This package implements a Smith-Waterman style local alignment algorithm. It works by calculating a sequence alignment between a query sequence and a reference. The scoring functions can be based on a matrix, or simple identity. Weights can be adjusted for match/mismatch and gaps, with gap extention penalties. Additionally, the gap penalty can be subject to a decay to prioritize long gaps over minor mismatches.
The input files are FASTA format files, or strings.
It is a useful example of how one might write an aligner from scratch. Even though it is not optimized, it is widely used as a teaching tool or for quick alignment searches when building an index would be overkill.
The aligner can be used in a stand-alone mode (
bin/swalign) or as an importable Python library.
swalign is available on PyPi and can be installed with pip.
$ pip install swalign
import swalign # Setup your scoring matrix # (this can also be read from a file like BLOSUM, etc) # # Or you can choose your own values. # 2 and -1 are common for an identity matrix. match = 2 mismatch = -1 scoring = swalign.NucleotideScoringMatrix(match, mismatch) # This sets up the aligner object. You must set your scoring matrix, but # you can also choose gap penalties, etc... sw = swalign.LocalAlignment(scoring) # Using your aligner object, calculate the alignment between # ref (first) and query (second) alignment = sw.align('ACACACTA','AGCACACA') alignment.dump()
Query: 1 AGCACAC-A 8 | ||||| | Ref : 1 A-CACACTA 8 Score: 12 Matches: 7 (77.8%) Mismatches: 2 CIGAR: 1M1I5M1D1M