swalign
This package implements a Smith-Waterman style local alignment algorithm. It works by calculating a sequence alignment between a query sequence and a reference. The scoring functions can be based on a matrix, or simple identity. Weights can be adjusted for match/mismatch and gaps, with gap extention penalties. Additionally, the gap penalty can be subject to a decay to prioritize long gaps over minor mismatches.
The input files are FASTA format files, or strings.
It is a useful example of how one might write an aligner from scratch. Even though it is not optimized, it is widely used as a teaching tool or for quick alignment searches when building an index would be overkill.
The aligner can be used in a stand-alone mode (bin/swalign
) or as an importable Python library.
Installation
swalign is available on PyPi and can be installed with pip.
$ pip install swalign
Example
import swalign
# Setup your scoring matrix
# (this can also be read from a file like BLOSUM, etc)
#
# Or you can choose your own values.
# 2 and -1 are common for an identity matrix.
match = 2
mismatch = -1
scoring = swalign.NucleotideScoringMatrix(match, mismatch)
# This sets up the aligner object. You must set your scoring matrix, but
# you can also choose gap penalties, etc...
sw = swalign.LocalAlignment(scoring)
# Using your aligner object, calculate the alignment between
# ref (first) and query (second)
alignment = sw.align('ACACACTA','AGCACACA')
alignment.dump()
Results:
Query: 1 AGCACAC-A 8
| ||||| |
Ref : 1 A-CACACTA 8
Score: 12
Matches: 7 (77.8%)
Mismatches: 2
CIGAR: 1M1I5M1D1M