PASTA

Pattern Analysis vs Sequence-based Tree Alignment

This Python 3 package was developed for aligning sequences derived from binary tree structures, specifically but not limited to neuronal trees. This package, and results from alignment using it were first published in:

Gillette TA, Hosseini P, Ascoli GA (2015) Topological characterization of neuronal arbor morphology via sequence representation. II. Global alignment. BMC Bioinformatics (submitted).

The main files are:

spaghetti.py: Pairwise global alignment between sequences in a single fasta file, or between sequences in two separate fasta files. The alignment is based on the Needleman-Wunsch algorithm modified to respect three specific bifurcation types to which each character must correspond. Matching and gapping rules are modified based on these bifurcation types. A second mode enables conversion from raw scores to per-character scores based on the shorter sequence length.
penne.py: Multiple alignment utilizing the same underlying methods as in spaghetti. The process by default iterates using a position-specific score matrix (PSSM) based on the initial alignment.
ditalini.py: A small program which produces measures and statistics of a multiple alignment.
orzo.py: Extracts potential domains from a multiple alignment. This module was not used in the aforementioned paper and has not been thoroughly tested.

Data-type files:

sequence.py: Contains classes NeuriteSequence, ConsensusSequence, and MultipleSequenceAlignment, and various methods for dealing with score matrices and tree-type character mapping.
matrix.py: Contains several classes for holding matrices and transposing indices. Also contains Iterative Proportional Fitting (IPF) logic designed to determine most significant domains using multiple separate metrics.

Heavy-lifting processing:

pairwise.py: Performs pairwise global alignment (local alignment code in progress), with wrappers for running multiple pairwise alignments given a set of queries and targets.
msa.py: Performs multiple sequence alignment, including iteration using PSSM.
domain.py: Analyze a consensus sequence and extract out domains which satisfy user-provided arguments.

Other processing:

buffer.py: A high-level module which performs functionality having to do with writing results produced from analysis.
parameter.py: Handles and validates program arguments (argparse) and produces InputStateWrapper object.
score_converter.py: Converts raw alignment scores to per-character scores.
validate_tree.py: Tests input sequence for whether it is a valid tree sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
buffer.py		buffer.py
ditalini.py		ditalini.py
domain.py		domain.py
matrix.py		matrix.py
msa-tester.py		msa-tester.py
msa.py		msa.py
orzo.py		orzo.py
pairwise.py		pairwise.py
parameter.py		parameter.py
penne.py		penne.py
score_converter.py		score_converter.py
sequence.py		sequence.py
spaghetti.py		spaghetti.py
test_params.py		test_params.py
timetest.py		timetest.py
tree.py		tree.py
validate_tree.py		validate_tree.py

License

tgillet1/PASTA

Folders and files

Latest commit

History

Repository files navigation

PASTA

About

Resources

License

Stars

Watchers

Forks

Languages