Skip to content

Import genomic data to get a Pandas & Biopython hybrid with fancy shortcuts to make Machine Learning preprocessing easy!

License

Notifications You must be signed in to change notification settings

tmsincomb/SeqPandas

Repository files navigation

SeqPandas

Import genomic data to get a custom Pandas & Biopython hybrid class object with fancy shortcuts to make Machine Learning preprocessing easy!

Installation

pip install seqpandas

Usage

import seqpandas as spd

# Direct File Path
df = spd.read_seq('file.fasta', format='fasta')
df = spd.read_seq('file.sam', format='sam')
df = spd.read_vcf('file.vcf', format='vcf')
df = spd.read_bed('file.bed', format='bed')

# Just need BioPython Seqs? No problem!
seqrecords = spd.read('file.fasta', format='fasta')

# Already Opened BioPython Handle
from Bio import SeqIO
seqrecords = SeqIO.parse('file.fasta', format='fasta')
df = spd.BioDataFrame.from_seqrecords(seqrecords)

Tutorial

For a complete walkthrough and to use it for a machine learning pipeline please follow the tutorial notebook.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

Import genomic data to get a Pandas & Biopython hybrid with fancy shortcuts to make Machine Learning preprocessing easy!

Resources

License

Stars

Watchers

Forks

Packages