Skip to content

Latest commit

 

History

History
207 lines (143 loc) · 5.75 KB

api_core.rst

File metadata and controls

207 lines (143 loc) · 5.75 KB

Python API documentation (core)

genomepy

This page described the core genomepy functionality. These classes and functions can be found on the top level of the genomepy module (e.g. genomepy.search), and are made available when running from genomepy import * (we won't judge you).

Additional functions that do not fit the core functionality, but we feel are still pretty cool, are also described.

Finding genomic data

When looking to download a new genome/gene annotation, your first step would be genomepy.search. This function will check either one, or all, providers. Advanced users may want to specify a provider for their search to speed up the process. To see which providers are available, use genomepy.list_providers or genomepy.list_online_providers:


list_providers

list_online_providers

search


If you have no idea what you are looking for, you could even check out all available genomes. Be warned, genomepy.list_available_genomes is like watching the Star Wars title crawl.


list_available_genomes


If we search for homo sapiens for instance, we find that GRCh3.p13 and hg38 are the latest versions. These names describe the same genome, but different assemblies, with differences between them.

One of these differences is the quality of the gene annotation. Next, we can inspect these with genomepy.head_annotations:


head_annotations


Installing genomic data

Now that you have seen whats available, its time to download a genome. The default parameter for genomepy.install_genome are optimized for sequence alignment and gene counting, but you have full control over them, so have a look!

genomepy won't overwrite any files you already downloaded (unless specified), but you can review your local genomes with genomepy.list_installed_genomes.


install_genome

list_installed_genomes


If you want to download a sequence blacklist, or create an aligner index, you might wanna look at plugins! Don't worry, you can rerun the genome.install_genome command, and genomepy will only run the new parts.


manage_plugins


The genome and gene annotations were installed in the genomes directory (unless specified otherwise). If you have a specific location in mind, you could set this as default in the genomepy config. To find and inspect it, use genomepy.manage_config:


manage_config


Errors

Did something go wrong? Oh noes! If the problem persists, clear the genomepy cache with genomepy.clean, and try again.


clean


Using a genome

Alright, you've got the goods! You can browse the genome's sequences and metadata with the genomepy.Genome class. This class builds on the pyfaidx.Fasta class to also provide you with several options to get specific sequences from your genome, and save these to file.


Genome

Methods

~Genome.close ~Genome.get_random_sequences ~Genome.get_seq ~Genome.get_spliced_seq ~Genome.items ~Genome.keys ~Genome.track2fasta ~Genome.values

Attributes

~Genome.gaps ~Genome.plugin ~Genome.sizes ~Genome.genomes_dir ~Genome.name ~Genome.genome_file ~Genome.genome_dir ~Genome.index_file ~Genome.sizes_file ~Genome.gaps_file ~Genome.annotation_gtf_file ~Genome.annotation_bed_file ~Genome.readme_file


You can obtain genomic sequences from a wide variety of inputs with as_seqdict. To use the function, it must be explicitly imported with from genomepy.seq import as_seqdict.


genomepy.seq.as_seqdict


A non-core function worth mentioning is genomepy.files.filter_fasta, for when you wish to filter a fasta file by chromosome name using regex, but want the output straight to (another) fasta file.


genomepy.files.filter_fasta


Using a gene annotation

Similarly, the genomepy.Annotation class helps you get the genes in check. This class returns a number of neat pandas dataframes, such as the named_gtf, or an annotation with the gene or chromosome names remapped to another type. Remapping gene names to another type is also possible with Annotation.map_genes. This feature also comes as separate function genomepy.query_mygene, as it's just so darn useful.


Annotation

query_mygene


Another non-core function worth mentioning is genomepy.annotation.filter_regex, which allows you to filter a dataframe by any columns using regex.


genomepy.annotation.filter_regex