Skip to content

Commit

Permalink
Merge pull request #1141 from tripal/1040-tv3-gff3_performance
Browse files Browse the repository at this point in the history
Much improved GFF3 loader
  • Loading branch information
spficklin committed Jan 4, 2021
2 parents 95e3d45 + a4be52d commit 35885f1
Show file tree
Hide file tree
Showing 32 changed files with 5,002 additions and 1,937 deletions.
67 changes: 47 additions & 20 deletions docs/user_guide/example_genomics/genomes_genes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,27 +39,54 @@ You should see output similar to the following:

::

Tripal Job Launcher
Running as user 'administrator'
-------------------
2018-06-29 18:00:50: There are 1 jobs queued.
2018-06-29 18:00:50: Job ID 8.
2018-06-29 18:00:50: Calling: tripal_run_importer(12)

Running 'Chado GFF3 File Loader' importer
NOTE: Loading of file is performed using a database transaction.
If it fails or is terminated prematurely then all insertions and
updates are rolled back and will not be found in the database

Opening /var/www/html/sites/default/files/tripal/users/1/Citrus_sinensis-orange1.1g015632m.g.gff3
Percent complete: 100.00%. Memory: 32,211,360 bytes.
Adding protein sequences if CDS exist and no proteins in GFF...
Setting ranks of children...

Done.
2020-10-02 21:53:18
Tripal Job Launcher
Running as user 'admin'
-------------------
2020-10-02 21:53:18: There are 1 jobs queued.
2020-10-02 21:53:18: Job ID 1310.
2020-10-02 21:53:18: Calling: tripal_run_importer(123)

Running 'Chado GFF3 File Loader' importer
NOTE: Loading of file is performed using a database transaction.
If it fails or is terminated prematurely then all insertions and
updates are rolled back and will not be found in the database

Opening /var/www/html/sites/default/files/tripal/users/1/Citrus_sinensis-orange1.1g015632m.g.gff3
Opening temporary cache file: /tmp/TripalGFF3Import_aUgoru
Step 1 of 26: Caching GFF3 file...
Step 2 of 26: Find existing landmarks...
Step 3 of 26: Insert new landmarks (if needed)...
Step 4 of 26: Find missing proteins...
Step 5 of 26: Add missing proteins to list of features...
Step 6 of 26: Find existing features...
Step 7 of 26: Clear attributes of existing features...
Step 8 of 26: Processing 135 features...
Step 9 of 26: Get new feature IDs...
Step 10 of 26: Insert locations...
Step 11 of 26: Associate parents and children...
Step 12 of 26: Calculate child ranks...
Step 13 of 26: Add child-parent relationships...
Step 14 of 26: Insert properties...
Step 15 of 26: Find synonyms (aliases)...
Step 16 of 26: Insert new synonyms (aliases)...
Step 17 of 26: Insert feature synonyms (aliases)...
Step 18 of 26: Find cross references...
Step 19 of 26: Insert new cross references...
Step 20 of 26: Get new cross references IDs...
Step 21 of 26: Insert feature cross references...
Step 22 of 26: Insert feature ontology terms...
Step 23 of 26: Insert 'derives_from' relationships...
Step 24 of 26: Insert Targets...
Step 25 of 26: Associate features with analysis....
Step 26 of 26: Adding sequences data (Skipped: none available)...

Done.
Committing Transaction...

Remapping Chado Controlled vocabularies to Tripal Terms...
Done.

Remapping Chado Controlled vocabularies to Tripal Terms...
Done.

.. note::

Expand Down
3 changes: 1 addition & 2 deletions legacy/tripal_feature/tripal_feature.module
Original file line number Diff line number Diff line change
Expand Up @@ -307,8 +307,7 @@ function tripal_feature_theme($existing, $type, $theme, $path) {
return $items;
}
/**
* Implements hook_job_describe_args() in order to describe the various feature jobs
* to the tripal jobs interface.
* Implements hook_job_describe_args()
*
* @ingroup tripal_legacy_feature
*/
Expand Down
3 changes: 0 additions & 3 deletions legacy/tripal_pub/tripal_pub.module
Original file line number Diff line number Diff line change
Expand Up @@ -305,9 +305,6 @@ function tripal_pub_form_alter(&$form, &$form_state, $form_id) {
/**
* Implements hook_job_describe_args().
*
* @param $callback
* @param $args
*
* @ingroup tripal_legacy_pub
*/
function tripal_pub_job_describe_args($callback, $args) {
Expand Down
41 changes: 41 additions & 0 deletions tests/tripal_chado/data/empty_landmarks.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
>Contig10036

>Contig1

>Contig0

>Contig100

>Contig10022

>Contig10023

>Contig10035

>Contig1001

>Contig10012

>Contig1002

>Contig10026

>Contig10018

>Contig1003

>Contig10030

>Contig10

>Contig10011

>Contig10005

>Contig10002

>Contig1000

>Contig10000

>Contig10001
499 changes: 499 additions & 0 deletions tests/tripal_chado/data/gff_duplicate_ids.gff

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions tests/tripal_chado/data/gff_invalidstartend.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
##gff-version 3
Contig0 FRAEX38873_v2 gene 44054 16315 . + . ID=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010;biotype=protein_coding
Contig0 FRAEX38873_v2 mRNA 16315 44054 . + . ID=FRAEX38873_v2_000000010.1;Parent=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010.1;biotype=protein_coding;AED=0.05
Contig0 FRAEX38873_v2 polypeptide 16315 44054 . + . ID=FRAEX38873_v2_000000010.1.3_test_protein;Parent=FRAEX38873_v2_000000010.1
6 changes: 6 additions & 0 deletions tests/tripal_chado/data/gff_phase.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
##gff-version 3
Contig0 FRAEX38873_v2 gene 16315 44054 . + . ID=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010;biotype=protein_coding
Contig0 FRAEX38873_v2 mRNA 16315 44054 . + . ID=FRAEX38873_v2_000000010.1;Parent=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010.1;biotype=protein_coding;AED=0.05
Contig0 FRAEX38873_v2 five_prime_UTR 16315 16557 . + . ID=FRAEX38873_v2_000000010.1.5utr1;Parent=FRAEX38873_v2_000000010.1
Contig0 FRAEX38873_v2 exon 16315 16967 . + . ID=FRAEX38873_v2_000000010.1.exon1;Parent=FRAEX38873_v2_000000010.1
Contig0 FRAEX38873_v2 CDS 16558 16967 . + 1 ID=FRAEX38873_v2_000000010.1.cds1;Parent=FRAEX38873_v2_000000010.1
6 changes: 6 additions & 0 deletions tests/tripal_chado/data/gff_phase_invalid_character.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
##gff-version 3
Contig0 FRAEX38873_v2 gene 16315 44054 . + . ID=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010;biotype=protein_coding
Contig0 FRAEX38873_v2 mRNA 16315 44054 . + . ID=FRAEX38873_v2_000000010.1;Parent=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010.1;biotype=protein_coding;AED=0.05
Contig0 FRAEX38873_v2 five_prime_UTR 16315 16557 . + . ID=FRAEX38873_v2_000000010.1.5utr1;Parent=FRAEX38873_v2_000000010.1
Contig0 FRAEX38873_v2 exon 16315 16967 . + . ID=FRAEX38873_v2_000000010.1.exon1;Parent=FRAEX38873_v2_000000010.1
Contig0 FRAEX38873_v2 CDS 16558 16967 . + a ID=FRAEX38873_v2_000000010.1.cds1;Parent=FRAEX38873_v2_000000010.1
6 changes: 6 additions & 0 deletions tests/tripal_chado/data/gff_phase_invalid_number.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
##gff-version 3
Contig0 FRAEX38873_v2 gene 16315 44054 . + . ID=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010;biotype=protein_coding
Contig0 FRAEX38873_v2 mRNA 16315 44054 . + . ID=FRAEX38873_v2_000000010.1;Parent=FRAEX38873_v2_000000010;Name=FRAEX38873_v2_000000010.1;biotype=protein_coding;AED=0.05
Contig0 FRAEX38873_v2 five_prime_UTR 16315 16557 . + . ID=FRAEX38873_v2_000000010.1.5utr1;Parent=FRAEX38873_v2_000000010.1
Contig0 FRAEX38873_v2 exon 16315 16967 . + . ID=FRAEX38873_v2_000000010.1.exon1;Parent=FRAEX38873_v2_000000010.1
Contig0 FRAEX38873_v2 CDS 16558 16967 . + 3 ID=FRAEX38873_v2_000000010.1.cds1;Parent=FRAEX38873_v2_000000010.1

0 comments on commit 35885f1

Please sign in to comment.