Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In silico normalization problems with Trinityv2.4.0 #481

Closed
UlisesArcos opened this issue May 28, 2018 · 22 comments
Closed

In silico normalization problems with Trinityv2.4.0 #481

UlisesArcos opened this issue May 28, 2018 · 22 comments

Comments

@UlisesArcos
Copy link

I try to use Trinityrnaseq to practices in bioinformatic with a transcriptome in single end, but when I use trinity I have this problem:

-sorting each stats file by read name.
Thread 1 terminated abnormally: Error, not all specified records have been retrieved (missing 600) from /home/manager/Transcriptoma/Jatropha/Totrans/Root0Trimm.fastq /home/manager/Transcriptoma/Jatropha/Totrans/Root001Trimm.fastq /home/manager/Transcriptoma/Jatropha/Totrans/Root100Trimm.fastq, see file: /home/manager/Transcriptoma/Jatropha/Totrans/Trinity_Jatropha/insilico_read_normalization/Root0Trimm.fastq_ext_all_reads.normalized_K25_C50_pctSD10000.fa.missing_accs for list of missing entries at /home/manager/Downloads/trinityrnaseq-Trinity-v2.4.0/util/insilico_read_normalization.pl line 548.
Error encountered with thread.
Error, at least one thread died at /home/manager/Downloads/trinityrnaseq-Trinity-v2.4.0/util/insilico_read_normalization.pl line 419.
Error, cmd: /home/manager/Downloads/trinityrnaseq-Trinity-v2.4.0/util/insilico_read_normalization.pl --seqType fa --JM 50G --max_cov 50 --CPU 8 --output /home/manager/Transcriptoma/Jatropha/Totrans/Trinity_Jatropha/insilico_read_normalization --max_pct_stdev 10000 --SS_lib_type F --single /home/manager/Transcriptoma/Jatropha/Totrans/Root0Trimm.fastq,/home/manager/Transcriptoma/Jatropha/Totrans/Root001Trimm.fastq,/home/manager/Transcriptoma/Jatropha/Totrans/Root100Trimm.fastq died with ret 7424 at /home/manager/Downloads/trinityrnaseq-Trinity-v2.4.0/Trinity line 2462.

I try to repair the problem but I have not been able to achieve it.

Could someone help me?
Many Thanks!

@UlisesArcos UlisesArcos changed the title In silico normalization problems In silico normalization problems with Trinityv2.4.0 May 29, 2018
@brianjohnhaas
Copy link
Member

brianjohnhaas commented May 29, 2018 via email

@UlisesArcos
Copy link
Author

UlisesArcos commented May 29, 2018

I use a file named Root0Trimm.fastq like input, thus the name is correct after Trinity ouput.

I still can not make trinity work. Can I do it something inside the file Root0Trimm.fastq_ext_all_reads.normalized_K25_C50_pctSD10000.fa.missing_accs
?

@brianjohnhaas
Copy link
Member

closing old posts. If this is still an active issue we can continue to explore it

@izabelcavassim
Copy link

I get a similar problem. Is it possible to answer the solution to this problem? Is there anything to do with the fastq format?

@brianjohnhaas
Copy link
Member

It usually has to do with the formatting of the input fastq records.

Can you show the top of your files? (ie. head *fastq)

@brianjohnhaas brianjohnhaas reopened this Jun 10, 2019
@izabelcavassim
Copy link

izabelcavassim commented Jun 10, 2019

Thanks Brian for your quick response,

My data is generated with Nextseq sequencing and I don't extract it from SRA.
I decided to not run cutadapt outside of the trinity, but let Trinity uses Trimmomatic for that matter.

I am using the latest version of Trinity (v2.8.5)

The head of my file looks like:

[mc4719@shake Combined_lanes]$ head Sample_A01_S1_R2_001.fastq
@NB551405:99:HKCG5AFXY:1:11101:19526:1106 2:N:0:CCGCGGTT+NTAGCGCT
CAAATCTTCTAGGTGACCCAGAAAACTTTACACCAGCCAACCCACTAGTTACCCCTCCCCACATTAAACCAGAGTGATACTTCTTATTCGCCTATGCTATCCTACGATCAATCCCAAACAAACTTGGAGGCGTACTAGCCCTCCTACTTT
+
AAAAAEEEEEEEEEEAEEEEEEEEEEEEEEE/EEEEEE</EEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEEEAEEEEEEEEEEEEEEAEAEEEEEEEEEEEEAEEEEEEEAEEEEEEEA/EEEEEEEEEEEE
@NB551405:99:HKCG5AFXY:1:11101:14384:1106 2:N:0:CCGCGGTT+NTAGCGCT
AATAAATATGTCATAACTTCACGCCGAGGTCGACAGGTCCTTACAGTTAAGGATGTTGCTAAAGAAGACCAAGGAGAATATAGCTTTGTGGCAGATGGGAAAAAGACCTCCTGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCTAG
+
AAA6AEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEAEEEEEEEEEEEEEEEA<EAEE/EEEEEEEEEEEEEE//EEEAEEE/EEEEEE<AAEAEAE/EEEEA/EE/</

I tried to modify the header of the reads with respect to the forward and reverse strand, but I still get an error:

[mc4719@shake Combined_lanes]$ head trinity_Sample_A01_S1_R2_001.fastq
@NB551405:99:HKCG5AFXY:1:11101:19526:1106_2:N:0:CCGCGGTT+NTAGCGCT/2
CAAATCTTCTAGGTGACCCAGAAAACTTTACACCAGCCAACCCACTAGTTACCCCTCCCCACATTAAACCAGAGTGATACTTCTTATTCGCCTATGCTATCCTACGATCAATCCCAAACAAACTTGGAGGCGTACTAGCCCTCCTACTTT
+
AAAAAEEEEEEEEEEAEEEEEEEEEEEEEEE/EEEEEE</EEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEEEAEEEEEEEEEEEEEEAEAEEEEEEEEEEEEAEEEEEEEAEEEEEEEA/EEEEEEEEEEEE
@NB551405:99:HKCG5AFXY:1:11101:14384:1106_2:N:0:CCGCGGTT+NTAGCGCT/2
AATAAATATGTCATAACTTCACGCCGAGGTCGACAGGTCCTTACAGTTAAGGATGTTGCTAAAGAAGACCAAGGAGAATATAGCTTTGTGGCAGATGGGAAAAAGACCTCCTGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCTAG
+
AAA6AEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEAEEEEEEEEEEEEEEEA<EAEE/EEEEEEEEEEEEEE//EEEAEEE/EEEEEE<AAEAEAE/EEEEA/EE/</
@NB551405:99:HKCG5AFXY:1:11101:5101:1107_2:N:0:CCGCGGTT+NTAGCGCT/2
CCTCCATCCAAGCCCTCCCTGTGACAGATGTCCATCCTCTGTTTAGAAGGAATCCTAGCATCCTTGGCCTTTTCTGGCCATGCGCTCTTCCCTTCCATGCCAGGAGCAGCTCCGCCGGGCCTTTCGGTGCCAATGGGATCTGCCTTAATC

@brianjohnhaas
Copy link
Member

I see the problem is the space in the original read def line;
@NB551405:99:HKCG5AFXY:1:11101:19526:1106 2:N:0:CCGCGGTT+NTAGCGCT

Try making this line look like this:
@NB551405:99:HKCG5AFXY:1:11101:19526:1106/2

instead of:
@NB551405:99:HKCG5AFXY:1:11101:19526:1106_2:N:0:CCGCGGTT+NTAGCGCT/2

and likewise for the /1 entries.

After this, try running Trinity in a new clean workspace so that it doesn't try to reuse any of the earlier intermediate outputs from the earlier run.

best,

~b

@izabelcavassim
Copy link

izabelcavassim commented Jun 10, 2019

Hi, it seems to be working now. Solution for my specific headers (forward reads for example):

if files are .gz:

  1. Delete everything after the space in the header:
    for i in R1_001.fastq.gz*; do zcat $i | sed '/^@/ s/ .*//' > trinity_1_$i; done

  2. Include the "/1" at the end of the headers:
    for i in trinity_1_*; do awk '{ if (NR%4==1) { print $1""$2"/1" } else { print } }' $i > new_$i; done

in the reverse reads you change the second step ({ print $1""$2"/2" }, instead of { print $1""$2"/1" }:
for i in trinity_2_*; do awk '{ if (NR%4==1) { print $1""$2"/2" } else { print } }' $i > new_$i; done

Thanks!

@isgilman
Copy link

Hello Brian, do lines beginning with +A also need to be amended? For example, would the following be a valid fastq read?

@A00127:170:H2H3NDSXY:1:1101:22146:1235/1
ATTTGACTCAGAATCTGTGAGGCTCTCTTGATCGAATCAGCACTCAACAATTTCCCGGTTTCACTTTTGTCCAAAAGGGCTGGATGATAACAAAGTGCAAG
+A00127:170:H2H3NDSXY:1:1101:22146:1235 1:N:0:GGTAGAATTA+TGTAAGGTGG
FFFFFF:FFFFFF,F,,F,,F,:,,FFFFFFFFFFF,:F:,FF,,FF:FFF,,F,,,,F,FFFFF::F::F:FFFFFF,:F,,FF,F:,,FFFFF:F:FFF

Thanks,
Ian

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Mar 17, 2020 via email

@peanut-buddy
Copy link

Hello!

I'm having a similar issue and looked into my trimmed reads. I just wanted to verify...

My read headers (for forward reads) look like this :
@a00920:888:HWTTMDSX2:4:1101:2627:1000 2:N:0:ACAATCCGTG+GGATTCTGTC
TAACTATCCTATCTTCTGATGACAGTTTAGCTCTTCAGAATCAAGAAACGCTTCTTAAGCTGAAACATCCTAAACCATCTAGATCTTTATCATTTCCTGAAACAATAACTTCTTTTACTGAAGTTTTGTTAGTCAAGTGAAGATGACGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFF:
@a00920:888:HWTTMDSX2:4:1101:12192:1000 2:N:0:ACAATCCGTG+GGATTCTGTC
TTTTAAGTTAAAAGGCTTAAATAGGTATAAAAGACGAGAAGACCCCATAGAGTTTAATTTTTTATTTTATTTTAGAATAATTTTATAACCAAAAAATTGGTTGGGGTAACTTAAAGATAAAATAAATTCTTTAAATGTGTAATCATTGAT
+

So should I remove the space to make the headers look like this?
@a00920:888:HWTTMDSX2:4:1101:2627:1000/1

@a00920:888:HWTTMDSX2:4:1101:12192:1000/1

Thank you and Happy New Year!

Margot

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Jan 5, 2023 via email

@peanut-buddy
Copy link

Thanks for pointing this out, I'll look into the raw reads.

Once this issue is corrected, would the correction (removal of space, addition of /1 or /2 to suffix) be the correct way to address this?

Thanks again!!

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Jan 5, 2023 via email

@peanut-buddy
Copy link

Hi Brian,

Okay good news--I checked the read files and I had accidentally copied and pasted a reverse read earlier. The forward reads are labeled with 1:N:0

Bad news...I'm not sure why I'm getting this error if the headers look fine.

Here is an example error:

Thread 6 terminated abnormally: Error, not all specified records have been retrieved (missing 15054418) from /PR09I1_2_val_2.fq.gz, see file: /trinity_out_dir/insilico_read_normalization/PR09I1_2_val_2.fq.gz.normalized_K25_maxC200_minC1_pctSD10000.fq.missing_accs for list of missing entries at /usr/local/bin/trinityrnaseq/util/insilico_read_normalization.pl line 552.
Thread 5 terminated abnormally: Error, not all specified records have been retrieved (missing 15054418) from /PR09I1_1_val_1.fq.gz, see file: /trinity_out_dir/insilico_read_normalization/PR09I1_1_val_1.fq.gz.normalized_K25_maxC200_minC1_pctSD10000.fq.missing_accs for list of missing entries at /usr/local/bin/trinityrnaseq/util/insilico_read_normalization.pl line 552.
Error encountered with thread.
Error encountered with thread.
Error, at least one thread died at /usr/local/bin/trinityrnaseq/util/insilico_read_normalization.pl line 423.

Do you have any suggestions for how to fix this? Thank you for your help, I really appreciate it!

Margot

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Jan 5, 2023 via email

@peanut-buddy
Copy link

Here are the top entries for: /trinity_out_dir/insilico_read_normalization/PR09I1_2_val_2.fq.gz.normalized_K25_maxC200_minC1_pctSD10000.fq.missing_accs

A00877:895:HWWVNDSX2:3:2225:9959:20995
A00253:590:HYCJ2DSX2:1:2402:2808:24518
A00253:581:HYM7GDSX2:1:1402:3206:5196
A00920:888:HWTTMDSX2:4:1563:28980:11819
A00917:820:HYMNJDSX2:3:1102:4426:34491
A00877:895:HWWVNDSX2:3:2242:15393:19038
A00253:581:HYM7GDSX2:1:2353:20437:15687
A00920:888:HWTTMDSX2:4:2446:15067:9455
A00253:590:HYCJ2DSX2:1:1443:9417:8625

it doesn't seem to find the second file, I wonder if it hasn't been created yet?

Thanks!

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Jan 5, 2023 via email

@peanut-buddy
Copy link

peanut-buddy commented Jan 5, 2023

Hi Brian,
I just grepped the trimmed and raw read files and didn't find the entries I posted for /trinity_out_dir/insilico_read_normalization/PR09I1_2_val_2.fq.gz.normalized_K25_maxC200_minC1_pctSD10000.fq.missing_accs

It seems like these reads are missing--what would you recommend next? EDIT: Sorry, I will try re-running in a new workspace and let you know how it turns out...thanks so much for your help.

Thank you!!

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Jan 5, 2023 via email

@peanut-buddy
Copy link

Hi Brian, the jobs have finished. I just needed to run them in a new directory. Thanks so much for your help!!!!

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Jan 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants