Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Argument isn't numeric in sort" in SAM_to_frag_coords.pl #1121

Closed
kubu4 opened this issue Feb 11, 2022 · 10 comments
Closed

"Argument isn't numeric in sort" in SAM_to_frag_coords.pl #1121

kubu4 opened this issue Feb 11, 2022 · 10 comments

Comments

@kubu4
Copy link

kubu4 commented Feb 11, 2022

Hi, I'm running a genome-guided assembly and just happened to notice the following messages in the sorting process:

CMD: /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl --CPU 40 --sort_buffer 500G --sam 20210131-cvir-hisat2.sorted.bam.-.sam --min_insert_size 1 --max_insert_size 10000 
-extracting read coordinates from 20210131-cvir-hisat2.sorted.bam.-.sam into 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords

-extracting read coordinates from 20210131-cvir-hisat2.sorted.bam.+.sam into 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords

CMD: touch 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.ok
CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.sort_by_readname
CMD: cp /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.-.sam.read_coords /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.sort_by_readname
CMD: touch 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.ok
CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
Argument "5688205NC_035780.1" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 171500519.
Argument "A01343:9:H2NGWDSX2:1:1303:16622:12962" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 171500519.
CMD: cp /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
Argument "568NC_035780.1" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 111031627.
CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k3,3n 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords.coord_sorted
CMD: /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//fragment_coverage_writer.pl 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coverage.wig

The command I issued to execute Trinity was:

# Run Trinity
## Running as "stranded" (--SS_lib_type)
${trinity_dir}/Trinity \
--genome_guided_bam ${sorted_bam} \
--genome_guided_max_intron ${max_intron} \
--seqType fq \
--SS_lib_type RF \
--max_memory ${max_mem} \
--CPU ${threads} \
--left "${R1_list}" \
--right "${R2_list}"

Is the error something I need to be concerned about? I'm running this as part of a SLURM script which has the set -e command at the beginning of the script, yet this message isn't exiting the script. This suggests, it's not a problem?

Anyway, just thought I'd check in and see if you happened to have any insight/thoughts on the matter.

As always, thanks for all the work you continue to do with Trinity!

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Feb 12, 2022 via email

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Feb 12, 2022 via email

@kubu4
Copy link
Author

kubu4 commented Feb 12, 2022

Thanks so much for the quick response!

Re-running with lower --max_memory 100G.

Getting the two requested files compressed and transferred to a server to host them. Will report back when I have URLs.

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Feb 12, 2022 via email

@kubu4
Copy link
Author

kubu4 commented Feb 13, 2022

Okay, here they are (finally):

If you just want to grab the directory with the two files and the MD5 checksum file, here's that URL:

https://gannet.fish.washington.edu/Atumefaciens/20220212_trinity_error/

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Feb 13, 2022 via email

@brianjohnhaas
Copy link
Member

brianjohnhaas commented Feb 13, 2022 via email

@kubu4
Copy link
Author

kubu4 commented Feb 13, 2022

Sounds good. Thanks again for looking at this! I'll report back on the re-run in a day or two (that last assembly run started throwing the error around day 5 of the job).

@kubu4
Copy link
Author

kubu4 commented Feb 16, 2022

Re-run of job has successfully gotten past point that triggered error.

File corruption was the cause (I suspect the corrupted file was triggered when the university's HPC cluster exceed disk quota at some point during assembly the previous week, but I don't have any way to prove this).

Again, I really can't thank you enough for your help and quick responses. It is greatly appreciated!

@kubu4 kubu4 closed this as completed Feb 16, 2022
@brianjohnhaas
Copy link
Member

brianjohnhaas commented Feb 16, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants