-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Argument isn't numeric in sort" in SAM_to_frag_coords.pl #1121
Comments
hi,
looks like this is the issue:
/usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n
20210131-cvir-hisat2.sorted.bam.+.sam.read_coords >
20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
Argument "5688205NC_035780.1" isn't numeric in sort
which does suggest that something led to corrupt files here.
If you have a way to share
"20210131-cvir-hisat2.sorted.bam.+.sam.read_coords" gzip-compressed, I
could aim to take a look.
If you can also share: 20210131-cvir-hisat2.sorted.bam.+.sam
(gzipped), that would help too.
I understand this could be a challenge if the files are massive.
Since it's early in the process, you could try rerunning everything in
a new directory so it doesn't try to reuse any of the earlier inputs,
and see if it's reproducible. My guess is that it will be based on the
error showing up for both strands.
Since this is the first report of this kind of error to my
knowledge/memory, it could be something specific to your data. If I
can access the data, I'll figure it out.
best,
~b
…On Fri, Feb 11, 2022 at 1:55 PM kubu4 ***@***.***> wrote:
Hi, I'm running a genome-guided assembly and just happened to notice the
following messages in the sorting process:
CMD: /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl --CPU 40 --sort_buffer 500G --sam 20210131-cvir-hisat2.sorted.bam.-.sam --min_insert_size 1 --max_insert_size 10000
-extracting read coordinates from 20210131-cvir-hisat2.sorted.bam.-.sam into 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords
-extracting read coordinates from 20210131-cvir-hisat2.sorted.bam.+.sam into 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords
CMD: touch 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.ok
CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.sort_by_readname
CMD: cp /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.-.sam.read_coords /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.sort_by_readname
CMD: touch 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.ok
CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
Argument "5688205NC_035780.1" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 171500519.
Argument "A01343:9:H2NGWDSX2:1:1303:16622:12962" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 171500519.
CMD: cp /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
Argument "568NC_035780.1" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 111031627.
CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k3,3n 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords.coord_sorted
CMD: /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//fragment_coverage_writer.pl 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coverage.wig
The command I issued to execute Trinity was:
# Run Trinity## Running as "stranded" (--SS_lib_type)${trinity_dir}/Trinity \
--genome_guided_bam ${sorted_bam} \
--genome_guided_max_intron ${max_intron} \
--seqType fq \
--SS_lib_type RF \
--max_memory ${max_mem} \
--CPU ${threads} \
--left "${R1_list}" \
--right "${R2_list}"
Is the error something I need to be concerned about? I'm running this as
part of a SLURM script which has the set -e command at the beginning of
the script, yet this message isn't exiting the script. This suggests, it's
not a problem?
Anyway, just thought I'd check in and see if you happened to have any
insight/thoughts on the matter.
As always, thanks for all the work you continue to do with Trinity!
—
Reply to this email directly, view it on GitHub
<#1121>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX2J6F324IQJ4H27ALTU2VLSDANCNFSM5OFCLZZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Also, when you run, try just giving it --max_memory 100G. Going way higher
could cause issues with some of the downstream steps.
best,
~b
…On Sat, Feb 12, 2022 at 9:08 AM Brian Haas ***@***.***> wrote:
hi,
looks like this is the issue:
/usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
Argument "5688205NC_035780.1" isn't numeric in sort
which does suggest that something led to corrupt files here.
If you have a way to share "20210131-cvir-hisat2.sorted.bam.+.sam.read_coords" gzip-compressed, I could aim to take a look.
If you can also share: 20210131-cvir-hisat2.sorted.bam.+.sam (gzipped), that would help too.
I understand this could be a challenge if the files are massive.
Since it's early in the process, you could try rerunning everything in a new directory so it doesn't try to reuse any of the earlier inputs, and see if it's reproducible. My guess is that it will be based on the error showing up for both strands.
Since this is the first report of this kind of error to my knowledge/memory, it could be something specific to your data. If I can access the data, I'll figure it out.
best,
~b
On Fri, Feb 11, 2022 at 1:55 PM kubu4 ***@***.***> wrote:
> Hi, I'm running a genome-guided assembly and just happened to notice the
> following messages in the sorting process:
>
> CMD: /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl --CPU 40 --sort_buffer 500G --sam 20210131-cvir-hisat2.sorted.bam.-.sam --min_insert_size 1 --max_insert_size 10000
> -extracting read coordinates from 20210131-cvir-hisat2.sorted.bam.-.sam into 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords
>
> -extracting read coordinates from 20210131-cvir-hisat2.sorted.bam.+.sam into 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords
>
> CMD: touch 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.ok
> CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.sort_by_readname
> CMD: cp /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.-.sam.read_coords /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.-.sam.read_coords.sort_by_readname
> CMD: touch 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.ok
> CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k2,2 -k4,4n 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords > 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
> Argument "5688205NC_035780.1" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 171500519.
> Argument "A01343:9:H2NGWDSX2:1:1303:16622:12962" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 171500519.
> CMD: cp /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords /gscratch/scrubbed/samwhite/outputs/20220207_cvir_trinity-gg_adult-oa-gonad_assembly-1.0/trinity_out_dir/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.sort_by_readname
> Argument "568NC_035780.1" isn't numeric in sort at /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//SAM_to_frag_coords.pl line 208, <$fh> line 111031627.
> CMD: /usr/bin/sort --parallel=40 -S500G -T . -k1,1 -k3,3n 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords.coord_sorted
> CMD: /gscratch/srlab/programs/trinityrnaseq-v2.9.0/util/support_scripts//fragment_coverage_writer.pl 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coords > 20210131-cvir-hisat2.sorted.bam.-.sam.frag_coverage.wig
>
> The command I issued to execute Trinity was:
>
> # Run Trinity## Running as "stranded" (--SS_lib_type)${trinity_dir}/Trinity \
> --genome_guided_bam ${sorted_bam} \
> --genome_guided_max_intron ${max_intron} \
> --seqType fq \
> --SS_lib_type RF \
> --max_memory ${max_mem} \
> --CPU ${threads} \
> --left "${R1_list}" \
> --right "${R2_list}"
>
> Is the error something I need to be concerned about? I'm running this as
> part of a SLURM script which has the set -e command at the beginning of
> the script, yet this message isn't exiting the script. This suggests, it's
> not a problem?
>
> Anyway, just thought I'd check in and see if you happened to have any
> insight/thoughts on the matter.
>
> As always, thanks for all the work you continue to do with Trinity!
>
> —
> Reply to this email directly, view it on GitHub
> <#1121>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABZRKX2J6F324IQJ4H27ALTU2VLSDANCNFSM5OFCLZZQ>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Thanks so much for the quick response! Re-running with lower Getting the two requested files compressed and transferred to a server to host them. Will report back when I have URLs. |
perfect!
…On Sat, Feb 12, 2022 at 10:43 AM kubu4 ***@***.***> wrote:
Thanks so much for the quick response!
Re-running with lower --max_memory 100G.
Getting the two requested files compressed and transferred to a server to
host them. Will report back when I have URLs.
—
Reply to this email directly, view it on GitHub
<#1121 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXZRWWF6UMOFRXYTDQ3U2Z5Y5ANCNFSM5OFCLZZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Okay, here they are (finally):
If you just want to grab the directory with the two files and the MD5 checksum file, here's that URL: https://gannet.fish.washington.edu/Atumefaciens/20220212_trinity_error/ |
Thanks for sharing the files! It does appear that the read_coords file has
somehow become corrupt.
Let's see if the re-run in a new workspace continues to have trouble and
we'll dig further.
best,
~b
On Sun, Feb 13, 2022 at 10:34 AM Brian Haas ***@***.***>
wrote:
… Thanks! downloading now
On Sun, Feb 13, 2022 at 12:25 AM kubu4 ***@***.***> wrote:
> Okay, here they are (finally):
>
> -
>
> 20210131-cvir-hisat2.sorted.bam.+.sam.gz
> <https://gannet.fish.washington.edu/Atumefaciens/20220212_trinity_error/20210131-cvir-hisat2.sorted.bam.+.sam.gz>
> (46G)
> - MD5: 95fd59802b501f0948dc7a46fb5cfd25
> -
>
> 20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.gz
> <https://gannet.fish.washington.edu/Atumefaciens/20220212_trinity_error/20210131-cvir-hisat2.sorted.bam.+.sam.read_coords.gz>
> (11G)
> - MD5: aa8ad8fd4eff1c05740298f6493ac2df
>
> If you just want to grab the directory with the two files and the MD5
> checksum file, here's that URL:
>
> https://gannet.fish.washington.edu/Atumefaciens/20220212_trinity_error/
>
> —
> Reply to this email directly, view it on GitHub
> <#1121 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABZRKXYWLVPYUKSXJL6CWOTU246EPANCNFSM5OFCLZZQ>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Sounds good. Thanks again for looking at this! I'll report back on the re-run in a day or two (that last assembly run started throwing the error around day 5 of the job). |
Re-run of job has successfully gotten past point that triggered error. File corruption was the cause (I suspect the corrupted file was triggered when the university's HPC cluster exceed disk quota at some point during assembly the previous week, but I don't have any way to prove this). Again, I really can't thank you enough for your help and quick responses. It is greatly appreciated! |
Glad to hear it!
…On Wed, Feb 16, 2022 at 1:35 PM kubu4 ***@***.***> wrote:
Re-run of job has successfully gotten past point that triggered error.
File corruption was the cause (I suspect the corrupted file was triggered
when the university's HPC cluster exceed disk quota at some point during
assembly the previous week, but I don't have any way to prove this).
Again, I really can't thank you enough for your help and quick responses.
It is *greatly* appreciated!
—
Reply to this email directly, view it on GitHub
<#1121 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX6W5KQM2FCDITVSM2DU3PU63ANCNFSM5OFCLZZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi, I'm running a genome-guided assembly and just happened to notice the following messages in the sorting process:
The command I issued to execute Trinity was:
Is the error something I need to be concerned about? I'm running this as part of a SLURM script which has the
set -e
command at the beginning of the script, yet this message isn't exiting the script. This suggests, it's not a problem?Anyway, just thought I'd check in and see if you happened to have any insight/thoughts on the matter.
As always, thanks for all the work you continue to do with Trinity!
The text was updated successfully, but these errors were encountered: