Regarding error related to fastq files #64

chanchalrana · 2024-05-07T12:07:35Z

Hello!!
I am running args_oap on fastq files and in stage one I am getting the error as:

Kindly tell me the solution for this. I am getting it for all the files.

chanchalrana · 2024-05-08T05:00:39Z

When I used the new version of ARGS-OAP, I am getting the error as:

It would be great if you can help me out with this issue.

xinehc · 2024-05-08T13:32:34Z

Your fastq file is truncated/incomplete somehow. Did you by any chance concatenated a fastq with a fasta file?

The + line is missing in line 29480427, please double check the file.

chanchalrana · 2024-05-09T05:15:56Z

The file is metagenomic fastq file in gzip format. I checked that there is + sign present at 29480427. These are infant stool samples. I cannot figure out the problem

xinehc · 2024-05-09T05:47:10Z

I would recommand to run seqkit sana on your fastq file to see whether there are malformed records.

Reference: https://bioinf.shenwei.me/seqkit/usage/#sana

chanchalrana · 2024-05-09T09:33:12Z

Yeah!! I used it. It is giving no errors.

What else I can do to find more about this error.

When I run the files, extracted.fa file is generated but metadata file remains empty.

xinehc · 2024-05-09T10:04:02Z

Did you check file <input/HeP-1057-162.fa>? Although its named after .fa, diamond identifies it as a fq file. Based on your screenshot, the file indeed should be a fq file, so my guess is that this particularly file is either truncated or contains a mixture of fq/fa records. Something could wrong with upstream preprocessing. Why it is named after .fa?

chanchalrana · 2024-05-09T11:36:49Z

Yup, earlier I changed the extension of fq.gz files after extraction to .fa. But then I run the pipeline on .fq.gz files only (the ones I got after quality check, adaptor trimming and removal of host sequences). I am getting the same error.

Yes, all the files are fastq files in gzip format. And all the files are giving this same error. How to know if the files are truncated. The de novo assembled files of these fastq are just fine.

Everything in the downstream analysis is just fine. I don't know why it is giving this error.

xinehc · 2024-05-09T12:47:08Z

try seqkit fq2fa to convert your file into fa format, then run args_oap on the converted files. If something is wrong with your file seqkit should raise errors.

chanchalrana · 2024-05-10T05:32:22Z

What might be the issue?? I cannot figure out.

In metadata.txt, if only nread column is important, we can make it manually. Right? Beacuse the extracted.fa file is made in stage 1 and it has the reads (It is not empty) so, I think it should. Isn't it?

Also, I ran one file with older version of args_oap on my desktop and it worked but when I ran the same file with the new version, it gave the same error.

xinehc · 2024-05-10T06:46:29Z

What might be the issue?? I cannot figure out.

The file after seqkit fq2fa is not gzipped, you need to remove .gz otherwise it will be mistakenly considered as a gzipped file.

In metadata.txt, if only nread column is important, we can make it manually. Right?

Please try not to manually editing the metadata file. It may lead to unexpected results.

Also, I ran one file with older version of args_oap on my desktop and it worked but when I ran the same file with the new version, it gave the same error.

I am actually not sure what is wrong with your input file. If you don't mind please attach a minimal reproducible example file so that I could check.

chanchalrana · 2024-05-10T07:47:41Z

subset_HeP-146-19.fq.gz

Is it sufficient. Please let me know.

Also, Why is that, same file when run on older version of ARGS-OAP (on Desktop) is giving result but when run on recent version (On server) is not giving the result.

xinehc · 2024-05-10T09:48:27Z

If you remove .gz then it should work:

wget https://github.com/xinehc/args_oap/files/15271838/subset_HeP-146-19.fq.gz

mkdir -p input
mv subset_HeP-146-19.fq.gz input/subset_HeP-146-19.fq

args_oap stage_one -i input -o output -f fq
args_oap stage_two -i output

Your files are not gziped. You can simply check whether a file is gzipped using gzip -t.

bioinfogini · 2024-07-16T13:14:56Z

Hello @xinehc ,
sorry to jump in but I am facing a similar issue, although I have paired ends files.
So, what I have are Trimmed paired compressed fastq files.
As they are compressed in zip, I decompressed using pigz.
Then, I tried to

convert them with seqkit fq2fa
deinterleave them with reformat.sh from bbmap
run args_oap stage_one -i input2 -o output2 -f fa
--> returned me WARNING: Something is wrong with <input2/R1.fa>, skip.

Also,
I tried

deinterleave them with reformat.sh from bbmap
check them with seqkit sana
run args_oap stage_one -i input -o output -f fq
--> returned me WARNING: Something is wrong with <input2/R1.fa>, skip

I can't really understand what is wrong.
I am working with args_oap as conda env on a server, where I don't have timewalls.
Can you please help?
Thanks

I attach a file of mine as proxy. Thanks in advance

https://drive.google.com/file/d/1y-20f7rUJX2cm3rYBIdmxPtdI-iq6qec/view?usp=drive_link

chanchalrana · 2024-07-16T14:37:24Z

My issue was resolved.
Firstly, there is no need to convert them to .fa as ARG_OAP can handle fastq gz files too. Just replace the .fa with .fq in the command line.
Second, I was operating on supercomputer and the I was running the args_oap in the command shell. Running this in the command shell was the mistake I was doing. Instead, we have to submit the jobs on supercomputer using sbatch or other predefined commands depending upon the server.

I tried everything but actually this was the error. There was nothing wrong with the sequences.

Also, the same command runs on the local system with the same file that was giving the error on the server.

bioinfogini · 2024-07-16T15:49:37Z

@chanchalrana thank a lot for your feedback, I will try to adopt same procedure! My problem is only that I also have to deinterleave my files, as they are paired ends while your were not, if I understood correctly.
Keep you posted!

UPDATE: I was not able to run the args_oap on the server, but worked on my laptop. Thanks @chanchalrana for the feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding error related to fastq files #64

Regarding error related to fastq files #64

chanchalrana commented May 7, 2024

chanchalrana commented May 8, 2024 •

edited

Loading

xinehc commented May 8, 2024

chanchalrana commented May 9, 2024

xinehc commented May 9, 2024

chanchalrana commented May 9, 2024 •

edited

Loading

xinehc commented May 9, 2024

chanchalrana commented May 9, 2024 •

edited

Loading

xinehc commented May 9, 2024

chanchalrana commented May 10, 2024 •

edited

Loading

xinehc commented May 10, 2024

chanchalrana commented May 10, 2024

xinehc commented May 10, 2024

bioinfogini commented Jul 16, 2024

chanchalrana commented Jul 16, 2024

bioinfogini commented Jul 16, 2024 •

edited

Loading

Regarding error related to fastq files #64

Regarding error related to fastq files #64

Comments

chanchalrana commented May 7, 2024

chanchalrana commented May 8, 2024 • edited Loading

xinehc commented May 8, 2024

chanchalrana commented May 9, 2024

xinehc commented May 9, 2024

chanchalrana commented May 9, 2024 • edited Loading

xinehc commented May 9, 2024

chanchalrana commented May 9, 2024 • edited Loading

xinehc commented May 9, 2024

chanchalrana commented May 10, 2024 • edited Loading

xinehc commented May 10, 2024

chanchalrana commented May 10, 2024

xinehc commented May 10, 2024

bioinfogini commented Jul 16, 2024

chanchalrana commented Jul 16, 2024

bioinfogini commented Jul 16, 2024 • edited Loading

chanchalrana commented May 8, 2024 •

edited

Loading

chanchalrana commented May 9, 2024 •

edited

Loading

chanchalrana commented May 9, 2024 •

edited

Loading

chanchalrana commented May 10, 2024 •

edited

Loading

bioinfogini commented Jul 16, 2024 •

edited

Loading