Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding error related to fastq files #64

Open
chanchalrana opened this issue May 7, 2024 · 15 comments
Open

Regarding error related to fastq files #64

chanchalrana opened this issue May 7, 2024 · 15 comments

Comments

@chanchalrana
Copy link

Hello!!
I am running args_oap on fastq files and in stage one I am getting the error as:
image (1)

Kindly tell me the solution for this. I am getting it for all the files.

@chanchalrana
Copy link
Author

chanchalrana commented May 8, 2024

When I used the new version of ARGS-OAP, I am getting the error as:
image

It would be great if you can help me out with this issue.

@xinehc
Copy link
Owner

xinehc commented May 8, 2024

Your fastq file is truncated/incomplete somehow. Did you by any chance concatenated a fastq with a fasta file?

The + line is missing in line 29480427, please double check the file.

@chanchalrana
Copy link
Author

The file is metagenomic fastq file in gzip format. I checked that there is + sign present at 29480427. These are infant stool samples. I cannot figure out the problem
image

@xinehc
Copy link
Owner

xinehc commented May 9, 2024

I would recommand to run seqkit sana on your fastq file to see whether there are malformed records.

Reference: https://bioinf.shenwei.me/seqkit/usage/#sana

@chanchalrana
Copy link
Author

chanchalrana commented May 9, 2024

Yeah!! I used it. It is giving no errors.
image
image
What else I can do to find more about this error.

When I run the files, extracted.fa file is generated but metadata file remains empty.

@xinehc
Copy link
Owner

xinehc commented May 9, 2024

Did you check file <input/HeP-1057-162.fa>? Although its named after .fa, diamond identifies it as a fq file. Based on your screenshot, the file indeed should be a fq file, so my guess is that this particularly file is either truncated or contains a mixture of fq/fa records. Something could wrong with upstream preprocessing. Why it is named after .fa?

@chanchalrana
Copy link
Author

chanchalrana commented May 9, 2024

Yup, earlier I changed the extension of fq.gz files after extraction to .fa. But then I run the pipeline on .fq.gz files only (the ones I got after quality check, adaptor trimming and removal of host sequences). I am getting the same error.

Yes, all the files are fastq files in gzip format. And all the files are giving this same error. How to know if the files are truncated. The de novo assembled files of these fastq are just fine.

Everything in the downstream analysis is just fine. I don't know why it is giving this error.

@xinehc
Copy link
Owner

xinehc commented May 9, 2024

try seqkit fq2fa to convert your file into fa format, then run args_oap on the converted files. If something is wrong with your file seqkit should raise errors.

@chanchalrana
Copy link
Author

chanchalrana commented May 10, 2024

image
image
image
image
What might be the issue?? I cannot figure out.

In metadata.txt, if only nread column is important, we can make it manually. Right? Beacuse the extracted.fa file is made in stage 1 and it has the reads (It is not empty) so, I think it should. Isn't it?

image

Also, I ran one file with older version of args_oap on my desktop and it worked but when I ran the same file with the new version, it gave the same error.

image

image

@xinehc
Copy link
Owner

xinehc commented May 10, 2024

What might be the issue?? I cannot figure out.

The file after seqkit fq2fa is not gzipped, you need to remove .gz otherwise it will be mistakenly considered as a gzipped file.

In metadata.txt, if only nread column is important, we can make it manually. Right?

Please try not to manually editing the metadata file. It may lead to unexpected results.

Also, I ran one file with older version of args_oap on my desktop and it worked but when I ran the same file with the new version, it gave the same error.

I am actually not sure what is wrong with your input file. If you don't mind please attach a minimal reproducible example file so that I could check.

@chanchalrana
Copy link
Author

subset_HeP-146-19.fq.gz

Is it sufficient. Please let me know.

Also, Why is that, same file when run on older version of ARGS-OAP (on Desktop) is giving result but when run on recent version (On server) is not giving the result.

@xinehc
Copy link
Owner

xinehc commented May 10, 2024

If you remove .gz then it should work:

wget https://github.com/xinehc/args_oap/files/15271838/subset_HeP-146-19.fq.gz

mkdir -p input
mv subset_HeP-146-19.fq.gz input/subset_HeP-146-19.fq

args_oap stage_one -i input -o output -f fq
args_oap stage_two -i output

Your files are not gziped. You can simply check whether a file is gzipped using gzip -t.

@bioinfogini
Copy link

Hello @xinehc ,
sorry to jump in but I am facing a similar issue, although I have paired ends files.
So, what I have are Trimmed paired compressed fastq files.
As they are compressed in zip, I decompressed using pigz.
Then, I tried to

  1. convert them with seqkit fq2fa
  2. deinterleave them with reformat.sh from bbmap
  3. run args_oap stage_one -i input2 -o output2 -f fa
    --> returned me WARNING: Something is wrong with <input2/R1.fa>, skip.

Also,
I tried

  1. deinterleave them with reformat.sh from bbmap
  2. check them with seqkit sana
  3. run args_oap stage_one -i input -o output -f fq
    --> returned me WARNING: Something is wrong with <input2/R1.fa>, skip

I can't really understand what is wrong.
I am working with args_oap as conda env on a server, where I don't have timewalls.
Can you please help?
Thanks

I attach a file of mine as proxy. Thanks in advance

https://drive.google.com/file/d/1y-20f7rUJX2cm3rYBIdmxPtdI-iq6qec/view?usp=drive_link

@chanchalrana
Copy link
Author

My issue was resolved.
Firstly, there is no need to convert them to .fa as ARG_OAP can handle fastq gz files too. Just replace the .fa with .fq in the command line.
Second, I was operating on supercomputer and the I was running the args_oap in the command shell. Running this in the command shell was the mistake I was doing. Instead, we have to submit the jobs on supercomputer using sbatch or other predefined commands depending upon the server.

I tried everything but actually this was the error. There was nothing wrong with the sequences.

Also, the same command runs on the local system with the same file that was giving the error on the server.

@bioinfogini
Copy link

bioinfogini commented Jul 16, 2024

@chanchalrana thank a lot for your feedback, I will try to adopt same procedure! My problem is only that I also have to deinterleave my files, as they are paired ends while your were not, if I understood correctly.
Keep you posted!

UPDATE: I was not able to run the args_oap on the server, but worked on my laptop. Thanks @chanchalrana for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants