Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tagseq processing not recognizing fastq headers #3

Closed
laurahspencer opened this issue Apr 29, 2020 · 5 comments
Closed

tagseq processing not recognizing fastq headers #3

laurahspencer opened this issue Apr 29, 2020 · 5 comments

Comments

@laurahspencer
Copy link

I'm following your tagSeq_processing_README.txt protocol to trim and filter reads, generated from QuantSeq libraries run on an Illumina NovaSeq platform this month. The output indicates that a very large portion of my reads do not have headers:

image

Upon inspection, the fastq files don't appear to lack headers, but I'm wondering if the tagseq_clipper.pl script is looking for a different header format? My headers are in the following format:

image

Here are abbreviated versions of an untrimmed file, and the trimmed file showing reads that passed the tagseq_clipper.pl script:
example_files.zip

I admittedly am unfamiliar with perl scripts, so any help would be great.

@laurahspencer laurahspencer changed the title tagseq_clipper.pl not recognizing fastq header tagseq processing not recognizing fastq headers Apr 29, 2020
@z0on
Copy link
Owner

z0on commented Apr 29, 2020 via email

@laurahspencer
Copy link
Author

Ah, good to know! The QuantSeq manual/FAQ doesn't indicate whether or not deduplication is necessary (below is a screen shot of their recommended trimming), but my data is single-read without UMIs, and from a couple things I've read online deduplication isn't recommended (or possible?) for this type of data. Let me know if you think otherwise!

Recommended trimming according to QuantSeq's FAQ:

image

@z0on
Copy link
Owner

z0on commented May 1, 2020 via email

@laurahspencer
Copy link
Author

Hi Misha-
I used your pipeline back in fall 2018 on some pilot QuantSeq data, at the suggestion of a colleague. It worked well then, but I don't think you had incorporated deduplication yet (?). I will probably depart from your process a bit, now that I more fully understand what your pipeline is intended for. I will align data to the Olympia oyster (Ostrea lurida) genome, which my lab developed.

Regarding deduplication, that's interesting to know, and I'll definitely have to do more reading on the matter. I'm now wondering if there is a tool I can use to identify duplicates based on the read sequences themselves (i.e. identical sequences), despite not having paired data or molecular identifiers... if you know of any, please let me know! Thanks for all you help!

@z0on
Copy link
Owner

z0on commented May 1, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants