New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tagseq processing not recognizing fastq headers #3
Comments
hi laura - easily solvable! can you please poke me tomorrow if i forget to reply? in short, "header" is not the read title, but the lead 5' portion of the read used for de-duplication. do you have those in your quant-seq?
… 28 апр. 2020 г., в 22:01, Laura H Spencer ***@***.***> написал(а):
I'm following your tagSeq_processing_README.txt protocol to trim and filter reads, generated from QuantSeq libraries run on an Illumina NovaSeq platform this month. The output indicates that a very large portion of my reads do not have headers:
Upon inspection, the fastq files don't appear to lack headers, but I'm wondering if the tagseq_clipper.pl script is looking for a different header format? My headers are in the following format:
Here are abbreviated versions of an untrimmed file, and the trimmed file showing reads that passed the tagseq_clipper.pl script:
example_files.zip
I admittedly am unfamiliar with perl scripts, so any help would be great.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Ah, good to know! The QuantSeq manual/FAQ doesn't indicate whether or not deduplication is necessary (below is a screen shot of their recommended trimming), but my data is single-read without UMIs, and from a couple things I've read online deduplication isn't recommended (or possible?) for this type of data. Let me know if you think otherwise! Recommended trimming according to QuantSeq's FAQ: |
Hi Laura - my position is that deduplication is always needed because
otherwise your counts-based stats (like DESeq2) are not valid; plus it
removes noise due to over-dispersion of amplified counts. That said, if you
don't have means to deduplicate you have no choice. Fortunately, it is
still OK to publish stuff based on non-deduped data!
so why do you want to use the tagseq pipeline, if I may ask?.. there is
really nothing special to it, except maybe deduplication :) What is the
reference you are going to map to?
cheers
Misha
…On Thu, Apr 30, 2020 at 1:38 PM Laura H Spencer ***@***.***> wrote:
Ah, good to know! The QuantSeq manual/FAQ doesn't indicate whether or not
deduplication is necessary (below is a screen shot of their recommended
trimming), but my data is single-read without UMIs, and from a couple
things I've read online deduplication isn't recommended (or possible?) for
this type of data. Let me know if you think otherwise!
Recommended trimming according to QuantSeq's FAQ
<https://www.lexogen.com/quantseq-3mrna-sequencing/#quantseqfaq>:
[image: image]
<https://user-images.githubusercontent.com/17264765/80649399-c82cf000-8a26-11ea-967f-bd91d21584d0.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZUHGARKQ4J6LZGT5WV7OTRPHARTANCNFSM4MTLFU6A>
.
|
Hi Misha- Regarding deduplication, that's interesting to know, and I'll definitely have to do more reading on the matter. I'm now wondering if there is a tool I can use to identify duplicates based on the read sequences themselves (i.e. identical sequences), despite not having paired data or molecular identifiers... if you know of any, please let me know! Thanks for all you help! |
Hi Laura - I see!
If you map to genome, my pipeline is really not too useful. Just use any mapper of your choice and then featureCounts to extract counts (you might wish to adjust your genome’s GFF file extend gene regions 1-2kb towards 3’ ; otherwise gene annotations are often missing the non-coding 3’regions where our reads are mapping)
Yes, you can mark duplicates just based on reads, using Picard tool. Still, since in quant-seq your reads will be piled up in a relatively narrow region near 3', there is a danger of over-deduping (i.e. some reads might legitimately map to the same place because there is not much choice where they could map). Check in IGV viewer how your read pile-ups look.
(both IGV viewer and Picard are tools by Broad institute)
cheers
Misha
… On May 1, 2020, at 3:29 PM, Laura H Spencer ***@***.***> wrote:
Hi Misha-
I used your pipeline back in fall 2018 on some pilot QuantSeq data, at the suggestion of a colleague. It worked well then, but I don't think you had incorporated deduplication yet (?). I will probably depart from your process a bit, now that I more fully understand what your pipeline is intended for. I will align data to the Olympia oyster (Ostrea lurida) genome, which my lab <https://faculty.washington.edu/sr320/> developed.
Regarding deduplication, that's interesting to know, and I'll definitely have to do more reading on the matter. I'm now wondering if there is a tool I can use to identify duplicates based on the read sequences themselves (i.e. identical sequences), despite not having paired data or molecular identifiers... if you know of any, please let me know! Thanks for all you help!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZUHGGDUHSCUQK7MXMPHPLRPMWMBANCNFSM4MTLFU6A>.
|
I'm following your tagSeq_processing_README.txt protocol to trim and filter reads, generated from QuantSeq libraries run on an Illumina NovaSeq platform this month. The output indicates that a very large portion of my reads do not have headers:
Upon inspection, the fastq files don't appear to lack headers, but I'm wondering if the tagseq_clipper.pl script is looking for a different header format? My headers are in the following format:
Here are abbreviated versions of an untrimmed file, and the trimmed file showing reads that passed the tagseq_clipper.pl script:
example_files.zip
I admittedly am unfamiliar with perl scripts, so any help would be great.
The text was updated successfully, but these errors were encountered: