Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality threshold issue with the Sanger Analysis Workflow #28

Closed
hadim opened this issue Feb 22, 2017 · 11 comments
Closed

Quality threshold issue with the Sanger Analysis Workflow #28

hadim opened this issue Feb 22, 2017 · 11 comments

Comments

@hadim
Copy link

hadim commented Feb 22, 2017

Using Tools > Sanger Data Analysis > Read quality control and alignement, in the FASTQ Quality Trimmer box if I set the quality greater than 0, all the sequences are discarded.

I think the FASTQ conversion get rid of the quality or set it to 0.

My input files are chromatograms (.abi). One example file can be found there : https://drive.google.com/open?id=0Bwt2y0cMI-oyRm1fVHpOcFhMYlU

@hadim
Copy link
Author

hadim commented Feb 22, 2017

The quality plot of that file is supposed to look like this.

quality

@hadim
Copy link
Author

hadim commented Feb 23, 2017

ping @dkandrov

Is it ok to open new issues here ?

@hadim
Copy link
Author

hadim commented Feb 25, 2017

I have also tried to convert manually ab1 file to fastaq with Ugene and the fastq file quality values are all set to ! :

@pFB7-CKAP5
TCCTACTGTCGCCTCTTCCACAGACATGCTCCACAGCAAACTCTCTCAGCTCCGGGAGTCACGGGAGCAGCACCAGCATT
CAGACCTGGATTCTAACCAGACTCACTCTTCAGGAACTGTGACCTCCTCCTCCTCCACAGCTAACATAGACGACTTGAAA
AAAAGACTGGAGAGAATAAAGAGCAGTCGCAAAAGTTCCCCTCAGCATCACCATCACCATCACCATTAGCTGCAGTCTCG
AGGCATGCGGTACCAAGCTTGTCGAGAAGTACTAGAGGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCT
TTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGC
TTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTT
TGTCCAAACTCATCAATGTATCTTATCATGTCTGGATCTGATCACTGCTTGAGCCTAGGAGATCCGAACCAGATAAGTGA
AATCTAGTTCCAAACTATTTTGTCATTTTTAATTTTCGTATTAGCTTACGACGCTACACCCAGTTCCCATCTATTTTGTC
ACTCTTCCCTAAATAATCCTTAAAAACTCCATTTCCACCCCTCCCAGTTCCCAACTATTTTGTCCGCCCACAGCGGGGCA
TTTTTCTTCCTGTTATGTTTTTAATCAAACATCCTGCCAACTCCATGTGACAAACCGTCATCTTCGGCTACTTTTTCTCT
GTCACAGAATGAAAATTTTTCTGTCATCTCTTCGTTATTAATGTTTGTAATTGACTGAATATCAACGCTTATTTGCAGCC
TGAATGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGGGTGGTGGTTACGCGCAGCGTGACCGCTACA
CTTGCCAGNGNCCTAGCGCCNGCTCCTTTNCGCTTTCTTCCCTTCCNTTTCTCGCCACGNTCGCCGGGNTTTCCCCGTTC
AAGCTCTAAANTCGGGGGCTCCCTTTAGGGGTTCCGAATTNAGNGCTTTACGGCNCCNCNGACCCCNAAAAACTTGGATN
AGGGGGGANGGNTNCCGGTAGGGGGCCNTCCCCCNTGANAAACGGGTTTTTCNNCCCTTTGNANGTNGGNGGNCCCNGNT
TCTTAANAGGGGAACNTTGNTTCCNANNNGGAAAAACCCCACCCCNTTTNCGGGCTTTTNTTTNGATTTNAANGGNATTT
TGCNCANTTCGCCNNTNGGTTAAAAAGGGGGTNNTTTANAAANTTNNCCGGAATTN
+
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

@dkandrov
Copy link
Contributor

Hello, @hadim!
Sorry, for late response.
In your sample file a quality data is absent. For example I opened your file and my with a quality data:
abi_files_difference
So this fact is the main reason, why the workflow is not working. Unfortunately the Read quality control and alignement workflow works only with files that contains a quality data.
As a solution you can create this data by using the PHRED tool and then use prepared files in our workflow.

About your question of creating issue here, it is not a problem. But our issue tracker is more prefered.

@hadim
Copy link
Author

hadim commented Feb 27, 2017

it makes sense now, I thought, quality values was computed on the fly from the chromatogram plot... Would it make sense for Ugene to integrate PHRED as an external tool so people can compute quality values when they are missing from chromatograms ?

@hadim
Copy link
Author

hadim commented Feb 27, 2017

I realized that my sequencing service also provide .qual files along with chromatogram files. I'll try to integrate that into a workflow.

@hadim
Copy link
Author

hadim commented Feb 27, 2017

Here is an example of a quality file provided :

17 21 11 11 12 22 25 37 37 32 32 32 25 21 10 10 10 17 21 30 36 36 38 38 42 27 27 11 10 10 26 23 47 47 47 50 50 53 53 68 68 68 68 68 68 68 68 68 68 68 68 68 68 51 51 48 32 20 13 11 11 11 23 23 44 59 59 57 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 51 51 51 51 51 51 51 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 50 50 50 51 50 51 68 68 68 51 51 51 51 50 50 50 51 68 68 68 68 68 68 57 57 57 57 57 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 68 59 59 57 57 57 57 57 57 59 59 59 59 57 57 57 57 57 53 59 59 59 59 59 59 59 44 44 44 37 37 44 44 44 47 59 59 59 59 59 59 59 59 59 59 59 59 48 47 47 42 47 44 44 44 41 41 32 37 37 37 37 37 59 59 50 50 50 50 48 47 47 47 47 47 41 41 37 37 37 37 37 32 31 41 41 37 37 44 44 47 59 59 59 59 41 37 32 32 32 41 41 37 32 37 37 37 39 37 37 27 25 27 27 28 37 43 59 59 47 41 41 24 17 16 14 14 21 34 43 41 37 37 43 59 41 41 41 39 39 40 40 39 37 37 32 29 29 29 25 32 27 26 31 27 25 28 36 37 59 37 37 31 35 35 34 34 36 36 35 29 28 31 30 25 31 27 24 25 26 19 19 20 20 16 27 15 18 18 28 27 36 31 35 36 36 27 26 25 27 36 34 23 18 14 16 18 17 30 28 35 31 31 31 34 37 37 33 29 22 23 19 22 20 17 22 21 8 8 8 10 9 9 11 11 10 10 15 14 10 10 10 11 15 23 23 19 19 23 19 22 25 25 25 30 25 15 15 13 15 22 22 31 24 24 28 12 17 10 10 10 10 9 9 9 8 9 8 11 14 9 10 10 11 11 11 11 11 11 11 11 15 13 22 20 19 24 13 10 8 8 8 10 10 14 13 19 11 9 9 14 15 13 11 10 10 10 8 9 9 8 8 8 8 8 8 8 11 10 15 18 20 17 19 12 12 9 9 9 13 15 8 8 9 12 12 10 10 10 10 10 9 11 11 9 11 9 8 8 8 8 8 8 11 11 23 16 28 24 21 16 11 11 8 8 8 8 8 8 10 10 10 12 13 17 17 26 33 27 14 12 12 10 13 11 11 8 8 10 9 11 9 11 9 8 8 8 10 10 10 10 10 10 10 12 13 9 10 10 7 7 9 11 10 10 10 9 9 9 8 8 8 9 9 16 30 19 14 14 9 9 10 18 16 14 14 15 15 10 10 10 10 9 9 9 8 8 8 8 8 8 8 8 14 11 15 8 8 8 8 9 9 10 9 9 8 8 8 8 8 8 9 9 9 17 12 14 10 10 10 12 10 12 11 8 8 6 6 6 6 8 9 9 9 8 8 8 8 8 9 10 11 11 11 11 12 12 10 11 14 11 11 9 10 10 9 9 8 8 8 8 8 8 11 17 17 23 17 8 8 8 8 8 8 8 10 8 21 15 13 8 8 9 10 9 9 6 6 6 6 6 6 8 10 6 7 6 6 6 6 8 8 8 9 9 9 10 10 14 14 9 8 6 6 6 8 7 12 11 11 6 6 6 6 6 6 10 10 10 8 8 10 11 11 8 9 9 10 10 10 10 9 8 8 8 8 8 8 8 8 9 11 9 9 9 8 8 8 8 8 8 8

@hadim
Copy link
Author

hadim commented Feb 27, 2017

  1. The quality file raises an error when using in Ugene because it does not contains the >Name of the Sequence first line. Would it be possible for Ugene to read the quality file even without this first line ? I am afraid my sequencing service won't fix that anytime soon... And I would like to avoid as much as possible pre-processing files since I am not the only one using Ugene to analyze Sanger sequencing results.

  2. Also would it be possible for the "Import PHRED Qualities" Workflow Element to have a list of quality files ? So it would be possible to match it with a File List element ?

Thank you.

@dkandrov
Copy link
Contributor

dkandrov commented Mar 2, 2017

About PHRED: This tool has strong and useless license for us. It is free only for academic use and we can't distribute it with our program. So in the near future I think we will not integrate it into UGENE.
I found free analogue TraceTuner, but as I understand it is not supported and last changes were in 2009 year.

Another solution is to add supporting .qual files like your files. To do this, please tell us what model of the sequencer are you using? Is this type of .qual files a standard for many sequencers? Some documentation or articles will be useful for us.

@dkandrov
Copy link
Contributor

dkandrov commented Mar 2, 2017

About "Import PHRED Qualities" Workflow Element: Using of this element for your data can't help you, because for this element you need quality in PHRED format. Something like this:

>HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1
efcfffffcfeefffcffffffddf`feed]`]_Ba_^__[YBBBBBBBBBBRTT\]][]dddd`ddd^dddadd^BBBBBBBBBBBBBBBBBBBBBBBB

@hadim
Copy link
Author

hadim commented Mar 2, 2017

I am using the university sequencing service platform so I don't know what type of sequencer they're using.

About the .qual files, I don't know if it's a standard.

But since I can download fastq files from the sequencing service, it's ok for me to not use .ab1 + .qual files. See https://local.ugene.unipro.ru/tracker/browse/UGENE-5529 for details.

Thank you.

@hadim hadim closed this as completed Mar 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants