-
Notifications
You must be signed in to change notification settings - Fork 12
Closed
Description
As per title.
cat test.fastq@test_1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
0,-/(/--/133451659;6875662/''$.110%&)(./1137540.+((&(##%&$)***,-30..*-,/20/0.,/124466640(-.',+)*,/.-,,+-&$$#%*(+,+/2/%'((&&$*+'*++'(()$%+-..1374,--'-&$$$$'.0/00340.--*++0/003513.0&&'-00,./03240/-,-.,,014'%%%0211+04753./,+1(*/222.&-,(()$$%$$/32300./2&,0%)00/6//00-+%&%&&'&/+/22321+-,'%
@test_2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
AA=96802,2-:9539;9897,+56771<:;<<;;4:;8008.<>;92@BBBA?(&54<=;4233334/$&&+2257./7421/4;B?;A5699@<:@:-)8145.335000
Manual implementation:
#! /usr/bin/env python
from statistics import mean
from Bio.SeqIO.QualityIO import FastqGeneralIterator
import math
def decode(c):
return ord(c) - 33
def phred_to_probability(phred_score):
return 10**(-phred_score/10)
def probability_to_phred(error_probability):
return -10 * math.log10(error_probability)
def raw_q_to_probability(q):
return phred_to_probability(decode(q))
def calc_stats(fastq):
mean_read_errors = []
mean_read_quals = []
with open(fastq) as in_handle:
for _title, _seq, qual in FastqGeneralIterator(in_handle):
error_probabilities = list(map(raw_q_to_probability, qual))
mean_read_error = mean(error_probabilities)
mean_read_errors.append(mean_read_error)
mean_read_quals.append(probability_to_phred(mean_read_error))
print(f"Averaging errors: {probability_to_phred(mean(mean_read_errors))}")
print(f"Averaging quals: {mean(mean_read_quals)}")
calc_stats("test.fastq")(base) mambauser@1:/phred_calc$ python3 mean_across_reads.py
Averaging errors: 11.13979284230169
Averaging quals: 12.12473614554866NanoStat:
(base) mambauser@1:/phred_calc$ NanoStat --fastq test.fastq -n nanostat.txt
(base) mambauser@1:/phred_calc$ cat nanostat.txt
General summary:
Mean read length: 198.0
Mean read quality: 12.1
Median read length: 198.0
Median read quality: 12.1
Number of reads: 2.0
Read length N50: 284.0
STDEV read length: 86.0
Total bases: 396.0
Number, percentage and megabases of reads above quality cutoffs
>Q5: 2 (100.0%) 0.0Mb
>Q7: 2 (100.0%) 0.0Mb
>Q10: 1 (50.0%) 0.0Mb
>Q12: 1 (50.0%) 0.0Mb
>Q15: 1 (50.0%) 0.0Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 15.2 (112)
2: 9.1 (284)
3: NA
4: NA
5: NA
Top 5 longest reads and their mean basecall quality score
1: 284 (9.1)
2: 112 (15.2)
3: NA
4: NA
5: NAThank you!!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels