Permalink
Browse files

Merge pull request #1 from jfass/master

revamped adapter files (one for older solexa adapters, one for Illumina's current TruSeq adapter), and added description of adapters and how to modify with barcodes
  • Loading branch information...
2 parents 5cef1e9 + 4e1d7a3 commit 872a54c996a1a9f5b3f5210d92bcbd8d7efcaa04 @vsbuffalo committed Jun 1, 2012
Showing with 32 additions and 6 deletions.
  1. +24 −2 README.md
  2. +4 −4 illumina_adapters.fa
  3. +4 −0 truseq_adapters.fasta
View
26 README.md
@@ -86,14 +86,36 @@ liberal trimming, i.e. of only a few bases.
## Notes
+Note that the two provided adapter sequence files contain non-FASTA
+characters to denote the locations of barcode sequences, which always
+appear in TruSeq adapters, and may or may not appear in forward and/or
+reverse reads using the original Solexa/Illumina adapter sequences,
+depending on library preparation. You'll need to modify the adapter
+sequence files in order to use them.
+
+In the case of the original Solexa/Illumina adapter sequences, we've seen
+barcodes "upstream" of forward reads (in which case the reverse complement
+of the barcode will appear before the adapter sequence at the 3'-end of
+reverse reads - replacing the [NNNNNN]). We've also seen barcodes upstream
+of reverse reads (in which case the reverse complement of the barcode will
+appear before the adapter sequence at the 3'-end of forward reads -
+replacing the [MMMMMM]). Your definition of the barcode may be someone
+else's reverse-complemented barcode, and the barcode may or may not be 6
+bases.
+
+In the case of TruSeq adapter sequences, there will always be a 6 bp
+barcode in place of the [NNNNNN] in sequence contaminating forward reads
+(if the fragment is short enough, of course). This barcode sequence should
+match the barcode included in the reads' FASTQ headers.
+
Scythe only checks for 3'-end contaminants, up to the adapter's length
into the 3'-end. For reads with contamination in *any* position, the
program TagDust (<http://genome.gsc.riken.jp/osc/english/dataresource/>)
is recommended. Scythe has the advantages of allowing fuzzier matching
and being base quality-aware, while TagDust has the advantages of very
fast matching (but allowing few mismatches, and not considering
-quality) and FDR. TagDust also removes contaminated reads *entirely*, while
-Scythe trims off contaminants.
+quality) and FDR. Note that TagDust removes contaminated reads *entirely*,
+while Scythe trims off contaminating sequence, leaving valuable reads!
A possible pipeline would run FASTQ reads through Scythe, then
TagDust, then a quality-based trimmer, and finally through a read
View
8 illumina_adapters.fa
@@ -1,4 +1,4 @@
->Solexa 3' adapter
-AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
->Solexa 3' adapter (alt)
-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
+>Solexa_forward_contam
+[MMMMMM]AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA
+>Solexa_reverse_contam
+[NNNNNN]AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
View
4 truseq_adapters.fasta
@@ -0,0 +1,4 @@
+>TruSeq_forward_contam
+AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[NNNNNN]ATCTCGTATGCCGTCTTCTGCTTGAAAAA
+>TruSeq_reverse_contam
+AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA

0 comments on commit 872a54c

Please sign in to comment.