Permalink
Browse files

replaced adapter files and added explanatory text to README

  • Loading branch information...
jfass committed May 29, 2012
1 parent b20c02b commit 26addb6458199fe7ee79990161a88af3b9ea736b
Showing with 29 additions and 25 deletions.
  1. +0 −2 3primeContam_new.fa
  2. +0 −2 3primeContam_new.fa~
  3. +0 −4 3primeContam_old.fa
  4. +21 −17 README.md
  5. +4 −0 solexa_adapters.fa
  6. +4 −0 truseq_adapters.fasta
View
@@ -1,2 +0,0 @@
->contam_new
-AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[barcode]ATCTCGTATGCCGTCTTCTGCTTGAAAAA
View
@@ -1,2 +0,0 @@
->IlluminaTruSeqAdapter
-AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[barcode]ATCTCGTATGCCGTCTTCTGCTTGAAAAA
View
@@ -1,4 +0,0 @@
->contam1
-AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA
->contam2
-AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
View
@@ -77,23 +77,27 @@ Lastly, a minimum match length argument can be specified with -n <integer>:
## Notes
-Included in this distribution are two fasta files containing sequences
-that we've observed as frequent contaminants at the 3'-ends of Illumina
-reads, that we interpret to come from adapter sequences. We are in no way
-passing on sequence information from Illumina, Inc., and we make no claims
-about the accuracy or identity of the sequences in "3primeContam_old.fa"
-and "3primeContam_new.fa" ... USE AT YOUR OWN RISK!
-If you are multiplexing samples using "in-line" barcodes (nucleotides added
-to one or the other Solexa adapter sequence), you will need to prepend the
-reverse-complement of the barcode to one or both of the adapter sequences
-in "3primeContam_old.fa" (depending on whether barcodes were added
-to one or both adapter). If you are multiplexing samples using Illumina's
-TruSeq adapter, you should replace the text "[barcode]" in
-"3primeContam_new.fa" with the 6 bp sequence found in the filenames
-of reads demultiplexed by Illumina's pipeline, e.g.:
-FOO_ACAGTG_L006_R1_001.fastq.gz
-... ACAGTG is how the barcode will appear within the adapter sequence
-provided.
+Note that the two provided adapter sequence files contain non-FASTA
+characters to denote the locations of barcode sequences, which always
+appear in TruSeq adapters, and may or may not appear in forward and/or
+reverse reads using the original Solexa/Illumina adapter sequences,
+depending on library preparation. You'll need to modify the adapter
+sequence files in order to use them.
+
+In the case of the original Solexa/Illumina adapter sequences, we've seen
+barcodes "upstream" of forward reads (in which case the reverse complement
+of the barcode will appear before the adapter sequence at the 3'-end of
+reverse reads - replacing the [NNNNNN]). We've also seen barcodes upstream
+of reverse reads (in which case the reverse complement of the barcode will
+appear before the adapter sequence at the 3'-end of forward reads -
+replacing the [MMMMMM]). Your definition of the barcode may be someone
+else's reverse-complemented barcode, and the barcode may or may not be 6
+bases.
+
+In the case of TruSeq adapter sequences, there will always be a 6 bp
+barcode in place of the [NNNNNN] in sequence contaminating forward reads
+(if the fragment is short enough, of course). This barcode sequence should
+match the barcode included in the reads' FASTQ headers.
Scythe only checks for 3'-end contaminants, up to the adapter's length
into the 3'-end. For reads with contamination in *any* position, the
View
@@ -0,0 +1,4 @@
+>Solexa_forward_contam
+[MMMMMM]AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA
+>Solexa_reverse_contam
+[NNNNNN]AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAA
View
@@ -0,0 +1,4 @@
+>TruSeq_forward_contam
+AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[NNNNNN]ATCTCGTATGCCGTCTTCTGCTTGAAAAA
+>TruSeq_reverse_contam
+AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAA

0 comments on commit 26addb6

Please sign in to comment.