Browse files

Merge remote branch 'upstream/master'

  • Loading branch information...
2 parents 26addb6 + 5cef1e9 commit 4e1d7a3141a107c7e031e7d5f7ba6f6193457a92 @jfass jfass committed May 29, 2012
@@ -0,0 +1,46 @@
+# History files
+# Example code in package build process
+# Org-mode
+# Thumbnails
+# Files that might appear on external disk
+# Object files
+# Libraries
+# Shared objects (inc. Windows DLLs)
+# Executables
@@ -1,3 +1,4 @@
+MIT License
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
@@ -1,5 +1,5 @@
-VERSION = 0.95
+VERSION = 0.981
CC = gcc
CFLAGS = -Wall -pedantic -DVERSION=$(VERSION) -std=gnu99
DEBUG = -g
@@ -45,4 +45,4 @@ build: match.o scythe.o util.o prob.o
$(CC) $(CFLAGS) $(LDFLAGS) $? -o scythe
- $(MAKE) build "CFLAGS=-Wall -pedantic -g -DDEBUG"
+ $(CC) $(LDFLAGS) $(DEBUG) -o scythe src/*.c
@@ -1,8 +1,13 @@
-# Scythe - A very simple adapter trimmer (version 0.93 BETA)
+# Scythe - A very simple adapter trimmer (version 0.981 BETA)
+Scythe and all supporting documentation
+Copyright (c) Vince Buffalo, 2011-2012
Contact: Vince Buffalo <> (with the poly-A tail removed)
-Copyright (c) 2011 The Regents of University of California, Davis Campus.
+If you wish to report a bug, please open an issue on Github
+( so that it can be
+tracked. You can contact me as well, but please open an issue first.
## About
@@ -73,7 +78,11 @@ or Solexa (pipeline < 1.3) qualities can be specified with -q:
Lastly, a minimum match length argument can be specified with -n <integer>:
- scythe -a adapter_file.fasta -n 4 -o trimmed_sequences.fasta sequences.fastq
+ scythe -a adapter_file.fasta -n 0 -o trimmed_sequences.fasta sequences.fastq
+The default is 5. If this pre-processing is upstream of assembly on a
+very contaminated lane, decreasing this parameter could lead to *very*
+liberal trimming, i.e. of only a few bases.
## Notes
@@ -111,5 +120,94 @@ while Scythe trims off contaminating sequence, leaving valuable reads!
A possible pipeline would run FASTQ reads through Scythe, then
TagDust, then a quality-based trimmer, and finally through a read
quality statistics program such as qrqc
-(<>) or FASTqc
+(<>) or FASTqc
+## FAQ
+### Does Scythe work with paired-end data?
+Scythe does work with paired-end data. Each file must be run
+separately, but Scythe will not remove reads entirely leaving
+mismatched pairs.
+In some cases, barcodes are ligated to both the 3'-end and 5'-end of
+reads. 5'-end removal is trivial since base calling is near-perfect
+there, but 3'-end removal can be trickier. Some users have created
+Scythe adapter files that contain all possible barcodes concatenated
+with possible adapters, so that both can be recognized and
+removed. This has worked well and is recommended for cases when 3'-end
+quality deteriorates and prevents barcode removal. Newer Illumina
+chemistry has the barcode separated from the fragment, so that it
+appears as an entirely separate read and is used to demultiplex sample
+reads by Illumina's CASAVA pipeline.
+### Does Scythe work on 5'-end or other contaminants?
+No. Embracing the Unix tool philosophy that tools should do one thing
+very well, Scythe just removes 3'-end contaminants where there could
+be multiple base mismatches due to poor base quality. N-mismatch
+algorithms (such as TagDust) don't consider base qualities. Scythe
+will allow more mismatches in an alignment if the mismatched bases are
+of low quality.
+**Scythe only checks as far in as the entire adapter contaminant's
+length.** However, some investigation has shown that Illumina
+pipelines sometimes produce reads longer than the read length +
+adapter length. The extra bases have always been observed to be
+A's. Some testing has shown this can be addressed by appending A's to
+the adapters in the adapters file. Since Scythe begins by checking for
+contamination from the 5'-end of the adapter, this won't affect the
+normal adapter contaminant cases.
+### What does the numeric output from Scythe mean?
+For each adapter in the file, the contaminants removed by position are
+returned via standard error. For example:
+ Adapter 1 'fake adapter' contamination occurences:
+ [10, 2, 4, 5, 6]
+indicates that "fake adapter" is 5 bases long (the length of the array
+returned), and that there were 10 contaminants found of first base (-n
+was set to 0 then), 2 of the first two bases, 4 contaminants of the
+first 3 bases, 5 of the first 4 bases, etc.
+### Does Scythe work on FASTA files?
+No, as these have no quality information.
+### How can I report a bug?
+See the section below.
+### How does Scythe compare to program "x"?
+As far as I know, Scythe is the only program that employs a Bayesian
+model that allows prior contaminant estimates to be used. This prior
+is a more realistic approach than setting a fixed number of mismatches
+because we can visually estimate it with the Unix tool `less`.
+Scythe also looks at base-level qualities, *not* just a fixed level of
+mismatches. A fixed number of mismatches is a bad approach with data
+our group (the UC Davis Bioinformatics Core) has seen, as a small bad
+quality run can quickly exhaust even a high numbers of fixed
+mismatches and lead to higher false negatives.
+## Reporting Bugs
+Scythe is free software and is proved without a warranty. However, I
+am proud of this software and I will do my best to provide updates,
+bug fixes, and additional documentation as needed. Please report all
+bugs and issues to Github's issue tracker
+( If you want to email me,
+do so in addition to an issue request.
+If you have a suggestion or comment on Scythe's methods, you can email
+me directly.
+## Is there a paper about Scythe?
+I am currently writing a paper on Scythe's methods. In my preliminary
+testing, Scythe has fewew false positives and false negatives than
+it competitors.
File renamed without changes.
Oops, something went wrong.

0 comments on commit 4e1d7a3

Please sign in to comment.