Permalink
Browse files

Merge remote branch 'upstream/master'

  • Loading branch information...
2 parents 26addb6 + 5cef1e9 commit 4e1d7a3141a107c7e031e7d5f7ba6f6193457a92 @jfass jfass committed May 29, 2012
View
@@ -0,0 +1,46 @@
+# History files
+.Rhistory
+
+# Example code in package build process
+*-Ex.R
+
+*~
+\#*\#
+/.emacs.desktop
+/.emacs.desktop.lock
+.elc
+auto-save-list
+tramp
+.\#*
+
+# Org-mode
+.org-id-locations
+*_archive
+
+.DS_Store
+
+# Thumbnails
+._*
+
+# Files that might appear on external disk
+.Spotlight-V100
+.Trashes
+
+
+# Object files
+*.o
+
+# Libraries
+*.lib
+
+# Shared objects (inc. Windows DLLs)
+*.dll
+*.so
+
+# Executables
+*.exe
+*.out
+
+scythe
+
+*.gch
View
@@ -1,3 +1,4 @@
+MIT License
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without
View
@@ -1,5 +1,5 @@
PROGRAM_NAME = scythe
-VERSION = 0.95
+VERSION = 0.981
CC = gcc
CFLAGS = -Wall -pedantic -DVERSION=$(VERSION) -std=gnu99
DEBUG = -g
@@ -45,4 +45,4 @@ build: match.o scythe.o util.o prob.o
$(CC) $(CFLAGS) $(LDFLAGS) $? -o scythe
debug:
- $(MAKE) build "CFLAGS=-Wall -pedantic -g -DDEBUG"
+ $(CC) $(LDFLAGS) $(DEBUG) -o scythe src/*.c
View
106 README.md
@@ -1,8 +1,13 @@
-# Scythe - A very simple adapter trimmer (version 0.93 BETA)
+# Scythe - A very simple adapter trimmer (version 0.981 BETA)
+
+Scythe and all supporting documentation
+Copyright (c) Vince Buffalo, 2011-2012
Contact: Vince Buffalo <vsbuffaloAAAAA@gmail.com> (with the poly-A tail removed)
-Copyright (c) 2011 The Regents of University of California, Davis Campus.
+If you wish to report a bug, please open an issue on Github
+(http://github.com/vsbuffalo/scythe/issues) so that it can be
+tracked. You can contact me as well, but please open an issue first.
## About
@@ -73,7 +78,11 @@ or Solexa (pipeline < 1.3) qualities can be specified with -q:
Lastly, a minimum match length argument can be specified with -n <integer>:
- scythe -a adapter_file.fasta -n 4 -o trimmed_sequences.fasta sequences.fastq
+ scythe -a adapter_file.fasta -n 0 -o trimmed_sequences.fasta sequences.fastq
+
+The default is 5. If this pre-processing is upstream of assembly on a
+very contaminated lane, decreasing this parameter could lead to *very*
+liberal trimming, i.e. of only a few bases.
## Notes
@@ -111,5 +120,94 @@ while Scythe trims off contaminating sequence, leaving valuable reads!
A possible pipeline would run FASTQ reads through Scythe, then
TagDust, then a quality-based trimmer, and finally through a read
quality statistics program such as qrqc
-(<https://github.com/vsbuffalo/qrqc>) or FASTqc
+(<http://bioconductor.org/packages/devel/bioc/html/qrqc.html>) or FASTqc
(<http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/>).
+
+## FAQ
+
+### Does Scythe work with paired-end data?
+
+Scythe does work with paired-end data. Each file must be run
+separately, but Scythe will not remove reads entirely leaving
+mismatched pairs.
+
+In some cases, barcodes are ligated to both the 3'-end and 5'-end of
+reads. 5'-end removal is trivial since base calling is near-perfect
+there, but 3'-end removal can be trickier. Some users have created
+Scythe adapter files that contain all possible barcodes concatenated
+with possible adapters, so that both can be recognized and
+removed. This has worked well and is recommended for cases when 3'-end
+quality deteriorates and prevents barcode removal. Newer Illumina
+chemistry has the barcode separated from the fragment, so that it
+appears as an entirely separate read and is used to demultiplex sample
+reads by Illumina's CASAVA pipeline.
+
+### Does Scythe work on 5'-end or other contaminants?
+
+No. Embracing the Unix tool philosophy that tools should do one thing
+very well, Scythe just removes 3'-end contaminants where there could
+be multiple base mismatches due to poor base quality. N-mismatch
+algorithms (such as TagDust) don't consider base qualities. Scythe
+will allow more mismatches in an alignment if the mismatched bases are
+of low quality.
+
+**Scythe only checks as far in as the entire adapter contaminant's
+length.** However, some investigation has shown that Illumina
+pipelines sometimes produce reads longer than the read length +
+adapter length. The extra bases have always been observed to be
+A's. Some testing has shown this can be addressed by appending A's to
+the adapters in the adapters file. Since Scythe begins by checking for
+contamination from the 5'-end of the adapter, this won't affect the
+normal adapter contaminant cases.
+
+### What does the numeric output from Scythe mean?
+
+For each adapter in the file, the contaminants removed by position are
+returned via standard error. For example:
+
+ Adapter 1 'fake adapter' contamination occurences:
+ [10, 2, 4, 5, 6]
+
+indicates that "fake adapter" is 5 bases long (the length of the array
+returned), and that there were 10 contaminants found of first base (-n
+was set to 0 then), 2 of the first two bases, 4 contaminants of the
+first 3 bases, 5 of the first 4 bases, etc.
+
+### Does Scythe work on FASTA files?
+
+No, as these have no quality information.
+
+### How can I report a bug?
+
+See the section below.
+
+### How does Scythe compare to program "x"?
+
+As far as I know, Scythe is the only program that employs a Bayesian
+model that allows prior contaminant estimates to be used. This prior
+is a more realistic approach than setting a fixed number of mismatches
+because we can visually estimate it with the Unix tool `less`.
+
+Scythe also looks at base-level qualities, *not* just a fixed level of
+mismatches. A fixed number of mismatches is a bad approach with data
+our group (the UC Davis Bioinformatics Core) has seen, as a small bad
+quality run can quickly exhaust even a high numbers of fixed
+mismatches and lead to higher false negatives.
+
+## Reporting Bugs
+
+Scythe is free software and is proved without a warranty. However, I
+am proud of this software and I will do my best to provide updates,
+bug fixes, and additional documentation as needed. Please report all
+bugs and issues to Github's issue tracker
+(http://github.com/vsbuffalo/scythe/issues). If you want to email me,
+do so in addition to an issue request.
+
+If you have a suggestion or comment on Scythe's methods, you can email
+me directly.
+
+## Is there a paper about Scythe?
+
+I am currently writing a paper on Scythe's methods. In my preliminary
+testing, Scythe has fewew false positives and false negatives than
+it competitors.
File renamed without changes.
Oops, something went wrong.

0 comments on commit 4e1d7a3

Please sign in to comment.