Skip to content
Newer
Older
100644 486 lines (357 sloc) 17.6 KB
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
1 =======================
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
2 README for MACS (1.4.2)
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
3 =======================
4 Time-stamp: <2012-03-19 17:43:47 Tao Liu>
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
5
6 Introduction
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
7 ============
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
8
9 With the improvement of sequencing techniques, chromatin
10 immunoprecipitation followed by high throughput sequencing (ChIP-Seq)
11 is getting popular to study genome-wide protein-DNA interactions. To
12 address the lack of powerful ChIP-Seq analysis method, we present a
13 novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for
14 identifying transcript factor binding sites. MACS captures the
15 influence of genome complexity to evaluate the significance of
16 enriched ChIP regions, and MACS improves the spatial resolution of
17 binding sites through combining the information of both sequencing tag
18 position and orientation. MACS can be easily used for ChIP-Seq data
19 alone, or with control sample with the increase of specificity.
20
21 Install
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
22 =======
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
23
24 Please check the file 'INSTALL' in the distribution.
25
26 Usage
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
27 =====
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
28
29 Parameters:
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
30 -----------
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
31
32 -t/--treatment FILENAME
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
33 ~~~~~~~~~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
34
35 This is the only REQUIRED parameter for MACS. If the format is
36 ELANDMULTIPET, user must provide two treatment files separated by
37 comma, e.g. s_1_1_eland_multi.txt,s_1_2_eland_multi.txt.
38
39 -c/--control
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
40 ~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
41
42 The control or mock data file in either BED format or any ELAND output
43 format specified by --format option. Please follow the same direction
44 as for -t/--treatment.
45
46 -n/--name
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
47 ~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
48
49 The name string of the experiment. MACS will use this string NAME to
50 create output files like 'NAME_peaks.xls', 'NAME_negative_peaks.xls',
51 'NAME_peaks.bed' , 'NAME_summits.bed', 'NAME_model.r' and so on. So
52 please avoid any confliction between these filenames and your existing
53 files.
54
55 -f/--format FORMAT
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
56 ~~~~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
57
58 Format of tag file, can be "ELAND", "BED", "ELANDMULTI",
59 "ELANDEXPORT", "ELANDMULTIPET" (for pair-end tags), "SAM", "BAM" or
60 "BOWTIE". Default is "AUTO" which will allow MACS to decide the format
61 automatically. Please use "AUTO" only when you combine different
62 formats of files.
63
64 The BED format is defined in "http://genome.ucsc.edu/FAQ/FAQformat#format1".
65
66 If the format is ELAND, the file must be ELAND result output file,
67 each line MUST represents only ONE tag, with fields of:
68
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
69 1. Sequence name (derived from file name and line number if format is
70 not Fasta)
71
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
72 2. Sequence
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
73
74 3. Type of match
75
76 :NM: no match found.
77
78 :QC: no matching done: QC failure (too many Ns basically).
79
80 :RM: no matching done: repeat masked (may be seen if repeatFile.txt was specified).
81
82 :U0: Best match found was a unique exact match.
83
84 :U1: Best match found was a unique 1-error match.
85
86 :U2: Best match found was a unique 2-error match.
87
88 :R0: Multiple exact matches found.
89
90 :R1: Multiple 1-error matches found, no exact matches.
91
92 :R2: Multiple 2-error matches found, no exact or 1-error matches.
93
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
94 4. Number of exact matches found.
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
95
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
96 5. Number of 1-error matches found.
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
97
98 6. Number of 2-error matches found. Rest of fields are only seen if a
99 unique best match was found (i.e. the match code in field 3 begins
100 with "U").
101
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
102 7. Genome file in which match was found.
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
103
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
104 8. Position of match (bases in file are numbered starting at 1).
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
105
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
106 9. Direction of match (F=forward strand, R=reverse).
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
107
108 10. How N characters in read were interpreted: ("."=not applicable,
109 "D"=deletion, "I"=insertion). Rest of fields are only seen in the
110 case of a unique inexact match (i.e. the match code was U1 or U2).
111
112 11. Position and type of first substitution error (e.g. 12A: base 12
113 was A, not whatever is was in read).
114
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
115 12. Position and type of first substitution error, as above.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
116
117 If the format is ELANDMULTI, the file must be ELAND output file from
118 multiple-match mode, each line MUST represents only ONE tag, with
119 fields of:
120
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
121 1. Sequence name
122
123 2. Sequence
124
125 3. Either NM, QC, RM (as described above) or the following:
126
127 4. x:y:z where x, y, and z are the number of exact, single-error, and
128 2-error matches found.
129
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
130 5. Blank, if no matches found or if too many matches found, or the
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
131 following: BAC_plus_vector.fa:163022R1,170128F2,E_coli.fa:3909847R1
132 This says there are two matches to BAC_plus_vector.fa: one in the
133 reverse direction starting at position 160322 with one error, one
134 in the forward direction starting at position 170128 with two
135 errors. There is also a single-error match to E_coli.fa.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
136
137 If the data is from Pair-End sequencing. You can sepecify the format
138 as ELANDMULTIPET ( stands for ELAND Multiple-match Pair-End Tags),
139 then the --treat (and --control if needed) parameter must be two file
140 names separated by comma. Each file must be in ELAND multiple-match
141 format described above. e.g. ::
142
143 macs14 --format ELANDMULTIPET -t s_1_1_eland_multi.txt,s_2_1_eland_multi.txt ...
144
145 If you use ELANDMULTIPET, you may need to modify --petdist parameter.
146
147 If the format is BAM/SAM, please check the definition in
148 (http://samtools.sourceforge.net/samtools.shtml). Pair-end mapping
149 results can be saved in a single BAM file, if so, MACS will
150 automatically keep the left mate(5' end) tag.
151
152 If the format is BOWTIE, you need to provide the ASCII bowtie output
153 file with the suffix '.map'. Please note that, you need to make sure
154 that in the bowtie output, you only keep one location for one
155 read. Check the bowtie manual for detail if you want at
156 (http://bowtie-bio.sourceforge.net/manual.shtml)
157
158 Here is the definition for Bowtie output in ASCII characters I copied
159 from the above webpage:
160
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
161 1. Name of read that aligned
162
163 2. Orientation of read in the alignment, - for reverse complement, +
164 otherwise
165
166 3. Name of reference sequence where alignment occurs, or ordinal ID if
167 no name was provided
168
169 4. 0-based offset into the forward reference strand where leftmost
170 character of the alignment occurs
171
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
172 5. Read sequence (reverse-complemented if orientation is -)
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
173
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
174 6. ASCII-encoded read qualities (reversed if orientation is -). The
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
175 encoded quality values are on the Phred scale and the encoding is
176 ASCII-offset by 33 (ASCII char !).
177
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
178 7. Number of other instances where the same read aligns against the
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
179 same reference characters as were aligned against in this
180 alignment. This is not the number of other places the read aligns
181 with the same number of mismatches. The number in this column is
182 generally not a good proxy for that number (e.g., the number in
183 this column may be '0' while the number of other alignments with
184 the same number of mismatches might be large). This column was
185 previously described as "Reserved".
186
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
187 8. Comma-separated list of mismatch descriptors. If there are no
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
188 mismatches in the alignment, this field is empty. A single
189 descriptor has the format offset:reference-base>read-base. The
190 offset is expressed as a 0-based offset from the high-quality (5')
191 end of the read.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
192
193 Notes:
194
fa5d3c7 @taoliu Update rst.
authored Mar 19, 2012
195 1. For BED format, the 6th column of strand information is required by
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
196 MACS. And please pay attention that the coordinates in BED format
197 is zero-based and half-open
198 (http://genome.ucsc.edu/FAQ/FAQtracks#tracks1).
199
fa5d3c7 @taoliu Update rst.
authored Mar 19, 2012
200 2. For plain ELAND format, only matches with match type U0, U1 or U2
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
201 is accepted by MACS, i.e. only the unique match for a sequence with
202 less than 3 errors is involed in calculation. If multiple hits of a
203 single tag are included in your raw ELAND file, please remove the
204 redundancy to keep the best hit for that sequencing tag.
205
fa5d3c7 @taoliu Update rst.
authored Mar 19, 2012
206 3. For the experiment with several replicates, it is recommended to
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
207 concatenate several ChIP-seq treatment files into a single file. To
208 do this, under Unix/Mac or Cygwin (for windows OS), type::
fa5d3c7 @taoliu Update rst.
authored Mar 19, 2012
209
210 cat replicate1.bed replicate2.bed replicate3.bed > all_replicates.bed
211
212 4. ELAND export format support sometimes may not work on your
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
213 datasets, because people may mislabel the 11th and 12th
214 column. MACS uses 11th column as the sequence name which should be
215 the chromosome names.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
216
217 --petdist=PETDIST
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
218 ~~~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
219
220 Best distance between Pair-End Tags. Only available when format is
221 'ELANDMULTIPE'. Default is 200bps. When MACS reads mapped positions
222 for 5' tag and 3' tag, it will decide the best pairing for them using
223 this best distance parameter. A simple scoring system is used as following::
224
225 score = abs(abs(p5-p3)-200)+e5+e5
226
227 Where p5 is one of the position of 5' tag, and e5 is the
228 mismatch/error for this mapped position of 5' tag. p3 and e3 are for
229 3' tag. Then the lowest scored paring is regarded as the best
230 pairing. The 5' tag position of the pair is kept in model building and
231 peak calling.
232
233 -g/--gsize
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
234 ~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
235
236 PLEASE assign this parameter to fit your needs!
237
238 It's the mappable genome size or effective genome size which is
239 defined as the genome size which can be sequenced. Because of the
240 repetitive features on the chromsomes, the actual mappable genome size
241 will be smaller than the original size, about 90% or 70% of the genome
242 size. The default hs -- 2.7e9 is recommended for UCSC human hg18
243 assembly. Here are all precompiled parameters for effective genome size::
244
245 -g hs = -g 2.7e9
246 -g mm = -g 1.87e9
247 -g ce = -g 9e7
248 -g dm = -g 1.2e8
249
250 -s/--tsize
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
251 ~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
252
253 The size of sequencing tags. If you DON'T specify it, MACS will try to
254 use the first 10 sequences from your input treatment file to determine
255 the tag size. Specifying it will override the automatic determined tag
256 size.
257
258 --bw
259 ~~~~
260
261 The band width which is used to scan the genome for model
262 building. You can set this parameter as the sonication fragment size
263 expected from wet experiment. The previous side effect on the peak
264 detection process has been removed. So this parameter only affects the
265 model building.
266
267 -p/--pvalue
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
268 ~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
269
270 The pvalue cutoff. Default is 1e-5.
271
272 -m/--mfold
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
273 ~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
274
275 This parameter is used to select the regions within MFOLD range of
276 high-confidence enrichment ratio against background to build
277 model. The regions must be lower than upper limit, and higher than the
278 lower limit of fold enrichment. DEFAULT:10,30 means using all regions
279 not too low (>10) and not too high (<30) to build paired-peaks
280 model. If MACS can not find more than 100 regions to build model, it
281 will use the --shiftsize parameter to continue the peak detection.
282
283 Check related *--off-auto* and *--shiftsize* for detail.
284
285 --nolambda
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
286 ~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
287
288 With this flag on, MACS will use the background lambda as local
289 lambda. This means MACS will not consider the local bias at peak
290 candidate regions.
291
292 --slocal, --llocal
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
293 ~~~~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
294
295 These two parameters control which two levels of regions will be
296 checked around the peak regions to calculate the maximum lambda as
297 local lambda. By default, MACS considers 1000bp for small local
298 region(--slocal), and 10000bps for large local region(--llocal)
299 which captures the bias from a long range effect like an open
300 chromatin domain. You can tweak these according to your
301 project. Remember that if the region is set too small, a sharp spike
302 in the input data may kill the significant peak.
303
304 --on-auto
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
305 ~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
306
307 Whether turn on the auto paired-peak model process. If set, when MACS
308 failed to build paired model, it will use the nomodel settings, the
309 '--shiftsize' parameter to shift and extend each tags. If not set,
310 MACS will be terminated if paried-peak model is failed.
311
312 --nomodel
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
313 ~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
314
315 While on, MACS will bypass building the shifting model.
316
317 --shiftsize
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
318 ~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
319
320 While '--nomodel' is set, MACS uses this parameter to shift tags to
321 their midpoint. For example, if the size of binding region for your
322 transcription factor is 200 bp, and you want to bypass the model
323 building by MACS, this parameter can be set as 100. This option is
324 only valid when --nomodel is set or when MACS fails to build
325 paired-peak model.
326
327 --keep-dup
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
328 ~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
329
330 It controls the MACS behavior towards duplicate tags at the exact same
331 location -- the same coordination and the same strand. The default
332 'auto' option makes MACS calculate the maximum tags at the exact same
333 location based on binomal distribution using 1e-5 as pvalue cutoff;
334 and the 'all' option keeps every tags. If an integer is given, at
335 most this number of tags will be kept at the same location. Default: 1.
336
337 --to-large
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
338 ~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
339
340 When not set, scale the larger dataset down to the smaller dataset;
341 when set, the smaller dataset will be scaled towards the larger
342 dataset.
343
344 -w/--wig
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
345 ~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
346
347 If this flag is on, MACS will store the fragment pileup in wiggle
348 format for every chromosome. The gzipped wiggle files will be stored
349 in subdirectories named NAME+'_MACS_wiggle/treat' for treatment data
350 and NAME+'_MACS_wiggle/control' for control data. --single-profile
351 option can be combined to generate a single wig file for the whole
352 genome.
353
354 -B/--bdg
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
355 ~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
356
357 If this flag is on, MACS will store the fragment pileup in bedGraph
358 format for every chromosome. The bedGraph file is in general much
359 smaller than wiggle file. However, The process will take a little bit
360 longer than -w option, since theoratically 1bp resolution data will be
361 saved. The bedGraph files will be gzipped and stored in subdirectories
362 named NAME+'_MACS_bedGraph/treat' for treatment and
363 NAME+'_MACS_bedGraph/control' for control data. --single-profile
364 option can be combined to generate a single bedGraph file for the
365 whole genome.
366
367 -S/--single-profile (formerly --single-wig)
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
368 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
369
370 If this flag is on, MACS will store the fragment pileup in wiggle or
371 bedGraph format for the whole genome instead of for every
372 chromosomes. The gzipped wiggle files will be stored in subdirectories
373 named EXPERIMENT_NAME+'_MACS_wiggle'+'_MACS_wiggle/treat/'
374 +EXPERIMENT_NAME+'treat_afterfiting_all.wig.gz' or
375 'treat_afterfiting_all.bdg.gz' for treatment data, and
376 EXPERIMENT_NAME+'_MACS_wiggle'+'_MACS_wiggle/control/'
377 +EXPERIMENT_NAME+'control_afterfiting_all.wig.gz' or
378 'control_afterfiting_all.bdg.gz' for control data.
379
380 --space=SPACE
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
381 ~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
382
383 By default, the resoluation for saving wiggle files is 10 bps,i.e.,
384 MACS will save the raw tag count every 10 bps. You can change it along
385 with '--wig' option.
386
387 Note this option doesn't work if -B/--bdg is on.
388
389 --call-subpeaks
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
390 ~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
391
392 If set, MACS will invoke Mali Salmon's PeakSplitter software through
393 system call. If PeakSplitter can't be found, an instruction will be
394 shown for downloading and installing the PeakSplitter package. The
395 PeakSplitter can refine the MACS peaks and split the wide peaks into
396 smaller subpeaks. For more information, please check the following URL:
397
398 http://www.ebi.ac.uk/bertone/software/PeakSplitter_Cpp_usage.txt
399
400 Note this option doesn't work if -B/--bdg is on.
401
402 --verbose
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
403 ~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
404
405 If you don't want to see any message during the running of MACS, set
406 it to 0. But the CRITICAL messages will never be hidden. If you want
407 to see rich information like how many peaks are called for every
408 chromosome, you can set it to 3 or larger than 3.
409
410 --diag
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
411 ~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
412
413 A diagnosis report can be generated through this option. This report
414 can help you get an assumption about the sequencing saturation. This
415 funtion is only in beta stage.
416
417 --fe-min, --fe-max & --fe-step
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
418 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
419
420 For diagnostics, FEMIN and FEMAX are the minimum and maximum fold
421 enrichment to consider, and FESTEP is the interval of fold
422 enrichment. For example, "--fe-min 0 --fe-max 40 --fe-step 10" will
423 let MACS choose the following fold enrichment ranges to consider:
424 [0,10), [10,20), [20,30) and [30,40).
425
426 Output files
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
427 ------------
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
428
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
429 1. NAME_peaks.xls is a tabular file which contains information about
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
430 called peaks. You can open it in excel and sort/filter using excel
431 functions. Information include: chromosome name, start position of
432 peak, end position of peak, length of peak region, peak summit
433 position related to the start position of peak region, number of
434 tags in peak region, -10*log10(pvalue) for the peak region
435 (e.g. pvalue =1e-10, then this value should be 100), fold
436 enrichment for this region against random Poisson distribution with
437 local lambda, FDR in percentage. Coordinates in XLS is 1-based
438 which is different with BED format.
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
439
440 2. NAME_peaks.bed is BED format file which contains the peak
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
441 locations. You can load it to UCSC genome browser or Affymetrix IGB
442 software. The 5th column in this file is the -10*log10pvalue of
443 peak region.
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
444
445 3. NAME_summits.bed is in BED format, which contains the peak summits
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
446 locations for every peaks. The 5th column in this file is the
447 summit height of fragment pileup. If you want to find the motifs at
448 the binding sites, this file is recommended.
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
449
450 4. NAME_negative_peaks.xls is a tabular file which contains
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
451 information about negative peaks. Negative peaks are called by
452 swapping the ChIP-seq and control channel.
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
453
454 5. NAME_model.r is an R script which you can use to produce a PDF
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
455 image about the model based on your data. Load it to R by::
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
456
fa5d3c7 @taoliu Update rst.
authored Mar 19, 2012
457 R --vanilla < NAME_model.r
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
458
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
459 Then a pdf file NAME_model.pdf will be generated in your current
460 directory. Note, R is required to draw this figure.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
461
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
462 6. NAME_treat/control_afterfiting.wig.gz files in NAME_MACS_wiggle
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
463 directory are wiggle format files which can be imported to UCSC
464 genome browser/GMOD/Affy IGB. The .bdg.gz files are in bedGraph
465 format which can also be imported to UCSC genome browser or be
466 converted into even smaller bigWig files.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
467
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
468 7. NAME_diag.xls is the diagnosis report. First column is for various
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
469 fold_enrichment ranges; the second column is number of peaks for
470 that fc range; after 3rd columns are the percentage of peaks
471 covered after sampling 90%, 80%, 70% ... and 20% of the total tags.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
472
f2ffa4f @taoliu Update README.rst
authored Mar 19, 2012
473 8. NAME_peaks.subpeaks.bed is a text file which IS NOT in BED
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
474 format. This file is generated by PeakSplitter
475 (<http://www.ebi.ac.uk/bertone/software/PeakSplitter_Cpp_usage.txt>)
476 when --call-subpeaks option is set.
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
477
478 Other useful links
92516e5 @taoliu UPdate README.rst. UPdate control for dpkg.
authored Mar 19, 2012
479 ==================
6f3160f @taoliu README.rst added.
authored Mar 19, 2012
480
481 Cistrome web server for ChIP-chip/seq analysis: http://cistrome.org/ap/
482
483 bedTools -- a super useful toolkits for genome annotation files: http://code.google.com/p/bedtools/
484
485 UCSC toolkits: http://hgdownload.cse.ucsc.edu/admin/exe/
Something went wrong with that request. Please try again.