Skip to content

Input Files format

Peng Jia edited this page Dec 19, 2020 · 1 revision

1. Reference file

Reference file is needed in the scan module, and the version and contig name of reference you used should be in accordance with your bam files. And the reference should be save as uncompressed fasta (fa) format.

2. microsatellite file

Microsatellite file is need in all module of MSIsesnsor-pro,it contain the chromosome, location, repeat unit, repeat unit length, and other information of each microsatellite.

Example:

chromosome location repeat_unit_length repeat_unit_binary repeat_times left_flank_binary right_flank_binary repeat_unit_bases left_flank_bases right_flank_bases threshold supportSamples
chr1 3780974 1 0 11 221 321 A ATCTC CCAAC 0.080981 30
chr1 3784993 1 0 13 885 758 A TCTCC GTTCG 0.007576 18
chr1 3836468 1 3 24 342 80 T CCCCG ACCAA 0.061750 19
chr1 3872414 1 0 13 834 545 A TCAAG GAGAC 0.02842 28
chr1 4712522 3 20 7 662 421 CCA GGCCG CGGCC 0.024391 3

Note:

  1. Columns with *_binary means: binary conversion of DNA bases based on A=00, C=01, G=10, and T=11.
  2. threshold means: the unstable baseline of slippages. It is calculated in baseline module and applied in pro module.
  3. supportSamples means: the number of samples with sufficient reads covered.

3. bam file

bam file need to be sorted and the index file are required.

4. configure file of baseline module

configure file is needed in baseline module, the first column is the sample name and the second is the absolute path of its bam file.

Example:

case1   /path/to/normal/case1_sorted.bam
case2   /path/to/normal/case2_sorted.bam
case3   /path/to/normal/case3_sorted.bam