Skip to content
RNAseq pipeline centered on Salmon
Jupyter Notebook Shell R Common Workflow Language
Branch: master
Clone or download
Latest commit cb0ed68 Sep 24, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github funding Jun 9, 2019
.ipynb_checkpoints Merge branch 'master' into trimglore Feb 15, 2019
adapters trimmomatic activate Nov 27, 2018
cwl_tools cwl Mar 22, 2019
img Add files via upload Jul 6, 2019
legacy wget for mac & move MakeCountTable_Illumina_SRR.sh to legacy Mar 6, 2019
test
.DS_Store tximport_R.R autoremove May 6, 2019
.Rhistory test Feb 14, 2019
.gitignore gitignore Mar 29, 2019
LICENSE Create LICENSE Sep 12, 2019
README.md Update README.md Sep 24, 2019
README_ja.md Update README_ja.md Sep 23, 2019
bash.ipynb ion illumina separation Dec 3, 2018
basicrnaseq_se.cwl cwl Mar 22, 2019
ikra.sh
ikra_Ion_SRR.sh output and icon Apr 21, 2019
ikra_Ion_fastq.sh output and icon Apr 21, 2019
quantmerge_gene.R bug fix Mar 20, 2019
quantmerge_gene.ipynb quantmerge_gene Dec 5, 2018
tximport_R.R description Jun 9, 2019
tximport_R.ipynb quantmerge_gene Dec 5, 2018

README.md

DOI

ikra v1.2.1 -RNAseq pipeline centered on Salmon-

A gene expression table (gene × sample) is automatically created from the experiment matrix. The output can be used as an input of idep. Ikra is an RNAseq pipeline centered on salmon.

日本語ドキュメントはこちら

Usage

Usage: ikra.sh experiment_table.csv species \
        [--test, --fastq, --help, --without-docker, --udocker --protein-coding] \
        [--threads [VALUE]][--output [VALUE]]\
        [--suffix_PE_1 [VALUE]][--suffix_PE_2 [VALUE]]
  args
    1.experiment matrix(csv)
    2.reference(human or mouse)

Options:
  --test  test mode(MAX_SPOT_ID=100000).(dafault : False)
  --fastq use fastq files instead of SRRid. The extension must be foo.fastq.gz (default : False)
  -u, --udocker
  -w, --without-docker
  -pc, --protein-coding use protein coding transcripts instead of comprehensive transcripts.
  -t, --threads
  -o, --output  output file. (default : output.tsv)
  -s1, --suffix_PE_1    suffix for PE fastq files. (default : _1.fastq.gz)
  -s2, --suffix_PE_2    suffix for PE fastq files. (default : _2.fastq.gz)
  -h, --help    Show usage.
  -v, --version Show version.
  • test option limits the number of reads to 100,000 in each sample.
  • udocker mode is for server environments that can only use User privileges. For more information https://github.com/indigo-dc/udocker.
  • without-docker mode works with all tools installed. Not recommended.
  • protein-coding mode restricts genes to protein coding genes only.
  • threads
  • output is output.tsv by default.
    experiment matrix should be separated by commas (csv format).

SRR mode

name SRR Layout condition1 ...
Treg_LN_1 SRR5385247 SE Treg ...
Treg_LN_2 SRR5385248 SE Treg ...

fastq mode

name fastq(PREFIX) Layout condition1 ...
Treg_LN_1 hoge/SRR5385247 SE Treg ...
Treg_LN_2 hoge/SRR5385248 SE Treg ...
  • Denote names by connecting conditions and replicates with underscores. See idep's Naming convention in detail.
  • The first three columns are required.
  • If you want to use your own fastq file, add --fastq option. Ikra supports only .fq, .fq.gz, .fastq and fastq.gz.
  • fastq file specifies path excluding fastq.gz or _1.fastq.gz and _2.fastq.gz. For example, hoge/SRR5385247.fastq.gz is described as hoge/SRR5385247.
  • If suffix is not _1.fastq.gz or _2.fastq.gz, add -s1 and -s2 options.
  • It is impossible for docker to specify a hierarchy above the execution directory, such as ../fq/**.fastq.gz, but it can be avoided by pasting a symbolic link. bonohu blog

Output

  • output.tsv(scaledTPM)
  • multiqc_report.html : including fastQC reports and mapping rate of salmon(mapping rate for transcripts)

output sample

Treg_LN_1 Treg_LN_2
0610005C13Rik 0 0
0610006L08Rik 0 1
0610009B22Rik 4 10
...

Specification

Major bugs that have fixed

tximport_R.R 2019/04/30

A serious bug was reported in the tximport_R.R and fixed. In the older version, Salmon's output and multiqc reports were correct and sometimes output.tsv were disturbed. Please update Ikra to the latest version. If you are using the old version(<1.1.1), please update and re-run ikra. We apologize for the inconvenience.

fasterq-dump error 2019/09/21

A bug has been reported that stops processing due to the following error in sra-tools. docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"fasterq-dump\": executable file not found in $PATH": unknown. The latest version has already been corrected, so if you encounter the same error, please update to the latest version.

Install

All you need is git clone ikra, and install docker or udocker(v1.1.3). No need for installing plenty of softwares! If you don’t want to use docker (or udocker), you must install all softwares by yourself and use —-without-docker option.

$ git clone https://github.com/yyoshiaki/ikra.git

Upgrade

$ git pull origin master

Confirm the version

 $ bash ikra.sh --version
 ...
 ikra v1.2.1 -RNAseq pipeline centered on Salmon-
 ...

Test

Illumina trim_galore ver.

SE

SRR mode

$ cd test/Illumina_SE && bash ../../ikra.sh Illumina_SE_SRR.csv mouse --test -t 10

fastq mode

You can execute it after you execute SRR mode. (That is because you don’t have fastq files.)

$ cd test/Illumina_SE && bash ../../ikra.sh Illumina_SE_fastq.csv mouse --fastq -t 10

PE

SRR mode

$ cd test/Illumina_PE && bash ../../ikra.sh Illumina_PE_SRR.csv mouse --test -t 10

fastq mode

You can execute it after you execute SRR mode. (That is because you don’t have fastq files.)

$ cd test/Illumina_PE && bash ../../ikra.sh Illumina_PE_fastq.csv mouse --fastq -t 10

For Mac Users

Dr.Ota(DBCLS) solved the problem that salmon doesn’t work on Mac. The cause of the problem is that Docker is allocated only 2GB by default on Mac. The problem will be solved by allocating sufficient memory space(>=8Gb) for Docker, and applying and restarting Docker.

img img

ikra pipeline

Tips

You can find SRR data so quickly in http://sra.dbcls.jp/

Issue

Please refer to issue

Releases

Please refer to Relases

  • add support for udocker
  • add setting of species
  • gtf and transcript file from GENCODE
  • salmon
  • trimmomatic(legacy)
  • trim_galore!
  • tximport
  • fastxtools(for Ion)
  • judging fastq or SRR(manual)
  • introduce "salmon gcbias correction"
  • salomn validateMappings
  • pigz(multithread version of gzip)
  • fasterq-dump
  • cwl development is in progress
  • rename to "ikra"
  • protein coding option

Legacy

Moved the flow using trimmomatic to ./legacy

Reference

Development of cwl ver.

2019/03/22 https://youtu.be/weJrq5QNt1M We tried developing it because Mr.Michael visited Japan. For now, cwlnized trim_galore and salmon in PE.

cd test/cwl_PE && bash test.sh

sorce and reference ー cwl_tools

Citation

Hiraoka, Y., Yamada, K., Kawasaki, Y., Hirose, H., Matsumoto, K., Ishikawa, K., & Yasumizu, Y. (2019). ikra : RNAseq pipeline centered on Salmon. https://doi.org/10.5281/ZENODO.3352573

You can’t perform that action at this time.