Skip to content
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ sashimi.py is a tool for visualizing various next-generation sequencing (NGS) da
3. Visualize coverage by heatmap, including HiC diagram
4. Visualize protein domain based the given gene id
5. Demultiplex the single-cell RNA/ATAC-seq which used cell barcode into cell population
6. Support visualizing individual full-length reads in IGV-like style
6. Support visualizing individual full-length reads in read-by-read style
7. Support visualize circRNA sequencing data

## Input
Expand All @@ -26,7 +26,7 @@ sashimi.py supports almost NGS data format, including
- bigBed
- bigWig
- Depth file generated by `samtools depth`
- naive HiC format
- naive Hi-C format


## Output
Expand All @@ -37,17 +37,16 @@ and each track on output corresponds these datasets from config file.
## Usage

The sashimi.py is written in Python, and user could install it in a variety of ways as follows
1. install from pipy

1. install from PiPy
```bash
pip install sashimi.py

# or install from source
python setup.py install

sashimipy --help
```
2. using docker image
2. install from bioconda
```bash
conda install -c bioconda sashimi-py
```
3. using docker image
```bash
docker pull ygidtu/sashimi
docker run --rm ygidtu/sashimi --help
Expand All @@ -61,7 +60,7 @@ The sashimi.py is written in Python, and user could install it in a variety of w
docker run --rm ygidtu/sashimi --help
```

3. install from source code
4. install from source code

```bash
git clone https://github.com/ygidtu/sashimi.py sashimi
Expand All @@ -73,7 +72,7 @@ The sashimi.py is written in Python, and user could install it in a variety of w
python main.py --help
```

4. running from a local webserver
5. running from a local webserver

```bash
git clone https://github.com/ygidtu/sashimi.py sashimi
Expand All @@ -89,7 +88,7 @@ The sashimi.py is written in Python, and user could install it in a variety of w
python server.py --help
```

5. for `pipenv` users
6. for `pipenv` users

```bash
git clone https://github.com/ygidtu/sashimi.py
Expand Down
88 changes: 55 additions & 33 deletions docs/command.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ Options:

### Common options

1. `--color-factor`: the index of column to set colors
1.`--color-factor`: the index of column to set colors

- basic usage: the input file list as follows,

Expand All @@ -265,8 +265,8 @@ Then the `--color-factor 2` means sashimi assign red color to LUAD and "#000000"

### Output options

1. `-o, --output`: the path to output file, the common image format such as pdf, png, jpg and svg are supported.
2. `--backend`: the backend is used to switch matplotlib plotting backend,
1.`-o, --output`: the path to output file, the common image format such as pdf, png, jpg and svg are supported.
2.`--backend`: the backend is used to switch matplotlib plotting backend,

**known issues: **

Expand All @@ -283,7 +283,7 @@ The recommended combination of backend and image formats please check [matplotli

### Reference plot

1. `--domain`: fetch domain information from uniprot and ensemble, then map amino acid coordinate into genomic coordinate.
1.`--domain`: fetch domain information from uniprot and ensemble, then map amino acid coordinate into genomic coordinate.

For each transcript, sashimi firstly get the uniprot id from [uniprot website]("https://rest.uniprot.org/uniprotkb/search?&query=ENST00000380276&format=xml") and check whether the length of protein is one third of CDS length. If yes, then fetch the uniprot information from [ebi](f"https://www.ebi.ac.uk/proteins/api/features/U2AF35a").

Expand All @@ -292,15 +292,15 @@ The sashimi will present these domains from ['DOMAIN_AND_SITES', 'MOLECULE_PROCE

![](imgs/cmd/domain.png)

2. `--local-domain`: load domain information from a folder that contains bigbed files which download from [UCSC](https://hgdownload.soe.ucsc.edu/gbdb/hg38/uniprot/)
2.`--local-domain`: load domain information from a folder that contains bigbed files which download from [UCSC](https://hgdownload.soe.ucsc.edu/gbdb/hg38/uniprot/)

In order to facilitate these people from poor network regions, Sashimi also provides a local mode for domain visualization. First, the user must download the corresponding reference from UCSC, and collect all bigbed file into a folder which could pass to sashimi with `--local-domain`.

But the bigbed file from UCSC didn't provide a transcript or uniprot id, Sashimi couldn't map the protein information into the corresponding transcript id.

![](imgs/cmd/local_domain.png)

3. `--interval`: add additional feature track into reference.
3.`--interval`: add additional feature track into reference.

In addition to fetch genomic feature from GTF or GFF file, Sashimi also provides a flexible way to load other features into reference track.
And user could prepare and record custom annotation information into a config file, like this
Expand Down Expand Up @@ -330,7 +330,7 @@ example/bws/2.bw bw bw green
example/bams/sc.bam bam sc
```

1. `--customized-junction`
1.`--customized-junction`

This parameter is used to add user defined junctions

Expand All @@ -344,7 +344,7 @@ chr1:1000-20000 100 200
- the columns corresponding to input files in file list.
- the table were filled with junction counts.

2. `--show-site` and `--show-strand`
2.`--show-site` and `--show-strand`

These two parameters were used to show the density of reads starts by forward and reverse strand separately.

Expand All @@ -366,24 +366,24 @@ python main.py \

#### Single cell bam related parameters

1. `--barcode`
1.`--barcode`

Provide a manually curated barcode list to separate bam files by cell types or other groups.

This barcode list as follows:

```bash
#bam barcode cell_type(optional) cell_type(optional)
sc AAACCTGCACCTCGTT-1 AT2 #A6DCC2
sc AAAGATGTCCGAATGT-1 AT2 #A6DCC2
sc AAAGCAATCGTACGGC-1 AT2 #A6DCC2
```

Provide a manually curated barcode list to separate bam files by cell types or other groups.

This barcode list as follows:

```bash
#bam barcode cell_type(optional) cell_type(optional)
sc AAACCTGCACCTCGTT-1 AT2 #A6DCC2
sc AAAGATGTCCGAATGT-1 AT2 #A6DCC2
sc AAAGCAATCGTACGGC-1 AT2 #A6DCC2
```

2. `--barocde-tag` and `--umi-tag`
2.`--barocde-tag` and `--umi-tag`

3. The tag to extract barcode and umi from each reads record, here we take the 10x Genomics bam format as default.
3.The tag to extract barcode and umi from each reads record, here we take the 10x Genomics bam format as default.

4. `--group-by-cell`
4.`--group-by-cell`

Group by cell types in density/line plot.

Expand Down Expand Up @@ -411,7 +411,7 @@ The line plot is simply another format of density plots.
The input file list as same as density plots


1. `--hide-legend`, `--legend-position` and `--legend-ncol`
1.`--hide-legend`, `--legend-position` and `--legend-ncol`

These three parameters were used to disable legend, modify legend position and the columns of legend separately.

Expand Down Expand Up @@ -502,9 +502,9 @@ example/bws/0.bw bw bw YlOrBr
### Igv plot


1. Sashimi.igv module support different format file as input.
1.Sashimi.igv module support different format file as input.

An Igv-like plot provides a landscape of aligned reads in a straight and convenient way.
A read-by-read plot provides a landscape of aligned reads in a straight and convenient way.

User could pass bed and bam file into Sashimi, and the input config file list as follows

Expand All @@ -531,7 +531,7 @@ python main.py \

![](imgs/cmd/igv_plot.1.png)

2. Sashimi.igv module load and visualize features from bam tags.
2.Sashimi.igv module load and visualize features from bam tags.

In this topic, Sashimi.igv could load m6A modification (tag, ma:i) and length of polyA (tag, pa:f) tag from bam file, and then present it on each reads.

Expand Down Expand Up @@ -563,7 +563,7 @@ here is the command line,
In this picture, the red track and blue dot represents the length of poly(A) and m6a modification respectively,
![](imgs/cmd/igv_plot.2.png)

3. Sashimi.igv module also allow sort these reads by specific alternative exon
3.Sashimi.igv module also allow sort these reads by specific alternative exon

User could modify the config file as follows,

Expand Down Expand Up @@ -619,7 +619,7 @@ for each hic track, a bigger `depth` means a higher y-axis.

Because `Li_et_al_2015.h5` doesn't contain chromosome 1, user could download a new toy dataset and add into example picture.

1. download hic file and convert into h5 format
1.download hic file and convert into h5 format

```bash
wget https://encode-public.s3.amazonaws.com/2016/12/01/a241cba5-df2e-45fb-9a8f-5af5587fb02a/ENCFF121YPY.hic
Expand All @@ -639,13 +639,13 @@ cooler cload pairix -p 16 hg38.chrom.sizes:1000 ENCFF931NQV.pairs.gz ENCFF931NQV
hicConvertFormat -m ENCFF121YPY_1000.cool --inputFormat cool --outputFormat h5 -o ENCFF121YPY.h5
```

2. prepare the config file
2.prepare the config file

```bash
# filepath file_category label color transform depth
example/ENCFF718AWL.h5 hic ENCFF718AWL RdYlBu_r log2 30000
```
3. run Sashimi
3.run Sashimi

```bash
python main.py \
Expand All @@ -661,6 +661,28 @@ python main.py \
```
here is the [results](https://github.com/ygidtu/sashimi/blob/dev/example/hic.example.pdf).

## circRNA plot

The linear and circRNA raw data were downloaded from [PRJNA541935](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA541935).


The command for generating a circRNA coverage plot with highlight the back-splice junction

```bash
python sashimi.py/main.py \
-e chr1:925421-944308:+ \
--density example/circRNA.tsv \
--stroke 937113-937713@blue \
-o circRNA.pdf \
--dpi 300 \
--width 10 \
--height 1 \
-t 10 \
-r Homo_sapiens.GRCh38.101.gtf \
--link 925921-943808
```
![](imgs/cmd/circRNA.png)


## Motif plot

Expand All @@ -678,15 +700,15 @@ python main.py \
--motif-region 1270756-1270760
```
The motif weight matrix should be customized bedGraph format as follows:

```bash
# chromosome start end A_weight T_weight C_weight G_weight
chr1 100 101 0.1 0.2 -0.3 -0.4
```

Then, bgzipped && tabix indexed

here is the [results](imgs/cmd/motif.png).

here is the [result](imgs/cmd/motif.png).


### Additional annotation
Expand Down
Binary file added docs/imgs/cmd/circRNA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@

---

The full-featured example
## Get started

![](imgs/example.png)
To learn Snakemake, please follow the [Tutorial](https://sashimi.readthedocs.io/en/latest/command/)
30 changes: 22 additions & 8 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,32 @@ python setup.py install
sashimipy --help
```

### Run as script
1. using pipenv
or

```bash
pipenv install
pipenv run python main.py --help
## via Conda
conda install sashimi-py

## via Docker
docker pull quay.io/biocontainers/sashimi-py

## via PyPI
pip install sashimi.py

```

### Run as script
1. using pipenv
```bash
pipenv install
pipenv run python main.py --help
```

2. using python
```bash
pip install -r requirements.txt
python main.py
```
```bash
pip install -r requirements.txt
python main.py
```

** Note: **
If there is any problem with installation of `cairocffi`
Expand Down
6 changes: 6 additions & 0 deletions example/circRNA.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
example/circRNA_bam/SRR9029986.bam bam Linear_RNA_rep1 #00AFBB
example/circRNA_bam/SRR9029988.bam bam Linear_RNA_rep2 #00AFBB
example/circRNA_bam/SRR9029992.bam bam Linear_RNA_rep3 #00AFBB
example/circRNA_bam/SRR9029993.bam bam Circular_RNA_rep1 #FC4E07
example/circRNA_bam/SRR9029994.bam bam Circular_RNA_rep2 #FC4E07
example/circRNA_bam/SRR9029995.bam bam Circular_RNA_rep3 #FC4E07
Binary file added example/circRNA_bam/SRR9029986.bam
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029986.bam.bai
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029988.bam
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029988.bam.bai
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029992.bam
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029992.bam.bai
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029993.bam
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029993.bam.bai
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029994.bam
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029994.bam.bai
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029995.bam
Binary file not shown.
Binary file added example/circRNA_bam/SRR9029995.bam.bai
Binary file not shown.