Skip to content

Commit

Permalink
Added initial tutorial as per issue #13
Browse files Browse the repository at this point in the history
  • Loading branch information
tleonardi committed Feb 10, 2019
1 parent 44feab4 commit 6811f8b
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docs/example.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
chr1 11868 14409 ENST00000456328.2 0 + 14409 14409 0 3 359,109,1189, 0,744,1352,
chr1 12009 13670 ENST00000450305.2 0 + 13670 13670 0 6 48,49,85,78,154,218, 0,169,603,965,1211,1443,
chr1 14403 29570 ENST00000488147.1 0 - 29570 29570 0 11 98,34,152,159,198,136,137,147,99,154,37, 0,601,1392,2203,2454,2829,3202,3511,3864,10334,15130,
chr1 923927 939291 ENST00000420190.6 0 + 924431 939291 0 7 1021,92,182,51,125,90,17, 0,1994,6227,7111,11844,15112,15347,
chr1 925149 935793 ENST00000437963.5 0 + 925941 935793 0 5 40,92,182,51,22, 0,772,5005,5889,10622,
chr1 925737 944575 ENST00000342066.7 0 + 925941 944153 0 14 63,92,182,51,125,90,186,163,116,79,500,125,111,668, 0,184,4417,5301,10034,13302,13537,15406,16398,16672,16821,17515,17960,18170,
56 changes: 56 additions & 0 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Bedparse tutorial

Hi, thanks for your interest in `bedparse1 `!

The following is a short tutorial that will guide you through the functionality of `bedparse`. You can find the `example.bed` file in this repo under docs/example.bed.
This files contains 6 human transcript models from Gencode. The first three are non-coding transcripts (i.e. field 7 and 8 are the same coordinate), wheread the last three are coding (i.e. fields 7 and 8 indicate the start and end of the [CDS](https://en.wikipedia.org/wiki/Coding_region)).

## Extracting the promoters

The `bedparse promoter` command reports the promoter of each transcript, defined as user specified interval around the [TSS](https://en.wikipedia.org/wiki/Transcription_start_site).
For example, we can extract promoters consisting of 1000bp upstream and 500bp downstream of the CDS:

```
$ bedparse promoter --up 1000 --down 500 example.bed
chr1 10868 12368 ENST00000456328.2
chr1 11009 12509 ENST00000450305.2
chr1 29070 30570 ENST00000488147.1
chr1 922927 924427 ENST00000420190.6
chr1 924149 925649 ENST00000437963.5
chr1 924737 926237 ENST00000342066.7
```

Note how the TSS (and as a consequence the promoter) depends on the strand: for transcripts on the negative strand the TSS is the end coordinate, i.e. column 3. The `--unstranded` allows you to override this behaviour and report promoters as an interval around column 2, thus disregarding the strand.


## Extracting the CDS

To extract the CDS of the coding transcripts in the BED file use the `bedparse cds` command:

```
bedparse cds example.bed
chr1 924431 939291 ENST00000420190.6 0 + 924431 939291 0 7 517,92,182,51,125,90,17, 0,1490,5723,6607,11340,14608,14843,
chr1 925941 935793 ENST00000437963.5 0 + 925941 935793 0 4 72,182,51,22, 0,4213,5097,9830,
chr1 925941 944153 ENST00000342066.7 0 + 925941 944153 0 13 72,182,51,125,90,186,163,116,79,500,125,111,246, 0,4213,5097,9830,13098,13333,15202,16194,16468,16617,17311,17756,17966,
```

Note how non-coding transcripts are not reported (because by definition they don't have a CDS). Also, note how the number of exons (column 10) and exon lengths and starts (columns 11 and 12) have been readjusted to reflect the fact that the transcripts have "lost" the UTRs. To visualise this operation you can save the output of the command above to a new text file and upload it as a custom track in the ucsc genome broser: you'll see that the new transcripts only correspond to the thick portion of the original Gencode transcripts.


## Extracting introns

In a BED file introns are implicitly defined as the genomic regions inbetween exons. The `bedparse introns` command created new "artificial" transcripts that correspond to the introns of the original transcripts:

```
$ bedparse introns example.bed
chr1 12227 13220 ENST00000456328.2 0 + 12227 12227 0 2 385,499, 0,494,
chr1 12057 13452 ENST00000450305.2 0 + 12057 12057 0 5 121,385,277,168,78, 0,170,640,995,1317,
chr1 14501 29533 ENST00000488147.1 0 - 14501 14501 0 10 503,757,659,92,177,237,172,206,6371,4642, 0,537,1446,2264,2554,2867,3241,3560,3865,10390,
chr1 924948 939274 ENST00000420190.6 0 + 924948 924948 0 6 973,4141,702,4682,3143,145, 0,1065,5388,6141,10948,14181,
chr1 925189 935771 ENST00000437963.5 0 + 925189 925189 0 4 732,4141,702,4682, 0,824,5147,5900,
chr1 925800 943907 ENST00000342066.7 0 + 925800 925800 0 13 121,4141,702,4682,3143,145,1683,829,158,70,194,320,99, 0,213,4536,5289,10096,13329,13660,15506,16451,16688,17258,17577,18008,
```



0 comments on commit 6811f8b

Please sign in to comment.