# After running R notebook
On the R notebook, gene counts are categorized by expression level. Here, a metagene plot shows average mark levels on genes of each category. Three steps are taken:
* plot of all marked genes
   * Prepare bed files from each list with a provided bed annotation of gene models
* plot of genes divided by expression
   * Write a configuration file with paths to bam and bed files
   * Run ngs.plot

In [1]:
## setup
jup_wd=~/work/jupyter-res
gal_wd=~/work/galaxy-res
bed_annot=~/work/lib/test_genome/genes.bed
peak_caller=epic2
cpus=6

cd ${jup_wd}/figures/ngsplot

In [2]:
## find input files on system
bed_01=($(ls ${gal_wd}/chipseq1/*bed | grep -i "${peak_caller}"))
bed_02=($(ls ${gal_wd}/chipseq2/*bed | grep -i "${peak_caller}"))

bam_1=($(ls ${gal_wd}/chipseq1/bam_files/*merged.bam* | grep -i chip))
inp_1=($(ls ${gal_wd}/chipseq1/bam_files/*merged.bam* | grep -i input))

bam_2=($(ls ${gal_wd}/chipseq2/bam_files/*merged.bam* | grep -i chip))
inp_2=($(ls ${gal_wd}/chipseq2/bam_files/*merged.bam* | grep -i input))

## 1 - Plot of all marked genes
A second run of ngs.plot with the genes processed to remove empty rows is done.

In [3]:
## sample 1
ngs.plot.r \
-P $cpus \
-G test \
-R bed \
-E epic2_marked_genes_1.bed \
-C "$bam_1":"$inp_1" \
-T "" \
-IN 1 \
-O filtgenes_1 \
-FS 20 \
-SE 1 -LEG 0 \
-RR 5 \
-CD 0.7 -CO darkred:yellow:darkgreen

## sample 2
ngs.plot.r \
-P $cpus \
-G test \
-R bed \
-E epic2_marked_genes_2.bed \
-C "$bam_2":"$inp_2" \
-T "" \
-O filtgenes_2 \
-FS 20 \
-SE 1 -LEG 0 \
-CD 0.7 -CO darkred:yellow:darkgreen \
-RR 5 -RB 0.05

Configuring variables...Done
Loading R libraries.....Done
1: In headerIndexBam(bam.list) :
  Aligner for: /home/jovyan/work/galaxy-res/chipseq1/bam_files/240_2-MarkDupes_ChIP_merged.bam cannot be determined. Style of 
standard SAM mapping score will be used. Would you mind submitting an issue 
report to us on Github? This will benefit people using the same aligner.
2: In headerIndexBam(bam.list) :
  Aligner for: /home/jovyan/work/galaxy-res/chipseq1/bam_files/241_2-MarkDupes_INPUT_merged.bam cannot be determined. Style of 
standard SAM mapping score will be used. Would you mind submitting an issue 
report to us on Github? This will benefit people using the same aligner.
'isNotPrimaryRead' is deprecated.
Use 'isSecondaryAlignment' instead.
See help("Deprecated") 
..Done
Plotting figures...Done
Saving results...Done
Wrapping results up...sh: 1: : Permission denied
In system2(zip, args, input = input) : error in running command
Done
All done. Cheers!
Configuring variables...Done
Loading R

## 2 - Plot of genes divided by expression
The files created with genes divided by expression category are used to write configuration files and plot.

In [4]:
# prepare bed files
gene_path=${jup_wd}/gene_expr_lists
for f in "${gene_path}"/*txt; do 
    join -2 4 -o 2.{1..4} -t $'\t' $f <(sort -k4 ${bed_annot}) > ${f/txt/bed}; 
done

In [5]:
s1=sample1
s2=sample2

# write config file for sample 1
echo '# base command: ngs.plot.r -G test -R bed -C config_1.txt -O plotBYexpr-sample1 -P 6 -FL 300 -IN 1 -FS 10 -WD 5 -HG 5 -SE 1' > config_"$s1".txt
echo '# Use TAB to separate the three columns: coverage file<TAB>gene list<TAB>title' >> config_"$s1".txt
echo '# "title" will be shown in the figure legend.' >> config_"$s1".txt
echo -e "$bam_1:$inp_1\t"${gene_path}"/"$s1".high.bed\t'High'" >> config_"$s1".txt
echo -e "$bam_1:$inp_1\t"${gene_path}"/"$s1".medium.bed\t'Medium'" >> config_"$s1".txt
echo -e "$bam_1:$inp_1\t"${gene_path}"/"$s1".low.bed\t'Low'" >> config_"$s1".txt
echo -e "$bam_1:$inp_1\t"${gene_path}"/"$s1".no_expr.bed\t'No expr'" >> config_"$s1".txt

# write config file for sample 2
echo '# base command: ngs.plot.r -G test -R bed -C config_2.txt -O plotBYexpr-sample2 -P 6 -FL 300 -IN 1 -FS 10 -WD 5 -HG 5 -SE 1' > config_"$s2".txt
echo '# Use TAB to separate the three columns: coverage file<TAB>gene list<TAB>title' >> config_"$s2".txt
echo '# "title" will be shown in the figure legend.' >> config_"$s2".txt
echo -e "$bam_2:$inp_2\t"${gene_path}"/"$s2".high.bed\t'High'" >> config_"$s2".txt
echo -e "$bam_2:$inp_2\t"${gene_path}"/"$s2".medium.bed\t'Medium'" >> config_"$s2".txt
echo -e "$bam_2:$inp_2\t"${gene_path}"/"$s2".low.bed\t'Low'" >> config_"$s2".txt
echo -e "$bam_2:$inp_2\t"${gene_path}"/"$s2".no_expr.bed\t'No expr'" >> config_"$s2".txt

In [6]:
## run ngs.plot
for f in config*
do
    s=$(sed 's;config_;;' <(echo ${f%%.txt}))
    ngs.plot.r \
    -G test \
    -R bed \
    -C $f \
    -O plotBYexpr-"$s" \
    -P $cpus \
    -FL 300 \
    -IN 1 \
    -FS 10 -WD 5 -HG 5 \
    -SE 1
done

Configuring variables...Done
Loading R libraries.....Done
1: In headerIndexBam(bam.list) :
  Aligner for: /home/jovyan/work/galaxy-res/chipseq1/bam_files/240_2-MarkDupes_ChIP_merged.bam cannot be determined. Style of 
standard SAM mapping score will be used. Would you mind submitting an issue 
report to us on Github? This will benefit people using the same aligner.
2: In headerIndexBam(bam.list) :
  Aligner for: /home/jovyan/work/galaxy-res/chipseq1/bam_files/241_2-MarkDupes_INPUT_merged.bam cannot be determined. Style of 
standard SAM mapping score will be used. Would you mind submitting an issue 
report to us on Github? This will benefit people using the same aligner.
'isNotPrimaryRead' is deprecated.
Use 'isSecondaryAlignment' instead.
See help("Deprecated") 
..........Done
Plotting figures...Done
Saving results...Done
Wrapping results up...sh: 1: : Permission denied
In system2(zip, args, input = input) : error in running command
Done
All done. Cheers!
Configuring variables...Done
L