<a href="https://colab.research.google.com/github/zzh24zzh/EPCOT_gradio/blob/main/gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install required packages

In [None]:
!pip install gradio==3.24.1
!pip install gdown
!pip install einops
!pip install pyBigWig==0.3.17
!pip install deepTools==3.5.1

In [None]:
# install samtools
!wget https://github.com/samtools/samtools/releases/download/1.17/samtools-1.17.tar.bz2
!bunzip2 samtools-1.17.tar.bz2
!tar -xf samtools-1.17.tar
!apt-get install -q samtools

### Clone the repository

In [None]:
!git clone https://github.com/zzh24zzh/EPCOT_gradio.git
%cd EPCOT_gradio/

###  Download trained models and reference genome data 

In [None]:
!python download.py 

### Process ATAC-seq data 

You'll need a .bam file of ATAC-seq data. Here's an example of how to read and process a .bam file stored in Google Drive:
```
#mount your Google Drive to the Google Colab environment
from google.colab import drive
drive.mount('/content/gdrive')
```
```
#usage: python atac_process.py -b <an ATAC-seq bam file> -p <number of processers used in deepTools bamCoverage function
!python atac_process.py -b /content/gdrive/MyDrive/GM12878.bam -p 12 
```

The processed file will be in .pickle format and stored in the "EPCOT_gradio/" folder. Check the message displayed when the script finishes running to confirm the location of the processed ATAC-seq file. You'll need the path of the processed ATAC-seq file to run the demo.

In [None]:
!python atac_process.py -b <enter your .bam file> -p 12

### Lunch the demo
The Gradio demo has two interfaces: **(1) Run Model**, and **(2) Visualize Prediction Results**.

In the first interface, users can enter a genomic region and execute models to generate predictions, including

* a file named **"prediction_xxxx.npz"**, which can be uploaded to the second interface for visualization,
* a compressed file named **"formatted_xxxx.zip"**, which contains ChIP-seq and CAGE-seq data in .bigWig format, and contact maps in .bedpe format.

The two files can also be found under the **"EPCOT_gradio/results/"** directory. 

The file upload block has been replaced with a textbox block here, allowing you to simply enter the path to the processed ATAC-seq file, as uploading it can be quite slow. To help you try the demo, we provide an example ATAC-seq file located at "examples/atac_GM12878.pickle".



In [None]:
import gradio as gr
import os
from func_gradio import predict_func_for_colab,make_plots

inputs = [
    gr.Dropdown([str(i) for i in range(1,23)],label='Chromosome',default='1'),
    gr.Dropdown(['Micro-C', 'Hi-C (ChIA-PET)'],label='Chromatin contact map', info='One type of contact map is predicted for each time'),
    gr.Number(label='Region of interest (500kb for Micro-C and 1Mb for Hi-C)',info='From'),
    gr.Number(info='To',show_label=False),
    gr.Textbox(label="ATAC-seq file",info="Path to the processed ATAC-seq file",lines=1),
]
outputs = [gr.Files(label='Download the results')]
app1 = gr.Interface(
    fn=predict_func_for_colab,inputs=inputs,outputs=outputs,
    title='A computational tool to use ATAC-seq to impute epigenome, transcriptome, and high-resolution chromatin contact maps',
    description='<a href="https://github.com/zzh24zzh/EPCOT_gradio" class="built-with svelte-1lyswbr" target="_blank" '
                'style="font-size: 15px; font-color: black; font-weight:bold" rel="noreferrer"> View Documentation </a>',
    examples=[["11", "Micro-C", "10500000", "11000000", "examples/atac_GM12878.pickle"],
              ["11", "Hi-C (ChIA-PET)", "7750000", "8750000", "examples/atac_GM12878.pickle"]]
)

with open(os.path.abspath('data/epigenomes.txt'), 'r') as f:
    epis=f.read().splitlines()
inputs1 = [
    gr.File(label="Prediction file (in .npz format))"),
    gr.Markdown(value='### Visualization options'),
    gr.Dropdown(epis,label='Epigenome features',multiselect=True,max_choices=10,value=['CTCF','H3K4me3']),
    gr.Radio(choices=['Signal p-values (archsinh)','Binding probability'], label='Type of epigenomic feature data', value='Signal p-values (archsinh)'),
    gr.Slider(maximum=16,label='Range of values displayed on the plots',info="Choose between 0 and 16 (contact maps)",value=4),
    gr.Slider(minimum=2,maximum=12,info="Choose between 2 and 12 (epigenomic feature signals)",value=4,show_label=False),
    gr.Slider(minimum=2,maximum=12,info="Choose between 2 and 12 (CAGE-seq)",value=8,show_label=False),
]
outputs1 = gr.Plot(label='Plots')
app2 = gr.Interface(
    fn=make_plots,
    inputs=inputs1,
    outputs=outputs1,
    live=True,
)

demo = gr.TabbedInterface([app1, app2], ["Run Model", "Visualize Prediction Results"], theme=gr.themes.Soft())
demo.launch(debug=True,share=True)



Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://b0b9a2cafb2d88870a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces


cuda:0
torch.Size([1, 500, 768])
