<a href="https://colab.research.google.com/github/xuebingwu/amplicon-lab/blob/main/amplicon-lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# <font color='MediumSlateBlue '> **Amplicon-Lab: a colab server for amplicon sequencing data analysis**  </font> 
## A Colab notebook for analyzing amplicon sequencing data
---
[Xuebing Wu lab @ Columbia](https://xuebingwu.github.io/)     |     [GitHub repository](https://github.com/xuebingwu/amplicon-lab) 


In [4]:
#@title Start analysis

INPUT = "Upload my own data" #@param ["Upload my own data","Sample: 2% DMS treated", "Sample: 0% DMS control", "Sample: Error-prone PCR mutagenesis"]

#@markdown - Input format: a single gzipped fastq file
#@markdown - To start analysis/upload an input file, click the Run button on the left, or from the dropdown menu `Runtime` -> `Run all`  </font>  

import os
from google.colab import files

if os.path.exists('sample.pdf'):
  !rm sample.pdf

if INPUT == "Upload my own data":
  # ask user to upload a file
  uploaded = files.upload()
  uploaded_file=list(uploaded.keys())[0]
  os.rename(uploaded_file,"uploaded.fastq.gz")
  # create a link to the input file: input.fq.gz
  !ln -s uploaded.fastq.gz input.fq.gz

if not os.path.exists("amplicon-lab"):
  !git clone https://github.com/xuebingwu/amplicon-lab.git
  
if INPUT == "Sample: Error-prone PCR mutagenesis":
  !ln -s amplicon-lab/error-prone-pcr.fastq.gz input.fq.gz
elif INPUT == "Sample: 2% DMS treated":
  !ln -s amplicon-lab/dms-treated.fastq.gz input.fq.gz
else:
  !ln -s amplicon-lab/dms-control.fastq.gz input.fq.gz

!python amplicon-lab/amplicon-lab.py input.fq.gz sample

!rm input.fq.gz 

# download the file
files.download(f"sample.pdf")


Saving error-prone-pcr.fastq.gz to error-prone-pcr.fastq.gz
ln: failed to create symbolic link 'input.fq.gz': File exists
92.0% of the first 100 reads have a length of 153
8741 of 10000 (87.41%) reads have a length of 153
3092 unique sequences
top 3 most abundant sequences and their counts:
[('GATAAGCTTGTCGACCACCAAGGTCTCCACACAAAATCACTAGTCTTCCAATCTCCAGAGGATCATAAGATATCCTCGAGTCTAGAGGGCCCGCGGTTCGAAGGTAAGCCTATCCCTAACCCTCTCCTCGGTCTCGATTCTACGCGTACCGGT', 1755), ('GATAAGCTTGTCGACCACCAAGGTCTCCACACAAATTCACTAGTCTTCCAATCTCCAGAGGATCATAAGATATCCTCGAGTCTAGAGGGCCCGCGGTTCGAAGGTAAGCCTATCCCTAACCCTCTCCTCGGTCTCGATTCTACGCGTACCGGT', 55), ('GATAAGCTTGTCGACCACCAAGGTCTCCACACAAAATCACTAGTCTTCCAATCTCCAGAGGATCATAAGATATCCTCGAGTCTAGAGGGTCCGCGGTTCGAAGGTAAGCCTATCCCTAACCCTCTCCTCGGTCTCGATTCTACGCGTACCGGT', 46)]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# About <a name="Instructions"></a>

**Applications**
* Analyze mutational patterns in amplicon sequencing data in which substitutions rather than indels are the predominant type of mutations.
* Example: DMS/SHAPE RNA structure probing, Error-prone PCR mutagenesis, etc.

**Input**

* A single gzipped fastq file
* All reads should be from a single PCR product
* All reads should be of the same length
* Avoid uploading large files. Use the following code to save the first 20,000 reads and upload: 

* `zcat original.fastq.gz | head -n 80000 | gzip > top20000.fastq.gz`

**Output**

* A single PDF file with 5 figures
* Fig. 1. Rate of mutation to each nucleotide at each position
* Fig. 2. Total mutation rate at each position
* Fig. 3: Number of mutations per read.
* Fig. 4: Fraction of all mutations
* Fig. 5: Frequency of mutation types

**Limitations**
* A gmail account is required to run Google Colab notebooks.
* This notebook was designed for analyzing a single PCR product, such as DMS amplicon. 
* Your browser can block the pop-up for downloading the result file. You can choose the `save_to_google_drive` option to upload to Google Drive instead or manually download the result file: Click on the little folder icon to the left, navigate to file: `sample.pdf`, right-click and select \"Download\".


**Bugs**
- If you encounter any bugs, please report the issue by emailing Xuebing Wu (xw2629 at cumc dot columbia dot edu)

**License**

* The source code of this notebook is licensed under [MIT](https://raw.githubusercontent.com/sokrypton/ColabFold/main/LICENSE).


