ssCRISPR was created by Austin Rottinghaus.
This code accompanies the manuscript: Computational design of CRISPR guide RNAs to enable strain-specific control of microbial consortia
Files provided on GitHub: https://github.com/Austin-Rottinghaus/ssCRISPR
- All_NCBI_Strains.xlsx - Excel sheet containing the strain database used in the program
- Cas9_gRNA_prediction.py - Standalone script used to predict the efficiency of Cas9 gRNAs using the finalized_model_cas9.sav machine learning model
- Cas9_gradient boosting.py - Python script used to generate the finalized_model_cas9.sav machine learning model
- Cas_icon3.ico - Icon used for the ssCRISPR GUI
- Fastq_analyze_Ecoli.py - Python script used to analyze E. coli next generation sequenceing results
- Fastq_analyze_pseudo.py - Python script used to analyze Pseudomonas next generation sequenceing results
- cpf1_gRNA_prediction.py - Standalone script used to predict the efficiency of Cpf1 gRNAs using the finalized_model_cpf1.sav machine learning model
- cpf1_gradient boosting.py - Python script used to generate the finalized_model_cpf1.sav machine learning model
- finalized_model_cas9.sav - Machine learning model used to predict the efficiency of Cas9 gRNAs
- finalized_model_cpf1.sav - Machine learning model used to predict the efficiency of LbCas12a gRNAs
- ssCRISPR Script.py - Python script containing the GUI and code used to predict gRNAs
Relevant files provided at Mendeley Data: https://data.mendeley.com/datasets/gpgyytwgb5/2
- ssCRISPR.zip
- Download the zipped folder, ssCRISPR.zip, from Mendeley Data to your computer; https://data.mendeley.com/datasets/gpgyytwgb5/2
- Unzip the folder in your desired location
- Ensure that the following files remain in the same location at all times:
- Cas_icon3.ico
- finalized_model_cpf1.sav
- finalized_model_cas9.sav
- All_NCBI_Strains.xlsx
- ssCRISPR.exe
- Run the ssCRISPR.exe file
- Fill out the following items in the user interface:
- Your email address - Note: This is only used to request genome sequences from NCBI. You won't recieve any emails.
- Whether you are using Cas9, Cpf1, or another Cas protein
- PAM sequence in A, T, C, and G nucleotides - The program does not accept multi-nucleotide letters, such as R, S, and N.
- The desired number of nucleotides of specificity
- Target sequence length
- PAM orientation
- List of target strains
- List of non-target strains
- Click "Determine gRNAs"
- When the program finishes running, a text file with the results can be downloaded to your computer by clicking "Download results"
- Download the following files from GitHub to your computer; https://github.com/Austin-Rottinghaus/ssCRISPR
- Cas_icon3.ico
- finalized_model_cpf1.sav
- finalized_model_cas9.sav
- All_NCBI_Strains.xlsx
- ssCRISPR.py
- Ensure that the files remain in the same location at all times:
- Open the ssCRISPR.py script
- Ensure that the following function packages are downloaded to your computer: Note, many come with come IDEs such as Anaconda Spyder
- Bio, re, sys, os, openpyxl, picke, random, seqfold, itertools, math, time
- Run the script
- Fill out the following items in the user interface:
- Your email address - Note: This is only used to request genome sequences from NCBI. You won't recieve any emails.
- Whether you are using Cas9, Cpf1, or another Cas protein
- PAM sequence in A, T, C, and G nucleotides - The program does not accept multi-nucleotide letters, such as R, S, and N.
- The desired number of nucleotides of specificity
- Target sequence length
- PAM orientation
- List of target strains
- List of non-target strains
- Click "Determine gRNAs"
- When the program finishes running, a text file with the results can be downloaded to your computer by clicking "Download results"
Note: User-provided FASTA files can be used as target and non-target strains.
If you want to update the ALL_NCBI_Strains.xlsx file to include the most recent catalogue of sequences, perform the following steps:
- Go to: https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/
- Filter by bacteria and complete genomes
- Download the resutls table from the search as an excel file
- Open the downloaded excel file and the All_NCBI_Strains.xlsx file provided with the program
- Replace the "Organism name," "Strain," and "Replicons" columns in All_NCBI_Strains.xlsx with the same columns from the downloaded excel file
- Drag the formula in the "Combined" column to apply to all rows with strains