Skip to content

Source code for "Computational design of CRISPR guide RNAs to enable strain-specific control of microbial consortia"

Notifications You must be signed in to change notification settings

Austin-Rottinghaus/ssCRISPR

Repository files navigation

ssCRISPR

ssCRISPR was created by Austin Rottinghaus.

ssCRISPR can be used to design gRNAs with strain-specific cleavage profiles

This code accompanies the manuscript: Computational design of CRISPR guide RNAs to enable strain-specific control of microbial consortia

Relevant files

  1. All_NCBI_Strains.xlsx - Excel sheet containing the strain database used in the program
  2. Cas9_gRNA_prediction.py - Standalone script used to predict the efficiency of Cas9 gRNAs using the finalized_model_cas9.sav machine learning model
  3. Cas9_gradient boosting.py - Python script used to generate the finalized_model_cas9.sav machine learning model
  4. Cas_icon3.ico - Icon used for the ssCRISPR GUI
  5. Fastq_analyze_Ecoli.py - Python script used to analyze E. coli next generation sequenceing results
  6. Fastq_analyze_pseudo.py - Python script used to analyze Pseudomonas next generation sequenceing results
  7. cpf1_gRNA_prediction.py - Standalone script used to predict the efficiency of Cpf1 gRNAs using the finalized_model_cpf1.sav machine learning model
  8. cpf1_gradient boosting.py - Python script used to generate the finalized_model_cpf1.sav machine learning model
  9. finalized_model_cas9.sav - Machine learning model used to predict the efficiency of Cas9 gRNAs
  10. finalized_model_cpf1.sav - Machine learning model used to predict the efficiency of LbCas12a gRNAs
  11. ssCRISPR Script.py - Python script containing the GUI and code used to predict gRNAs

Relevant files provided at Mendeley Data: https://data.mendeley.com/datasets/gpgyytwgb5/2

  1. ssCRISPR.zip

Tutorials

To use the ssCRISPR program as a stand-alone executable application, perform the following steps:

  1. Download the zipped folder, ssCRISPR.zip, from Mendeley Data to your computer; https://data.mendeley.com/datasets/gpgyytwgb5/2
  2. Unzip the folder in your desired location
  3. Ensure that the following files remain in the same location at all times:
    • Cas_icon3.ico
    • finalized_model_cpf1.sav
    • finalized_model_cas9.sav
    • All_NCBI_Strains.xlsx
    • ssCRISPR.exe
  4. Run the ssCRISPR.exe file
  5. Fill out the following items in the user interface:
    • Your email address - Note: This is only used to request genome sequences from NCBI. You won't recieve any emails.
    • Whether you are using Cas9, Cpf1, or another Cas protein
    • PAM sequence in A, T, C, and G nucleotides - The program does not accept multi-nucleotide letters, such as R, S, and N.
    • The desired number of nucleotides of specificity
    • Target sequence length
    • PAM orientation
    • List of target strains
    • List of non-target strains
  6. Click "Determine gRNAs"
  7. When the program finishes running, a text file with the results can be downloaded to your computer by clicking "Download results"

To use the ssCRISPR program in Python, perform the following steps:

  1. Download the following files from GitHub to your computer; https://github.com/Austin-Rottinghaus/ssCRISPR
    • Cas_icon3.ico
    • finalized_model_cpf1.sav
    • finalized_model_cas9.sav
    • All_NCBI_Strains.xlsx
    • ssCRISPR.py
  2. Ensure that the files remain in the same location at all times:
  3. Open the ssCRISPR.py script
  4. Ensure that the following function packages are downloaded to your computer: Note, many come with come IDEs such as Anaconda Spyder
    • Bio, re, sys, os, openpyxl, picke, random, seqfold, itertools, math, time
  5. Run the script
  6. Fill out the following items in the user interface:
    • Your email address - Note: This is only used to request genome sequences from NCBI. You won't recieve any emails.
    • Whether you are using Cas9, Cpf1, or another Cas protein
    • PAM sequence in A, T, C, and G nucleotides - The program does not accept multi-nucleotide letters, such as R, S, and N.
    • The desired number of nucleotides of specificity
    • Target sequence length
    • PAM orientation
    • List of target strains
    • List of non-target strains
  7. Click "Determine gRNAs"
  8. When the program finishes running, a text file with the results can be downloaded to your computer by clicking "Download results"

Note: User-provided FASTA files can be used as target and non-target strains.

If you want to update the ALL_NCBI_Strains.xlsx file to include the most recent catalogue of sequences, perform the following steps:

  1. Go to: https://www.ncbi.nlm.nih.gov/genome/browse#!/prokaryotes/
  2. Filter by bacteria and complete genomes
  3. Download the resutls table from the search as an excel file
  4. Open the downloaded excel file and the All_NCBI_Strains.xlsx file provided with the program
  5. Replace the "Organism name," "Strain," and "Replicons" columns in All_NCBI_Strains.xlsx with the same columns from the downloaded excel file
  6. Drag the formula in the "Combined" column to apply to all rows with strains

About

Source code for "Computational design of CRISPR guide RNAs to enable strain-specific control of microbial consortia"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages