Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bioinformatics 3: ChiP-seq analysis #21

Open
twang15 opened this issue Feb 23, 2021 · 3 comments
Open

Bioinformatics 3: ChiP-seq analysis #21

twang15 opened this issue Feb 23, 2021 · 3 comments

Comments

@twang15
Copy link
Owner

twang15 commented Feb 23, 2021

Steps in data analysis

  1. Preprocessing:
    i) Bad quality -> Tool: Use “FASTQ Quality Filter” and/or “FASTQ Quality
    ii) Flagged Kmer Content: About 100% of the first six bases are the same sequence -> Tool: Use “FASTQTrimmer” Trimmer

  2. Quality control: Run fastqc on the processed samples to see if the problem has been removed. Tool: fastqc

Library complexity: the fraction of unique fragments present in a given library. A proxy is to look at the sequence
duplication levels on the FastQC report.

Low library complexity may be an indicator that:
– A new sample and a new library should be prepared.
– We have to find a better Ab to perform the IP.
– We can not sequence the same sample anymore because we will not find new sequences.
In certain experimental settings we may expect a low library complexity. i.e. We are profiling a protein that binds to a small subset of the genome.

  1. Mapping (alignment): Treat IP and control the same way (preprocessing and mapping). Tool: bowtie 1 or bowtie 2 (use end-to-end mode) or bwa
    – map the reads and removing unmapped reads
    – filter reads mapped by quality mapping score

  2. Peak calling
    i) Read extension and signal profile generation: Estimation of the fragment length using Strand cross-correlation analysis
    ii) Peak assignment and evaluation
    – Look for fold enrichment of the sample over input or expected background
    – Estimate the significance of the fold enrichment using

  • • Poisson distribution
  • • negative binomial distribution
  • • background distribution from input DNA
  • • model background data to adjust for local variation (MACS): MACS default is to filter out redundant tags at the same location and with the same strand by allowing at most 1 tag. Format of tag file, “BED” or “SAM” or “BAM” or “BOWTIE”. DEFAULT: “BED”
    iii) Look at your mapped reads and peaks in a genome browser to verify peak calling thresholds
  1. Peak analysis and interpretation
    i) Link peaks to genes: Bed tools (intersectBed, closestBed, coverageBed, slopBed)
  • Link peaks to nearby genes (intersectBed)
  • Link peaks to closest genes (closestBed)
    ii) Infer possible biological consequences of the binding
  1. Comparing ChIP-seq across samples
  • intersectBed (finds the subset of peaks common in 2 samples or unique to one them)
  • macs2 bdgdiff (find peaks present only in one of the samples)
  1. Visualizing ChIP-seq reads with ngsplot
    AnalysisofChIP-seqData2016.pdf
@twang15
Copy link
Owner Author

twang15 commented Feb 23, 2021

@twang15
Copy link
Owner Author

twang15 commented Feb 23, 2021

An enhancer is a short piece or sequence of DNA that works to enhance or speed up the rate of genetic transcription. An enhancer is also often called a cis-regulatory element and is between 20 to 400 base pairs of DNA in size. Transcription factors first bind to an enhancer. Then a DNA bending protein brings the enhancer closer to the promoter in a process known as DNA looping. Enhancers thus enhance or speed up the rate of transcription by bringing transcription actors closer to the promoter. Enhancers can also regulate more than one gene regardless of their orientation relative to the genes or genes. Enhancers also are an important genetic element in development since they can help to enhance the activation of transcription in cells.

Promoters are pieces of DNA sequences that indicate where transcription of DNA by RNA polymerase starts. Promoters are involved in initiating or starting genetic transcription since they determine which DNA strand will be transcribed (i.e. which strand is the sense strand), and in which direction the transcription will occur. Promoters are usually found upstream from the start of transcription at the 5’end of where transcription starts. Promoters have to be in a 5’position near to the gene to be transcribed. The 5’end of DNA refers to the DNA strand that ends on a 5’carbon. The promoters bind to both the RNA polymerase enzyme and to transcription factors.The promoter initiates the process of transcription by interacting with RNA polymerase and transcription factors. The RNA polymerase enzyme weakly binds to a DNA sequence and moves along the strand until it encounters a promoter. At this stage, it then forms a closed promoter complex with the promoter. The RNA polymerase then proceeds to unwind the DNA at the transcription initiation or start site to form an open promoter complex. Transcription is then initiated.

Summary of Enhancer Vs. Promoter

  • An enhancer is a sequence of DNA that functions to enhance transcription.
  • A promoter is a sequence of DNA that initiates the process of transcription.
  • A promoter has to be close to the gene that is being transcribed while an enhancer does not need to be close to the gene of interest.
  • Both promoters and enhancers help to regulate genetic transcription.
  • Enhancers and promoters can be important in disease.
    Comparison

@twang15
Copy link
Owner Author

twang15 commented Feb 27, 2021

CAP: Chromatin Associated Proteins (CAPs)

introns
exons

Mapping (alignment)

irreproducible discovery rate (IDR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant