Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help with an erro please #8

Open
dklcdbi opened this issue Feb 14, 2024 · 11 comments
Open

help with an erro please #8

dklcdbi opened this issue Feb 14, 2024 · 11 comments
Assignees

Comments

@dklcdbi
Copy link

dklcdbi commented Feb 14, 2024

Hello, I was trying to use your BindingSiteFinder and got an error:

Error in if (all(!df.global$increaseOverMin)) { :
missing value where TRUE/FALSE needed

I was using BSFind(object = KObds, anno.genes = gns, anno.transcriptRegionList = regions,

  •           est.subsetChromosome = "chr3", veryQuiet = TRUE, est.maxBsWidth = 29)
    

and it also gave following message:
2.16% (17/788) peaks overlap with multiple anno.genes in the given gene annotation.
A single instance of each peak is kept. This is recommended.

Could you please tell me waht could be the problem? thanks

@MirkoBr
Copy link
Contributor

MirkoBr commented Feb 22, 2024

Hi,

thanks for point out this error.

It seems like you are running into a problem that happens with the automated binding sites width estimation. This is done through the estimateBsWidth() function, which is automatically invoked if you run BSFind() without specifying a particular binding sites size (via the parameter bsSize).

Without having seen some of your data as an example I can't follow it up in more detail.

For the moment I suggest, that you try to manually estimate the correct binding site size. Please have a look at the chapters 5.2 and 5.3 in the vignette on how to do this.

Hope this helps!

@MirkoBr MirkoBr self-assigned this Apr 20, 2024
@KevinDJMin
Copy link

Hi, it seems like I also encountered the error right here. As the default vignette showed subset.chromosome to be limited, I have created a list to incorporate about ten more chromosomes, but still encountered the error.

I tried to manually estimate the bsSize following the manual, but it cannot detect binding site signals:

Current definition does not result in binding sites on all chromosomes where signal was present.
No binding sites on: chr8 chr13 chr14 chr15 chr20 chr22 ML143359.1 ML143366.1 ML143369.1 ML143371.1 ML143377.1 ML143380.1 GL000009.2 GL000214.1 GL000218.1 GL000219.1 GL000220.1 GL000224.1 GL000255.2 GL000258.2 GL339449.2 GL383522.1 GL383546.1 GL949747.2 KI270333.1 KI270336.1 KI270337.1 KI270442.1 KI270466.1 KI270467.1 KI270706.1 KI270728.1 KI270733.1 KI270745.1 KI270754.1 KI270819.1 KI270831.1 KI270832.1 KI270850.1 KI270853.1 KI270866.1 KI270868.1 KI270879.1 KI270897.1 KI270905.1 KI270908.1 KI270913.1 KN538372.1 KV575244.1 KV880768.1 KZ208915.1 KZ559112.1

Any suggestion?

Thank you.

@MelinaKlostermann
Copy link
Contributor

Hi,
I have a few questions for clarification.
So you used your own data set and you find crosslinks on all of these chromosomes right? My first thought would be that the signal is to weak on these chromosomes to define binding sites. Maybe we can check that first. Did you get pureclip peaks on all of these chromosomes?
Did you check once in a Genome Browser (like IGV) if you encounter a bell shaped distribution of peaks somewhere on these chromosomes? What is the maximum number of crosslinks in the center of these peaks? (Both very low and very high numbers below 20-30 or above 1000 can cause problems here). Could you maybe send me a screenshot of the signal in the genome browser of one of these peaks that do not become binding sites?

@MelinaKlostermann
Copy link
Contributor

I made a new issue for this as it sounds like a different problem for me.

@KevinDJMin
Copy link

So you used your own data set and you find crosslinks on all of these chromosomes right?

Yes, I have completed in RBFOX2 eCLIP-seq PureCLIP crosslinking site definitions globally.

**Did you get pureclip peaks on all of these chromosomes?

Yes, when I manually checked the crosslinking sites outputted by PureCLIP peaks, all peaks are present in the chromosomes.

I am thinking on possibly the bigwig file that is having the problem, if the signal is not well captured. I will get back on the IGV screenshots in chromosome 8 which shows high enrichment score according to PureCLIP. Any recommendations on making a bigwig file? I was using deeptools bamcoverage function.

@KevinDJMin
Copy link

KevinDJMin commented Jul 2, 2024

Capture

Here is one IGV image that shows binding sites at chromosome 8. BED file denotes the PureCLIP crosslinking site defined. -f.bw denotes the forward bigwig, with -r.bw the reverse bigwig for your reference.

I highly think my bigwig is incorrect. Any suggestions for rectifications?

@MelinaKlostermann
Copy link
Contributor

MelinaKlostermann commented Jul 3, 2024

I think one problem are indeed the bigwig files. BindingSiteFinder is build to work with single nucleotide crosslinks in bigwig format. For this you cannot turn the complete bam file into a bigwig but you need to select the nucleotide position 1nt upstream of the 5'end of the read in the bam file. To do that I usually turn the bam first into bed, select the upstream nucleotide and then turn it to bigwig after. (This approach comes from the iCLIP processing workflow, published by Busch et al 2020: https://www.sciencedirect.com/science/article/pii/S1046202318304948)

Here is a possible bash code to do this:

#### Convert all read locations to intervals in bed file format using BEDTools
  bedtools bamtobed -i {input.bam} > {output.bed}  \
  #### Shift intervals depending on the strand by 1 bp upstream using BEDTools
  bedtools shift -m 1 -p -1 -i {output.bed} -g {params.genome_fasta}.fai | sort -k1,1 -k2,2n -k3,3n > {output.bed_shift}
  #### Extract the 5' end of the shifted intervals and pile up into coverage track in bedgraph file format (separately for each strand) using BEDTools (in case of RPM-normalised coverage tracks, use additional parameter -scale with 1,000,000/#mappedReads)
  bedtools genomecov -bg -strand + -5 -i {output.bed_shift} -g {params.genome_fasta}.fai | LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n > {output.bed_plus} 
  bedtools genomecov -bg -strand - -5 -i {output.bed_shift} -g {params.genome_fasta}.fai | LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n  > {output.bed_minus} 
  #### conversion of bedgraph files to bw file format files using bedGraphToBigWig of the kentUtils suite
  bedGraphToBigWig {input.bed_plus} {params.genome_fasta}.fai {output.bw_plus} 
  bedGraphToBigWig {input.bed_minus} {params.genome_fasta}.fai {output.bw_minus}

It uses bedtools and bedGraphToBigWig from the kentUtils suite. Both can be installed with conda.

@MelinaKlostermann
Copy link
Contributor

The code above comes from our clip processing tool racoon_clip, which retrieves the single nucleotide crosslinks from the raw fastq files. There racoon_clip also does the alignment to obtain the bam files a bit differently from encode, because it does not allow soft-clipping at the 5' end of the read and by default does not allow multimappers. Both make the final crosslink positions a bit cleaner. So maybe you could consider running racoon_clip on the encode fastq files instead (https://racoon-clip.readthedocs.io/en/latest/). But it should also work with using the code above.

@MelinaKlostermann
Copy link
Contributor

If you rather want to do the shifting in R instead of bash you could also:
Turn the bam files into bed files (in bash).
Then something like this R code should work:

library(rtracklayer)
library(GenomicRanges)
crosslinks <- import.bed("file.bed")
crosslinks <- resize(crosslinks, width = 1, fix="start")
crosslinks <- shift(crosslinks, upstream = 1)
export.bw("file.bw")

@KevinDJMin
Copy link

Awesome! Thanks for the suggestion. I will try the above options and get back to you as soon as possible.

By the way, thank you for developing an awesome software! I actually had a very hard time to filter the robust crosslinking peaks in PureCLIP.

@KevinDJMin
Copy link

Dear Melina,

I have converted the bigwig file as what you described, and it works without any errors. Apparently it was caused by my lack of knowledge on CLIP-seq bigwig file. Hope this forum can help on the others who are having the same troubles.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

4 participants