vplagnol / ExomeDepth Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error at the end of getBam Counts and output is not S4. #22
Comments
|
mmm I recently updated ExomeDepth to 1.13 to deal with some new R compatibility issue, and something odd may be happening with tibbles and data frames. It could be that you need to update your dplyr package? Some incompatibility there? For now, I have done created a version 1.1.14, that does |
|
Dear vplagnol, Thank you for your swift reply! I just tried 1.1.14 and now I can proceed with the dataframe step without the previous error though it is not a S4 object. You can see the output as follows.
I shall try proceeding with the rest of the processing to see if it can work on my dataset. Nevertheless, the >50 warnings at the end of getBamCounts still exist. Is it normal or is it related to the chr notation in my bam? I used hg19 for my bam and fasta but both of them use the chr(1,2,...,X,Y) notation. Thank you very much in advance and wish you a happy New Year! Now parsing ~/OPERATION/RedCellNGS_hg19/BAM_testing/Normal9.bam
|
|
OK, so the errors I see suggest an issue between the use of the "chr" convention, probably in your BAM file and the convention "1..22" which I think is the default I use. |
|
Hello vplagnol, your quick response is much appreciated. I have attached the screenshot of the dataframe for reference. "include.chr" was set to be true. I think the counts are reasonable at regions with coverage. Nevertheless as I used a custom capture panel, a lot of the genes had no coverage at all, do you think it would affect the algorithm for the enumeration of CNV? Furthermore, do you think the warnings at the end of processing is safe to bypass? Thank you. Have a nice day and all the best in the new year! |
|
Hello vplagnol, sorry for another issue. When I was trying to build my reference set from my samples, I encountered the following problem.
It looks like it's another dataframe problem. After changing the attribute to numeric, the step worked by there were three warnings at the end. The sample data reference set also has the same 3 warnings at the end of processing. Is it normal?
Furthermore, at the end of ExomeDepth CNV calling, there are two warning messages.
2: In model.matrix.default(mt, mfb, contrasts) : Are the these warnings safe to bypass? Thank you very much! |
|
Oops, sorry, in my rewrite of the code I stupidly converted start/end to character. Now fixed in 1.1.15. Thank you for picking this up! To your earlier question, if you use a custom panel, I would restrict the BED file to the covered regions to avoid all the 0s. Not sure how much it really matters, but it may, so play it safe... The statement "non-list contrasts argument ignored" concerns me... but I am not sure where it comes from. It must be from: |
|
That's perfect @vplagnol , I've got it working now. Just a couple of quick questions. As I'm working on a gremlin targeted capture panel of about 7M with 10 normal controls and 160 samples (I presume about 20-30% of them have CNVs of different genes) The samples were run on two occasions following same protocol and on the same sequencer
Your support is truly appreciated! Wish you another prosperous year ahead! |
|
So for question 1, n.bins.reduced is useful when you have more than 10K bins with non-zero count, to speed things up. If you have more, downsampling should be fine, and if you have fewer, nothing should happen. So I would leave that parameter as is. Re question 2 and the choice of reference samples, much of the accuracy of ExomeDepth depends on tight correlations between the test and reference samples. So in general it is best to have more BAM files, in order to find the most closely technically matched samples (usually the ones run in the same sequencing batch). The only exception is when you expect many CNVs in the same locations across your cases, which would obviously make it difficult to distinguish CNVs in the test sample (as the reference samples would be similarly affected). So the general answer is to NOT restrict, unless you have a very specific and well defined genetic cause of disease and you always look for relatively common CNVs in the same few genes/exons. Thank you for the good words and please let me know if you find what you expect to see, with the recent changes some confirmation that nothing is broken would good. |
|
@vplagnol That's very clear and helpful! I've tried on my sample, known deletions of about 20K and 3K-4K were not detected. I did an exploratory analysis on my samples which no SNV/small indels could account for the phenotype. Unfortunately, no particular relevant CNVs were detected. On the other hand, at the reference building stage, I setup my variables similar to the way that you did in the tutorial, except I used my on primary target bed. When I go through the variable my.choice, i.e. output of select.reference.set, in my set of 176 samples, only about the top 10 odd samples show up as meaningful values and the rest are all labelled as NA. Is it normal? or is there some problem with my script? Furthermore, can ExomeDepth detect deletions of 20K and 3K-4K? How should I optimise the bin size to detect CNV of such sizes? Sorry for my many questions but you really guided me through using the ExomeDepth and I'm much obliged to you.
$summary.stats |
|
At first sight that correlation level that you find (0.99962) seems too high. High is good, but that extent high suggests that something is wrong. As if your reference sample exactly predicted the test sample. Odd... Are you sure that the duplicate(s) of the test are NOT included in the potential reference set? which I have not seen before. Also that 0.99962 does not seem to match the numbers I see in the summary stats table, which are more realistic. And also with such super high correlations it would explain why you see so few CNV calls, basically the match is nearly perfect. I am a bit concerned... |
|
@vplagnol That's a good suggestion. I've checked they were not duplicates and sample was excluded from the reference set. Could it be possible that I have more than 170 bams that the algorithm was able to select a bam of very high correlation? Previously I used the bams with GATK, cnvkit, viscap and several other programs the results appeared to correlate with other programs. I think the problem I'm now having might be related to the worrisome error message. Is there anyway to troubleshoot the error? I will also try several other reference set combinations toexplore. Thank you and have a nice day! |
|
Hello @vplagnol, the problem was partly resolved. In my bed, there were several big chromosomal regions which were not separated by genes/exons. They were mistakenly treated as a big exon. I've worked on my bed file and apparently the correlation problem has resolved. I can also detect the deletions mentioned earlier which were buried in the large chromosomal regions. However, the error still persists. |
|
OK that's good news. The fact that you can see at least some of the CNVs you are meant to see tells me that the code is not broken, something good is happening. Still a bit unsure about that error message, but I think I am comfortable enough to push the updated code to CRAN. I'll try to figure out what the error message is telling me, but really happy to hear things are now working, so thank you for this. |
|
Uploading 1.1.15 now to CRAN. I will close the issue unless something else comes up (in which case re-open please). |


Hello, I'm a new to ExomeDepth and is currently try it on my custom targeted capture panel. At the end of getBamCounts, the following error shows up.
Now parsing ~/OPERATION/RedCellNGS_hg19/BAM_testing/Normal9.bam
Parsing chromosome chr1
Parsing chromosome chr10
Parsing chromosome chr11
Parsing chromosome chr12
Parsing chromosome chr13
Parsing chromosome chr14
Parsing chromosome chr15
Parsing chromosome chr16
Parsing chromosome chr17
Parsing chromosome chr18
Parsing chromosome chr19
Parsing chromosome chr2
Parsing chromosome chr20
Parsing chromosome chr21
Parsing chromosome chr22
Parsing chromosome chr3
Parsing chromosome chr4
Parsing chromosome chr5
Parsing chromosome chr6
Parsing chromosome chr7
Parsing chromosome chr8
Parsing chromosome chr9
Number of counted fragments : 836974
There were 50 or more warnings (use warnings() to see the first 50)
2: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
3: In .Seqinfo.mergexy(x, y) :
Each of the 2 combined objects has sequence levels not in the other:
4: In .Seqinfo.mergexy(x, y) :
Furthermore, the output of getBamCounts in my case is a list and when I tried to convert it to a data frame, it failed with the following error.
My script is as follows.
data(exons.hg19)
hg19<-file.path("
/OPERATION/RedCellNGS_hg19/ref","hg19.fa")/OPERATION/RedCellNGS_hg19/BAM", pattern = '*.bam$')bamName <- list.files("
bamFile <- file.path("~/OPERATION/RedCellNGS_hg19/BAM",bamName)
ExomeCount <- getBamCounts(bed.frame = exons.hg19,bam.files = bamFile,include.chr = TRUE,referenceFasta = hg19)
My sessionInfo() is as follows.
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ExomeDepth_1.1.13
loaded via a namespace (and not attached):
[1] matrixStats_0.55.0 lattice_0.20-38
[3] IRanges_2.20.1 Rsamtools_2.2.1
[5] Biostrings_2.54.0 GenomicAlignments_1.22.1
[7] bitops_1.0-6 grid_3.6.2
[9] GenomeInfoDb_1.22.0 stats4_3.6.2
[11] magrittr_1.5 zlibbioc_1.32.0
[13] XVector_0.26.0 S4Vectors_0.24.1
[15] Matrix_1.2-18 aod_1.3.1
[17] BiocParallel_1.20.1 tools_3.6.2
[19] Biobase_2.46.0 RCurl_1.95-4.12
[21] DelayedArray_0.12.1 parallel_3.6.2
[23] compiler_3.6.2 BiocGenerics_0.32.0
[25] GenomicRanges_1.38.0 SummarizedExperiment_1.16.1
[27] GenomeInfoDbData_1.2.2
Thank you very much for your help!
The text was updated successfully, but these errors were encountered: