Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radiator::filter_rad generating individual stats error #58

Closed
SarahEmel opened this issue Aug 20, 2019 · 3 comments
Closed

radiator::filter_rad generating individual stats error #58

SarahEmel opened this issue Aug 20, 2019 · 3 comments

Comments

@SarahEmel
Copy link

SarahEmel commented Aug 20, 2019

Hi Thierry,

I am attempting to use radiator::filter_rad to filter a populations.snps.vcf file from Stacks, but I am consistently getting an error in ''generating individual stats" and then I do not get the output files I've specified. Based on testing with different arguments and versions of the input file (including after filtering out some missing data in vcftools) and on reading other issues here, I suspect there might be a problem with my vcf header that I'm unaware of.

I appreciate any help you can offer. Thanks for making this package and I hope to be able to make it part of my workflow!

-Sarah

  • the exact command (function, arguments, values) used

All of these lead to the same error:
myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
filter.short.ld = "random",
filter.long.ld = 0.5,
filter.hwe = TRUE,
strata = "strata_locyear.txt",
output = c("vcf","genind"),
filename = "filter_rad_output")

myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
filter.short.ld = "random",
filter.long.ld = 0.5,
filter.hwe = TRUE,
strata = NULL,
output = c("vcf","genind"),
filename = "filter_rad_output")

myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
strata = NULL,
interactive.filter = TRUE)

myfiltereddata <- filter_rad(
data = "populations.snps.vcftools.recode",
strata = NULL,
interactive.filter = TRUE)

myfiltereddata <- filter_rad(
data = "RH_subset.vcf",
strata = NULL,
interactive.filter = TRUE)

  • the complete error message you're getting

myfiltereddata <- filter_rad(

  • data = "RH_subset.vcf",
  • strata = NULL,
  • interactive.filter = TRUE)
    ################################################################################
    ############################# radiator::filter_rad #############################
    ################################################################################
    The function arguments names have changed: please read documentation

Execution date@time: 20190820@0008
Folder created: filter_rad_20190820@0008
Function call and arguments stored in: radiator_filter_rad_args_20190820@0008.tsv
File written: random.seed (563656)
Filters parameters file generated: filters_parameters_20190820@0008.tsv

Reading VCF
Data summary:
number of samples: 94
number of markers: 185
done! timing: 0 sec

Generating individual stats...
Error in if (stats::sd(id.info$COVERAGE_MEAN) != 0) { :
missing value where TRUE/FALSE needed

Computation time, overall: 1 sec
############################# completed filter_rad #############################
RH_subset.vcf.zip

  • the output of devtools::session_info()

############################# completed filter_rad #############################

devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.0 (2019-04-26)
os macOS High Sierra 10.13.6
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2019-08-20

─ Packages ──────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
Biobase 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics 0.30.0 2019-05-02 [1] Bioconductor
Biostrings 2.52.0 2019-05-02 [1] Bioconductor
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
boot 1.3-23 2019-07-05 [1] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
gdsfmt 1.20.0 2019-05-02 [1] Bioconductor
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
GenomeInfoDb 1.20.0 2019-05-02 [1] Bioconductor
GenomeInfoDbData 1.2.1 2019-08-02 [1] Bioconductor
GenomicRanges 1.36.0 2019-05-02 [1] Bioconductor
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
GWASExactHW 1.01 2013-01-05 [1] CRAN (R 3.6.0)
hms 0.5.0 2019-07-09 [1] CRAN (R 3.6.0)
IRanges 2.18.1 2019-05-31 [1] Bioconductor
jomo 2.6-9 2019-07-29 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.0)
logistf 1.23 2018-07-19 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.6.0)
mice 3.6.0 2019-07-10 [1] CRAN (R 3.6.0)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.0)
mitml 0.3-7 2019-01-07 [1] CRAN (R 3.6.0)
nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.6.0)
nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
pan 1.6 2018-06-29 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
radiator * 1.1.2 2019-08-20 [1] Github (53a137d)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.6.0)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
S4Vectors 0.22.0 2019-05-02 [1] Bioconductor
SeqArray 1.24.2 2019-07-12 [1] Bioconductor
SeqVarTools 1.22.0 2019-05-02 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
survival 2.44-1.1 2019-04-01 [1] CRAN (R 3.6.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
XVector 0.24.0 2019-05-02 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
zlibbioc 1.30.0 2019-05-02 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

  • complete data required to reproduce the problem, a subset of it. The data remains confidential.

See attached subset

@thierrygosselin
Copy link
Owner

thierrygosselin commented Aug 20, 2019

If your subset is a good representation of the original dataset, the problem is that your VCF as individual's that are missing all of their genotypes. My code was not prepared to handle this and the code throws an error when it tries to do a standard SD with base R code (it doesn't accept NA, NaN , etc).

The new commit should fix that.

I suggest looking at the file that start with individuals.qc.stats and highlight the bad samples.

@SarahEmel
Copy link
Author

Thank you Thierry! It took me a while to get everything working with my dataset, but I am able to run filter_rad now without errors. We had a lot of variation in reads per individual and there was one bad sample that had to be removed in addition to other filtering.

@thierrygosselin
Copy link
Owner

ok glad I could help!
lots of variation in reads per individuals can be lowered by...

Prior to sequencing (the most important steps):

  • good DNA
  • a puch or a tool that takes similar amount of tissue
  • using a robot for DNA extractions
  • having experienced people behind the wet lab bench
  • having the same, experienced, lab tech behind the wet bench
  • not having a student behind the wet lab bench
  • you get the idea ;)

bioinformatically:

  • normalization...if the variation generates weird individual heterozygosity pattern (check the Manhattan plot in the folder detect_mixed_genomes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants