Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR with detect_duplicate_genome + tidy format obtained from genomic_converter #179

Closed
GabryS3 opened this issue May 3, 2023 · 3 comments

Comments

@GabryS3
Copy link

GabryS3 commented May 3, 2023

Hi Thierry,
I have 2 issues:

  1. I want to understand how the tidy format is obtained (to understand if I get a correct conversion into tidy of my genlight object)
  2. I get an error when using detect_duplicate_genomes() function --> Error in value_vars(value.var, names(data)) : value.var values [n] are not found in 'data'.

1). Question 1 - tidy format:
I just want to understand if the function "genomic_converter" is working properly now that I installed the most recent version of Radiator (1.2.8).

I used "genomic_converter" to convert my "genlight" dataset --> into a "tidy" dataset.
Code:
<
My_dataset_TIDY = genomic_converter(
My_dataset, # class = genlight object
strata = NULL,
output = "tidy",
filename = "My_dataset_TIDY",
parallel.core = parallel::detectCores() - 1,
verbose = TRUE)

However, I cannot understand if the conversion was correct or whether there are issues.

First of all, did I needed to include a "STRATA" file? I did not include any. My genlight object was obtained through the package dartR.

Second, my concern is that the genotypes are not properly coded. For example:

TIDY format: individual 1, chrom1_locus 1002_42_A_T_42 --> REF = T, ALT = A, GT_BIN = 2
GENLIGHT format: individual 1, chrom1_locus 1002_42_A_T_42 --> genotype = 0 (= homozygous for REF allele)

TIDY format: individual 2, chrom1_locus 1002_42_A_T_42 --> REF = T, ALT = A, GT_BIN = 1
GENLIGHT format: individual 2, chrom1_locus 1002_42_A_T_42 --> genotype = 1 (heterozygous)

TIDY format: individual 3, chrom1_locus 1002_42_A_T_42 --> REF = T, ALT = A, GT_BIN = 0
GENLIGHT format: individual 3, chrom1_locus 1002_42_A_T_42 --> genotype = 2 (= homozygous for ALT allele = SNP)

What is the GT_BIN & How is it coded? Is this genotype coding transformation reported above correct? From my understanding, it seems that GT_BIN codes the genotype in an opposite way compared to the genlight object, correct?

2). Question 2 - error in "detect_duplcate_genomes()"
After converting my genlight object into tidy format with the code above (with genomic_converter() ) --> I then treid to use "detect_duplicate_genomes" on my tidy dataset. However, I get the following error:
Code:
<My_dataset_duplicate_genomes = detect_duplicate_genomes(
data = "My_dataset_TIDY.rad",
interactive.filter = TRUE,
detect.duplicate.genomes = TRUE,
dup.threshold = 0,
distance.method = "manhattan",
genome = FALSE,
threshold.common.markers = NULL,
blacklist.duplicates = FALSE,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE)

################################################################################
###################### radiator::detect_duplicate_genomes ######################
################################################################################
Execution date@time: 20230503@1746
Folder created: -604_detect_duplicate_genomes_20230503@1746
Function call and arguments stored in a file
File written: radiator_detect_duplicate_genomes_args_20230503@1746.tsv
File written: random.seed (247023)
Filters parameters file generated: filters_parameters_20230503@1746.tsv
Preparing data for analysis
Calculating manhattan distances between individuals...
Error in value_vars(value.var, names(data)) :
value.var values [n] are not found in 'data'.

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Computation time, overall: 36 sec
###################### completed detect_duplicate_genomes ######################

What does this error "Error in value_vars(value.var, names(data)) : value.var values [n] are not found in 'data'." mean?

I would really appreciate your help, as I have been trying to use this function for a while now, always incurring in some issue on the way...
Thanks a lot!
Best,
Gabriella

devtools session info:
devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.2.1 (2022-06-23 ucrt)
os Windows 10 x64 (build 19044)
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Brisbane
date 2023-05-03
rstudio 2022.07.0+548 Spotted Wakerobin (desktop)
pandoc NA

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
ade4 * 1.7-19 2022-04-19 [1] CRAN (R 4.2.1)
adegenet * 2.1.7 2022-06-06 [1] CRAN (R 4.2.1)
amap * 0.8-19 2022-10-28 [1] CRAN (R 4.2.1)
ape 5.6-2 2022-03-02 [1] CRAN (R 4.2.1)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.1)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
BiocGenerics 0.42.0 2022-04-26 [1] Bioconductor
BiocManager 1.30.18 2022-05-18 [1] CRAN (R 4.2.1)
bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.1)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.1)
broom 1.0.0 2022-07-01 [1] CRAN (R 4.2.1)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.1)
calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.2.1)
callr 3.7.1 2022-07-13 [1] CRAN (R 4.2.1)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.1)
cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.2)
cluster 2.1.3 2022-03-28 [2] CRAN (R 4.2.1)
codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.1)
colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.1)
combinat 0.0-8 2012-10-29 [1] CRAN (R 4.2.0)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.1)
VP dartR * 2.9.4 2022-06-05 [?] CRAN (R 4.2.1) (on disk 2.0.4)
dartR.data * 1.0.2 2022-11-16 [1] CRAN (R 4.2.2)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.2.1)
DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.1)
dbplyr 2.2.1 2022-06-27 [1] CRAN (R 4.2.1)
devtools * 2.4.3 2021-11-30 [1] CRAN (R 4.2.1)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.1)
dismo 1.3-5 2021-10-11 [1] CRAN (R 4.2.1)
doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.2.1)
dotCall64 1.0-1 2021-02-11 [1] CRAN (R 4.2.1)
dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.1)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.1)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1)
fields 14.0 2022-07-05 [1] CRAN (R 4.2.1)
forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.2.1)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.1)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.1)
fst 0.9.8 2022-02-08 [1] CRAN (R 4.2.1)
fstcore * 0.9.12 2022-03-23 [1] CRAN (R 4.2.1)
gap 1.2.3-6 2022-05-13 [1] CRAN (R 4.2.1)
gap.datasets 0.0.5 2022-05-09 [1] CRAN (R 4.2.0)
gdata 2.18.0.1 2022-05-10 [1] CRAN (R 4.2.1)
gdistance 1.3-6 2020-06-29 [1] CRAN (R 4.2.1)
gdsfmt * 1.32.0 2022-04-26 [1] Bioconductor
generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
genetics 1.3.8.1.3 2021-03-01 [1] CRAN (R 4.2.1)
GGally 2.1.2 2021-06-21 [1] CRAN (R 4.2.1)
ggplot2 * 3.4.0 2022-11-04 [1] CRAN (R 4.2.2)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.1)
gtools 3.9.3 2022-07-11 [1] CRAN (R 4.2.1)
haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.1)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.1)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.1)
httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.1)
httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.1)
igraph 1.3.2 2022-06-13 [1] CRAN (R 4.2.1)
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.1)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.1)
knitr 1.39 2022-04-26 [1] CRAN (R 4.2.1)
later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1)
lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.1)
LEA * 3.8.0 2022-04-26 [1] Bioconductor
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2)
lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
maps 3.4.0 2021-09-25 [1] CRAN (R 4.2.1)
MASS 7.3-57 2022-04-22 [2] CRAN (R 4.2.1)
Matrix * 1.4-1 2022-03-23 [2] CRAN (R 4.2.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1)
mgcv 1.8-40 2022-03-29 [2] CRAN (R 4.2.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
mmod 1.3.3 2017-04-06 [1] CRAN (R 4.2.1)
modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1)
mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.2.0)
naniar 1.0.0 2023-02-02 [1] CRAN (R 4.2.3)
nlme 3.1-157 2022-03-25 [2] CRAN (R 4.2.1)
OutFLANK * 0.2 2022-07-18 [1] Github (whitlock/OutFLANK@e502e82)
patchwork 1.1.1 2020-12-17 [1] CRAN (R 4.2.1)
pegas 1.1 2021-12-16 [1] CRAN (R 4.2.1)
permute 0.9-7 2022-01-27 [1] CRAN (R 4.2.1)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.1)
pinfsc50 1.2.0 2020-06-03 [1] CRAN (R 4.2.0)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.2.1)
plotrix * 3.8-2 2021-09-08 [1] CRAN (R 4.2.0)
plyr * 1.8.7 2022-03-24 [1] CRAN (R 4.2.1)
png 0.1-7 2013-12-03 [1] CRAN (R 4.2.0)
PopGenReport 3.0.7 2022-05-27 [1] CRAN (R 4.2.1)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.1)
processx 3.7.0 2022-07-07 [1] CRAN (R 4.2.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.1)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.1)
qvalue * 2.28.0 2022-04-26 [1] Bioconductor
R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
VP radiator * 1.2.8 2022-07-16 [?] Github (6efdf14) (on disk 1.2.2)
raster 3.5-21 2022-06-27 [1] CRAN (R 4.2.1)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.2.0)
Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1)
readr * 2.1.2 2022-01-30 [1] CRAN (R 4.2.1)
readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.1)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1)
reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.1)
reshape 0.8.9 2022-04-12 [1] CRAN (R 4.2.1)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.2.1)
rgdal 1.5-32 2022-05-09 [1] CRAN (R 4.2.1)
RgoogleMaps 1.4.5.3 2020-02-12 [1] CRAN (R 4.2.1)
rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.2)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.1)
rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.1)
scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.1)
seqinr 4.2-16 2022-05-19 [1] CRAN (R 4.2.1)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
shiny 1.7.1 2021-10-02 [1] CRAN (R 4.2.1)
SNPRelate * 1.30.1 2022-05-15 [1] Bioconductor
snpStats * 1.46.0 2022-04-26 [1] Bioconductor
sp 1.5-0 2022-06-05 [1] CRAN (R 4.2.1)
spam 2.9-0 2022-07-11 [1] CRAN (R 4.2.1)
spida2 * 0.2.1 2023-04-26 [1] Github (gmonette/spida2@48e562d)
StAMPP 1.6.3 2021-08-08 [1] CRAN (R 4.2.1)
stockR * 1.0.74 2020-03-04 [1] CRAN (R 4.2.1)
stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
stringr * 1.4.1 2022-08-20 [1] CRAN (R 4.2.1)
survival * 3.3-1 2022-03-03 [2] CRAN (R 4.2.1)
terra 1.5-34 2022-06-09 [1] CRAN (R 4.2.1)
tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.1)
tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.1)
tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.1)
tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.2.1)
tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.1)
usethis * 2.1.6 2022-05-25 [1] CRAN (R 4.2.1)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.1)
vcfR * 1.12.0 2020-09-01 [1] CRAN (R 4.2.1)
vctrs 0.5.1 2022-11-16 [1] CRAN (R 4.2.2)
vegan 2.6-2 2022-04-17 [1] CRAN (R 4.2.1)
versions 0.3 2016-09-01 [1] CRAN (R 4.2.0)
viridis 0.6.2 2021-10-13 [1] CRAN (R 4.2.1)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.2.1)
visdat 0.6.0 2023-02-02 [1] CRAN (R 4.2.3)
vroom 1.5.7 2021-11-30 [1] CRAN (R 4.2.1)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
xfun 0.31 2022-05-10 [1] CRAN (R 4.2.1)
xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1)
zlibbioc 1.42.0 2022-04-26 [1] Bioconductor

[1] C:/Users/scatag/AppData/Local/R/win-library/4.2
[2] C:/Program Files/R/R-4.2.1/library

V ── Loaded and on-disk version mismatch.
P ── Loaded and on-disk path mismatch.

──────────────────────────────────────

@thierrygosselin
Copy link
Owner

@GabryS3
Sorry for the long delay

First of all, did I needed to include a "STRATA" file? I did not include any. My genlight object was obtained through the package dartR.

I haven't used dartR in a long long time and no longer sure if it stores the population map or stratification (STRATA) for individuals. The file is not required because certain format stores the info. If not found it will make the assumption that you have just one population

@thierrygosselin
Copy link
Owner

@GabryS3
GT_BIN format
What is the GT_BIN & How is it coded? Is this genotype coding transformation reported above correct? From my understanding, it seems that GT_BIN codes the genotype in an opposite way compared to the genlight object, correct?

It will probably be change for another name very soon. For bi-allelic dataset it's the dosage of the alternate allele (the number of alternate allele).

I cannot be certain of your problem without a thorough look at your file.

@thierrygosselin
Copy link
Owner

Question 2 - error in "detect_duplcate_genomes()"

I'm unable to reproduce or help you with your problem without the files

Re-open the issue if it's still relevant to you
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants