Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DArT Counts Error: filter_rad (MARKERS %in% markers) #186

Closed
nsc-2024 opened this issue Apr 29, 2024 · 6 comments
Closed

DArT Counts Error: filter_rad (MARKERS %in% markers) #186

nsc-2024 opened this issue Apr 29, 2024 · 6 comments

Comments

@nsc-2024
Copy link

nsc-2024 commented Apr 29, 2024

Hi Thierry,
We are reviewing DArT Count data for Lake Sturgeon. We contacted you in January 2024 about "filter_rad" erroring at the "filter_individuals" stage. This was fixed in radiator 1.3.0.

With the latest push, we now receive a similar error when operating filter_rad noting:
"Generating statistics
✔ Missing genotypes [141ms]
✔ Heterozygosity [472ms]
Error in dplyr::filter():
ℹ In argument: MARKERS %in% markers.
Caused by error:
! object 'MARKERS' not found
Run rlang::last_trace() to see where the error occurred.
✖ Coverage ... [1.3s]".

However, when we generate the gds and run "filter_individuals" immediately after, the function works.
We have tried the following troubleshooting:

  • Loading the 1.3.0 version
  • Providing a more filtered strata file
  • Running datasets that are known to work (2018 sturgeon data)
  • blacklisting or not blacklisting markers in filter_reproducibility
    but the error persists.

Data files have already been emailed to you and are in the OneDrive Folder (2 Thierry Shared Files)

Cheers,
Tina

Here's the error:
DUPSRead_tidy <- radiator::read_dart(data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
strata= "(5)strata_dart_sturgeon_20240427_mod.tsv", tidy.dart = TRUE) ##we have also done tidy.dart = FALSE

filtered_rad<- radiator::filter_rad(data = DUPSRead_tidy, strata = NULL,
interactive.filter = TRUE, filter.hwe = TRUE,
filter.common.markers = FALSE,
verbose = TRUE)

################################################################################
############################# radiator::filter_rad #############################
################################################################################
Execution date@time: 20240429@1342
Folder created: filter_rad_20240429@1342
Function call and arguments stored in: radiator_filter_rad_args_20240429@1342.tsv
File written: random.seed (542770)
Filters parameters file generated: filters_parameters_20240429@1342.tsv
################################################################################
#################### radiator::filter_dart_reproducibility #####################
################################################################################
Execution date@time: 20240429@1342
Function call and arguments stored in: radiator_filter_dart_reproducibility_args_20240429@1342.tsv

Interactive mode: on
2 steps to visualize and filter the data based on reproducibility:
Step 1. Visualization
Step 2. Choose the filtering threshold

File written: dart_reproducibility_stats.tsv
File written: dart_reproducibility_boxplot_20240429@1342.pdf
Generating helper table...
Files written: helper tables and plots

Step 2. Filtering markers based on markers reproducibility

Do you still want to blacklist markers? (y/n):
n

Computation time, overall: 3 sec
#################### completed filter_dart_reproducibility #####################
################################################################################
######################### radiator::filter_monomorphic #########################
################################################################################
Execution date@time: 20240429@1342
Function call and arguments stored in: radiator_filter_monomorphic_args_20240429@1342.tsv
File written: whitelist.polymorphic.markers_20240429@1342.tsv
################################### RESULTS ####################################

Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
Before: 2194 / 26 / 1 / 26333 / 30769
Blacklisted: 0 / 0 / 0 / 0 / 0
After: 2194 / 26 / 1 / 26333 / 30769

Computation time, overall: 1 sec
######################### completed filter_monomorphic #########################
################################################################################
######################### radiator::filter_individuals #########################
################################################################################
Execution date@time: 20240429@1342
Function call and arguments stored in: radiator_filter_individuals_args_20240429@1342.tsv
Interactive mode: on

Step 1. Visualization
Step 2. Missingness
Step 3. Heterozygosity
Step 4. Coverage (if available)

Step 1. Visualization of samples QC

Generating statistics
✔ Missing genotypes [141ms]
✔ Heterozygosity [472ms]
Error in dplyr::filter():
ℹ In argument: MARKERS %in% markers.
Caused by error:
! object 'MARKERS' not found
Run rlang::last_trace() to see where the error occurred.
✖ Coverage ... [1.3s]

Computation time, overall: 2 sec
######################### completed filter_individuals #########################

Computation time, overall: 5 sec
############################# completed filter_rad #############################

Here is the session info:
─ Session info ─────────────────────────────────────────────────────
setting value
version R version 4.4.0 (2024-04-24)
os macOS Sonoma 14.4.1
system aarch64, darwin20
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Winnipeg
date 2024-04-29
rstudio 2023.12.1+402 Ocean Storm (desktop)
pandoc NA

─ Packages ─────────────────────────────────────────────────────────
package * version date (UTC) lib source
ade4 * 1.7-22 2023-02-06 [1] CRAN (R 4.4.0)
adegenet * 2.1.10 2023-01-26 [1] CRAN (R 4.4.0)
amap * 0.8-19 2022-10-28 [1] CRAN (R 4.4.0)
ape 5.8 2024-04-11 [1] CRAN (R 4.4.0)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.4.0)
BiocGenerics 0.49.1 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0)
BiocManager * 1.30.22 2023-08-08 [1] CRAN (R 4.4.0)
Biostrings 2.70.3 2024-03-13 [1] Bioconductor 3.18 (R 4.4.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.4.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.4.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.4.0)
boot 1.3-30 2024-02-26 [1] CRAN (R 4.4.0)
broom 1.0.5 2023-06-09 [1] CRAN (R 4.4.0)
cachem 1.0.8 2023-05-01 [1] CRAN (R 4.4.0)
carrier 0.1.1 2023-04-28 [1] CRAN (R 4.4.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0)
cluster 2.1.6 2023-12-01 [1] CRAN (R 4.4.0)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.0)
conflicted * 1.2.0 2023-02-01 [1] CRAN (R 4.4.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.4.0)
data.table 1.15.4 2024-03-30 [1] CRAN (R 4.4.0)
devtools * 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0)
dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.4.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.4.0)
foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.0)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
fst 0.9.8 2022-02-08 [1] CRAN (R 4.4.0)
fstcore * 0.9.18 2023-12-02 [1] CRAN (R 4.4.0)
future 1.33.2 2024-03-26 [1] CRAN (R 4.4.0)
gdsfmt * 1.39.3 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
GenomeInfoDb 1.38.8 2024-03-15 [1] Bioconductor 3.18 (R 4.4.0)
GenomeInfoDbData 1.2.11 2024-04-27 [1] Bioconductor
GenomicRanges 1.54.1 2023-10-29 [1] Bioconductor
ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
glmnet 4.1-8 2023-08-22 [1] CRAN (R 4.4.0)
globals 0.16.3 2024-03-08 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.4.0)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
gtools 3.9.5 2023-11-20 [1] CRAN (R 4.4.0)
HardyWeinberg 1.7.8 2024-04-06 [1] CRAN (R 4.4.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
igraph 2.0.3 2024-03-13 [1] CRAN (R 4.4.0)
IRanges 2.36.0 2023-10-24 [1] Bioconductor
iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.0)
jomo 2.7-6 2023-04-15 [1] CRAN (R 4.4.0)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
listenv 0.9.1 2024-01-29 [1] CRAN (R 4.4.0)
lme4 1.1-35.3 2024-04-16 [1] CRAN (R 4.4.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
MASS 7.3-60.2 2024-04-24 [1] local
Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0)
matrixStats * 1.3.0 2024-04-11 [1] CRAN (R 4.4.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
mgcv 1.9-1 2023-12-21 [1] CRAN (R 4.4.0)
mice 3.16.0 2023-06-05 [1] CRAN (R 4.4.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
minqa 1.2.6 2023-09-11 [1] CRAN (R 4.4.0)
mitml 0.4-5 2023-03-08 [1] CRAN (R 4.4.0)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
nlme 3.1-164 2023-11-27 [1] CRAN (R 4.4.0)
nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.4.0)
nnet 7.3-19 2023-05-03 [1] CRAN (R 4.4.0)
OutFLANK * 0.2 2024-04-27 [1] Github (whitlock/OutFLANK@e502e82)
pan 1.9 2023-12-07 [1] CRAN (R 4.4.0)
parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.4.0)
permute 0.9-7 2022-01-27 [1] CRAN (R 4.4.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.4.0)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
qvalue * 2.35.0 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
radiator * 1.3.1 2024-04-27 [1] Github (cae66c8)
ragg 1.3.0 2024-03-13 [1] CRAN (R 4.4.0)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0)
RCurl 1.98-1.14 2024-01-09 [1] CRAN (R 4.4.0)
readr 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
remotes * 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.4.0)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.4.0)
rpart 4.1.23 2023-12-05 [1] CRAN (R 4.4.0)
Rsolnp 1.16 2015-12-28 [1] CRAN (R 4.4.0)
rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
S4Vectors 0.40.2 2023-11-23 [1] Bioconductor 3.18 (R 4.4.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
SeqArray * 1.43.8 2024-04-27 [1] Github (zhengxwen/SeqArray@16bba1e)
seqinr 4.2-36 2023-12-08 [1] CRAN (R 4.4.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.4.0)
shiny * 1.8.1.9000 2024-04-27 [1] Github (rstudio/shiny@950c630)
SNPRelate * 1.37.5 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0)
stockR * 1.0.76 2023-04-26 [1] CRAN (R 4.4.0)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.4.0)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
survival 3.6-4 2024-04-24 [1] CRAN (R 4.4.0)
systemfonts 1.0.6 2024-03-07 [1] CRAN (R 4.4.0)
textshaping 0.3.7 2023-10-09 [1] CRAN (R 4.4.0)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
truncnorm 1.0-9 2023-03-20 [1] CRAN (R 4.4.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.4.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
usethis * 2.2.3 2024-02-19 [1] CRAN (R 4.4.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
vegan 2.6-4 2022-10-11 [1] CRAN (R 4.4.0)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.4.0)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.4.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
XVector 0.42.0 2023-10-24 [1] Bioconductor
zlibbioc 1.48.2 2024-03-13 [1] Bioconductor 3.18 (R 4.4.0)

@thierrygosselin
Copy link
Owner

Welcome to GitHub Tina
I'll have a look at this later today
Best
Thierry

@thierrygosselin
Copy link
Owner

updates....
I'm able to reproduce your error with the provided data...
I'll have a fix

@thierrygosselin
Copy link
Owner

test1 <- radiator::read_dart(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv"
  )

works

Reading DArT file...
Number of blacklisted samples: 239
DArT SNP format: alleles coverage in 2 Rows counts
fstcore package v0.9.18
(OpenMP detected, using 56 threads)
Generating genotypes and calibrating REF/ALT alleles...
Number of markers recalibrated based on counts of allele read depth: 5398
Generating GDS...
File written: radiator_20240430@1040.gds.rad

Number of chrom: 1
Number of locus: 28804
Number of SNPs: 33736
Number of strata: 27
Number of individuals: 2525

Number of ind/strata:
  GFR = 417
RAL = 49
RAR = 49
NMW = 5
ENG = 50
CAR = 138
WHD = 19
TET = 56
BOU = 27
PPE = 20
SFC = 88
SFB = 38
SFA = 24
DSS = 92
NUM = 125
NUT = 125
PIG = 2
PFR = 815
DOR = 35
SEM = 24
LDB = 87
DSP = 57
ASS = 30
EBC = 20
CHR = 20
GUL = 69
STE = 44

Number of duplicate id: 0

@thierrygosselin
Copy link
Owner

test2 <- radiator::read_dart(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv",
  tidy.dart = TRUE
)

works

@thierrygosselin
Copy link
Owner

test3 <- radiator::filter_rad(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv"
)
Generating statistics
✔ Missing genotypes [3.1s]
✔ Heterozygosity [1.5s]
Error in `dplyr::filter()`:
ℹ In argument: `MARKERS %in% markers`.
Caused by error:
! object 'MARKERS' not found
Run `rlang::last_trace()` to see where the error occurred.
✖ Coverage ... [2.9s]

thierrygosselin added a commit that referenced this issue Apr 30, 2024
* works with R 4.3.4
* Fix issue #186 related some particular DArT files
@thierrygosselin
Copy link
Owner

Version 1.3.2 should work with your dataset

test this:

data <- radiator::filter_rad(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv"
)
  • You'll see your duplicates easily, look carefully at those from different groupings, those are id errors or sampling/wet lab problems.
  • Be careful how you handle the juveniles and broods during SNP discovery, they have potential to bias a lot. Looking at the close kin fig will highlight that

Getting this at the end is normal, part of it will be fixed but it doesn't change anything. It's code cosmetic.

############################# completed filter_rad #############################
Warning messages:
1: In ggplot2::scale_y_log10(labels = scales::number_format(), oob = scales::squish_infinite) :
  log-10 transformation introduced infinite values.
2: There was 1 warning in `dplyr::mutate()`.
ℹ In argument: `WHITELISTED_MARKERS = purrr::map_int(...)`.
Caused by warning:
! Using one column matrices in `filter()` was deprecated in dplyr
  1.1.0.
ℹ Please use one dimensional logical vectors instead.
ℹ The deprecated feature was likely used in the radiator package.
  Please report the issue at
  <https://github.com/thierrygosselin/radiator/issues>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning
was generated. 
3: Unknown or uninitialised column: `STRATA`. 
4: Unknown or uninitialised column: `STRATA`. 
5: Unknown or uninitialised column: `STRATA`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants