Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read.bed fails on simple .bed file #6

Closed
richelbilderbeek opened this issue Jun 30, 2021 · 3 comments
Closed

read.bed fails on simple .bed file #6

richelbilderbeek opened this issue Jun 30, 2021 · 3 comments

Comments

@richelbilderbeek
Copy link

Dear ARTP2 maintainers,

Thanks for ARTP2's read.bed function, I am very happy it is available!

Unexpectedly, for a simple .bed file (created by PLINK v1.9 with --make_bed), read.bed fails, where the genio R package does read it successfully. Below is a reprex with the error message.

I predicted this can be fixed easily, as the solution is laid bare by the reprex :-)

Thanks and cheers, Richel

# Get the data to work on
remotes::install_github("richelbilderbeek/plinkr")
bed_filename <- get_plinkr_filename("demo_assoc.bed")
bim_filename <- get_plinkr_filename("demo_assoc.bim")
fam_filename <- get_plinkr_filename("demo_assoc.fam")

# genio can read the .bed file
genio::read_bed(
  file = bed_filename,
  names_loci = genio::read_bim(bim_filename)$id,
  names_ind =  genio::read_fam(fam_filename)$fam
)

# ARTP2 cannot read the .bed file yet
ARTP2::read.bed(
  bed = get_plinkr_filename("demo_assoc.bed"),
  bim = get_plinkr_filename("demo_assoc.bim"),
  fam = get_plinkr_filename("demo_assoc.fam")
)

The error given is:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning messages:
1: In ARTP2::read.bed(bed = get_plinkr_filename("demo_assoc.bed"),  :
  Duplicated subjects exist in fam file: 
/home/richel/GitHubs/plinkr/inst/extdata/demo_assoc.fam
2: non-unique value when setting 'row.names': ‘1’ 

This gives us the reason for the problem: the .fam file has duplicate IDs. This is reasonable: all individuals should be unique. Well done!

But maybe the function is a bit too overeager. One hint of this may already be that PLINK can work with the .fam file without problems. So, let's take a look at the .fam file:

genio::read_fam(fam_filename)
# A tibble: 3 x 6
  fam   id    pat   mat     sex pheno
  <chr> <chr> <chr> <chr> <int> <dbl>
1 1     1     0     0         1     1
2 2     1     0     0         1     1
3 3     1     0     0         1     1

Aha, all individuals are unique, as all combinations of family ID and within-family ID are unique.

I guess that makes fixing this bug easy :-)

@zhangh12
Copy link
Owner

zhangh12 commented Jul 9, 2021 via email

@richelbilderbeek
Copy link
Author

Hi Han, thanks for your reply. In that case, I will use genio instead, which is tested to work correctly. Cheers, Richel

@zhangh12
Copy link
Owner

zhangh12 commented Jul 9, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants