-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read.bed fails on simple .bed file #6
Comments
Hi Richel,
Thank you for your information and I am sorry to reply to you so late. I
have left academia so I have very limited time to maintain this package
(cannot touch the data any longer).
Yes, read.bed assumes unique IDs for each row. Your testing approach shows
that the combination of family ID and within-family ID could be unique, so
a more easy fix to this problem might be manually replacing the duplicated
IDs with these unique ones. I agree that it could be easy to fix this issue
in the code as well, however, this package is widely used in the NCI
community, and any small change will require a comprehensive test before
being released, so that published results could be replicated. I therefore
prefer to keep the package as it is for now.
If you are familiar with R, please feel free to update the code and
redistribute it! Thank you.
Best wishes,
Han
…On Wed, Jun 30, 2021 at 4:06 AM Richel Bilderbeek ***@***.***> wrote:
Dear ARTP2 maintainers,
Thanks for ARTP2's read.bed function, I am very happy it is available!
Unexpectedly, for a simple .bed file (created by PLINK v1.9 with
--make_bed), read.bed fails, where the genio R package
<https://github.com/OchoaLab/genio> does read it successfully. Below is a
reprex with the error message.
I predicted this can be fixed easily, as the solution is laid bare by the
reprex :-)
Thanks and cheers, Richel
# Get the data to work on
remotes::install_github("richelbilderbeek/plinkr")
bed_filename <- get_plinkr_filename("demo_assoc.bed")
bim_filename <- get_plinkr_filename("demo_assoc.bim")
fam_filename <- get_plinkr_filename("demo_assoc.fam")
# genio can read the .bed file
genio::read_bed(
file = bed_filename,
names_loci = genio::read_bim(bim_filename)$id,
names_ind = genio::read_fam(fam_filename)$fam
)
# ARTP2 cannot read the .bed file yet
ARTP2::read.bed(
bed = get_plinkr_filename("demo_assoc.bed"),
bim = get_plinkr_filename("demo_assoc.bim"),
fam = get_plinkr_filename("demo_assoc.fam")
)
The error given is:
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning messages:
1: In ARTP2::read.bed(bed = get_plinkr_filename("demo_assoc.bed"), :
Duplicated subjects exist in fam file:
/home/richel/GitHubs/plinkr/inst/extdata/demo_assoc.fam
2: non-unique value when setting 'row.names': ‘1’
This gives us the reason for the problem: the .fam file has duplicate IDs.
This is reasonable: all individuals *should* be unique. Well done!
But maybe the function is a bit too overeager. One hint of this may
already be that PLINK can work with the .fam file without problems. So,
let's take a look at the .fam file:
genio::read_fam(fam_filename)
# A tibble: 3 x 6
fam id pat mat sex pheno
<chr> <chr> <chr> <chr> <int> <dbl>
1 1 1 0 0 1 1
2 2 1 0 0 1 1
3 3 1 0 0 1 1
Aha, all individuals *are* unique, as all *combinations* of family ID and
within-family ID are unique.
I guess that makes fixing this bug easy :-)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4VTQIURUY3K2C73QK3NVDTVLF77ANCNFSM47RZHBPQ>
.
|
Hi Han, thanks for your reply. In that case, I will use |
Hi Richel,
Glad you have a solution! Have a good day
Han
…On Fri, Jul 9, 2021 at 01:36 Richel Bilderbeek ***@***.***> wrote:
Hi Han, thanks for your reply. In that case, I will use genio instead,
which is tested to work correctly. Cheers, Richel
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4VTQPXPAQL7AAZNIAYSJLTW2DEFANCNFSM47RZHBPQ>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear ARTP2 maintainers,
Thanks for ARTP2's
read.bed
function, I am very happy it is available!Unexpectedly, for a simple .bed file (created by PLINK v1.9 with
--make_bed
),read.bed
fails, where the genio R package does read it successfully. Below is a reprex with the error message.I predicted this can be fixed easily, as the solution is laid bare by the reprex :-)
Thanks and cheers, Richel
The error given is:
This gives us the reason for the problem: the .fam file has duplicate IDs. This is reasonable: all individuals should be unique. Well done!
But maybe the function is a bit too overeager. One hint of this may already be that PLINK can work with the .fam file without problems. So, let's take a look at the .fam file:
Aha, all individuals are unique, as all combinations of family ID and within-family ID are unique.
I guess that makes fixing this bug easy :-)
The text was updated successfully, but these errors were encountered: