Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem making f2 file from plink .bed .fam and .bim files #74

Open
sbird1000 opened this issue May 19, 2024 · 6 comments
Open

Problem making f2 file from plink .bed .fam and .bim files #74

sbird1000 opened this issue May 19, 2024 · 6 comments

Comments

@sbird1000
Copy link

I'm a first time user of Admixtools2. I have been trying to create the f2 file using 'extract_f2' but no luck. Here are various command line combos I've tried, so far with no luck:

I entered:
prefix = '/plink2/EAS006uc'
my_f2_dir = '
/F2/'
extract_f2(prefix, my_f2_dir)

This is what I got back:

ℹ Reading allele frequencies from PLINK files...
ℹ EAS006uc.bed has 1 samples and 2503505 SNPs
ℹ Calculating allele frequencies from 1 samples in 1 populations
ℹ Expected size of allele frequency data: 600 MB
2503k SNPs read...
✔ 2503505 SNPs read in total
Error in discard_snps(snpdat, maxmiss = maxmiss, auto_only = auto_only, :
Could not parse chromosome numbers! Set 'auto_only = FALSE' to ignore chromosome labels!
In addition: Warning message:
In discard_snps(snpdat, maxmiss = maxmiss, auto_only = auto_only, :
Keeping only chromosomes 1 to 22! Set auto_only = FALSE to keep all chromosomes!

I then tried the same command syntax but with the 'auto_only = FALSE" included and got this:

prefix = '/plink2/EAS006uc'
my_f2_dir = '
/F2/'
extract_f2(prefix,'auto_only = FALSE', my_f2_dir)

The result was:

ℹ Reading allele frequencies from PLINK files...
Error in match_samples(fam$X2, fam$X1, inds, pops) :
Individuals missing in indfile:
~/F2/

Why is 'extract_f2' looking for a file called "indfile" when it was not looking for that file in the first run? Rather than .bed .bin and .fam, files, it appears to be looking for files perhaps associated with the EIGENSTRAT format. Why won't it recognize the Plink format files? Do I need to add some other function or flag to my input code?

Thanks for your help!

@sbird1000
Copy link
Author

I did a third run, removed the ' marks from around auto_only = FALSE and got this result:

prefix = '/plink2/EAS006uc'
my_f2_dir = '
/F2/'
extract_f2(prefix,auto_only = FALSE, my_f2_dir)

ℹ Reading allele frequencies from PLINK files...
ℹ EAS006uc.bed has 1 samples and 2503505 SNPs
ℹ Calculating allele frequencies from 1 samples in 1 populations
ℹ Expected size of allele frequency data: 600 MB
2503k SNPs read...
✔ 2503505 SNPs read in total
Error in extract_f2(prefix, auto_only = FALSE, my_f2_dir) :
There are no informative SNPs!

Just some extra information.

@uqrmaie1
Copy link
Owner

uqrmaie1 commented Jun 5, 2024

Your PLINK files might not be formatted as described here. If you can share the first few lines of the .bim and .fam file, I might be able to see what the problem is. One thing I can tell from the log messages is that it's only recognizing a single sample. If there is indeed only one sample or one population in your data, then no f-statistics can be computed; at least two populations with one sample each are needed to do anything.

"indfile" is just the variable name used in some places for the .fam or the .ind file, it doesn't mean it's expecting EIGENSTRAT format.

I would generally avoid putting named arguments before positional arguments like in extract_f2(prefix,auto_only = FALSE, my_f2_dir). I think R will interpret it correctly in this case, but generally the argument position matters for unnamed arguments, so better to write extract_f2(prefix, my_f2_dir, auto_only = FALSE)

@sbird1000
Copy link
Author

sbird1000 commented Jun 5, 2024 via email

@uqrmaie1
Copy link
Owner

uqrmaie1 commented Jun 5, 2024

I'm glad you're finding Admixtools 2 useful! One of my goals when working on it was to make these methods a bit more accessible to non-experts. But I realize it's still challenging.

A single sample per file won't work. It's necessary that all samples are in the same set of files. You can use the PLINK --merge-list option to merge the file sets. Feel free to email me if you run into other issues!

@sbird1000
Copy link
Author

sbird1000 commented Jun 10, 2024 via email

@sbird1000
Copy link
Author

sbird1000 commented Jun 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants