Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when estimating hybrid index using default settings #4

Open
selasphoruskershaw opened this issue Jul 18, 2024 · 11 comments
Open

Comments

@selasphoruskershaw
Copy link

Hello,

I successfully ran est_p, and am now trying to estimate the hybrid index using default settings. The data for the putative hybrids is formatted the same way as the parentals, but I am getting an error:

  1. I successfully ran this:

estimate parental allele frequencies, uses default HMC settings

p_out<-est_p(G0=gt.alhu,G1=gt.ruhu,model="genotype",ploidy="diploid")

  1. I received the error here:

estimate hybrid indexes, uses default HMC settings

and uses point estimates (posterior medians) of allele frequencies

h_out<-est_hi(Gx=gt.hybrids,p0=p_out$p0[,1],p1=p_out$p1[,1],model="genotype",ploidy="diploid")

The error:
"Error : Exception: variable does not exist; processing stage=data initialization; variable name=L; base type=int (in 'hi', line 14, column 1 to column 7)
failed to create the sampler; sampling not done
Stan model 'hi' does not contain samples.
Error in apply(rstan::extract(fit, "H")[[1]], 2, quantile, probs = c(0.5, :
dim(X) must have a positive length"

I'd greatly appreciate any help you can provide - the package seems very promising.

@zgompert
Copy link
Owner

zgompert commented Jul 19, 2024 via email

@selasphoruskershaw
Copy link
Author

Hello,

I think I figured out what is going on, but I'm unsure exactly how to solve it. When I enter dim(gt.hybrids), I get "NULL".

There is an issue after I convert my vcfs into genind objects. For example, with "gt.ruhu", which is a genind object converted from my vcf file of one of the parental species, I have a character matrix of genotypes: "chr [1:13268, 1:27] "0/1" "0/1"" etc

I convert these data to an integer form to make the dataset compatible with bgchm using the code below:
gt.ruhu <- extract.gt(vcf.ruhu)
gt.ruhu[gt.ruhu == "0/0"]<-0
gt.ruhu[gt.ruhu == "0/1"]<-1
gt.ruhu[gt.ruhu == "1/1"]<-2
gt.ruhu <- as.integer(gt.ruhu)

I then have a matrix that reads as follows for gt.ruhu: "int [1:358236] 0 0 0 1 2" etc

This happens for both parentals, and for my hybrids, and clearly appears to be the source of the issue.

So while est_p initially works, it is essentially estimating nonsense, and this is why I believe that est_hi does not work downstream.

I've tried many things over the last few days to correct this - do you think you can point me in the right direction?

Thanks again!

@zgompert
Copy link
Owner

zgompert commented Jul 23, 2024 via email

@selasphoruskershaw
Copy link
Author

Yes, this makes sense, thank you. I tried this before, and everything appears correct, but the data still appear to be in character format, in comparison to the tutorial dataset (GenHybrids, GenP0, GenP1), which are in integer format (see screenshot below). In character format, when I try to run est_p, I get the error below:

Error : Exception: variable does not exist; processing stage=data initialization; variable name=G0; base type=int (in 'p', line 5, column 1 to column 33)
failed to create the sampler; sampling not done
Stan model 'p' does not contain samples.
Error in apply(rstan::extract(fit, "P0")[[1]], 2, quantile, probs = c(0.5, :
dim(X) must have a positive length

After applying as.integer to each character matrix, est.p works, but the matrix is obviously jumbled after applying as.integer, as mentioned before. Is there a workaround to get my data in the correct format for est.p? Thank you.

Screen Shot 2024-07-23 at 10 08 55 AM

@zgompert
Copy link
Owner

zgompert commented Jul 24, 2024 via email

@selasphoruskershaw
Copy link
Author

selasphoruskershaw commented Jul 24, 2024

Thank you so much. A subset of my data is attached below. Below is the code I've been using - I've tried multiple variations, but don't include them here for simplicity.

#run bgchm
#generate a cline from a filtered, biallelic vcf
library("vcfR")
library("adegenet")
library("SNPfiltR")
library("data.table")
library("bgchm")

#import vcfs
vcf.ruhu <-read.vcfR("ruhu.phase2.Z.vcf.gz")
vcf.alhu <-read.vcfR("alhu.phase2.Z.vcf.gz")
vcf.hybrids <-read.vcfR("hybrids.phase2.Z.vcf.gz")

#convert vcfs to genind objects
gt.ruhu <- extract.gt(vcf.ruhu)
gt.ruhu[gt.ruhu == "0/0"]<-0
gt.ruhu[gt.ruhu == "0/1"]<-1
gt.ruhu[gt.ruhu == "1/1"]<-2

gt.alhu <- extract.gt(vcf.alhu)
gt.alhu[gt.alhu == "0/0"]<-0
gt.alhu[gt.alhu == "0/1"]<-1
gt.alhu[gt.alhu == "1/1"]<-2

gt.hybrids <- extract.gt(vcf.hybrids)
gt.hybrids[gt.hybrids == "0/0"]<-0
gt.hybrids[gt.hybrids == "0/1"]<-1
gt.hybrids[gt.hybrids == "1/1"]<-2

dim(gt.hybrids)

#estimate parental allele frequencies, uses default HMC settings
p_out<-est_p(G0=gt.alhu,G1=gt.ruhu,model="genotype",ploidy="diploid")
p_out

#estimate hybrid indexes, uses default HMC settings
#and uses point estimates (posterior medians) of allele frequencies
h_out<-est_hi(Gx=gt.hybrids,p0=p_out$p0[,1],p1=p_out$p1[,1],model="genotype",ploidy="diploid")

#plot hybrid index estimates with 90% equal-tail probability intervals
#sorted by hybrid index, just a nice way to visualize that in this example we have
#few hybrids with intermediate hybrid indexes
plot(sort(h_out$hi[,1]),ylim=c(0,1),pch=19,xlab="Individual (sorted by HI)",ylab="Hybrid index (HI)")
segments(1:100,h_out$hi[order(h_out$hi[,1]),3],1:100,h_out$hi[order(h_out$hi[,1]),4])

#estimate interspecific ancestry, this can
#be especially informative about the types of hybrids present
q_out<-est_Q(Gx=gt.hybrids,p0=p_out$p0[,1],p1=p_out$p1[,1],model="genotype",ploidy="diploid")

hybrids.phase2.Z.vcf.gz
alhu.phase2.Z.vcf.gz
ruhu.phase2.Z.vcf.gz

@zgompert
Copy link
Owner

zgompert commented Aug 2, 2024 via email

@selasphoruskershaw
Copy link
Author

Good evening! Thank you for your message - unfortunately I don't see a script attached. Do you think you might be able to resend it?

@zgompert
Copy link
Owner

zgompert commented Aug 2, 2024 via email

@selasphoruskershaw
Copy link
Author

Update: I've gotten the script you sent over to work!

One follow-up question - if I try a VCF with a different chromosome, I get the following error:

code: p_out<-est_p(G0=t(gt.alhu),G1=t(gt.ruhu),model="genotype",ploidy="diploid")

Error in FUN(X[[i]], ...) : Stan does not support NA (in G0) in data
failed to preprocess the data; sampling not done
Stan model 'p' does not contain samples.
Error in apply(rstan::extract(fit, "P0")[[1]], 2, quantile, probs = c(0.5, :
dim(X) must have a positive length

I checked my matrices and there are NA peppered throughout. This is not the case with the initial VCF files I got to successfully run (with the code you provided). Can you advise for dealing these NA?

Thanks again!

@selasphoruskershaw
Copy link
Author

Update: I ran the following code to generate a cline: gc_out<-est_genocl(Gx=t(gt.hybrids),p0=p_out$p0[,1],p1=p_out$p1[,1],H=h_out$hi[,1],model="genotype",ploidy="diploid",hier=TRUE,n_iters=4000)

It took about 4 days to run, then appeared to complete, but then gave this set of errors. Any help is appreciated - thanks again:

Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Calls: ... doTryCatch -> sendData -> sendData.SOCK0node -> serialize
Execution halted
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Calls: ... doTryCatch -> sendData -> sendData.SOCK0node -> serialize
Execution halted
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Calls: ... doTryCatch -> sendData -> sendData.SOCK0node -> serialize
Execution halted
Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal
Calls: ... doTryCatch -> sendData -> sendData.SOCK0node -> serialize
Execution halted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants