-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fit_nullmodel Output is mostly Null and 0 #49
Comments
I would like to add that my WGS VCFs did not have the AVGDP annotation, so I added that manually by copying the DP values for every variant as follows for every chromosome. ` avgdp<- seqGetData(genofile , "annotation/info/DP") ` |
Hi @samreenzafer, Thanks for reaching out. It's OK that your aGDS files do not have a specific annotation named For your question, I wonder whether the Best, |
Thank you for responding. for the sampleID question , yes they are overlapping.
|
It seems like your outcome variable is binary, and if so, you should use Hope this helps. |
For group, do you mean I should use my I will try the I hadn't seen any instructions on the tutorial page to format the phenotype file, and had just assumed the plink-like notation. That was my bad. |
I got this error while using the suggested logit Error in glmmkin(fixed = fixed, data = data, kins = kins, id = id, random.slope = random.slope, :
|
Hi @samreenzafer, for binary traits, please leave |
I'm not sure what I'm doing wrong but I get errors.
Then I edited the command to remove the
And I still get an error, but after 500 iterations.
I don't understand how the program runs, so not sure what is it that is " not a matrix " here. |
Is it even okay for me to have all 3 of my ethnic group of samples analysed together (the creating of the SGRM, fitting null model and all the variant analyses)? Or should this pipeline be run separately for the 3 ethnic groups? |
And here is my GRM object
Which I had created using the following command.
outputting |
Hi @xihaoli I tried a few things, and realized that the "SNPSEX" in the phenotype is causing the problems for me. I refitted the null object without using SNPSEX as a covariate (also did NOT use group_race)
The code ran to completion in only 16 iterations (as opposed to 500 iterations when I used SNPSEX which had given me the "not a matrix" error). Then I tried one randome Individual_Analysis on arrayid=130 which gave the output log
and output of
which now has numeric Pvalues in the PValue column. I think I can now run for all the entire job array. Question: Why do you think SNPSEX could be causing a problem?
where SNPSEX is 1 (for male) or 2 (for female), |
Hi @samreenzafer, Thanks for following up. First of all, your output log of For your question, yes it is possible that the non-convergence issue is on Lastly, even if you end up not including |
That makes sense. The cases in our cohort are all male, whereas controls have males and females (only for the European cohort). You can see the tables below.
|
Hi @samreenzafer, Thanks for checking. It all makes sense now. Here you may not be able to include
This sounds good. |
Thank you again for all your help, I’ve begun running the association
analyses and so far so good, I must say that the pipeline steps have been
implemented so well that tweaking for my directory structure and filenames
etc is very easy.
We are interested in Noncoding gene region and hence using your pipeline.
Regards,
Samreen
…On Wed, Mar 13, 2024 at 12:53 PM Xihao Li ***@***.***> wrote:
Hi @samreenzafer <https://github.com/samreenzafer>,
Thanks for checking. It all makes sense now. Here you may not be able to
include SNPSEX in the null model fitting, because knowing a sample with
SNPSEX=2 guarantees that pheno=0.
Also we have small number of cases in the Afr and Hispanic cohorts as opposed to the European cohort.
So I think adding as.factor(group_race) to the fixed effect formula when fitting the null model , and Excluding SNPSEX makes most sense.
This sounds good.
—
Reply to this email directly, view it on GitHub
<#49 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADLD6ZQG7SJZWO7CGAHJPTYYCVCBAVCNFSM6AAAAABERETKGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJVGU3TGMBSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi Samreen, This is great to hear. You may close this case for now, and if you have any questions, please feel free to reopen it. Best, |
Hi,
I'm not sure if I'm getting a reasonable output of the null model.
My genotyping (WGS) data has 1026 samples.
`
genofile
Object of class "SeqVarGDSClass"
File: /analysis/21-WGS/analysis/STAARpipeline/gds_with_AVGDP/chr11.gds (283.1M)
|--+ description [ ] *
|--+ sample.id { Str8 1026 LZMA_ra(13.0%), 1.1K } *
|--+ variant.id { Int32 2496036 LZMA_ra(4.50%), 439.1K } *
|--+ position { Int32 2496036 LZMA_ra(30.8%), 2.9M } *
|--+ chromosome { Str8 2496036 LZMA_ra(0.02%), 1.2K } *
|--+ allele { Str8 2496036 LZMA_ra(15.8%), 1.7M } *
|--+ genotype [ ] *
| |--+ data { Bit2 2x1026x2496036 LZMA_ra(4.87%), 59.5M } *
| |--+ extra.index { Int32 3x0 LZMA_ra, 18B } *
| --+ extra { Int16 0 LZMA_ra, 18B }
|--+ phase [ ]
| |--+ data { Bit1 1026x2496036 LZMA_ra(0.01%), 45.6K } *
| |--+ extra.index { Int32 3x0 LZMA_ra, 18B } *
| --+ extra { Bit1 0 LZMA_ra, 18B }
|--+ annotation [ ]
| |--+ id { Str8 2496036 LZMA_ra(9.76%), 2.8M } *
| |--+ qual { Float32 2496036 LZMA_ra(62.4%), 5.9M } *
| |--+ filter { Int32,factor 2496036 LZMA_ra(2.17%), 211.1K } *
| |--+ info [ ]
| | |--+ AC { Int32 2496036 LZMA_ra(18.8%), 1.8M } *
| | |--+ AF { Float32 2496036 LZMA_ra(36.3%), 3.5M } *
| | |--+ ALLELE_END { Bit1 2496036 LZMA_ra(0.06%), 201B } *
| | |--+ AN { Int32 2496036 LZMA_ra(12.2%), 1.2M } *
| | |--+ AS_BaseQRankSum { Float32 0 LZMA_ra, 18B } *
| | |--+ AS_FS { Float32 1531413 LZMA_ra(26.4%), 1.5M } *
| | |--+ AS_InbreedingCoeff { Float32 3060955 LZMA_ra(23.0%), 2.7M } *
| | |--+ AS_MQ { Float32 0 LZMA_ra, 18B } *
| | |--+ AS_MQRankSum { Float32 0 LZMA_ra, 18B } *
| | |--+ AS_QD { Float32 3060955 LZMA_ra(35.3%), 4.1M } *
| | |--+ AS_ReadPosRankSum { Float32 0 LZMA_ra, 18B } *
| | |--+ AS_SB_TABLE { Str8 2496036 LZMA_ra(0.02%), 517B } *
| | |--+ AS_SOR { Float32 1531413 LZMA_ra(28.4%), 1.7M } *
| | |--+ BaseQRankSum { Float32 2496036 LZMA_ra(39.8%), 3.8M } *
| | |--+ ClippingRankSum { Float32 2496036 LZMA_ra(34.4%), 3.3M } *
| | |--+ DB { Bit1 2496036 LZMA_ra(0.06%), 201B } *
| | |--+ DP { Int32 2496036 LZMA_ra(40.8%), 3.9M } *
| | |--+ DS { Bit1 2496036 LZMA_ra(0.06%), 201B } *
| | |--+ END { Int32 2496036 LZMA_ra(0.02%), 1.6K } *
| | |--+ ExcessHet { Float32 2496036 LZMA_ra(24.4%), 2.3M } *
| | |--+ FS { Float32 2496036 LZMA_ra(35.9%), 3.4M } *
| | |--+ InbreedingCoeff { Float32 2496036 LZMA_ra(24.2%), 2.3M } *
| | |--+ MLEAC { Int32 2642114 LZMA_ra(17.2%), 1.7M } *
| | |--+ MLEAF { Float32 2642114 LZMA_ra(18.6%), 1.9M } *
| | |--+ MQ { Float32 2496036 LZMA_ra(12.3%), 1.2M } *
| | |--+ MQ0 { Int32 2496036 LZMA_ra(0.02%), 1.6K } *
| | |--+ MQRankSum { Float32 2496036 LZMA_ra(35.7%), 3.4M } *
| | |--+ NEGATIVE_TRAIN_SITE { Bit1 2496036 LZMA_ra(45.5%), 138.7K } *
| | |--+ POSITIVE_TRAIN_SITE { Bit1 2496036 LZMA_ra(96.5%), 294.1K } *
| | |--+ QD { Float32 2496036 LZMA_ra(37.5%), 3.6M } *
| | |--+ RAW_MQ { Float32 2496036 LZMA_ra(0.02%), 1.6K } *
| | |--+ ReadPosRankSum { Float32 2496036 LZMA_ra(39.5%), 3.8M } *
| | |--+ SOR { Float32 2496036 LZMA_ra(36.3%), 3.5M } *
| | |--+ VQSLOD { Float32 2496036 LZMA_ra(41.1%), 3.9M } *
| | |--+ culprit { Str8 2496036 LZMA_ra(7.96%), 1.0M } *
| | |--+ F_MISSING { Float32 2496036 LZMA_ra(13.2%), 1.3M } *
| | |--+ AN_case { Int32 2496036 LZMA_ra(6.09%), 593.7K } *
| | |--+ AN_control { Int32 2496036 LZMA_ra(11.0%), 1.0M } *
| | |--+ AC_case { Int32 2496036 LZMA_ra(9.24%), 900.6K } *
| | |--+ AC_control { Int32 2496036 LZMA_ra(19.1%), 1.8M } *
| | |--+ AC_Het_case { Int32 2496036 LZMA_ra(8.40%), 818.7K } *
| | |--+ AC_Het_control { Int32 2496036 LZMA_ra(17.7%), 1.7M } *
| | |--+ AC_Het { Int32 2496036 LZMA_ra(17.7%), 1.7M } *
| | |--+ AF_case { Float32 2496036 LZMA_ra(13.4%), 1.3M } *
| | |--+ AF_control { Float32 2496036 LZMA_ra(34.4%), 3.3M } *
| | |--+ HWE_case { Float32 2496036 LZMA_ra(7.81%), 761.3K } *
| | |--+ HWE_control { Float32 2496036 LZMA_ra(22.8%), 2.2M } *
| | |--+ HWE { Float32 2496036 LZMA_ra(23.6%), 2.2M } *
| | |--+ FunctionalAnnotation [ spec_tbl_df,tbl_df,tbl,data.frame,list ] *
| | | |--+ VarInfo { Str8 2496036 LZMA_ra(17.0%), 6.7M }
| | | |--+ apc_conservation { Float64 2496036 LZMA_ra(79.1%), 15.1M }
| | | |--+ apc_epigenetics { Float64 2496036 LZMA_ra(81.4%), 15.5M }
| | | |--+ apc_epigenetics_active { Float64 2496036 LZMA_ra(75.9%), 14.5M }
| | | |--+ apc_epigenetics_repressed { Float64 2496036 LZMA_ra(50.5%), 9.6M }
| | | |--+ apc_epigenetics_transcription { Float64 2496036 LZMA_ra(43.6%), 8.3M }
| | | |--+ apc_local_nucleotide_diversity { Float64 2496036 LZMA_ra(79.9%), 15.2M }
| | | |--+ apc_mappability { Float64 2496036 LZMA_ra(29.7%), 5.7M }
| | | |--+ apc_protein_function { Float64 2496036 LZMA_ra(2.15%), 419.9K }
| | | |--+ apc_transcription_factor { Float64 2496036 LZMA_ra(7.10%), 1.4M }
| | | |--+ cage_tc { Str8 2496036 LZMA_ra(5.19%), 183.1K }
| | | |--+ metasvm_pred { Str8 2496036 LZMA_ra(0.82%), 20.1K }
| | | |--+ rsid { Str8 2496036 LZMA_ra(36.4%), 9.4M }
| | | |--+ fathmm_xf { Float64 2496036 LZMA_ra(50.6%), 9.6M }
| | | |--+ genecode_comprehensive_category { Str8 2496036 LZMA_ra(0.57%), 147.1K }
| | | |--+ genecode_comprehensive_info { Str8 2496036 LZMA_ra(6.10%), 3.2M }
| | | |--+ genecode_comprehensive_exonic_category { Str8 2496036 LZMA_ra(1.19%), 34.6K }
| | | |--+ genecode_comprehensive_exonic_info { Str8 2496036 LZMA_ra(7.53%), 489.1K }
| | | |--+ genehancer { Str8 2496036 LZMA_ra(0.39%), 365.7K }
| | | |--+ linsight { Float64 2496036 LZMA_ra(21.7%), 4.1M }
| | | |--+ cadd_phred { Float64 2496036 LZMA_ra(23.7%), 4.5M }
| | | --+ rdhs { Str8 2496036 LZMA_ra(3.81%), 317.0K }
| | --+ AVGDP { Int32 2496036 LZMA_ra(40.8%), 3.9M } *
| --+ format [ ]
--+ sample.annotation [ ]
`
The sgrm matrix is 1026x1026 . I made this using --degree 2 (everything else the same, I know non of my cases are related, and only 1 pair of controls are 1st degree relatives, which the SGRM pipeline identified)
Phenotype has 1004 samples, with a binary variable, which I've created as follows:
The text was updated successfully, but these errors were encountered: