Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'P1' #53

Closed
grafau opened this issue Jul 26, 2020 · 29 comments
Closed

KeyError: 'P1' #53

grafau opened this issue Jul 26, 2020 · 29 comments

Comments

@grafau
Copy link

grafau commented Jul 26, 2020

Thanks for a quick reply to my previous issue. Here is another one, and I would be grateful for your help.

For two of many phenotypes I tried, I receive the following error from cluster:

Traceback (most recent call last):
  File "/home/rg187/bin/INSTALLS/kmeras/kmers_gwas.py", line 278, in <module>
    main()
  File "/home/rg187/bin/INSTALLS/kmeras/kmers_gwas.py", line 247, in main
    th_5per = get_threshold_from_perm(res, "P", args.n_permutations, 0.05)
  File "/home/rg187/bin/INSTALLS/kmeras/src/py/functions.py", line 110, in get_threshold_from_perm
    pvals.append(best_pvals[prefix + str(i)])
KeyError: 'P1'

It seems that the pipeline finished all 100 permutations, but collapsed around summarizing the results and deleting plink files (below are last 3 lines of log_file)

RUN: mv S_SS_50/kmers/pheno.100.P100.fam S_SS_50/kmers/pheno.100.P100.fam.orig
RUN: cat S_SS_50/pheno.phenotypes_and_permutations | tail -n +2 | awk '{print $1 " " $1 " 0 0 0 " $102}' > S_SS_50/kmers/pheno.100.P100.fam
RUNG (5/101): /home/rg187/bin/INSTALLS/kmeras/external_programs/gemma_0_96 -bfile S_SS_50/kmers/pheno.100.P100 -lmm 2 -k S_SS_50/pheno.kinship -outdir S_SS_50/kmers/output -o P100 -maf 0.050000 -miss 0.5

Here are EMMA_perm.log

EMMA_n_permutation = 100
EMMA_n_accessions = 159
EMMA_vg = 316.16095200859
EMMA_ve = 41.6692764404136
EMMA_herit = 0.883550149966293

Much obliged for your help.

@voichek
Copy link
Owner

voichek commented Jul 27, 2020

Dear Rafal,

Just to be sure, for other phenotypes with the same kmers-table the pipeline run normally ? So it is something specific for these two phenotypes?

Can you upload the log file from the run directory, and also the list of all files under the run directory ?

Best,
Yoav

@sandipmkale
Copy link

Hello Yoav,

I too had same issue. In my case, it was because all the values were not rounded to same number of digits.

With best regards

Sandip

@voichek
Copy link
Owner

voichek commented Jul 27, 2020

Hi Sandip,

Thank you very much for sharing this info and nice catch/debugging. I have no understanding of why this happened. I will check if I can fix or at least present a normal error message for these cases.

Rafal, can Sandip explanation explain your problem?

Best,
Yoav

@grafau
Copy link
Author

grafau commented Jul 29, 2020

Hi Yoav and Sandip,

I am still trying to figure it out; I don't think it is just the number of digits. I have one phenotype file that has similar variation in number of digits and it works well. I am experimenting with modifying the file that didn't work, but have no conclusions yet.

If that helps, I can send to you two phenotype files (one that works, and one that doesn't).

Best,
Rafal

@voichek
Copy link
Owner

voichek commented Jul 30, 2020

Dear Rafal,

I think it can be faster from your side if you will let me help you debug this. Can you respond to my previous message:
"Just to be sure, for other phenotypes with the same kmers-table the pipeline run normally ? So it is something specific for these two phenotypes?

Can you upload the log file from the run directory, and also the list of all files under the run directory ?"

Best,
Yoav

@grafau
Copy link
Author

grafau commented Jul 30, 2020

I was trying to reproduce this error today to get the list and the log file that you requested, but no error showed up. I got assigned to CPU's rather than GPU's on cluster today, but I don't know if that should make any difference...

Please keep this issue opened for a while, and I will report when I see the error again. There is a lot of phenotypes I will run next week, so hoping (not really... but would be happy to help debug) to see the error again.

Best,
Rafal

@sandipmkale
Copy link

sandipmkale commented Jul 31, 2020 via email

@voichek
Copy link
Owner

voichek commented Jul 31, 2020

Dear Sandip and Rafal,

I would be curious to look at the log files, and also if you have the list of files it succeed in creating.

I have two guesses for the reason for this, one is that the R packages which it depends on are missing from one cluster. The other, is that the compilation of the C++ code does not work in this system or it lacks the SSE4.1 commands the code depends on (less likely).

Best,
Yoav

@sandipmkale
Copy link

sandipmkale commented Jul 31, 2020 via email

@voichek
Copy link
Owner

voichek commented Jul 31, 2020

Dear Sandip,

Thanks for sharing this! I downloaded the file so if you want you can remove the sharing.

It seems that GEMMA was not able to run as there is nothing in the output directory (kmers/output).

Can you check if when you just run the GEMMA (e.g. with --help), it returns an error on this server?

In any case, my two previous guesses seems to be wrong from this log files.

@grafau
Copy link
Author

grafau commented Aug 2, 2020

I have run into the "P1" issue again. I traced it back and could see that it happens in the same partition of our slurm cluster. I logged in onto that partition and ran gemma -h without issue.

If you still need it, you can download the whole run directory here: https://drive.google.com/file/d/1qXEzpK8w7q3T7b0PbYIpFGs3ADAfq3vG/view?usp=sharing

@sandipmkale
Copy link

sandipmkale commented Aug 3, 2020 via email

@lierking
Copy link

lierking commented Aug 3, 2020

Dear Rafal,
When I finished the kmers_gwas.py program,no candidate results were obtained in the pass_threshold_10per and pass_threshold_5per files. Some informations of "~/kmers/output/phenotype_value.log.txt" was as follow:

...
## Summary Statistics:
## number of total individuals = 336
## number of analyzed individuals = 336
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 10001
## number of analyzed SNPs = 10001
## REMLE log-likelihood in the null model = -777.293
## MLE log-likelihood in the null model = -774.204
## pve estimate in the null model = 0.999954
## se(pve) in the null model = 0.000578608
## vg estimate in the null model = 33.0531
## ve estimate in the null model = 0.000330531
## beta estimate in the null model =   -1.73203e-14
## se(beta) =   0.000991828

I was very shocked that the number of total SNPs = 10001 . Actually my total kmer number is 25248947,and there were 25248947 in the file “/kmers/pheno.tested_kmers”.
Then I checked file “/kmers/pheno.0.phenotype_value.bim” and found that there were only 10001 kmers.
Then I checked file “/log_file” and found that the “n_kmers=10001” were marked as follows:

...
gemma_path='/yun/kmersGWAS/external_programs/gemma_0_96', kmers_len=31, kmers_pattern_counter=False, kmers_table='kmers_table_mac1p2', mac=5, maf=0.05, min_data_points=30, n_extra_phenotype_kmers=None, **n_kmers=10001**, n_permutations=100, **n_snps=10001**, outdir='out_GWAS', parallel=8, remove_intermediate=True, run_kmers=True, run_one_step_snps=False, run_two_steps_snps=False, snps_matrix=None, use_kinship_from_kmers=True)Unique phenotype per accession, copying phenotype data to directory
Using kinship calculated on k-mersRUN: python2.7 /yun/kmersGWAS/src/py/align_kinship_phenotype.py --pheno out_GWAS/pheno.original_phenotypes --fam_file out_GWAS/pheno_for_accessions_order.fam --kinship_file kmers_table_mac5p2.kinship --output_pheno  out_GWAS/pheno.phenotypes --output_kinship out_GWAS/pheno.kinship --DBs_list kmers_table_mac5p2.names
RUN: Rscript /yun/kmersGWAS/src/R/transform_and_permute_phenotypes.R out_GWAS/pheno.phenotypes out_GWAS/pheno.kinship 100 out_GWAS/pheno.phenotypes_and_permutations out_GWAS/pheno.phenotypes_permuted_transformed out_GWAS/EMMA_perm.log > out_GWAS/phenotypes_transformation_permutation.log
RUN: mkdir out_GWAS/kmers
RUN: /yun/kmersGWAS/bin/associate_kmers -p out_GWAS/pheno.phenotypes_permuted_transformed -b pheno -o out_GWAS/kmers **-n 10001** --parallel 8 --kmers_table kmers_table_mac5p2 --kmer_len 31 --maf 0.050000 --mac 5 2> out_GWAS/associate_kmers.log
We have 101 phenotypes
RUN: mv out_GWAS/kmers/pheno.0.phenotype_value.fam out_GWAS/kmers/pheno.0.phenotype_value.fam.orig
RUN: cat out_GWAS/pheno.phenotypes_and_permutations | tail -n +2 | awk '{print $1 " " $1 " 0 0 0 " $2}' > out_GWAS/kmers/pheno.0.phenotype_value.fam
RUNG (1/1): /yun/kmersGWAS/external_programs/gemma_0_96 -bfile out_GWAS/kmers/pheno.0.phenotype_value -lmm 2 -k out_GWAS/pheno.kinship -outdir out_GWAS/kmers/output -o phenotype_value -maf 0.050000 -miss 0.5
RUN: mv out_GWAS/kmers/pheno.1.P1.fam out_GWAS/kmers/pheno.1.P1.fam.orig
RUN: cat out_GWAS/pheno.phenotypes_and_permutations | tail -n +2 | awk '{print $1 " " $1 " 0 0 0 " $3}' > out_GWAS/kmers/pheno.1.P1.fam
...

But why? n_kmers=10001 and /associate_kmers ... -n 10001 ... ????

Much obliged for your help.
Best,
Lierk

@grafau
Copy link
Author

grafau commented Aug 3, 2020

Hi Lierk,

When you have an unrelated issue, it is good practice to start a new thread. Additionally, if you need help you should address the pipeline developer (not me, a random guy) in your message.

Finally, to save you some time, please read the paper by Voichek and Weigel carefully - they refer to the methods that result in your k-mer number.

Best,
Rafal

@voichek
Copy link
Owner

voichek commented Aug 3, 2020

Dear Rafal,

First, thanks for answering Lierk. Indeed, this issue is explained in our methods.

I've downloaded the directory you shared, thank you. However, it seems to have finished running the pipeline without errors, right?

Regarding running GEMMA on the part of the clusters that produce the error. Just to make sure, did you try to run the same gemma we use in the pipeline. In your case, it is located in:
/home/rg187/bin/INSTALLS/kmeras/external_programs/gemma_0_96

If you just typed gemma, it might tried to run a local version, and then it is not the right control.

Best,
Yoav

@lierking
Copy link

lierking commented Aug 4, 2020

Hi Rafal,
I'm very sorry to disturb you by mistake, and thank you very much for your generous reply.

Best,
Lierk

@lierking
Copy link

lierking commented Aug 4, 2020

Dear Rafal,

First, thanks for answering Lierk. Indeed, this issue is explained in our methods.

I've downloaded the directory you shared, thank you. However, it seems to have finished running the pipeline without errors, right?

Regarding running GEMMA on the part of the clusters that produce the error. Just to make sure, did you try to run the same gemma we use in the pipeline. In your case, it is located in:
/home/rg187/bin/INSTALLS/kmeras/external_programs/gemma_0_96

If you just typed gemma, it might tried to run a local version, and then it is not the right control.

Best,
Yoav

Dear Yoav,
Also thank you very much,I‘ll read the paper again.

Best,
Lierk

@voichek
Copy link
Owner

voichek commented Aug 19, 2020

Can I close this until it will bother someone again?

@grafau
Copy link
Author

grafau commented Aug 19, 2020

Ah, yes, sorry, I got into a whirlwind of things. Once I have a little more time to check and properly identify my failed run, I will reopen.

Thanks,
Rafal

@voichek voichek closed this as completed Aug 20, 2020
@MFSeidl
Copy link

MFSeidl commented Sep 22, 2020

Hey all, thanks for the discussion on this topic. I am currently trying to run the pipeline and get the same error. I have ~850 genomes and associated phenotype data, i.e. normalised growth under different conditions. I obtain the same error as @grafau . I tried to look into the logs and it seems the error might be connected with the permutations running with gemma. If I call gemma without any parameters there doesn't seem to be a problem. If I run one of the permutations from the log file I obtain an 'Illegal instruction (core dump)' error and no output is generated. I assume that this leads to the python error when trying to access the output. Thus, do you have an idea what might cause the error in the gemma execution? Thanks Michael

@voichek
Copy link
Owner

voichek commented Sep 22, 2020

I think this error can occur from multiple reasons. First, it is good you checked that gemma can run.
Can you also provide the list of files in the run directory - or ideally can you share the whole directory zipped, here or sending to my mail account?

Also can you copy here the full command line you used?

@voichek voichek reopened this Sep 22, 2020
@MFSeidl
Copy link

MFSeidl commented Sep 22, 2020

Thanks for your quick reply. The files are attached. The gemma command is:

/hosts/linuxhome/quorum/michael2/UU/kmerGWAS/kmerGWAS/external_programs/gemma_0_96 -bfile kmers/pheno.0.phenotype_value -lmm 2 -k pheno.kinship -outdir kmers/output -o phenotype_value -maf 0.050000 -miss 0.5

and the overall command is:

python2.7 ../kmerGWAS/kmers_gwas.py --pheno YPDCUSO410MM.pheno --kmers_table kmers_table -l 31 -p 6 --outdir testdir

The tar.gz of the run directory is here: https://www.dropbox.com/s/cpd5pqa4zrg88q4/testdir.tar.gz?dl=0

Thanks
Michael

@voichek
Copy link
Owner

voichek commented Sep 24, 2020

Dear Michael,

From looking at your directory it seems that indeed the gemma step was not able to run.

I now run the following two commands (from the directory you sent me with my local gemma executable):
(1) /hosts/linuxhome/quorum/michael2/UU/kmerGWAS/kmerGWAS/external_programs/gemma_0_96 -bfile testdir/kmers/pheno.0.phenotype_value -lmm 2 -k testdir/pheno.kinship -outdir testdir/kmers/output -o phenotype_value -maf 0.050000 -miss 0.5
(2) /hosts/linuxhome/quorum/michael2/UU/kmerGWAS/kmerGWAS/external_programs/gemma_0_96 -bfile testdir/kmers/pheno.1.P1 -lmm 2 -k testdir/pheno.kinship -outdir testdir/kmers/output -o P1 -maf 0.050000 -miss 0.5

Both of them run and outputted the expected files.
If I understood correctly you get an error running the second command and not the first?
If this is true it is a weird problem of gemma happening on your system but not mine. Let me know if this is correct, and we can think of solutions and can also report to the GEMMA team.

BTW, if you can run the first command you can already get the strongest k-mers, inside the kmers/output directory. You will not know the permutation defined threshold, but you can use in the meantime the a Bonferroni threshold using the number of unique k-mers presence/absence patterns. The number of tests/unique patterns in your case is 17,378,387 (found in pheno.tested_kmers). And by looking at the gemma results you have very strong associations, passing this threshold easily :)

Best wishes,
Yoav

@MFSeidl
Copy link

MFSeidl commented Sep 24, 2020

Hi Yoav,

thanks a lot for your efforts.

No, unfortunately I am getting the error with both commands (Illegal instruction (core dumped)). Thus I think it has to do something with the Gemma on my system. When I just run Gemma without any options it does show the version (or Gemma -h gives the help).

Cheers
Michael

@voichek
Copy link
Owner

voichek commented Sep 25, 2020

Dear Michael,

Please try to run the same GEMMA command with the new release of GEMMA (0.98.1)
https://github.com/genetics-statistics/GEMMA/releases

If it will work then I can will modify the pipeline a bit so it can also take results from the newer version of gemma. Please let me know if it works, otherwise we should open an issue on their github page.

Best,
Yoav

@MFSeidl
Copy link

MFSeidl commented Sep 25, 2020

Dear Yoav,

the commands run with the static build of GEMMA (0.98.1). So there is something not okay on my system with the build that is delivered with your package. I also downloaded the 0.96 build from GitHub and obtained the same error.

Cheers
Michael

@voichek
Copy link
Owner

voichek commented Sep 25, 2020

Dear Michael,

I tried to do a quick fix so the pipeline will run with gemma 0.98.1, please try to run the pipeline after replacing these two files with the files with the same names in the main library directory and under src/py/

replacement.zip

Then you can indicate to the pipeline to use the gemma 0.98.1 you downloaded using the --gemma_path parameter when you run kmers_gwas.py.

Please let me know if this solves the problem.
Best,
Yoav

@MFSeidl
Copy link

MFSeidl commented Sep 25, 2020

Hi Yoav,

replacing the file and running the pipeline works. I now get the pass_threshold_x files with the significant k-mers. Now I can try to check out what these k-mers are ;) Thanks a lot for your help

Cheers
Michael

@voichek
Copy link
Owner

voichek commented Sep 25, 2020

Dear Michael,

This is good to hear, I will try to find time to incorporate this changes for a next release so people can use different GEMMA versions.

Best wishes,
Yoav

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants