Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alleles not equal on tsinfer verify, v 0.2.0 #490

Closed
stsmall opened this issue Apr 22, 2021 · 6 comments · Fixed by #492
Closed

Alleles not equal on tsinfer verify, v 0.2.0 #490

stsmall opened this issue Apr 22, 2021 · 6 comments · Fixed by #492

Comments

@stsmall
Copy link

stsmall commented Apr 22, 2021

Hi,
I am working with the tsinfer version 0.2.0. When running verify, I get an error about alleles not being equal. I checked that all my sites have only 2 alleles and that the ancestral allele matches either the REF or ALT. I also removed all singletons, not sure if this was advised or not. I used this script (https://github.com/stsmall/Kiribina_Folonzo/blob/debug/vcf2tsinfer.py) which was a modification of the one in the tutorial. tsinfer ran to completion without error.
thanks for the help!
@stsmall

$ tsinfer list chrX.trees
path = chrX.trees
size = 519.0 MiB
edges = 9215138
trees = 910682
sites = 2160904
mutations = 2124826

$ tsinfer verify chrX.samples chrX.trees
2021-04-22 10:36:02,999 [47534] CRITICAL root: Traceback (most recent call last):
File "tsinfer", line 8, in
sys.exit(main())
File "tsinfer/main.py", line 5, in main
cli.tsinfer_main()
File "tsinfer/cli.py", line 499, in tsinfer_main
args.runner(args)
File "tsinfer/cli.py", line 238, in run_verify
tsinfer.verify(samples, ts, progress_monitor=args.progress)
File "tsinfer/inference.py", line 181, in verify
raise ValueError(f"alleles not equal: {var1.alleles} != {var2.alleles}")
ValueError: alleles not equal: ('C', 'T') != ('C',)

@hyanwong
Copy link
Member

Ah, it looks like these are monomorphic sites, right? Perhaps we forgot to test how those are handled in verify(). I suspect you can ignore this until we fix it, though.

@stsmall
Copy link
Author

stsmall commented Apr 22, 2021

ugh, yep. Monomorphic sites. I am sorry that I missed that.

@stsmall stsmall closed this as completed Apr 22, 2021
@hyanwong
Copy link
Member

No problem, we should handle this anyway.

@hyanwong
Copy link
Member

hyanwong commented Apr 22, 2021

Here's a quick test case that current fails:

ts = msprime.sim_ancestry(3, ploidy=1, sequence_length=10, random_seed=123)
ts = msprime.sim_mutations(ts, rate=0.5, model="binary", random_seed=1)
sample_data = tsinfer.SampleData.from_tree_sequence(ts, use_sites_time=False)
is_monomorphic = np.all(np.diff(sample_data.sites_genotypes[:], axis=1)==0, axis=1)
assert len(is_monomorphic) == sample_data.num_sites
assert np.any(is_monomorphic)
ts_inf = tsinfer.infer(sample_data)
tsinfer.verify(sample_data, ts_inf)

Giving:

ValueError: alleles not equal: ('0', '1') != ('0',)

@hyanwong
Copy link
Member

hyanwong commented Apr 22, 2021

The new msprime mutation model is also not guaranteed to give the non-ancestral alleles in the same order as input. I'm working on a fix.

Edit - here's an example:

ts = msprime.sim_ancestry(3, ploidy=1, sequence_length=10, random_seed=123)
ts = msprime.sim_mutations(ts, rate=0.2, random_seed=1)
sd = tsinfer.SampleData.from_tree_sequence(ts, use_sites_time=False)
ts_inf = tsinfer.infer(sd)
has_alt_order = False
for v1, v2 in zip(sd.variants(), ts_inf.variants()):
    if set(v1.alleles) == set(v2.alleles) and v1.alleles != v2.alleles:
        has_alt_order = True
assert has_alt_order
tsinfer.verify(sd, ts_inf)

Currently giving:

ValueError: alleles not equal: ('C', 'T', 'A') != ('C', 'A', 'T')

@hyanwong
Copy link
Member

@stsmall - if you have a moment could you check that the latest GitHub version of tsinfer now works with your files with the monomorphic sites?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants