Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default SGkit alleles are byte strings, which confuses tsinfer: #810

Open
hyanwong opened this issue Mar 21, 2023 · 2 comments · May be fixed by #928
Open

Default SGkit alleles are byte strings, which confuses tsinfer: #810

hyanwong opened this issue Mar 21, 2023 · 2 comments · May be fixed by #928

Comments

@hyanwong
Copy link
Member

hyanwong commented Mar 21, 2023

If sd.sites_alleles[:] are byte strings, as occurs e.g. when using sgkit.simulate_genotype_call_dataset(), then when performing inference we get an error

>>> print(sd.sites_alleles[:])
[[b'T' b'A']
 [b'C' b'T']
 [b'A' b'G']
 [b'C' b'G']
 [b'C' b'T']
 [b'G' b'C']
 [b'A' b'G']
 [b'T' b'A']]
tsinfer.infer(sd)

... 
 1886     derived_state[mutation_id] = site.reorder_alleles().index(allele)
   1887     mutation_id += 1
   1888 site_id += 1

ValueError: tuple.index(x): x not in tuple

Because allele here has been translated to a normal string, e.g. "T", whereas the site.reorder_alleles() method returns a tuple of byte strings ((b'C', b'T'))

@jeromekelleher
Copy link
Member

I feel like this is an sgkit bug, as in we should have predictable usage of str types rather than bytes?

@benjeffery
Copy link
Member

As a workaround we can convert in the SgkitSampleData class sites_alleles property accessor?

hyanwong added a commit to hyanwong/tsinfer that referenced this issue Jun 6, 2024
@hyanwong hyanwong linked a pull request Jun 6, 2024 that will close this issue
hyanwong added a commit to hyanwong/tsinfer that referenced this issue Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants