Add map_deletions_to_ts as Dataset method by hyanwong · Pull Request #429 · tskit-dev/sc2ts

hyanwong · 2024-12-03T23:54:16Z

This seemed like the neatest API for mapping deletions, as we require a Dataset object, and can add methods to it easily enough, i.e.

ds = sc2ts.Dataset("../data/viridian_2024-04-29.alpha1.zarr.zip")
ts = tszip.load("../data/find_problematic_v2-2023-01-01.ts.tsz")

start = 11284
end = 11302
del_ts = ds.map_deletions_to_ts(ts, start, end)

I have coded it so that the first sample could be Wuhan, or not. I have also put a stub for a test in, but I'm not sure how to actually test it, as I don't know if we have a test tree sequence equivalent of fx_dataset with the same named samples.

jeromekelleher · 2024-12-04T09:54:17Z

Nice, thanks @hyanwong.

I think we'd probably arrange the API a bit differently, in that what we'll probably want to do is remap all deletions that pass a specific frequency threshold, so we'll likely pass in a list of site IDs rather than one range. I'd also like to add some metadata so we can track these mutations more easily in analysis.

Leave it with me and I'll rejig and write tests when I get a chance.

hyanwong · 2024-12-04T14:26:21Z

Great. Re ranges, given the issues I just had with alignments (and after chatting to Isobel), it does seem worth including a short portion of flanking regions too: i.e. don't believe the site positions / ids are completely accurate w.r.t. deletions.

I was pleasantly surprised that when I remapped the range of 11280 to 11305, counting only "significant" mutations that lead to more than 50 samples, the only regions with deletions were 11283-11296, as discussed in jeromekelleher/sc2ts-paper#249 (comment). That implies that significant deletions might actually be quite rare, and we might be able to pass in all the sites as a first approximation, then narrow down to only those with significant deletions.

jeromekelleher · 2024-12-04T14:34:59Z

My guess is that if we do something like including only sites with > 10% frequency of deletions (or something) we'll get a very good approx. We track this in the site QC of the ARG:

Add map_deletions_to_ts as Dataset method

393636f

jeromekelleher force-pushed the remap branch from 19713f0 to 393636f Compare February 5, 2025 12:06

jeromekelleher merged commit 3158f29 into tskit-dev:main Feb 5, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add map_deletions_to_ts as Dataset method#429

Add map_deletions_to_ts as Dataset method#429
jeromekelleher merged 1 commit intotskit-dev:mainfrom
hyanwong:remap

hyanwong commented Dec 3, 2024 •

edited

Loading

Uh oh!

jeromekelleher commented Dec 4, 2024

Uh oh!

hyanwong commented Dec 4, 2024

Uh oh!

jeromekelleher commented Dec 4, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hyanwong commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromekelleher commented Dec 4, 2024

Uh oh!

hyanwong commented Dec 4, 2024

Uh oh!

jeromekelleher commented Dec 4, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hyanwong commented Dec 3, 2024 •

edited

Loading