-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch ancestor matching #917
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #917 +/- ##
==========================================
+ Coverage 87.04% 93.31% +6.27%
==========================================
Files 5 18 +13
Lines 1767 6279 +4512
Branches 310 1131 +821
==========================================
+ Hits 1538 5859 +4321
- Misses 140 285 +145
- Partials 89 135 +46
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm in favour of dropping LMBD for simpler file-system based methods 👍
tsinfer/inference.py
Outdated
os.makedirs(match_data_dir, exist_ok=True) | ||
for file in os.listdir(match_data_dir): | ||
with open(os.path.join(match_data_dir, file), "rb") as f: | ||
(group, num_sites), batch_results = pickle.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worthwhile making dataclass for this payload, we'll inevitably need to add more things to it.
If we do this, will it be possible to run a lightweight version of tsinfer on pyodide? Since pyodide already includes |
Yes, seems likely. We'd have to drop the original SampleData format though. |
19207b3
to
8729437
Compare
The latest commit implements part of the scheme suggested at #921. Remaining is the partition matching function. |
8216f5b
to
dadb93c
Compare
This is ready to go - have filed #932 for follow up work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. A couple of high-level things:
- The mix of pathlib and os.path/os functions is weird. I would suggest embracing pathlib by doing
some_path = pathlib.Path(some_path)
at the top of a function that takes a pathlike as an argument, and then exclusively using pathlib functions. It leads to much more readable code. (for new code, old code can stay as is) - There's some stuff that's not tested, mostly error conditions. Good to cover these.
tsinfer/inference.py
Outdated
else: | ||
total_work = sum(ancestor_lengths[ancestor] for ancestor in group_ancestors) | ||
min_work_per_job_group = min_work_per_job | ||
if total_work / 1000 > min_work_per_job: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably good to cover this somehow, it's the sort of thing that would catch us out in real applications. Could make the 1000
a parameter, and then test with something small?
tsinfer/inference.py
Outdated
partitions.append(current_partition) | ||
# Make directories for the path data | ||
if len(partitions) > 1: | ||
os.mkdir(os.path.join(working_dir, f"group_{group_index}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the attraction of mixing os.path/os.mkdir etc with pathlib. I would make sure working_dir
is a pathlib at the top of the function, and then just use pathlib methods on it:
group_dir = working_dir / f"group_{group_index}
group_dir.mkdir()
Addressed the comments here - believe this is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just spotted a few redundant comments ("x+= 1 # Add one to x")
We should cover those error cases if we can also. Happy to merge, though
Comments addressed. I've added a bit more testing - some of the missing coverage in error cases is in code paths that will be removed by the sample matching batch refactor. |
Merge away |
Stacked on #896