New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutation map #711
Mutation map #711
Conversation
750a529
to
f72c31c
Compare
Gee, looks just right. Good thinking. |
Codecov Report
@@ Coverage Diff @@
## master #711 +/- ##
=========================================
+ Coverage 87.43% 87.5% +0.07%
=========================================
Files 13 14 +1
Lines 7902 8045 +143
Branches 1628 1648 +20
=========================================
+ Hits 6909 7040 +131
- Misses 499 505 +6
- Partials 494 500 +6
Continue to review full report at Codecov.
|
OK, that's most of it done. Just need to update the high-level code and test now. The only change here is that the map is defined by two equal length arrays now. There's no point in specifying the end value, as it just makes things tricky. |
89a935a
to
bd2f8f7
Compare
720bf7d
to
3c1099d
Compare
@daniel-goldstein, I'd appreciate a quick look over this please. It's a WIP where I'm refactoring the recombination map class to make use of a What do you think of the basic setup? I think the plan now is to make a high-level Pythonic interface for the IntervalMap class and to give it some useful methods. We then refactor the RecombinationMap class to use this (not quite sure how yet), but maintaining the old interface for compatibility. I'm also changing my mind about insisting the arrays are the same length in Python - really, the cc also @grahamgower - I know you were thinking about #902. I think this PR should also close that issue (or at least plan to with a concrete follow up). |
This is very cool. One thing that I had in mind for #902 is that it would be nice for the python interface to do zero memory allocations/copies when getting the |
Thanks!
Ah, yes, now is this tricky. I've thought about doing it before (for table columns in tskit) and ended up giving up and just returning copies. In principle, we should be able to use this method and set the base member to the IntervalMap instance so that ref counting works properly. There are some worries about then changing the shape of the array (or something) and then breaking underlying C code, but maybe it can be set to a read-only-ish mode. It would be handy to directly access the underlying arrays when manipulating these maps... I might have another go at it, it's good idea. |
Ah yes, now I remember why I didn't do this for Table columns. This is because they are dynamic, and the column memory can be realloced, therefore making the memory pointed to by the numpy array invalid. We don't have this problem here though. |
Could you get round this by supporting a callback that the C API calls with the new memory address when ever it re-allocates? Would clearly add complexity - do you have a handle on how much copies are a perf issue? |
Copies are totally not a perf issue, which is mainly why I decided to not worry about zero-copy semantics for this stuff. It's more about nice APIs really than perf. We'd like to be able to change the data in place sometimes, and modifying the array is much nicer. The realloc stuff isn't an issue here though as the size is fixed, that's a tskit Tables API concern. |
I started messing around with zero copy stuff there and realised there's actually not much point in this case. The gains are pretty minimal and we'd have to do a lot of work to make sure that (e.g.) the user couldn't modify the position array while it was in use by the mutation generator. Taking copies is safe and simple. Having a good high-level API is more important here than a super-efficient low-level one. |
bc8e820
to
8373905
Compare
I think the thing to do here is to merge this PR as soon as possible and to follow up with more work on improving the high-level API and testing out the mutation mapping code. There's quite a big diff here, so it would be good to get it merged now so we can keep other PRs going through. I've added #920 as a starting follow-up where we add the create the high-level interval map tools and change the semantics to something more sensible. |
@jeromekelleher Sounds good to me. I wonder though if It makes sense the way
I am also a fan of nixing the last entry on the values array. I think it's no more mental overhead to make sure it's one shorter than make sure they're the same length and ends in zero.
Worth noting that the C simulator copies its |
Thanks @daniel-goldstein, I was wondering myself about an |
Wow, this is great. And: Yes!! Please make values one shorter!!! That eliminates uncertainty about which number goes with which interval, a serious source of bugs. |
|
||
fprintf(out, "interval_map (%p):: size = %d\n", (void *) self, (int) self->size); | ||
fprintf(out, "\tsequence_length = %f\n", interval_map_get_sequence_length(self)); | ||
fprintf(out, "\tindex\tposition\tvalue\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be left, right, value?
} | ||
|
||
size_t | ||
interval_map_get_size(interval_map_t *self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe num_breakpoints
instead of size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Will follow up in #920
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read over this. Looks good! We might do the mutation drawing differently if we expect a lot of small rate changes per edge, but it's good for now.
8373905
to
de191d5
Compare
OK, thanks @petrelharp. I'm going to merge this and then pick it up again in #920 to tidy up the rough edges on the interval map stuff. |
Closes #710.
@petrelharp, I started thinking about this and I realise I was over complicating things. Let's just do it the simplest reasonable way first and see how it performs after.