-
Notifications
You must be signed in to change notification settings - Fork 78
Add metadata column to edges #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1feefb9 to
0098fe7
Compare
Codecov Report
@@ Coverage Diff @@
## master #496 +/- ##
==========================================
- Coverage 87.23% 87.08% -0.16%
==========================================
Files 21 21
Lines 15420 15597 +177
Branches 2999 3036 +37
==========================================
+ Hits 13452 13583 +131
- Misses 975 1006 +31
- Partials 993 1008 +15
Continue to review full report at Codecov.
|
|
@jeromekelleher This is all ready for you to have a gander. |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @benjeffery. I know how tedious all that copy-pasting is...
One high-level point: do we want to increment FILE_VERSION_MAJOR for this? If all we were doing before the next release was adding edge metadata, I would say we should just increment the minor version and conditionally read in edge metadata if it's in the file. I think we want to get more flexible about having optional columns in the file format with well-defined defaults at some point anyway, so maybe now is the time to grasp that nettle. From a space perspective this will be important anyway, as adding an extra uint32 for each edge will be significant for things like UKB.
We could put in a special case to deal with edge metadata in this PR, and then think about a general mechanism for optional columns and defaults later. I think I've convinced myself that we do need to make this optional all right --- it's quite disruptive asking users to upgrade their files and the space argument is a clincher. Happy to be convinced otherwise though.
One other thing: when mucking with the low-level C module it's a good idea to run the stress_lowlevel.py script to make sure there are no memory leaks. I find just running the low-level module tests is enough,
python3 stress_lowlevel.py -m lowlevel
This should get the iteration count up pretty quickly. A good way to build up a feel for how this should look is to run it first for say 100 iterations. Then comment out one of your XDECREFs, run it again and see what happens.
Wasn't too bad as in doing it I got a good tour through the layers.
We need to at some point draw a line and from that point forward do everything we can to be backward compatible. We talked about this being 1.0 which seems to make sense as long as it lines up with how things are being used on the ground. If you think a breaking change would cause a lot of suffering already then we should start being backward compatible now. I think optional columns and defaults is a separate issue (although linked in design considerations). Certainly one to consider for metadata on all tables. I'll have a play and see what this might look like for edges in the C. |
|
I was thinking about the optional columns thing this morning, and if we align it with the semantics of Anyway, I don't think we should do that here. We can do something like tsk_table_collection_load_indexes for the edge table. In my mind everything we're adding to the file format in terms of metadata is optional, so I'd like to keep the file format version major the same. I've no problem bumping the version number if we do find something we need to change that'll break old code, though. |
0098fe7 to
8ec7142
Compare
|
@jeromekelleher So I've had a go at making metadata an optional column for edges. Basically the I have only done the C side of things - wanted to get your thoughts before going further. It is probably easier to see these specific changes by just looking at 8ec7142. EDIT: Dammit! Just noticed the one spot I didn't put a new line between declarations and code ;) |
|
Looks good, thanks @benjeffery. I think we should simplify it a bit on this first pass though, and follow the same logic as the rest of the tables. Metadata/metadata_offset are never NULL, as we set them to default values if they are not specified. If we do the same thing when we read the data in from file, this should minimise the changes. It would be simpler to just do something specific for the edge_table_t (see the bit where we're reading in the indexes, e.g) rather than making a generic optional/non-optional mechanism. However, there's no point in taking out the changes you've made to the rest of the columns as we'll need them soon. Figuring out when we do and don't write out columns is something we can follow up on. I don't think storing NULL is the way to do it though, we should always have valid memory for each of the columns in each table as part of the API contract. (For metadata, we can do something simple like |
8ec7142 to
5ef26ea
Compare
|
@jeromekelleher Ok, so this latest push creates an empty |
5ef26ea to
599b51b
Compare
Yes, that's the idea. It's probably more efficient to go through this in person though, let's talk it through later. |
d389c12 to
d5b5b12
Compare
|
@jeromekelleher I've added the |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @benjeffery. One nitpick about indentation and we're good to merge.
Let's do the migrations as a follow-up PR - this is big enough already.
|
I'm assuming that codecov is being pedantic also, and we've covered everything that we reasonably can? |
f7adb73 to
fa6c865
Compare
|
Thought I'd fixed that indent! Must have run |
Part of the work in #493.
I realise I'm now three levels deep on PR requests, but at ~600 lines this seemed to warrant its own PR....