Skip to content

Conversation

@jeromekelleher
Copy link
Member

Seemed simplest to tackle this head on, as it's complicating quite a lot of what we're doing. It should close #527 (there's a test in here to verify it), but we really should get some files in from earlier file minor versions to make sure we're doing the right thing.

Closes #528
Closes #527
Closes #506

Not ready for merging yet though, as it needs some more C test coverage and there's something weird going on in one of the tests that needs figuring out.

@benjeffery
Copy link
Member

This looks great, so much simpler to use set_columns now we're copying out of kastore

I've installed SLiM and confirmed that @petrelharp 's example now works for this branch, and understand how zero edges was happening before (think I should get a if (ret != 0) tattoo as a penance).

@petrelharp
Copy link
Contributor

Oh!! Thanks for the reminder, I think the code I am working on needs a ret !=0 or two yet.

@benjeffery
Copy link
Member

The failing test is fixed by removing TSK_NO_INIT_TABLES in b2e7536

Is there any value in that flag anymore? I can't see it used anywhere else in the code, and now that load_table is using set_columns it really needs an init'd table.

I've also formatted the C to make the CI happy in 8ae100c

@jeromekelleher
Copy link
Member Author

Awesome, thanks @benjeffery! I'll finish this up first thing tomorrow morning.

@codecov
Copy link

codecov bot commented Apr 14, 2020

Codecov Report

Merging #530 into master will increase coverage by 0.48%.
The diff coverage is 81.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #530      +/-   ##
==========================================
+ Coverage   86.84%   87.32%   +0.48%     
==========================================
  Files          21       21              
  Lines       15996    16014      +18     
  Branches     3109     3117       +8     
==========================================
+ Hits        13891    13984      +93     
+ Misses       1055     1000      -55     
+ Partials     1050     1030      -20     
Flag Coverage Δ
#c_tests 88.41% <81.44%> (+0.65%) ⬆️
#python_c_tests 90.55% <ø> (ø)
#python_tests 99.18% <ø> (ø)
Impacted Files Coverage Δ
c/tskit/tables.c 79.50% <81.25%> (+1.76%) ⬆️
c/tskit/core.c 90.54% <100.00%> (+0.06%) ⬆️
c/tskit/trees.c 90.86% <0.00%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be8c0f3...e63b9ae. Read the comment docs.

@jeromekelleher
Copy link
Member Author

Some more updates here which should push the test coverage in C right up, as well as put some infrastructure in place for making all the "x, x_offset" columns optional. I still need to add some tests for handling the input of bad offsets, but it'll be ready for final review and merge then.

@benjeffery
Copy link
Member

Great, I think making the "both columns" error explicit is a good choice.

Speaking of coverage - is it worth getting the coverage to ignore some of the lines that we can't easily hit like OOMs? I'd like to enforce 100% coverage then.

@jeromekelleher
Copy link
Member Author

Speaking of coverage - is it worth getting the coverage to ignore some of the lines that we can't easily hit like OOMs? I'd like to enforce 100% coverage then.

If there's an easy way to do this, then great. I'm wary about enforcing 100% test coverage as it leads to weird code and tests sometimes, but having more insight into the catchable errors would be good.

It is possible to test all the OOM corner cases (I've done it for kastore) but it's pretty tedious.

@jeromekelleher jeromekelleher force-pushed the copy-all-cols branch 2 times, most recently from 8582b93 to 39d4164 Compare April 14, 2020 19:48
@jeromekelleher jeromekelleher marked this pull request as draft April 14, 2020 19:54
@jeromekelleher jeromekelleher marked this pull request as ready for review April 15, 2020 08:30
@jeromekelleher
Copy link
Member Author

OK, I think this is ready to go. Can you have a look over please @benjeffery?

We probably won't be able to hit the test coverage threshold as there were a few incidental changes that have a lot of uncatchable errors, but I think we're covering the file format stuff pretty well now.

I hope you don't mind me squashing your "fixup!" commits, but I figured you wouldn't since they were marked as fixups.

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just a couple of questions/suggestions.

Comment on lines +7028 to +7060
if (ret != 0) {
goto out;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines have no effect. Is their aim to prevent an unchecked ret if code is added after at a later date? The pattern in the rest of the file is not to have these, but I can see an argument for them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you brought this up! I was thinking about this, and I think we should probably have explicit ret != 0 checks after every call even if it's at the end of the function. There's two good reasons for this:

  1. As you say, if code gets added later we won't accidentally skip the check (and it's really easy to miss this in review, as the diffs often won't show enough context to spot the problem)
  2. Also it helps us with figuring out where test coverage is missing. If we're testing error conditions on every other call except the last one, we could miss testing some important error conditions.

So, even though it's obviously stupid, I think it's the right thing to do. It might result in a tiny bit more code (but maybe not: the compiler might spot that it's redundant), but it's not important. Since you're in favour, I'm going to loop back and make the change on all the code I've just updated now.

c/tskit/tables.c Outdated
Comment on lines 7095 to 7133
/* Any read errors will have already happened so we ignore any errors here. */
kastore_close(&store);
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get this - in the case where there have been no read errors we're ignoring errors from kastore_close, which can still return errors, even when the mode is KAS_READ.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes. You're right, I'll update.


int TSK_WARN_UNUSED
tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t options)
tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t TSK_UNUSED(options))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the method signature kept the same for backwards compatibility? Or in case we need flags in future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both maybe, but more the latter.

ret = TSK_REQUIRED_COL_NOT_FOUND;
ret = TSK_ERR_REQUIRED_COL_NOT_FOUND;
goto out;
} else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjeffery, I changed the semantics here slightly because we were getting inconsistent behaviour when the first column in a list was optional (so, everything else was expected to have 0 rows). Since we're already setting default values for the incoming columns, it seemed best to just use those values as the indicator and not to use the lengths at all. That's why we're now checking to see if metadata schema was set, rather than passing NULL, -1 through to set_metadata_schema.

Copy link
Member Author

@jeromekelleher jeromekelleher Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like read_table_cols function though, it's too tricky. I think #537 will make things a lot simpler.

Slightly changes the error semantics when loading in malformed files to
be more consistent.
@jeromekelleher jeromekelleher merged commit ecfc5ad into tskit-dev:master Apr 16, 2020
@jeromekelleher jeromekelleher deleted the copy-all-cols branch April 16, 2020 08:53
@petrelharp
Copy link
Contributor

Fantastic! :yay:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C API] Missing error check on table_x_copy tree sequence with edges has no edges on load [C API] Copy columns out of kastore

3 participants