Do not use kastore memory for columns. #530

jeromekelleher · 2020-04-11T17:33:59Z

Seemed simplest to tackle this head on, as it's complicating quite a lot of what we're doing. It should close #527 (there's a test in here to verify it), but we really should get some files in from earlier file minor versions to make sure we're doing the right thing.

Closes #528
Closes #527
Closes #506

Not ready for merging yet though, as it needs some more C test coverage and there's something weird going on in one of the tests that needs figuring out.

benjeffery · 2020-04-11T22:40:42Z

This looks great, so much simpler to use set_columns now we're copying out of kastore

I've installed SLiM and confirmed that @petrelharp 's example now works for this branch, and understand how zero edges was happening before (think I should get a if (ret != 0) tattoo as a penance).

petrelharp · 2020-04-11T22:46:13Z

Oh!! Thanks for the reminder, I think the code I am working on needs a ret !=0 or two yet.

benjeffery · 2020-04-11T23:08:24Z

The failing test is fixed by removing TSK_NO_INIT_TABLES in b2e7536

Is there any value in that flag anymore? I can't see it used anywhere else in the code, and now that load_table is using set_columns it really needs an init'd table.

I've also formatted the C to make the CI happy in 8ae100c

jeromekelleher · 2020-04-13T16:21:36Z

Awesome, thanks @benjeffery! I'll finish this up first thing tomorrow morning.

codecov · 2020-04-14T11:36:43Z

Codecov Report

Merging #530 into master will increase coverage by 0.48%.
The diff coverage is 81.44%.

@@            Coverage Diff             @@
##           master     #530      +/-   ##
==========================================
+ Coverage   86.84%   87.32%   +0.48%     
==========================================
  Files          21       21              
  Lines       15996    16014      +18     
  Branches     3109     3117       +8     
==========================================
+ Hits        13891    13984      +93     
+ Misses       1055     1000      -55     
+ Partials     1050     1030      -20

Flag	Coverage Δ
#c_tests	`88.41% <81.44%> (+0.65%)`	⬆️
#python_c_tests	`90.55% <ø> (ø)`
#python_tests	`99.18% <ø> (ø)`

Impacted Files	Coverage Δ
c/tskit/tables.c	`79.50% <81.25%> (+1.76%)`	⬆️
c/tskit/core.c	`90.54% <100.00%> (+0.06%)`	⬆️
c/tskit/trees.c	`90.86% <0.00%> (+0.11%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be8c0f3...e63b9ae. Read the comment docs.

jeromekelleher · 2020-04-14T17:01:39Z

Some more updates here which should push the test coverage in C right up, as well as put some infrastructure in place for making all the "x, x_offset" columns optional. I still need to add some tests for handling the input of bad offsets, but it'll be ready for final review and merge then.

benjeffery · 2020-04-14T17:10:38Z

Great, I think making the "both columns" error explicit is a good choice.

Speaking of coverage - is it worth getting the coverage to ignore some of the lines that we can't easily hit like OOMs? I'd like to enforce 100% coverage then.

jeromekelleher · 2020-04-14T17:38:57Z

Speaking of coverage - is it worth getting the coverage to ignore some of the lines that we can't easily hit like OOMs? I'd like to enforce 100% coverage then.

If there's an easy way to do this, then great. I'm wary about enforcing 100% test coverage as it leads to weird code and tests sometimes, but having more insight into the catchable errors would be good.

It is possible to test all the OOM corner cases (I've done it for kastore) but it's pretty tedious.

c/tskit/tables.c

jeromekelleher · 2020-04-15T09:08:07Z

OK, I think this is ready to go. Can you have a look over please @benjeffery?

We probably won't be able to hit the test coverage threshold as there were a few incidental changes that have a lot of uncatchable errors, but I think we're covering the file format stuff pretty well now.

I hope you don't mind me squashing your "fixup!" commits, but I figured you wouldn't since they were marked as fixups.

benjeffery

Awesome! Just a couple of questions/suggestions.

benjeffery · 2020-04-16T00:04:17Z

c/tskit/tables.c

+        if (ret != 0) {
+            goto out;
+        }


These lines have no effect. Is their aim to prevent an unchecked ret if code is added after at a later date? The pattern in the rest of the file is not to have these, but I can see an argument for them.

I'm glad you brought this up! I was thinking about this, and I think we should probably have explicit ret != 0 checks after every call even if it's at the end of the function. There's two good reasons for this:

As you say, if code gets added later we won't accidentally skip the check (and it's really easy to miss this in review, as the diffs often won't show enough context to spot the problem)

Also it helps us with figuring out where test coverage is missing. If we're testing error conditions on every other call except the last one, we could miss testing some important error conditions.

So, even though it's obviously stupid, I think it's the right thing to do. It might result in a tiny bit more code (but maybe not: the compiler might spot that it's redundant), but it's not important. Since you're in favour, I'm going to loop back and make the change on all the code I've just updated now.

c/tskit/tables.c

benjeffery · 2020-04-16T00:15:17Z

c/tskit/tables.c

+    /* Any read errors will have already happened so we ignore any errors here. */
+    kastore_close(&store);
    return ret;


I don't quite get this - in the case where there have been no read errors we're ignoring errors from kastore_close, which can still return errors, even when the mode is KAS_READ.

Hmm, yes. You're right, I'll update.

c/tskit/tables.c

benjeffery · 2020-04-16T00:47:24Z

c/tskit/tables.c


 int TSK_WARN_UNUSED
-tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t options)
+tsk_table_collection_init(tsk_table_collection_t *self, tsk_flags_t TSK_UNUSED(options))


Is the method signature kept the same for backwards compatibility? Or in case we need flags in future?

Both maybe, but more the latter.

Closes tskit-dev#536 Closes tskit-dev#528 Closes tskit-dev#527 Closes tskit-dev#506

jeromekelleher · 2020-04-16T08:15:12Z

c/tskit/tables.c

-            ret = TSK_REQUIRED_COL_NOT_FOUND;
+            ret = TSK_ERR_REQUIRED_COL_NOT_FOUND;
            goto out;
-        } else {


@benjeffery, I changed the semantics here slightly because we were getting inconsistent behaviour when the first column in a list was optional (so, everything else was expected to have 0 rows). Since we're already setting default values for the incoming columns, it seemed best to just use those values as the indicator and not to use the lengths at all. That's why we're now checking to see if metadata schema was set, rather than passing NULL, -1 through to set_metadata_schema.

I don't like read_table_cols function though, it's too tricky. I think #537 will make things a lot simpler.

Slightly changes the error semantics when loading in malformed files to be more consistent.

petrelharp · 2020-04-16T13:36:34Z

Fantastic! :yay:

jeromekelleher requested a review from benjeffery April 11, 2020 17:36

jeromekelleher force-pushed the copy-all-cols branch from 06aa379 to f46e920 Compare April 14, 2020 16:59

jeromekelleher force-pushed the copy-all-cols branch 2 times, most recently from 8582b93 to 39d4164 Compare April 14, 2020 19:48

jeromekelleher commented Apr 14, 2020

View reviewed changes

c/tskit/tables.c Outdated Show resolved Hide resolved

jeromekelleher marked this pull request as draft April 14, 2020 19:54

jeromekelleher marked this pull request as ready for review April 15, 2020 08:30

jeromekelleher force-pushed the copy-all-cols branch from d2daf56 to c242f31 Compare April 15, 2020 09:05

jeromekelleher mentioned this pull request Apr 15, 2020

[File format] Make all ragged column pairs optional on load #537

Closed

benjeffery reviewed Apr 16, 2020

View reviewed changes

Do not use kastore memory for columns.

a4a7220

Closes tskit-dev#536 Closes tskit-dev#528 Closes tskit-dev#527 Closes tskit-dev#506

jeromekelleher force-pushed the copy-all-cols branch from d08b9ff to d482151 Compare April 16, 2020 08:10

jeromekelleher commented Apr 16, 2020

View reviewed changes

Split file format tests to new file.

e63b9ae

Slightly changes the error semantics when loading in malformed files to be more consistent.

jeromekelleher force-pushed the copy-all-cols branch from d482151 to e63b9ae Compare April 16, 2020 08:25

jeromekelleher merged commit ecfc5ad into tskit-dev:master Apr 16, 2020

jeromekelleher deleted the copy-all-cols branch April 16, 2020 08:53

Do not use kastore memory for columns. #530

Do not use kastore memory for columns. #530

Uh oh!

Conversation

jeromekelleher commented Apr 11, 2020

Uh oh!

benjeffery commented Apr 11, 2020

Uh oh!

petrelharp commented Apr 11, 2020

Uh oh!

benjeffery commented Apr 11, 2020

Uh oh!

jeromekelleher commented Apr 13, 2020

Uh oh!

codecov bot commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Apr 14, 2020

Uh oh!

benjeffery commented Apr 14, 2020

Uh oh!

jeromekelleher commented Apr 14, 2020

Uh oh!

Uh oh!

jeromekelleher commented Apr 15, 2020

Uh oh!

benjeffery left a comment

Choose a reason for hiding this comment

Uh oh!

benjeffery Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benjeffery Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

benjeffery Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

petrelharp Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Apr 16, 2020

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petrelharp commented Apr 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Apr 14, 2020 •

edited

Loading

jeromekelleher Apr 16, 2020 •

edited

Loading