Skip to content

Conversation

@castedo
Copy link
Member

@castedo castedo commented Oct 27, 2020

Builds on top of #939 and final PR for #844.

Consitfies more or less everything else except haplotype_matching.h. Wasn't clear whether that header is used enough to be worth constifying. I've barely read it.

This the final PR for where a few final touches in trees.h cause a bunch of changes across many files.

@AdminBot-tskit
Copy link
Collaborator

📖 Docs for this PR can be previewed here

@castedo castedo force-pushed the pr/const_c_api branch 2 times, most recently from 610f23b to 64a6525 Compare October 27, 2020 01:20
@codecov
Copy link

codecov bot commented Oct 27, 2020

Codecov Report

Merging #940 into main will increase coverage by 0.00%.
The diff coverage is 97.29%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #940   +/-   ##
=======================================
  Coverage   93.40%   93.40%           
=======================================
  Files          25       25           
  Lines       20357    20362    +5     
  Branches      825      825           
=======================================
+ Hits        19014    19019    +5     
  Misses       1306     1306           
  Partials       37       37           
Impacted Files Coverage Δ
c/tskit/genotypes.c 94.66% <94.44%> (+0.06%) ⬆️
c/tskit/convert.c 92.04% <100.00%> (ø)
c/tskit/haplotype_matching.c 95.09% <100.00%> (ø)
c/tskit/stats.c 85.15% <100.00%> (ø)
c/tskit/tables.c 90.63% <100.00%> (ø)
c/tskit/trees.c 94.75% <100.00%> (ø)
_tskitmodule.c 90.21% <0.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd5eb19...8aaf2b2. Read the comment docs.

@castedo castedo marked this pull request as draft October 27, 2020 15:41
@castedo castedo mentioned this pull request Oct 28, 2020
@castedo castedo force-pushed the pr/const_c_api branch 2 times, most recently from 93db149 to d6e1f3a Compare November 2, 2020 21:47
@castedo
Copy link
Member Author

castedo commented Nov 2, 2020

Congrats on reaching the last of 4 commits to constify almost all of tskit! Here's some high level commentary.

Firstly, I have not tried to const literally everything in tskit. As mentioned I've completely skipped haplotype_matching.h/c. Most everything in all the other files has been constified. Everything in the "tree API" is now const correct. But surely there are some other smaller stuctures and functions which I have skipped. I was mainly working towards full constification of the tree API and was opportunistic about other code if it looked relatively easy.

So, more so than the other commits, this one touches more files even though it's less lines changed. Until now, everything I've changed I doubt actually changed what machine code is compiled in any non-trivial way. It's mostly all been 'compiler type sugar'.

But in this commit there are two exceptions.

  1. a function in test_stats.c needed a minor logic change which I don't think adversely affects the test.

  2. The tsk_vargen_t type in genotypes.h has a variable change and tsk_vargen_init() in genotypes.c is implemented differently.

The old code has bool sample_index_map_allocated to track whether tsk_id_t *sample_index_map points to the one inside the tree sequence or whether it's a copy of the alternative sample index map that is optionally passed to tsk_vargen_init. With const correct typing this is problematic because a pointer to the sample index map inside the tree sequence should be const. And trying to use tsk_safe_free with a const pointer and strict const warnings turning into errors opens a can of worms.

So I went with the solution of dropping the bool and have two separate pointers to a sample index map. One is a const pointer which always points to the sample index map to use and the other pointer is optional and is what needs to get freed if it is there. In the process of making this change, I went I ahead and split out some init logic into a separate function.

@castedo castedo marked this pull request as ready for review November 2, 2020 21:50
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks again for all the hard work here @castedo. One minor comment above.

@castedo
Copy link
Member Author

castedo commented Nov 3, 2020

All resolved. Good to merge.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @castedo!

    - stats.{h,c}
    - newick tree converter
    - tsk_vargen_t
    - kc_distance and tsk_diff_iter
@mergify mergify bot merged commit 1c6ad9c into tskit-dev:main Nov 3, 2020
@castedo
Copy link
Member Author

castedo commented Nov 3, 2020

@jeromekelleher and @benjeffery, so now that this is finally all merged in. I'm curious what worked well or not so well as far as having lots of branches?

Definitely one take away is to only have ONE branch marked as non-draft when there is a chain of dependent branches/PR.

Did this feel like too many small merges? I was tilting towards lots of small PRs as an experiment.

With a chain of changes like this, was it worth it to have the "future" branches available as GitHub PRs rather than just topic branchs that can mentioned and then "manually" diffed?

One take way I'm thinking of following is that there is never more than one draft PR among the "future" branchs. So even though there might be a logical chain of 3+ branches, one is non-draft read-to-review, the next one is draft, and all further one exist as non-PR topic branches that are in-waiting and can be mentioned/linked to, but aren't yet github PRs.

@benjeffery
Copy link
Member

I think it worked better than having one huge PR. Was managing the branches annoying? Multiple open PRs isn't an issue as long as they are ready to discuss.

@jeromekelleher
Copy link
Member

From the maintainers perspective having a sequence of PRs is infinitely better than one giant one. I didn't really pay attention to how you were managing the sequencing @castedo, I just focused on the sequentially-ready ones. This worked well for me.

@castedo
Copy link
Member Author

castedo commented Nov 11, 2020

@benjeffery

Was managing the branches annoying?

I did not find managing multiple non-rebased "topic" branches annoying. However I did find attempting to have multiple rebased pr/... branches corresponding to each topic branch, all at the same time, was too much.

Another thing I discovered is that my git-prepr script does not gracefully handle mergebot doing a force push of a pr/... branch. I'm probably going to try and make a helper script on top of git-isograft to better handle that scenario.

So my take-away from this experiment is the following:

Assuming one wants to preserve git history for a topic branch (it's easier to not), manage the topic branch with this life-cycle:

  1. create topic branch named, say, foobar (and don't bother immediately creating a pr/foobar branch)
  2. once there is reason to discuss, create draft PR and branch pr/foobar that just follows foobar (and don't bother immediately rebasing-squashing-grafting pr/foobar)
  3. once the PR is non-draft and next to be merged, then rebase-squash-graft pr/foobar, a la git-prepr
  4. use TBD helper script to clean up after mergebot does a forced push of the pr/foobar branch

That last step 6) is needed so that all other topic branches depending on foobar get handled properly with git merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants