Skip to content

Conversation

@benjeffery
Copy link
Member

Fixes #463

WIP, getting some odd, unrelated test failures. Trying on CI.

@jeromekelleher
Copy link
Member

Might be worth doing a check perf check to see what promoting up to 32 bit before going through the painful stuff? Where should be a quick s/i8/i32 change you could do globally that would be easy.

@benjeffery
Copy link
Member Author

Might be worth doing a check perf check to see what promoting up to 32 bit before going through the painful stuff? Where should be a quick s/i8/i32 change you could do globally that would be easy.

In tracing all the types we also have to change to 32 in map_mutations so doing the lot and then testing that and genotypes for perf.

@benjeffery benjeffery force-pushed the 32bit_genotype branch 5 times, most recently from acb15c8 to 5b501e6 Compare February 3, 2022 17:51
@benjeffery
Copy link
Member Author

@petrelharp @bhaller @molpopgen This PR converts all internal genotype handline to 32bit from 8bit (some bits had optional 32bit which has been removed) Initial testing shows this has no effect on perf, at least when iterating through variants to get the genotypes from python. I still need to do some C tests.
The main effect is that people using ts.genotype_matrix will be using 4x ram, however, generally, this is advised against anyway.
What are your thoughts here? Do you foresee any issue with this change?

@molpopgen
Copy link
Member

I'm fine with this in principle. I'm still using a lot of 8-bit code on my end, but those code paths are separate.

@bhaller
Copy link

bhaller commented Feb 3, 2022

If I understand what's being proposed, this should be fine. We only use vargen_t for crosschecking the tskit data structures against SLiM (i.e., in debug mode) and when loading a .trees file from disk, I believe. In all places where we currently call tsk_vargen_init() we pass TSK_16_BIT_GENOTYPES; we don't use 8-bit. Is 16-bit still going to be supported, or is it also going away in favor of 32-bit? Is the TSK_16_BIT_GENOTYPES flag going to be deprecated, or honored, or just ignored? Anyhow, I could dig into the diffs here to answer my own questions, but this really ought to be no problem for us, if I'm not totally misunderstanding what is at issue.

@benjeffery
Copy link
Member Author

Thanks for the questions @bhaller - under the current proposal the 16bit option will be removed and the flag will be ignored, hence why we want to get this sorted for C API 1.0.

@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #2108 (ad1e017) into main (2e21a1b) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2108      +/-   ##
==========================================
- Coverage   93.37%   93.33%   -0.05%     
==========================================
  Files          27       27              
  Lines       25591    25538      -53     
  Branches     1163     1163              
==========================================
- Hits        23895    23835      -60     
- Misses       1666     1673       +7     
  Partials       30       30              
Flag Coverage Δ
c-tests 92.26% <100.00%> (-0.08%) ⬇️
lwt-tests 89.05% <ø> (ø)
python-c-tests 72.09% <100.00%> (-0.02%) ⬇️
python-tests 98.89% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/tables.c 90.27% <ø> (ø)
c/tskit/genotypes.c 89.74% <100.00%> (-2.84%) ⬇️
c/tskit/haplotype_matching.c 95.07% <100.00%> (ø)
c/tskit/trees.c 94.97% <100.00%> (ø)
python/_tskitmodule.c 90.92% <100.00%> (-0.04%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e21a1b...ad1e017. Read the comment docs.

@benjeffery
Copy link
Member Author

benjeffery commented Feb 4, 2022

Some testing:
Screenshot from 2022-02-04 10-59-47

Python:

for v in ts.variants():
    v.genotypes

This branch: 129s
Main: 134s

C:

    ret = tsk_treeseq_load(&ts, "perf.ts", 0);
    CU_ASSERT_EQUAL_FATAL(ret, 0);
    ret = tsk_vargen_init(&vargen, &ts, NULL, 0, NULL, 0);
    CU_ASSERT_EQUAL_FATAL(ret, 0);
    while (tsk_vargen_next(&vargen, &var) == 1){
    };

This branch: 312s
Main: 312s

What I don't understand is why the python is so much faster. Must be a mistake right? Looking into it.
EDIT:
Compile flags - running again

@benjeffery
Copy link
Member Author

Ok think I have sorted my issues out.
Looks like the 32bit genotypes branch is faster by 2-3%.

@benjeffery
Copy link
Member Author

No noticeable impact on ram, as you would expect for iterating. Just needs a changelog.

@benjeffery benjeffery marked this pull request as ready for review February 4, 2022 13:08
@bhaller
Copy link

bhaller commented Feb 4, 2022

Thanks for the questions @bhaller - under the current proposal the 16bit option will be removed and the flag will be ignored, hence why we want to get this sorted for C API 1.0.

OK, good to know. Full steam ahead; will be a trivial fix when we decide to merge the next tskit version, I think. Thanks.

@benjeffery
Copy link
Member Author

@jeromekelleher What sort of info did you want from perf?

Here's the summary:
main:

         43,598.48 msec task-clock                #    1.000 CPUs utilized          
               594      context-switches          #    0.014 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            32,826      page-faults               #    0.753 K/sec                  
   198,588,349,911      cycles                    #    4.555 GHz                    
   215,228,429,541      instructions              #    1.08  insn per cycle         
    19,162,813,346      branches                  #  439.529 M/sec                  
           778,793      branch-misses             #    0.00% of all branches        

      43.606464362 seconds time elapsed

      43.554717000 seconds user
       0.044002000 seconds sys

branch:

         45,034.81 msec task-clock                #    1.000 CPUs utilized          
               130      context-switches          #    0.003 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
            33,219      page-faults               #    0.738 K/sec                  
   205,385,182,502      cycles                    #    4.561 GHz                    
   234,245,790,040      instructions              #    1.14  insn per cycle         
    19,162,553,042      branches                  #  425.505 M/sec                  
           795,300      branch-misses             #    0.00% of all branches        

      45.036363412 seconds time elapsed

      44.991126000 seconds user
       0.043999000 seconds sys

The flame graph from both is almost 100% tsk_variant_update_site

Screenshot from 2022-02-07 12-46-59

@jeromekelleher
Copy link
Member

Looks great! The first ~10 lines of the perf report screen for both is the most informative I think.

@benjeffery
Copy link
Member Author

Cool, here's that for main:

# Overhead  Command  Shared Object      Symbol                                             
# ........  .......  .................  ...................................................
#
    65.86%  a.out    a.out              [.] tsk_variant_update_site
    33.69%  a.out    a.out              [.] tsk_variant_update_genotypes_i8_sample_list
     0.05%  a.out    a.out              [.] tsk_tree_update_sample_lists
     0.03%  a.out    a.out              [.] tsk_tree_insert_edge
     0.02%  a.out    a.out              [.] tsk_tree_insert_branch
     0.02%  a.out    a.out              [.] tsk_tree_remove_branch
     0.02%  a.out    a.out              [.] tsk_tree_remove_edge
     0.02%  a.out    a.out              [.] check_offsets
     0.01%  a.out    libc-2.31.so       [.] __memmove_avx_unaligned_erms
     0.01%  a.out    a.out              [.] tsk_table_collection_check_edge_integrity

and this branch:

# Overhead  Command  Shared Object      Symbol                                             
# ........  .......  .................  ...................................................
#
    63.06%  a.out    a.out              [.] tsk_variant_update_site
    36.32%  a.out    a.out              [.] tsk_variant_update_genotypes_sample_list
     0.06%  a.out    a.out              [.] tsk_tree_update_sample_lists
     0.03%  a.out    a.out              [.] tsk_tree_insert_edge
     0.02%  a.out    a.out              [.] tsk_tree_remove_branch
     0.02%  a.out    a.out              [.] tsk_tree_insert_branch
     0.02%  a.out    [kernel.kallsyms]  [k] gen8_irq_handler
     0.02%  a.out    a.out              [.] tsk_tree_advance
     0.02%  a.out    a.out              [.] check_offsets
     0.02%  a.out    a.out              [.] tsk_tree_remove_edge

Seems to be hardly any difference!

@jeromekelleher
Copy link
Member

OK, so there's a very slight increase in the time spent in tsk_variant_update_genotypes_sample_list, but it's within the range we'd expect for statistical noise.

I'm sold then, let's make the change!

@benjeffery benjeffery force-pushed the 32bit_genotype branch 2 times, most recently from e60a7f4 to 623326e Compare February 23, 2022 16:44
@benjeffery
Copy link
Member Author

@jeromekelleher Need an approving review here for the merge.

@mergify mergify bot merged commit 9955a69 into tskit-dev:main Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C API] Change vargen genotypes (and elsewhere) to int32?

4 participants