Speed upgrade - Refactor generate network #152

zktuong · 2022-06-07T16:02:43Z

Bug fixes and Improvements

Speed up generate_network
- pair-wise hamming distance is calculated on per clone/clonotype only if more than 1 cell is assigned to a clone/clonotype
- .distance slot is removed and is now directly stored/converted from the .graph slot.
- new options:
  - compute_layout: bool = True. If dataset is too large, generate_layout can be switched to False in which case only the networkx graph is returned. The data can still be visualised later with scirpy's plotting method (see below).
  - layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'. New default uses the ultra-fast C++ implemented sfdp_layout algorithm in graph-tools to generate final layout. sfdp stands for Scalable Force Directed Placement.
    - Minor caveat is that the repulsion is not as good - when there's a lot of singleton nodes, they don't separate well unless you some how work out which of the parameters in sfdp_layout to tweak will produce an effective separate. changing gamma alone doesn't really seem to do much.
    - The original layout can still be generated by specifying layout_method = 'mod_fr'. Requires a separate installation of graph-tool via conda (not managed by pip) as it has several C++ dependencies.
    - pytest on macos may also stall because of a different backend being called - this is solved by changing tests that calls generate_network to run last.
- added steps to reduce memory hogging.
- min_size was doing the opposite previously and this is now fixed. [BUG] min_size in generate_network #155
Speed up transfer
- Found a faster way to create the connectivity matrix.
- this also now transfer a dictionary that scirpy can use to generate the plots Using own distance matrix for clonotype_network scverse/scirpy#286
- Fix [BUG] scirpy conversion - rename productive #153
  - rename productive to productive_status.
Fix [BUG] allow manual paths for germline #154
- reorder the if-else statements.
Speed up filter_contigs
- tree construction is simplified and replaced for-loops with dictionary updates.
Speed up initialise_metadata. Dandelion should now initialise and read faster.
- Removed an unnecessary data sanitization step when loading data.
- Now load_data will rename umi_count to duplicate_count
- Speed up Query
  - tree construction is simplified and replaced for-loops with dictionary updates.
  - didn't need to use an airr validator as that slows things down.
data initialised by Dandelion will be ordered based on productive first, then followed by umi count (largest to smallest).

Breaking Changes

initialise_metadata/update_metadata/Dandelion
- For-loops to initialise the object has veen vectorized, resulting in a minor speed uprade
- This results in reduction of some columns in the .metadata which were probably bloated and not used.
  - vdj_status and vdj_status_summary removed and replaced with rearrangement_VDJ_status and rearrange_VJ_status
  - constant_status and constant_summary removed and replaced with constant_VDJ_status and constant_VJ_status.
  - productive and productive_summary combined and replaced with productive_status.
  - locus_status and locus_status_summary combined and replaced with locus_status.
  - isotype_summary replaced with isotype_status.
where there was previously unassigned or '' has been changed to :str: None in .metadata.
- Not changed to NoneType as there's quite a bit of text processing internally that gets messed up if swapped.
- No_contig will still be populated after transfer to AnnData to reflect cells with no TCR/BCR info.
deprecate use of nxviz<0.7.4
- reworked code to use the updated version at https://github.com/zktuong/nxviz/tree/custom_color_mapping_circos_nodes_and_edges

Minor changes

Rename and deprecate read_h5/write_h5. Use of read_h5ddl/write_h5ddl will be enforced in the next update.

fix issue with querier

deprecate distance

make generate network more verbose

codecov · 2022-06-07T17:15:16Z

Codecov Report

Merging #152 (8a20407) into master (5dbd1ab) will increase coverage by 5.61%.
The diff coverage is 85.79%.

@@            Coverage Diff             @@
##           master     #152      +/-   ##
==========================================
+ Coverage   73.41%   79.03%   +5.61%     
==========================================
  Files          22       44      +22     
  Lines        5748     7245    +1497     
==========================================
+ Hits         4220     5726    +1506     
+ Misses       1528     1519       -9

Flag	Coverage Δ
unittests	`79.03% <85.79%> (+5.61%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
dandelion/logging/_badge.py	`100.00% <ø> (ø)`
dandelion/preprocessing/_preprocessing.py	`63.87% <ø> (-0.85%)`	⬇️
tests/fixtures/fixtures.py	`100.00% <ø> (ø)`
tests/fixtures/fixtures_mouse.py	`100.00% <ø> (ø)`
dandelion/tools/_gini.py	`87.50% <60.00%> (+0.54%)`	⬆️
dandelion/plotting/_plotting.py	`63.77% <64.39%> (-1.58%)`	⬇️
dandelion/preprocessing/external/_preprocessing.py	`65.28% <71.00%> (+0.14%)`	⬆️
dandelion/tools/_network.py	`68.73% <74.81%> (+2.87%)`	⬆️
dandelion/logging/_metadata.py	`80.00% <75.00%> (+5.00%)`	⬆️
dandelion/tools/_tools.py	`81.15% <75.00%> (-1.02%)`	⬇️
... and 58 more

try and free up memory

review-notebook-app · 2022-06-07T18:41:33Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

does this work?

formatting

fix typo

faster way to build the matrix

simplify the membership construction

zktuong added 11 commits June 6, 2022 10:21

clean up verbosity steps

7de3992

Update _core.py

cbe96a3

fix issue with querier

add deprecation

96adaab

deprecate distance

7f822c8

Update _tools.py

b5bc2b6

deprecate distance

speed up generate network and transfer

9aaf12c

Update _network.py

8bfcfe6

oops

163b642

Update _network.py

9361481

make generate network more verbose

Update _network.py

cb4c638

Update _network.py

05e0ae5

zktuong added 3 commits June 7, 2022 18:25

small bug

6f69396

Update _network.py

c6eb5d5

Update _network.py

e65a3d5

try and free up memory

zktuong marked this pull request as draft June 7, 2022 17:38

zktuong closed this Jun 7, 2022

zktuong added 2 commits June 7, 2022 18:48

update documentations

ab4ba03

Update _network.py

2e2bd6e

zktuong reopened this Jun 7, 2022

zktuong added 6 commits June 7, 2022 22:32

Update _network.py

cf38651

update tests

8290a1f

Update test_plotting.py

152bf3b

Update test_tools_light.py

1a3f84e

Update _io.py

ed3fe00

update note books

47890ca

zktuong linked an issue Jun 9, 2022 that may be closed by this pull request

[BUG] scirpy conversion - rename productive #153

Closed

zktuong added 2 commits June 9, 2022 09:53

add check for transfer = True

5de8cfa

fix #154

7920c1d

zktuong added 28 commits June 16, 2022 18:44

Update environment.yml

b3eb802

Update requirements.txt

846ea14

does this work?

Update requirements.txt

8e9ef3b

Update _plotting.py

1e9170d

Update _tools.py

c93b9e1

Update _plotting.py

9deb4cb

Update test_mouse_plotting.py

d0e523f

Update _tools.py

b20da1b

formatting

Update _preprocessing.py

7298a68

move fixtures

cbb4e4b

Update fixtures.py

9797c46

formatting

let's see if this works

81b7454

black code style check

895a9b6

fix doctstring and typos

1dc5aee

Update test_rpy2.py

83d466b

Update _plotting.py

6489029

fix typo

check doc strings again

686a54d

fix docstrings

7092b96

Update test_filter.py

09e0221

Update test_preprocessing.py

8a67b5c

Update test_zz_mouse3.py

2cea779

Update test_zz_mouse3.py

2eb0e2a

Update _diversity.py

9c4f154

Update _network.py

343cc8b

faster way to build the matrix

Update _network.py

b6ffb35

simplify the membership construction

small bug with the recent speed upgrade

8693405

Update _utilities.py

2d12936

Create test_load_data.py

8a20407

zktuong merged commit 7f38a03 into master Jun 22, 2022

zktuong deleted the refactor_generate_network branch June 22, 2022 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed upgrade - Refactor generate network #152

Speed upgrade - Refactor generate network #152

zktuong commented Jun 7, 2022 •

edited

codecov bot commented Jun 7, 2022 •

edited

review-notebook-app bot commented Jun 7, 2022

Speed upgrade - Refactor generate network #152

Speed upgrade - Refactor generate network #152

Conversation

zktuong commented Jun 7, 2022 • edited

Bug fixes and Improvements

Breaking Changes

Minor changes

codecov bot commented Jun 7, 2022 • edited

Codecov Report

review-notebook-app bot commented Jun 7, 2022

zktuong commented Jun 7, 2022 •

edited

codecov bot commented Jun 7, 2022 •

edited