Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation/tutorial #42

Closed
wants to merge 32 commits into from
Closed

Documentation/tutorial #42

wants to merge 32 commits into from

Conversation

zoepiran
Copy link
Collaborator

@zoepiran zoepiran commented Oct 23, 2021

Hi @Marius1311,
added the tutorials - hope it follows your line of thought.

  • 'FGW-regimes'; wasn't sure how much more to get into it.
  • simulated trajectory;
    (a) showing only the 'true tree' setting - I think it passes the point and this way we can refrain from forcing users only going through tutorial to install Cassipeia (and Gurobi etc.).
    (b) thought it is useful and understandable to use the benchmarking functions (rather than breaking them line by line) hence some of the explanations are maybe not as detailed.

Marius1311 and others added 18 commits August 23, 2021 12:44
* Add basic config files

* Add basic docs

* Add linting CI action
* Add alpha version

* Fix FusedGromow

* [ci skip] Fix FGW

* Fix FGW v2

* Polish a few things

* Add tests

* Add tests, CI included

* Add notebook comparison with gt FGW

* Add docs

* Add numpy to requirements

* [ci skip] Skip 2 jobs to reduce credit usage

* Address comments

* Rename package

* Fix not passing epsilon, add novosparc
* Update the Readme

* [ci skip] Add logo

Co-authored-by: Michal Klein <michal.klein@protonmail.com>
* Print losses as from POT in FGW

* Print also tau
* Fix tau, remove novosparc option

* Remove artifact
* Fix tau, remove novosparc option

* Fix eps/None, update logging, jit=True

* Parametrize test by jitting

* Fix jitting sinkhorn instead GW

* Try fixing test

* Fix not passing kwargs to test
* Fix tau, remove novosparc option

* Fix eps/None, update logging, jit=True

* Add converged property

* Parametrize test by jitting

* Use rtol/atol for FGW convergence, rename n_iters

* Fix typo in test
* Add first comparison of GW with LineageOT

* Add benchmarking notebook

* Update bench notebook

* Add bench notebooks, plotting notebook

* Add first version of tedsim

* Add C.elegans notebook

* Update LOT bench notebook

* Update notebooks

* Remove old plotting notebook

* Fix Geometry, add POT impl.

* Change loops

* Fix typo

* Try out Cassiopeia

* Add LOT barcode notebook plot

* Add LOT barcode notebook plot fix

* Add fitted tree distances notebook

* Fix not propagating eps to regOT

* Create new Geometry

* Fix test
* Add FGW term scaling

* Remove redundant scale passing
* Add random init for FGW

* Fix typing for 3.7
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@zoepiran zoepiran marked this pull request as draft October 23, 2021 18:59
zoepiran and others added 7 commits October 24, 2021 12:31
* Add warning if not converged

* Add check for marginals

* Fix tests

* Update error msg
* Add first comparison of GW with LineageOT

* Add benchmarking notebook

* Update bench notebook

* Add bench notebooks, plotting notebook

* Add first version of tedsim

* Add C.elegans notebook

* Update LOT bench notebook

* Update notebooks

* Remove old plotting notebook

* Fix Geometry, add POT impl.

* Change loops

* Fix typo

* Try out Cassiopeia

* Add LOT barcode notebook plot

* Add LOT barcode notebook plot fix

* Add fitted tree distances notebook

* Add LOT/cassiopeia comparison

* Update LOT bench notebook

* Fix typo

* Update C. elegans notebook v2

* Add convergence notebook

* Update convergence notebook

* Update convergence notebook v2

* Add convergence bifurcation plot

* Add C. elegans plot notebook

* Add C. elegans UMAP

* Use color scaling as in LineageOT

* Update convergence notebooks

* Add working TedSim notebook

* TedSim generate datasets

* Update TedSim ranges

* Add LOT seeded notebok

* Add LOT seeded notebook v2

* Add forgotten scale

* Add C. elegans CellRank skeleton notebook

* Run c elegans analyis

* Add adata creation notebook

* Fix name

Co-authored-by: Marius Lange <marius.lange@t-online.de>
* Add warning if not converged

* Add check for marginals

* Fix tests

* Update error msg

* Fix jitting, FusedGW not passing kwargs
@@ -0,0 +1,509 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also enable 64-bit and add a sentence below why we do it (with 32-bit, sometimes the precision wasn't enough).


Reply via ReviewNB

@@ -0,0 +1,509 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.    module_path = os.path.abspath(os.path.join('..'))

The local import seems unnecessary in this notebook, would remove it if that's the case.


Reply via ReviewNB

@@ -0,0 +1,509 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typos + maintain consistency (would start sentences capitalized + end with .)


Reply via ReviewNB

@@ -0,0 +1,509 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some closing remarks.


Reply via ReviewNB

@@ -0,0 +1,715 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add link to the paper.

Furthermore, I'd format both notebooks using black (there should be some extension, can't remember its name).


Reply via ReviewNB

@@ -0,0 +1,715 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would use flow_types instead of integer indices (just like in Aden's notebook)


Reply via ReviewNB

@@ -0,0 +1,715 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would remove this cell and run below benchmark_moscot(sim, alpha=0, epsilon=epsilon) - just adds extra cell with little info.

Also applied for the other alphas (1, and 0.9).


Reply via ReviewNB

@@ -0,0 +1,715 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would all params (in this case scale) to one place (e.g. where you define epsilon


Reply via ReviewNB

@@ -0,0 +1,715 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh, not sure how informative this is, because of overplotting/the data.

To my eyes, OT, LOT and us look the same.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in terms of comparison; hence added the cost but thought it was beneficial to show how one can visualize easily the obtained mappings/. Am open for suggestions for better visuals here.

@@ -0,0 +1,715 @@
{
Copy link
Collaborator

@michalk8 michalk8 Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add some closing remark, summarizing what we've done/learned/...


Reply via ReviewNB

@Marius1311
Copy link
Collaborator

Thanks for the work on this @michalk8 and @zoepiran, and let me know once I should take a look

@zoepiran
Copy link
Collaborator Author

The current version has a after incorporation of @michalk8 comments

@Marius1311
Copy link
Collaborator

Marius1311 commented Nov 15, 2021

Are our results in the simple 1D example correct? In the original POT example, the selling point was that FGW is a best-of-both-worlds scenario: it preserves feature relations while also enforcing structural correspondence (Fig. 2). But that's not what we are showing (Fig. 1 below) - in our results, FGW looks just super similar to GW and the user if left wondering what the added benefit really is. We need to make this example more convincing. Also, at the moment, it's a bit confusing because x and y have different meanings throughout the notebook - they're at the same time the corresponding feature spaces for source and target, while also the x and y axis in the plots we show. I know this is also the case in the original POT tutorial, but I also find this confusing there. We should just improve the naming a bit. Also, I don't get why the features have two different names (a and b) - isn't the whole story here that features can be compared across spaces? So it should be the same feature we're looking at, no? On the other hand, the structural coordinates should have different names, maybe just not x and y, because that get's confusing. Also, when we shot the final couplings, it's a bit hard to relate this to what we had earlier because suddenly, there are i and j, which haven't been introduced before.

Fig. 1 (ours)

Screenshot 2021-11-15 at 19 32 40

Fig. 2 (POT)

Screenshot 2021-11-15 at 19 33 56

@Marius1311
Copy link
Collaborator

For a tutorial to be convincing, it just has to be super obvious - at the moment, we leave a lot of work on the users side to try and figure out what we're trying to sell here. We need to make this much clearer by improving nomenclature, descriptions and basically making sure our results also show a best-of-both-worlds scenario. Also, most users that come won't really understand what these two "extremal regimes" are, we need to add some explanations.

@Marius1311
Copy link
Collaborator

Also we need to give credit to the original POT example, not just to Vayer.

@Marius1311
Copy link
Collaborator

I did some minor edits on the text, but this needs some more work.

@zoepiran
Copy link
Collaborator Author

zoepiran commented Nov 15, 2021

Are our results in the simple 1D example correct? In the original POT example, the selling point was that FGW is a best-of-both-worlds scenario: it preserves feature relations while also enforcing structural correspondence (Fig. 2). But that's not what we are showing (Fig. 1 below) - in our results, FGW looks just super similar to GW and the user if left wondering what the added benefit really is. We need to make this example more convincing. Also, at the moment, it's a bit confusing because x and y have different meanings throughout the notebook - they're at the same time the corresponding feature spaces for source and target, while also the x and y axis in the plots we show. I know this is also the case in the original POT tutorial, but I also find this confusing there. We should just improve the naming a bit. Also, I don't get why the features have two different names (a and b) - isn't the whole story here that features can be compared across spaces? So it should be the same feature we're looking at, no? On the other hand, the structural coordinates should have different names, maybe just not x and y, because that get's confusing. Also, when we shot the final couplings, it's a bit hard to relate this to what we had earlier because suddenly, there are i and j, which haven't been introduced before.

Fig. 1 (ours)

Screenshot 2021-11-15 at 19 32 40

Fig. 2 (POT)

Screenshot 2021-11-15 at 19 33 56

I briefly mentioned this in our previous meeting - I was also concerned with this and it got me to think again about the scaling again ... however in the POT example \alpha = 1e-3. Its my mistake i didn;t add this option here as well.
This is actually an interesting point to discuss with Cuturi - I would love to hear his take on scaling the costs and having \alpha as an actual accountable weight.

Digging more into it there are two ways to go:
[1] showing the results w/ \alpha = 1e-3 (reproducing POT)
image

[2] using our scale_fn leading to:
image

@Marius1311
Copy link
Collaborator

Thanks Zoe! Using our scale_fn, can we also recover a coupling similar to what we get with GW? i.e., is there a value of \alpha where FGW "flips"?

@Marius1311
Copy link
Collaborator

Other than that, did you have a chance to incorporate my suggestions yet? Let me know once you do, happy to go over this again.

@zoepiran
Copy link
Collaborator Author

For a tutorial to be convincing, it just has to be super obvious - at the moment, we leave a lot of work on the users side to try and figure out what we're trying to sell here. We need to make this much clearer by improving nomenclature, descriptions and basically making sure our results also show a best-of-both-worlds scenario. Also, most users that come won't really understand what these two "extremal regimes" are, we need to add some explanations.

I am afraid my main problem is that its not super clear to me what do we add on POT's tutorial here ..

@zoepiran
Copy link
Collaborator Author

I am sure there is; will manually look for it

@Marius1311 Marius1311 added the documentation Improvements or additions to documentation label Jan 24, 2022
@michalk8 michalk8 changed the base branch from dev_old to dev January 31, 2022 22:27
@michalk8 michalk8 closed this Jan 31, 2022
@michalk8
Copy link
Collaborator

@zoepiran I had to close this PR to delete an old branch (old_dev).

@michalk8 michalk8 mentioned this pull request Jan 31, 2022
@michalk8 michalk8 deleted the documentation/tutorial branch March 17, 2022 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants