Optimization with Huge Speedup: Dask Dataframes Instead of Manual Dataframe Iterations#240
Optimization with Huge Speedup: Dask Dataframes Instead of Manual Dataframe Iterations#240selmanozleyen merged 127 commits intomainfrom
Conversation
for more information, see https://pre-commit.ci
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
commit 56888eb Merge: d169628 9068a93 Author: Selman Özleyen <32667648+selmanozleyen@users.noreply.github.com> Date: Fri Jun 13 12:03:07 2025 +0200 Merge pull request #235 from theislab/feature/predict_batch Feature and Speedup: `predict_batch` commit d169628 Merge: f6cdf17 801adf0 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri Jun 13 11:28:18 2025 +0200 Merge pull request #239 from theislab/feature/ooc-dataloading Feature: Out of Core Dataloading Option commit 801adf0 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 15:31:53 2025 +0200 add return types commit 6d44054 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 13:56:48 2025 +0200 update error message commit 3bda128 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 13:56:20 2025 +0200 document better commit de3b6ee Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Jun 12 11:51:03 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 8f18574 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 13:50:55 2025 +0200 remove unused fixture commit 8e808a9 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Jun 12 11:17:33 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 67e64e1 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 13:17:22 2025 +0200 add tests commit ab95081 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 12:08:48 2025 +0200 throw error when there isn't any valid indices commit 9068a93 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu Jun 12 09:38:43 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit d65151a Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 11:38:34 2025 +0200 When using PredictionSampler we can't use batched mode commit bb941ea Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu Jun 12 11:33:45 2025 +0200 clarify documentation that this requires same number of cells commit 7461dc3 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Sun Jun 1 16:07:15 2025 +0200 Fix type error commit 04566e7 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Tue May 27 17:11:02 2025 +0200 Replace tree map commit 4a9bbcd Merge: 855c5d9 f6cdf17 Author: LeonStadelmann <169152764+LeonStadelmann@users.noreply.github.com> Date: Tue May 27 16:58:19 2025 +0200 Merge branch 'main' into feature/predict_batch commit f6cdf17 Merge: b4b7c96 70766f5 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Tue May 27 09:10:57 2025 +0200 Merge pull request #246 from theislab/pre-commit-ci-update-config [pre-commit.ci] pre-commit autoupdate commit b4b7c96 Merge: 7a69c5d b67804f Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Tue May 27 09:10:36 2025 +0200 Merge pull request #245 from theislab/feature/stochastic_to_cellflow_class pass rng from cellflow.predict commit 70766f5 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon May 26 18:56:02 2025 +0000 [pre-commit.ci] pre-commit autoupdate updates: - [github.com/biomejs/pre-commit: v2.0.0-beta.4 → v2.0.0-beta.5](biomejs/pre-commit@v2.0.0-beta.4...v2.0.0-beta.5) - [github.com/tox-dev/pyproject-fmt: v2.5.1 → v2.6.0](tox-dev/pyproject-fmt@v2.5.1...v2.6.0) - [github.com/astral-sh/ruff-pre-commit: v0.11.10 → v0.11.11](astral-sh/ruff-pre-commit@v0.11.10...v0.11.11) commit b67804f Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Mon May 26 11:03:13 2025 +0200 fix callback/ commit b9bc6c2 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Mon May 26 09:04:25 2025 +0200 pass rng from cellflow.predict commit 7a69c5d Merge: ce064e3 50a3cc8 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 17:17:03 2025 +0200 Merge pull request #238 from theislab/pre-commit-ci-update-config [pre-commit.ci] pre-commit autoupdate commit ce064e3 Merge: 6cb5540 b6c0bdb Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 17:14:53 2025 +0200 Merge pull request #215 from theislab/add/solver_callback Add solver to callback call commit 6cb5540 Merge: 0dbf90f 83db489 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 17:05:14 2025 +0200 Merge pull request #233 from theislab/tutorial/combosciplex add combosciplex commit 0dbf90f Merge: d8ae659 69327d9 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 17:04:55 2025 +0200 Merge pull request #244 from theislab/tests/find_failing2 deprecation python 3.10 commit 69327d9 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 15:42:55 2025 +0200 enable all tests commit 83db489 Merge: 63c4966 d8ae659 Author: Dominik Klein <domin.klein@gmail.com> Date: Fri May 23 15:29:53 2025 +0200 Merge branch 'main' into tutorial/combosciplex commit 63c4966 Author: Dominik Klein <domin.klein@gmail.com> Date: Fri May 23 15:22:53 2025 +0200 add combosciplex commit 855c5d9 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Fri May 23 14:59:56 2025 +0200 revert duplicate removal commit 235bc0e Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Fri May 23 14:03:43 2025 +0200 emove duplicate commit 24ade1e Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Fri May 23 14:01:31 2025 +0200 Add testing for genot predict commit 340b696 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 12:23:15 2025 +0200 only enable test_cellflow_with_validation commit 9dfcd18 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 11:50:27 2025 +0200 skip some tests again commit f481dac Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 10:38:10 2025 +0200 enable more tests again commit 6bb6550 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 10:27:57 2025 +0200 deprecate 3.10 commit 8e8d2ad Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 10:26:24 2025 +0200 deprecate 3.10 commit 11d2f55 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 10:24:35 2025 +0200 deprecate 3.10 commit 459f435 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 10:22:14 2025 +0200 enable some more tests commit 0b56b4c Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 10:05:24 2025 +0200 skip more tests commit 2e7aac9 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Fri May 23 09:48:30 2025 +0200 skip more tests commit d8ae659 Merge: 70eec9a f3c840c Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Thu May 22 16:27:36 2025 +0200 Merge pull request #241 from theislab/fix/tokenattention fix TokenAttention commit 0da9154 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Thu May 22 16:25:26 2025 +0200 skip more tests commit 7de55b3 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Thu May 22 15:06:36 2025 +0200 skip more tests commit 6e266a2 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu May 22 11:17:07 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 7840c80 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Thu May 22 13:16:42 2025 +0200 skip gene emb tests commit f3c840c Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Thu May 22 08:41:30 2025 +0200 update scvi dependency commit fd76f07 Author: selmanozleyen <syozleyen@gmail.com> Date: Wed May 21 13:15:15 2025 +0200 add genot commit e97edd2 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Wed May 21 12:41:07 2025 +0200 Handle empty input commit 454734e Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Tue May 20 19:31:39 2025 +0200 revert previous precommit changes commit 0884a8b Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Tue May 20 17:17:10 2025 +0200 fix tests commit 186bcaa Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Tue May 20 15:35:46 2025 +0200 fix tests commit c5c0494 Author: Dominik Klein <dominik.klein@helmholtz-munich.de> Date: Tue May 20 15:30:09 2025 +0200 fix TokenAttention commit 1e585cd Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue May 20 10:46:42 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit 5dc4a54 Author: selmanozleyen <syozleyen@gmail.com> Date: Tue May 20 12:45:24 2025 +0200 support both dataloaders commit dd098fa Author: selmanozleyen <syozleyen@gmail.com> Date: Tue May 20 11:21:35 2025 +0200 init commit 50a3cc8 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon May 19 18:57:43 2025 +0000 [pre-commit.ci] pre-commit autoupdate updates: - [github.com/biomejs/pre-commit: v2.0.0-beta.3 → v2.0.0-beta.4](biomejs/pre-commit@v2.0.0-beta.3...v2.0.0-beta.4) - [github.com/astral-sh/ruff-pre-commit: v0.11.9 → v0.11.10](astral-sh/ruff-pre-commit@v0.11.9...v0.11.10) commit 2ccb43d Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Sun May 18 12:33:32 2025 +0200 Add batched predict test commit 10edee2 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Sat May 17 20:11:45 2025 +0200 Adjust docs commit b6c0bdb Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Sat May 17 17:42:27 2025 +0200 fix docs commit 021aff0 Author: selman.ozleyen <syozleyen@gmail.com> Date: Thu May 15 12:42:30 2025 +0200 update implementation so that two functions predict and predict batch are unified commit 6bdbf5c Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue May 13 16:24:30 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit db6dc69 Author: selman.ozleyen <syozleyen@gmail.com> Date: Tue May 13 18:20:22 2025 +0200 needs testing commit a3c43d3 Author: selman.ozleyen <syozleyen@gmail.com> Date: Tue May 13 18:00:26 2025 +0200 condition keys left to add commit 03dfb2b Author: selman.ozleyen <syozleyen@gmail.com> Date: Tue May 13 17:50:41 2025 +0200 init commit f1e8704 Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Mon May 12 11:24:11 2025 +0000 [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci commit facf52d Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Mon May 12 13:23:52 2025 +0200 Added source data parameter + stochastic test commit 04a1d2c Author: Dominik Klein <domin.klein@gmail.com> Date: Fri May 9 13:35:02 2025 +0200 add combosciplex commit 54e2386 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Thu May 1 16:16:21 2025 +0200 Rename flow commit ac8d073 Merge: 5327a05 4970ad4 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Thu May 1 16:14:11 2025 +0200 Merge branch 'main' into add/solver_callback commit 5327a05 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Thu May 1 12:37:09 2025 +0200 Added test for custom callbacks commit e650031 Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Thu Apr 24 18:09:57 2025 +0200 Add typing and solver for on_train_end commit 5e18f6e Author: LeonStadelmann <leonstadlmnn@gmail.com> Date: Mon Apr 21 16:57:33 2025 +0200 Add solver to callback call
This reverts commit 1429cab.
This reverts commit 6f0d747.
This reverts commit d850771.
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #240 +/- ##
==========================================
- Coverage 81.44% 0.00% -81.45%
==========================================
Files 38 38
Lines 2608 2723 +115
Branches 331 0 -331
==========================================
- Hits 2124 0 -2124
- Misses 343 2723 +2380
+ Partials 141 0 -141
🚀 New features to boost your workflow:
|
* save state * a bit more tidying * save state
src/cellflow/data/_datamanager.py
Outdated
| @property | ||
| def is_conditional(self) -> bool: | ||
| """Whether the model is conditional.""" | ||
| return (len(self._perturbation_covariates) > 0) or (len(self._sample_covariates) > 0) | ||
|
|
There was a problem hiding this comment.
I think we can just set this to true all the time, wdyt
There was a problem hiding this comment.
we can assert (len(self._perturbation_covariates) > 0) or (len(self._sample_covariates) > 0) in the begining and also get rid of it also
| new = dm_new._get_condition_data( | ||
| adata=adata_perturbation.copy(), | ||
| ) | ||
| compare_train_data(old, new) |
There was a problem hiding this comment.
this is to compare whether it gives the same as it used to be?
There was a problem hiding this comment.
Yes I want to be sure that the results are exactly the same for a while. Because the code isn't very interpretable unfortunately
MUCDK
left a comment
There was a problem hiding this comment.
Looks great, thanks a lot. let's just resolve the comments
for more information, see https://pre-commit.ci
Fixes: #236
This shouldn't change the behavior except storing preprocessed arrays as numpy arrays.