Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added F statistics and options for different types of tests #14

Open
wants to merge 64 commits into
base: correlation-metric
Choose a base branch
from

Conversation

handwerkerd
Copy link

@handwerkerd handwerkerd commented Feb 15, 2024

Proposes changes to ME-ICA#1021.

I merged my approach using F statistics for the external regressors into this PR.

Changes proposed in this pull request:

  • Added a CLI parameter external_regressor_dict which defines what test to use for fitting external regressors
    • corr_no_detrend.json runs @tsalo's correlation method
    • corr_detrend.json runs @tsalo's correlation method after detrending
    • Fmodel.json run a single F test using all external regressors an adds metrics for F, R2, and p.
    • Mot12_CSF.json is an example of running the F test on a full model and also outputting stats for partial models, like all motion parameters an a CSF ROI
  • Added a new demo decision tree minimal_external3.json which is the same as extern1 but it uses a full F test model with p<0.05 and R2>0.5 (i.e. statistically significant and over half the variance in a component). As expected, this rejects a component that was substantially modeled by two external regressors, but not each separately.
    • Demo trees are now all names demo_ so that we can merge them and make clear they're demonstrations of functionality without recommending immediate usage.
  • external_regressor_fits.py is expanded to parse the above files and calculate the correlation and F metrics. As set up, it will be relatively easy to add more metrics for external regressors.
  • new intregration test for using the F statistic
  • external_regressor_dict is being passed to the component selector class so that a descriptive line is added to the report log.

Stuff left to do:

  • Add unit tests
  • Improve the LGR.info() messages
  • Add documentation for how to use this new functionality
  • Consider if additional information should be saved in the output directory. At minimum we might want to copy the external regressors and external_regressor_dict into the output directory.

This passes all linting and seems fairly stable when I've tried a bunch of different things, but it's still a work in progress. Your choice if you want to merge into your PR so we're working on only one version or wait until this is cleaner. We can discuss on the dev call tomorrow.

tsalo and others added 18 commits August 7, 2023 09:38
…E-ICA#1024)

* Create dependabot.yml

* Drop release drafter in favor of release config.
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v2...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@tsalo tsalo self-requested a review February 15, 2024 15:02
tsalo and others added 2 commits February 15, 2024 17:36
* Base versions on last CircleCI 3.8 env build.

* Base max versions on 3.12 test.

* Set minimum versions too.

* Update pyproject.toml

Co-authored-by: Dan Handwerker <7406227+handwerkerd@users.noreply.github.com>

---------

Co-authored-by: Dan Handwerker <7406227+handwerkerd@users.noreply.github.com>
Copy link
Owner

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very much sold on using F-statistics, but I think the regressor definitions should be merged into the decision tree configs. The main concern about putting regressor definitions into decision trees is that the column names may vary by the individual, but I think users can deal with that by either modifying the decision trees' necessary metrics definitions or by writing in regular expressions into the metric definitions.

docs/api.rst Outdated Show resolved Hide resolved
tedana/metrics/collect.py Outdated Show resolved Hide resolved
@@ -0,0 +1,7 @@
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still rather merge these configs into the decision tree. Decision tree configs define necessary metrics, so we should be determining what metrics to calculate from them anyway (ME-ICA#921). If you are concerned about column names, we could include regular expressions to capture common cases, or just let users update the "info" to say that they modified the tree to match their column names, but that the didn't change the underlying logic of the tree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still thinking about this & leaning towards agreeing with you. I think we might still need an option for a regressor column config file, but if a decision tree is looking for specific column classes in order to function, then that info could be in the decision tree json. I'll play around to see how this looks in practice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now done and works with regular expressions.

tedana/resources/extern_regress_dicts/Fmodel.json Outdated Show resolved Hide resolved
component_table,
cross_component_metrics={},
status_table=None,
external_regressor_dict=None,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the external regressor config needs to be passed in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it's passed here is so that the LGR.report line for the regressor model appears along with the tree description. This wouldn't be an issue if the regressor model was in the decision tree json, but I think that would cause other issues... still thinking about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move everything into the decision tree json file, I think I'd need move a light version of ME-ICA#969 into this PR. This isn't a bad thing, but the only way to get this to work would be to initialize the component selector and read in the external regressor fields much earlier in the code. There are benefits to this, but it would expand this PR beyond just adding correlations. I might still give it a try, but wanted to bring this up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tedana/selection/selection_utils.py Outdated Show resolved Hide resolved
tedana/metrics/external_regressor_fits.py Outdated Show resolved Hide resolved
tedana/metrics/external_regressor_fits.py Outdated Show resolved Hide resolved
return comptable


def make_detrend_regressors(n_time, polort=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't some of the largest-variance components typically characterized by drifts? Are there some cases where we'd want to detrend regressors, but other cases where we wouldn't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, I think I'm doing something non-ideal here and in my earlier non-tedana version of this analysis. We need to model slow drifts because, they can be a large source of variance and our external regressors will less often model a significant portion of the variance if we don't model the drift. That said, if we're calculating fits based on the full F statistic model then tedana is detrending data by default. This isn't good because a benefit of tedana is it can potentially separate and retain T2*-weight slow drifts in learning or pharmacology studies.

I think the better approach is to output a partial model with all nuissance regressors except the slow drifts and use that in the decision tree. Based on feedback from Logan & Gang Chen, I might also give an option to model task regressors so decision trees might skew towards keeping them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused in my above comment. The Full F test is comparing a model with all the external regressors and the detrending regressors to just the detrending regressors. That is, it's detrending the data and seeing if the remaining regressors model the signal above and beyond what is detrended. As such, this is what I think we want. A component won't be flagged and removed because of a large signal drift unless that componend is flagged for rejection based on kappa/rho criteria. If a drift component is T2* weighted, tedana should not remove it (but a user can remove it separately)

tsalo and others added 8 commits February 16, 2024 15:22
…4.2.0 (ME-ICA#1031)

* Fix empty file

* fix style

* style fix 2

* csv file check

* fix txt when empty and create test for empty files

* Fix style

* Run black

* unused import remove

* Explanation of function and remove unused variable

* Fix No blank lines allowed after function docstring

* use bare 'except'

* ica_reclassify logging init higher

* linting fixes

* fixed a test

* fixed linting and precommit black version

---------

Co-authored-by: martinezeguiluz <mmartinezeguiluz@gmail.com>
* docs: update README.md

* docs: update .all-contributorsrc

---------

Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
@handwerkerd
Copy link
Author

I'm going to get back to working on this after ME-ICA#952 is merged (assuming no big concerns with that one). Do you want me to keep working on this PR to your PR or should you just close your PR & I'll open a new one? (Depends on how much I'm just taking the lead on this PR going forward vs we'll both keep contributing).

tsalo and others added 4 commits February 28, 2024 10:41
Getting generate metrics fully running
* Add T2* and S0 figures.

* Add seaborn as a dependency.

* Update static_figures.py

* Fix.

* Update expected outputs.

* Reclassify doesn't need T2*/S0 plots.

* Add png_cmap to T2*/S0 plots.

* Add type hints and fix test outputs.

* Update html_report.py

* Update report_body_template.html

* Try improving the figures.

* Update report_body_template.html

* Drop seaborn.

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Document the figures.

* Update docs/outputs.rst

Co-authored-by: Dan Handwerker <7406227+handwerkerd@users.noreply.github.com>

* Switch to grayscale colormap.

* Update tedana/reporting/static_figures.py

Co-authored-by: Dan Handwerker <7406227+handwerkerd@users.noreply.github.com>

* Update tedana/reporting/static_figures.py

Co-authored-by: Dan Handwerker <7406227+handwerkerd@users.noreply.github.com>

* Update static_figures.py

* Update figure.

---------

Co-authored-by: Dan Handwerker <7406227+handwerkerd@users.noreply.github.com>
* Improve component plots maybe.

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Put title on top axis.

* Use suptitle.

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Update static_figures.py

* Add time ticks to time series subplot.

* Update static_figures.py

* Update static_figures.py
@tsalo
Copy link
Owner

tsalo commented Feb 29, 2024

I think I'll change the target for my PR to a new branch on ME-ICA, like we did with doc-tree. That way we can both contribute to it as needed before we merge into main.

dependabot bot and others added 4 commits March 4, 2024 14:05
…E-ICA#1055)

Updates the requirements on [nibabel](https://github.com/nipy/nibabel) to permit the latest version.
- [Release notes](https://github.com/nipy/nibabel/releases)
- [Changelog](https://github.com/nipy/nibabel/blob/master/Changelog)
- [Commits](nipy/nibabel@2.5.1...5.2.1)

---
updated-dependencies:
- dependency-name: nibabel
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@handwerkerd
Copy link
Author

handwerkerd commented Mar 7, 2024

I've addressed almost all the comments above. The main thing left to text and fix is I'm fairly sure I've messed up how it handles individually named regressors in the partial model. With the new system, I think I'll need to use regex which will require a few more changes.

Based on a suggestion from Logan, I also added an option for Task regressors which are excluded from the other model, but calculated separately, so that it will be possible to protect against rejecting components that follow the task design.

As for branching, when you made a new branch on main I still needed to open a new PR, so it would have been just as easy for me to open a PR from my branch (and give maintainers edit access if they want to make changes).

Also, when the big changes settle down, I'll add tests and documentation.

@handwerkerd
Copy link
Author

@tsalo There's still some work to do, but I've addressed all of your above concerns and I was just about to open a new PR. I intended to merge into https://github.com/ME-ICA/tedana/tree/enh/external-regressors but I ended up creating https://github.com/ME-ICA/tedana/tree/external-regressors instead. Opinions on whether we should keep using the branch I created vs me trying to figure out how to push to the branch you created?

FWIW, the command I used was git push upstream external-regressors

git push upstream enh/external-regressors
outputted

error: src refspec enh/external-regressors does not match any
error: failed to push some refs to 'github.com:ME-ICA/tedana.git'

Either way, we should delete whichever one we aren't using. (There are also a bunch of all-contributors branches that should probably be deleted)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants