Harmonize feature type detection #701

Lilly-May · 2024-04-20T15:47:46Z

Description of feature

We currently have three issues (#637, #649, #662) dealing with how we determine feature types and the downstream problems that arise due to different approaches. I'll provide a summary of these issues and associated ToDos here, while closing the other three issues.

Problem:
We've introduced PR #697 to accurately determine feature types in ehrapy. With the new method ep.ad.infer_feature_types, feature types are guessed based on predefined rules and we prompt the user to review these annotations. Currently, feature determination occurs at multiple stages inconsistently and is saved in the adata at several places. Ideally, we would harmonize ehrapy to use exclusively use ep.ad.infer_feature_types for feature annotation, eliminating guesswork in downstream analyses. This means that the new method would be part of the standard preprocessing steps.

ToDos:

The text was updated successfully, but these errors were encountered:

eroell · 2024-04-22T07:48:54Z

Nice, thanks for the summary and binging this together here!

Ideally, we would get rid of the adata.var[EHRAPY_TYPE_KEY] annotation entirely and, if autodetect is set to True in the encoding method, base the identification of features to encode on the annotation from ep.ad.infer_feature_types.

This is not what currently is suggested in #697 right?

Also, I think this could lead to some hard to resolve issues if it is not stored but repeatedly called: e.g. labels often encountered are True/False or 0/1, or yes/no: in the 0/1 case, type inference likely infers that to continuous. And 0/1 for sure sometimes would be wanted to be categorical, and sometimes to be continuous

-> users would want to switch the annotated type sometimes for sure, which wouldnt be doable with on-the-fly type inference

Lilly-May added the enhancement New feature or request label Apr 20, 2024

This was referenced Apr 20, 2024

Single implementation of column type autodetection #662

Closed

Encoding information should be stored in var if possible #649

Closed

Add var type retrospectively #637

Closed

Lilly-May self-assigned this Apr 20, 2024

This was referenced Apr 22, 2024

Unify feature type annotations #697

Merged

Add bias detection to preprocessing #690

Merged

Lilly-May mentioned this issue May 3, 2024

Update tutorials to new feature type detection theislab/ehrapy-tutorials#26

Closed

1 task

Lilly-May mentioned this issue May 12, 2024

Unify feature type detection #724

Merged

10 tasks

Zethson closed this as completed in #724 May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmonize feature type detection #701

Harmonize feature type detection #701

Lilly-May commented Apr 20, 2024 •

edited

eroell commented Apr 22, 2024

Harmonize feature type detection #701

Harmonize feature type detection #701

Comments

Lilly-May commented Apr 20, 2024 • edited

Description of feature

eroell commented Apr 22, 2024

Lilly-May commented Apr 20, 2024 •

edited