-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify feature type detection #724
Conversation
# Conflicts: # pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so so much for your hard work on this!
undo_encoding
Does this also need to be removed from the docs?
For the release notes, it'd be great if we had a function which takes the old feature annotations and moves them to the right places. We would tell people to run that when moving versions. Is this even necessary when it will run our new functions anyways? It might just reannotate the features with the right slots then, right?
I removed it from
The changes shouldn't cause any function to fail, as |
No, I just probably missed it. Thanks! OK concerning the "transfer method" |
PR Checklist
docs
is updatedDescription of changes
ehrapy
now fully relies onep.ad.infer_feature_types
whenever a distinction between categorical and numerical features is needed.adata.var["ehrapy_column_type"]
were removed.ep.ad.infer_feature_types
method: All dates stored in any ISO-format as a String are not automatically detected as dates.Discussion points
adata.uns
. With the new feature type detection, we could remove that now. The only reason why I kept it is because it is used to find out if theadata
is already encoded or not, and, depending on that, the behavior of the encoding method changes. I could manually test that instead by just checking that all features are numerical. However, that would be more computationally expensive than simply checking if a specific key inadata.uns
is present.encoded=True
). Because the encoding relies on feature types, those will automatically be inferred whenever data are loaded. I think automatically encoding them is a very convenient feature, so I guess we want to keep this behavior, but I still want to confirm that you are aware of and fine with this behavior.ep.ad.set_feature_type(["feature1", "feature2", "feature3"], "categorical")
, where three features that were detected incorrectly would be corrected to the feature type "categorical"?ToDos
adata.uns
, as described above)adata.uns
inadata.var
or remove it