Skip to content

Remove unused feature type inference to stick with one way of doing it#231

Merged
eroell merged 10 commits into
mainfrom
remove-infer-column-types
Mar 25, 2026
Merged

Remove unused feature type inference to stick with one way of doing it#231
eroell merged 10 commits into
mainfrom
remove-infer-column-types

Conversation

@eroell
Copy link
Copy Markdown
Collaborator

@eroell eroell commented Mar 25, 2026

Add binary_as parameter to infer_feature_types

Summary

  • Feature: Add a binary_as parameter to ed.infer_feature_types that controls whether binary (0/1) features are classified as categorical or numeric.
  • Modified: ed.infer_feature_types considers now integer series 0, ... , n as numeric, and not as categorical
  • Removed: A private function _infer_numerical_column_indices has been removed which has previously been used in some imputation functions from ehrapy (Remove _infer_numerical_column_indices ehrapy#1014)

Fixes #214

Motivation

Binary 0/1 variables are ambiguous: they can represent true categorical labels (e.g., male/female) or numeric indicators (e.g., presence/absence of a diagnosis used in downstream arithmetic). Previously, infer_feature_types always classified them as categorical. This made it cumbersome when a dataset has many binary indicator columns that should be treated numerically — users had to manually correct each one with ed.replace_feature_types.

Change

infer_feature_types (and the internal _detect_feature_type) now accept a keyword argument:

binary_as: Literal["categorical", "numeric"] = "categorical"
  • "categorical" (default): binary 0/1 features are classified as categorical — no change in behavior.
  • "numeric": binary 0/1 features are classified as numeric instead.

Usage

import ehrdata as ed

edata = ed.dt.mimic_2()

# Default — binary features are categorical (existing behavior)
ed.infer_feature_types(edata)

# Opt in — treat binary 0/1 features as numeric
ed.infer_feature_types(edata, binary_as="numeric")

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@eroell eroell merged commit f1d93ba into main Mar 25, 2026
10 checks passed
@eroell eroell deleted the remove-infer-column-types branch March 25, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

citations broken

1 participant