Skip to content

DataFrameMapper - pass custom callable function and preserve columns names #252

Open
@roei-simplex

Description

@roei-simplex

As mentioned in Dynamic Columns section of the documentation, DataFrameMapper supports selecting columns dynamically during the fit operation by passing a custom callable or using sklearn.compose.make_column_selector.
I've tried doing so, and the behavior I've experienced is that the columns names are replaced with column index (a number), which is also the expected behavior according to the documentation:

class GetColumnsStartingWith:
... def init(self, start_str):
... self.pattern = start_str
...
... def call(self, X:pd.DataFrame=None):
... return [c for c in X.columns if c.startswith(self.pattern)]
...
df = pd.DataFrame({
... 'sepal length (cm)': [1.0, 2.0, 3.0],
... 'sepal width (cm)': [1.0, 2.0, 3.0],
... 'petal length (cm)': [1.0, 2.0, 3.0],
... 'petal width (cm)': [1.0, 2.0, 3.0]
... })
t = DataFrameMapper([
... (
... sklearn.compose.make_column_selector(dtype_include=float),
... sklearn.preprocessing.StandardScaler(),
... {'alias': 'x'}
... ),
... (
... GetColumnsStartingWith('petal'),
... None,
... {'alias': 'petal'}
... )], df_out=True, default=False)
t.fit(df).transform(df).shape
(3, 6)
t.transformed_names_
['x_0', 'x_1', 'x_2', 'x_3', 'petal_0', 'petal_1']

I would like to know how can I select columns dynamically (e.g. by dtype) while preserving their names.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions