Description
As mentioned in Dynamic Columns section of the documentation, DataFrameMapper supports selecting columns dynamically during the fit operation by passing a custom callable or using sklearn.compose.make_column_selector.
I've tried doing so, and the behavior I've experienced is that the columns names are replaced with column index (a number), which is also the expected behavior according to the documentation:
class GetColumnsStartingWith:
... def init(self, start_str):
... self.pattern = start_str
...
... def call(self, X:pd.DataFrame=None):
... return [c for c in X.columns if c.startswith(self.pattern)]
...
df = pd.DataFrame({
... 'sepal length (cm)': [1.0, 2.0, 3.0],
... 'sepal width (cm)': [1.0, 2.0, 3.0],
... 'petal length (cm)': [1.0, 2.0, 3.0],
... 'petal width (cm)': [1.0, 2.0, 3.0]
... })
t = DataFrameMapper([
... (
... sklearn.compose.make_column_selector(dtype_include=float),
... sklearn.preprocessing.StandardScaler(),
... {'alias': 'x'}
... ),
... (
... GetColumnsStartingWith('petal'),
... None,
... {'alias': 'petal'}
... )], df_out=True, default=False)
t.fit(df).transform(df).shape
(3, 6)
t.transformed_names_
['x_0', 'x_1', 'x_2', 'x_3', 'petal_0', 'petal_1']
I would like to know how can I select columns dynamically (e.g. by dtype) while preserving their names.