# 1. ColumnTransformer

## Logic:
ColumnTransformer is a versatile tool in scikit-learn used to apply different preprocessing steps to different subsets of features within a dataset. This is particularly useful in cases where you need to apply different transformations to numerical and categorical features, or to different groups of features.

## Usage:
You explicitly define which columns are transformed by which transformer, and you can combine multiple transformers together.

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer

# Sample data
import pandas as pd
X = pd.DataFrame({
    'num_feature1': [1, 2, 3, None],
    'num_feature2': [4, 5, 6, 7],
    'cat_feature': ['a', 'b', 'a', 'b']
})

# Define the column transformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', SimpleImputer(strategy='mean'), ['num_feature1', 'num_feature2']),
        ('cat', OneHotEncoder(), ['cat_feature'])
    ]
)

# Fit and transform the data
X_transformed = preprocessor.fit_transform(X)
print(X_transformed)


# 2. make_column_transformer

## Logic:
make_column_transformer is a convenience function that helps to create a ColumnTransformer without needing to explicitly name each transformer.

## Usage:
It simplifies the syntax by not requiring names for each transformer, which can be handy for quick prototyping or simpler pipelines.

In [None]:
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer

# Sample data
X = pd.DataFrame({
    'num_feature1': [1, 2, 3, None],
    'num_feature2': [4, 5, 6, 7],
    'cat_feature': ['a', 'b', 'a', 'b']
})

# Define the column transformer using make_column_transformer
preprocessor = make_column_transformer(
    (SimpleImputer(strategy='mean'), ['num_feature1', 'num_feature2']),
    (OneHotEncoder(), ['cat_feature'])
)

# Fit and transform the data
X_transformed = preprocessor.fit_transform(X)
print(X_transformed)


# 3. make_column_selector

## Logic:
make_column_selector is used to create selectors for column names based on certain criteria, such as selecting columns by data type (numerical, categorical, etc.). It is often used in conjunction with ColumnTransformer or make_column_transformer.

## Usage:
It helps to dynamically select columns, which can be particularly useful when the column names are not known in advance or when you want to select columns by type.

In [None]:
from sklearn.compose import make_column_selector, ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer

# Sample data
X = pd.DataFrame({
    'num_feature1': [1, 2, 3, None],
    'num_feature2': [4, 5, 6, 7],
    'cat_feature': ['a', 'b', 'a', 'b']
})

# Define the column transformer using make_column_selector
preprocessor = ColumnTransformer(
    transformers=[
        ('num', SimpleImputer(strategy='mean'), make_column_selector(dtype_include='number')),
        ('cat', OneHotEncoder(), make_column_selector(dtype_include='object'))
    ]
)

# Fit and transform the data
X_transformed = preprocessor.fit_transform(X)
print(X_transformed)


# Summary
- ColumnTransformer: Flexible and explicitly names transformers.
- make_column_transformer: Simplifies the creation of a ColumnTransformer without naming transformers.
- make_column_selector: Dynamically selects columns based on criteria, often used with ColumnTransformer.