# make_pipeline
The make_pipeline function in scikit-learn is a convenience function that helps create a pipeline without explicitly naming the individual steps. It simplifies the creation of a pipeline by automatically naming the steps based on their types.

## How It Works
- Step Definition: Define the individual transformers and estimators.
- Creating the Pipeline: Use make_pipeline to create a pipeline with the defined steps.
- Fitting the Pipeline: Train the pipeline on the training data.
- Making Predictions: Use the pipeline to make predictions on new data.

## Simple Example

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load data
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Create pipeline
pipeline = make_pipeline(
    StandardScaler(),
    LogisticRegression()
)

# Fit pipeline
pipeline.fit(X_train, y_train)

# Make predictions
predictions = pipeline.predict(X_test)

print(predictions)


- Uses StandardScaler and LogisticRegression.
- Demonstrates basic pipeline creation and usage.

## Complex Example

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split

# Load data
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Create pipeline
pipeline = make_pipeline(
    StandardScaler(),
    PCA(n_components=2),
    SVC(kernel='linear')
)

# Fit pipeline
pipeline.fit(X_train, y_train)

# Make predictions
predictions = pipeline.predict(X_test)

print(predictions)


- Uses StandardScaler, PCA, and SVC.
- Shows how to include a dimensionality reduction step in the pipeline.

## Very Complex Example

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Create pipeline
pipeline = make_pipeline(
    StandardScaler(),
    SelectKBest(f_classif),
    SVC()
)

# Define parameter grid
param_grid = [
    {
        'selectkbest__k': [5, 10, 15],
        'svc__C': [0.1, 1, 10]
    },
    {
        'selectkbest__k': [5, 10, 15],
        'svc': [RandomForestClassifier()],
        'svc__n_estimators': [10, 50, 100]
    }
]

# Create grid search
grid_search = GridSearchCV(pipeline, param_grid, cv=5)

# Fit pipeline
grid_search.fit(X_train, y_train)

# Make predictions
predictions = grid_search.predict(X_test)

print(predictions)


- Uses StandardScaler, SelectKBest for feature selection, and either SVC or RandomForestClassifier.
- Integrates GridSearchCV to optimize parameters for the pipeline.

## Test the examples

In [None]:
import unittest
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris, load_wine, load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.feature_selection import SelectKBest, f_classif

### test_simple example

In [None]:
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

pipeline = make_pipeline(
    StandardScaler(),
    LogisticRegression()
)

param_grid = {
    'logisticregression__C': [0.1, 1, 10]
}

grid_search = GridSearchCV(pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)
predictions = grid_search.predict(X_test)

self.assertEqual(len(predictions), len(y_test))

# Cross-validation
scores = cross_val_score(grid_search, data.data, data.target, cv=5)
print(f"Simple Pipeline Cross-Validation Scores: {scores}")

- Uses StandardScaler and LogisticRegression.
- GridSearchCV optimizes the C parameter.
- Tests if predictions match the length of the test data.
- Evaluates the performance using cross_val_score

### test_complex example

In [None]:
data = load_wine()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

pipeline = make_pipeline(
    StandardScaler(),
    PCA(n_components=2),
    SVC(kernel='linear')
)

param_grid = {
    'pca__n_components': [2, 3],
    'svc__C': [0.1, 1, 10]
}

grid_search = GridSearchCV(pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)
predictions = grid_search.predict(X_test)

self.assertEqual(len(predictions), len(y_test))

# Cross-validation
scores = cross_val_score(grid_search, data.data, data.target, cv=5)
print(f"Complex Pipeline Cross-Validation Scores: {scores}")

- Uses StandardScaler, PCA, and SVC.
- GridSearchCV optimizes the number of PCA components and the C parameter of SVC.
- Tests if predictions match the length of the test data.
- Evaluates the performance using cross_val_score

### test_very_complex example

In [None]:
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

pipeline = make_pipeline(
    StandardScaler(),
    SelectKBest(f_classif),
    SVC()
)

param_grid = [
    {
        'selectkbest__k': [5, 10, 15],
        'svc__C': [0.1, 1, 10]
    },
    {
        'selectkbest__k': [5, 10, 15],
        'svc': [RandomForestClassifier()],
        'svc__n_estimators': [10, 50, 100]
    }
]

grid_search = GridSearchCV(pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)
predictions = grid_search.predict(X_test)

self.assertEqual(len(predictions), len(y_test))

# Cross-validation
scores = cross_val_score(grid_search, data.data, data.target, cv=5)
print(f"Very Complex Pipeline Cross-Validation Scores: {scores}")

- Uses StandardScaler, SelectKBest for feature selection, and either SVC or RandomForestClassifier.
- GridSearchCV optimizes the number of features selected and the parameters of the classifiers.
- Tests if predictions match the length of the test data.
- Evaluates the performance using cross_val_score.