<a href="https://colab.research.google.com/github/wjtopp3/CIS-210/blob/main/CIS_210_Unit_5_Lab_5_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The diamond dataset contains the price, cut, color, and other characteristics of a sample of nearly 54,000 diamonds. This data can be used to predict the price of a diamond based on its characteristics. Use **sklearn's GridSearchCV()** function to train and evaluate an **elastic net model over a hyperparameter grid**.

- Create dataframe X with the features carat and depth.

- Create dataframe y with the feature price.

- Split the data into 80% training and 20% testing sets, with random_state = 42.

- Initialize an elastic net model with random_state = 0.

- Create a tuning grid with the hyperparameter name alpha and the values 0.1, 0.5, 0.9, 1.0.

- Use GridSearchCV() with cv=10 to initialize and fit a tuning grid to the training data.

- Print the mean testing score for each fold and the best parameter value.

----------------------------------

Ex: If random_state=123 is used to split the data, the output is:

Mean testing scores: [0.8487619  0.81653977 0.76847326 0.75584872]
Best estimator: ElasticNet(alpha=0.1, random_state=0)

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import ElasticNet

# Load the diamond dataset
# Replace 'diamonds.csv' with your actual file path
# Load the diamonds dataset
# Assuming the dataset is available as a CSV file or from a package like seaborn
try:
    import seaborn as sns
    df = sns.load_dataset('diamonds')
except:
    # Alternative: if you have the dataset as a CSV file
    df = pd.read_csv('diamonds.csv')


# Create dataframe X with the features 'carat' and 'depth'
X = df[['carat', 'depth']]

# Create dataframe y with the feature 'price'
y = df[['price']]

# Split the data into 80% training and 20% testing sets, with random_state = 42
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)

# Initialize an elastic net model with random_state = 0
elastic_net = ElasticNet(random_state=0)

# Create a tuning grid with the hyperparameter name 'alpha' and values 0.1, 0.5, 0.9, 1.0
param_grid = {
    'alpha': [0.1, 0.5, 0.9, 1.0]
}

# Use GridSearchCV() with cv=10 to initialize and fit a tuning grid to the training data
grid_search = GridSearchCV(
    elastic_net,
    param_grid,
    cv=10
)

# Fit the grid search to the training data
grid_search.fit(X_train, y_train.values.ravel())

# Print the mean testing score for each fold
print("Mean testing scores:", grid_search.cv_results_['mean_test_score'])

# Print the best parameter value
print("Best estimator:", grid_search.best_estimator_)

Mean testing scores: [0.82224591 0.61369963 0.47103892 0.444506  ]
Best estimator: ElasticNet(alpha=0.1, random_state=0)


In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import ElasticNet

# Load the diamonds dataset
# Assuming the dataset is available as a CSV file or from a package like seaborn
try:
    import seaborn as sns
    diamonds = sns.load_dataset('diamonds')
except:
    # Alternative: if you have the dataset as a CSV file
    diamonds = pd.read_csv('diamonds.csv')


X = diamonds[['carat', 'depth']]
y = diamonds[['price']]

# Create training/testing split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the input features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Initialize elastic net model
ENModel = ElasticNet(random_state=0)

# Create tuning grid
alpha = {'alpha': [0.1, 0.5, 0.9, 1.0]}

# Initialize tuning grid and fit to training data
ENTuning = GridSearchCV(ENModel, alpha, cv=10)
ENTuning.fit(X_train, np.ravel(y_train))

# Mean testing score for each lambda and best model
print('Mean testing scores:', ENTuning.cv_results_['mean_test_score'])
print('Best estimator:', ENTuning.best_estimator_)

Mean testing scores: [0.84865529 0.81643291 0.76836873 0.75574554]
Best estimator: ElasticNet(alpha=0.1, random_state=0)
