# Hybrid recommender
## Collaborative filtering + Content-Based

## Plan:
* Load and merge the wines metadata on WineID.
* Load the data, format columns to the correct types like:
    * Grapes and Harmonize (string list) -> (python list)  
    !!! This step is required in future always when loading this columns from csv.
    * Vintage (str) -> numeric. I just simply replace N.V.(non-vintage) with 0 and then turn whole column to integer type.
    * Datetime to a proper pd.datetime type
* Columns used:
    * **WineID**: Integer. The wine primary key identification;
    * **WineName**: String. The textual wine identification presented in the label;
    * **Type**: String. The categorical type classification: Red, white or rosé for still wines, gasified sparkling or dessert for sweeter and fortified wines. Dessert/Port is a subclassification for liqueur dessert wines;
    * **Elaborate**: String. Categorical classification between varietal or assemblage/blend. The most famous blends are also considered, such as * Bordeaux red and white blend, Valpolicella blend and Portuguese red and white blend;
    * **Grapes**: String list. It contains the grape varieties used in the wine elaboration. The original names found have been kept;
    * **Harmonize**: String list. It contains the main dishes set that pair with the wine item. These are provided by producers but openly recommended on the internet by sommeliers and even consumers;
    * **ABV**: Float. The alcohol by volume (ABV) percentage. According to [1], the value shown on the label may vary, and a tolerance of 0.5% per 100 volume is allowed, reaching 0.8% for some wines;
    * **Body**: String. The categorical body classification: Very light-bodied, light-bodied, medium-bodied, full-bodied or very full-bodied based on wine viscosity [37];
    * **Acidity**: String. The categorical acidity classification: Low, medium, or high, based on potential hydrogen (pH) score [38];
    * **Country**: String. The categorical origin country identification of the wine production (ISO-3166);
    * **RegionName**: String. The textual wine region identification. The appellation region name was retained when identified;
    * **WineryName**: String. The textual winery identification;
    * **UserID**: Integer. The sequential key without identifying the user's private data;
    * **Vintage**: String. A rated vintage year or the abbreviation "N.V." referring to "non-vintage";
    * **Date**: String. Datetime in the format YYYY-MM-DD hh:mm:ss informing when it was rated by the user. It can be easily converted to other formats.
    * **Rating**(**Target variable**): Float. It contains the 5-stars (1–5) rating value ⊂ {1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5} performed by the user;
* Columns dropped:
    * **RegionID** - since it is just unique IDs and not descriptive for the recommender.
    * **Code** - since it's the same meaning as **Country** column. Either one can be selected.
    * **Vintages** - since it's just lists of possible vintages and we already have a Vintage column with the exact value(year or 0 for non-vintage).

In [1]:

import pandas as pd
import ast
import numpy as np

# Training and evaluation
import optuna
import lightgbm as lgb
import xgboost as xgb
from sklearn.metrics import mean_squared_error, mean_absolute_error, root_mean_squared_error

# Preprocessing
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.base import BaseEstimator, TransformerMixin
from category_encoders import TargetEncoder, HashingEncoder
from sklearn.preprocessing import MultiLabelBinarizer
from scipy.sparse import hstack, csr_matrix, issparse, lil_matrix, save_npz, load_npz


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Converter of string lists into Python lists
# (e.g. "['a', 'b', 'c']" → [a, b, c])
def parse_list_col(s):
    return ast.literal_eval(s)

# Converter of 'N.V.' to 0, so column is numeric
def parse_vintage(s):
    return 0 if s == 'N.V.' else int(s)


base_path = '..\..\..\..\data\main'


In [3]:
# Load the train and test splits

wines = pd.read_csv(
    f'{base_path}\\XWines_Full_100K_wines.csv', 
    usecols=['WineID', 'Type', 'Elaborate', 'ABV', 'Body', 'Acidity', 'RegionName', 'WineryName', 'Grapes','Harmonize','Country'],
    converters={
        'Grapes':    parse_list_col,
        'Harmonize': parse_list_col
    }
)
train = pd.read_csv(
    f'{base_path}\\trainset.csv', 
    usecols=['UserID', 'WineID', 'Rating', 'Date', 'Vintage'],
    parse_dates=['Date'],
    date_format=lambda s: pd.to_datetime(s),
    converters={'Vintage': parse_vintage}
)
test_uwarm_iwarm = pd.read_csv(
    f'{base_path}\\testset_warm_user_warm_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating', 'Date', 'Vintage'],
    parse_dates=['Date'],
    date_format=lambda s: pd.to_datetime(s),
    converters={'Vintage': parse_vintage}
)
test_uwarm_icold = pd.read_csv(
    f'{base_path}\\testset_warm_user_cold_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating', 'Date', 'Vintage'],
    parse_dates=['Date'],
    date_format=lambda s: pd.to_datetime(s),
    converters={'Vintage': parse_vintage}
)
test_ucold_iwarm = pd.read_csv(
    f'{base_path}\\testset_cold_user_warm_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating', 'Date', 'Vintage'],
    parse_dates=['Date'],
    date_format=lambda s: pd.to_datetime(s),
    converters={'Vintage': parse_vintage}
)
test_ucold_icold = pd.read_csv(
    f'{base_path}\\testset_cold_user_cold_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating', 'Date', 'Vintage'],
    parse_dates=['Date'],
    date_format=lambda s: pd.to_datetime(s),
    converters={'Vintage': parse_vintage}
)

In [4]:
# Merge ratings with wines metadata on 'WineID'
train = train.merge(wines, on='WineID', how='left')
test_uwarm_iwarm = test_uwarm_iwarm.merge(wines, on='WineID', how='left')
test_uwarm_icold = test_uwarm_icold.merge(wines, on='WineID', how='left')
test_ucold_iwarm = test_ucold_iwarm.merge(wines, on='WineID', how='left')
test_ucold_icold = test_ucold_icold.merge(wines, on='WineID', how='left')

# Check the shapes
print(f"Train shape: {train.shape}")
print(f"Test warm user warm item shape: {test_uwarm_iwarm.shape}")
print(f"Test warm user cold item shape: {test_uwarm_icold.shape}")
print(f"Test cold user warm item shape: {test_ucold_iwarm.shape}")
print(f"Test cold user cold item shape: {test_ucold_icold.shape}")

Train shape: (16917894, 15)
Test warm user warm item shape: (2036778, 16)
Test warm user cold item shape: (35456, 16)
Test cold user warm item shape: (506800, 16)
Test cold user cold item shape: (16504, 16)


In [5]:
# Take a small sample of the training set
# train = train.sample(frac=0.1, random_state=42).reset_index(drop=True)


## Preprocessing
### Preprocessing methods for different features:
* **Standard Scaler** - is used for numerical type columns
    * **ABV**
    * **Vintage** (formatted to be numerical)
    * **DaysAgo(Date)** - see below
* **One-hot-encoding** - is used for categorical features, but is limited by the number of categories within a feature:
    * **Type**
    * **Body**
    * **Acidity**
    * **Elaborate**
* **Multi-Label** - is used for categorical features with too many categories, where also multiple active categories could be possible:
    * **Grapes** (774 classes)
    * **Harmonize** (~64 classes)
* **Target Encoding** - used for text features and user IDs, wine IDs. Used with KFold(5 folds) to prevent data leakage, i.e. encoded feature never knows about it's own target value
* **Date encoding** - created custom object to convert datetime column to DaysAgo from the most recent record column. This way we keep information about time-related information and reduce feature to be simply numerical. **Standard Scaler** applied afterwards.

In [6]:
# Aggregate features

# Use StandardScaler for numerical features (create binary Is_NonVintage column derived from Vintage and maybe fill NaN values with 0)
numerical_features = ['ABV', 'Vintage']
# Use one-hot encoding for categorical features
categorical_features = ['Type', 'Elaborate', 'Body', 'Acidity']
# Use multilabel binarizer for multilabel features
multilabel_features = ['Grapes', 'Harmonize']
# Use target encoding for Country
targetencoder_features = ['Country']
# Use frequency encoder for IDs and high cardinality features
frequency_features = ['WineID', 'UserID', 'WineryName', 'RegionName']
# Use conversion to DaysAgo for time-based features
date_features = ['Date']

* **Create Preprocessing pipeline**:
    ##### **Important**: Since during Grapes column encoding we create 774 classes + there are around 100 classes from other encoders, the pandas DataFrame would require too much RAM (I recieved 69GB memory allocation error only for the Grapes column) and same happening for the dense array (numpy), I used csr_matrix from scipy.sparse and some additional optimizations for MultiLabelBinarizer in particular, to be able to store all the preprocessed features. More info in code below.

    * **Created custom object for Date column preprocessing**
    * **Created Wrappers for other preprocessors to always return sparse csr matricies**. For OneHotEncoding there is already implemented sparse output. For StandardScaler in date_pipeline wrapper is not required, since the input is already a csr matrix.
    * **Created pipelines for each preprocessor and a general pipeline to combine everything together, using ColumnTransformer** 

In [7]:
# Date preprocessor
class DateTransformer(BaseEstimator, TransformerMixin):
    """Transforms a single datetime column into 'days ago' relative to the latest date in training data."""
    
    def fit(self, X, y=None):
        # Expect a DataFrame with a single datetime column
        self.col = X.columns[0]
        self.column = f"{self.col}_days_ago"
        self.reference_date = pd.to_datetime(X[self.col]).max()
        return self

    def transform(self, X):
        days_ago = (self.reference_date - pd.to_datetime(X[self.col])).dt.days
        
        return csr_matrix(days_ago.values.reshape(-1, 1))

    def get_feature_names_out(self, input_features=None):
        return np.array([self.column])
    

class MultiLabelWrapper(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.encoders = {}
        self.feature_names = []
    
    def fit(self, X, y=None):
        self.feature_names = []
        for col in X.columns:
            mlb = MultiLabelBinarizer()
            # Fill missing with empty list for consistent fitting
            safe_col = X[col].apply(lambda x: x if isinstance(x, list) else [])
            mlb.fit(safe_col)
            self.encoders[col] = mlb
            self.feature_names.extend([f"{col}__{cls}" for cls in mlb.classes_])
        return self
            
    def transform(self, X):
        matricies = []
        for col in X.columns:
            mlb = self.encoders[col]
            class_index = {cls: i for i, cls in enumerate(mlb.classes_)}
            n_rows = len(X)
            n_classes = len(mlb.classes_)
            sparse = lil_matrix((n_rows, n_classes), dtype=np.uint8)

            for i, labels in enumerate(X[col]):
                # Handle missing or malformed entries
                if not isinstance(labels, list):
                    labels = []
                for label in labels:
                    if label in class_index:
                        sparse[i, class_index[label]] = 1

            matricies.append(sparse.tocsr())
        return hstack(matricies, format='csr')
    
    def get_feature_names_out(self, input_features=None):
        return np.array(self.feature_names)


class TargetEncoderWrapper(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.encoders = {}
        self.feature_names = []
    
    def fit(self, X, y):
        self.feature_names = []
        self.global_means = {}
        kf = KFold(n_splits=5, shuffle=True, random_state=42)
        
        for col in X.columns:
            te = TargetEncoder(cols=[col])
            te.fit(X[[col]], y, cv=kf)
            self.encoders[col] = te
            self.global_means[col] = y.mean() 
            self.feature_names.append(f'{col}_target_encoded')
        return self
    
    def transform(self, X):
        matricies = []
        for col in X.columns:
            te = self.encoders[col]
            df_encoded = te.transform(X[[col]])
            arr = df_encoded[col].fillna(self.global_means[col]).values.reshape(-1, 1)
            matricies.append(csr_matrix(arr))
        return hstack(matricies, format='csr')
    
    def get_feature_names_out(self, input_features=None):
        return np.array(self.feature_names)
    
    
class FrequencyEncoderWrapper(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.freq_maps = {}
        self.feature_names = []

    def fit(self, X, y=None):
        self.freq_maps = {}
        self.feature_names = [f"{col}_freq" for col in X.columns]
        for col in X.columns:
            self.freq_maps[col] = X[col].value_counts(normalize=True).to_dict()
        return self

    def transform(self, X):
        matrices = []
        for col in X.columns:
            freq = X[col].map(self.freq_maps[col]).fillna(0).values.reshape(-1, 1)
            matrices.append(csr_matrix(freq))
        return hstack(matrices, format='csr')

    def get_feature_names_out(self, input_features=None):
        return np.array(self.feature_names)


class StandardScalerWrapper(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.scaler = StandardScaler(with_mean=False)
        self.feature_names = []

    def fit(self, X, y=None):
        self.feature_names = X.columns.tolist()
        self.scaler.fit(X)
        return self

    def transform(self, X):
        X_scaled = self.scaler.transform(X)
        # Always return sparse csr_matrix
        if not issparse(X_scaled):
            X_scaled = csr_matrix(X_scaled)
        return X_scaled

    def get_feature_names_out(self, input_features=None):
        return np.array(self.feature_names)

## Preprocessing pipeline

# Numerical
numerical_pipeline = Pipeline([
    ('scaler', StandardScalerWrapper()),
    ])

# Categorical via one-hot-encoding
categorical_pipeline = Pipeline([
    ('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=True))
])


# Categorical via MultiLabelBinarizer
multilabel_pipeline = Pipeline([
    ('multilabel', MultiLabelWrapper())
])

# Country via target encoding
target_pipeline = Pipeline([
    ('target', TargetEncoderWrapper())
])

# High cardinality categorical features via frequency encoding
# (i.e. WineryName, RegionName, UserID, WineID)
frequency_pipeline = Pipeline([
    ('frequency', FrequencyEncoderWrapper())
])

# Datetime via custom date transformer
date_pipeline = Pipeline([
    ('date', DateTransformer()),
    ('scaler', StandardScaler(with_mean=False))
])

# Preprocessor
# Remainnder contains RatingID column, which is not needed for training neither for testing
preprocessor = ColumnTransformer(transformers=[
    ('numerical', numerical_pipeline, numerical_features),
    ('categorical', categorical_pipeline, categorical_features),
    ('multilabel', multilabel_pipeline, multilabel_features),
    ('target', target_pipeline, targetencoder_features),
    ('frequency', frequency_pipeline, frequency_features),
    ('date', date_pipeline, date_features)
], remainder='drop')

preprocessing_pipeline = Pipeline([
    ('preprocessor', preprocessor)
])


In [8]:
# Drop target column for all datasets

X_train = train.drop(columns=['Rating'])
y_train = train['Rating']

X_test_uwarm_iwarm = test_uwarm_iwarm.drop(columns=['Rating'])
y_test_uwarm_iwarm = test_uwarm_iwarm['Rating']

X_test_uwarm_icold = test_uwarm_icold.drop(columns=['Rating'])
y_test_uwarm_icold = test_uwarm_icold['Rating']

X_test_ucold_iwarm = test_ucold_iwarm.drop(columns=['Rating'])
y_test_ucold_iwarm = test_ucold_iwarm['Rating']

X_test_ucold_icold = test_ucold_icold.drop(columns=['Rating'])
y_test_ucold_icold = test_ucold_icold['Rating']

In [9]:
from sklearn.model_selection import train_test_split
# Train/val split for hyperparameter tuning
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)


* **Fit preprocessing pipeline on training data. We pass target variable there for the Target Encoder**
* **Transform train and test sets on a fitted preprocessor**

In [10]:
# Fit the preprocessing pipeline
preprocessing_pipeline.fit(X_train, y_train)

X_train_transformed = preprocessing_pipeline.transform(X_train)
X_val_transformed = preprocessing_pipeline.transform(X_val)
X_test_uwarm_iwarm_transformed = preprocessing_pipeline.transform(X_test_uwarm_iwarm)
X_test_uwarm_icold_transformed = preprocessing_pipeline.transform(X_test_uwarm_icold)
X_test_ucold_iwarm_transformed = preprocessing_pipeline.transform(X_test_ucold_iwarm)
X_test_ucold_icold_transformed = preprocessing_pipeline.transform(X_test_ucold_icold)

# Save feature names
feature_names = preprocessing_pipeline.get_feature_names_out()
# Check the size of feature names and transformed data features 
print(f"Feature names size: {len(feature_names)}")
print(f"Transformed train data size: {X_train_transformed.shape[1]}")



Feature names size: 884
Transformed train data size: 884




In [11]:
# Save transformed data to npz
save_npz(f'{base_path}\\preprocessed\\X_train_transformed.npz', X_train_transformed)
save_npz(f'{base_path}\\preprocessed\\X_val_transformed.npz', X_val_transformed)
save_npz(f'{base_path}\\preprocessed\\X_test_uwarm_iwarm_transformed.npz', X_test_uwarm_iwarm_transformed)
save_npz(f'{base_path}\\preprocessed\\X_test_uwarm_icold_transformed.npz', X_test_uwarm_icold_transformed)
save_npz(f'{base_path}\\preprocessed\\X_test_ucold_iwarm_transformed.npz', X_test_ucold_iwarm_transformed)
save_npz(f'{base_path}\\preprocessed\\X_test_ucold_icold_transformed.npz', X_test_ucold_icold_transformed)
# Save target values to csv
y_train.to_csv(f'{base_path}\\preprocessed\\y_train.csv', index=False)
y_val.to_csv(f'{base_path}\\preprocessed\\y_val.csv', index=False)
y_test_uwarm_iwarm.to_csv(f'{base_path}\\preprocessed\\y_test_uwarm_iwarm.csv', index=False)
y_test_uwarm_icold.to_csv(f'{base_path}\\preprocessed\\y_test_uwarm_icold.csv', index=False)
y_test_ucold_iwarm.to_csv(f'{base_path}\\preprocessed\\y_test_ucold_iwarm.csv', index=False)
y_test_ucold_icold.to_csv(f'{base_path}\\preprocessed\\y_test_ucold_icold.csv', index=False)

# Train models

In [12]:
# # Load transformed data from npz
# X_train_transformed = load_npz(f'{base_path}\\preprocessed\\X_train_transformed.npz')
# X_val_transformed = load_npz(f'{base_path}\\preprocessed\\X_val_transformed.npz')
# X_test_uwarm_iwarm_transformed = load_npz(f'{base_path}\\preprocessed\\X_test_uwarm_iwarm_transformed.npz')
# X_test_uwarm_icold_transformed = load_npz(f'{base_path}\\preprocessed\\X_test_uwarm_icold_transformed.npz')
# X_test_ucold_iwarm_transformed = load_npz(f'{base_path}\\preprocessed\\X_test_ucold_iwarm_transformed.npz')
# X_test_ucold_icold_transformed = load_npz(f'{base_path}\\preprocessed\\X_test_ucold_icold_transformed.npz')

# # Load target variables
# y_train = pd.read_csv(f'{base_path}\\preprocessed\\y_train.csv')
# y_val = pd.read_csv(f'{base_path}\\preprocessed\\y_val.csv')
# y_test_uwarm_iwarm = pd.read_csv(f'{base_path}\\preprocessed\\y_test_uwarm_iwarm.csv')
# y_test_uwarm_icold = pd.read_csv(f'{base_path}\\preprocessed\\y_test_uwarm_icold.csv')
# y_test_ucold_iwarm = pd.read_csv(f'{base_path}\\preprocessed\\y_test_ucold_iwarm.csv')
# y_test_ucold_icold = pd.read_csv(f'{base_path}\\preprocessed\\y_test_ucold_icold.csv')



## LightGBM model run

In [13]:
# Train a LightGBM model
lgb_model = lgb.LGBMRegressor(random_state=42)
lgb_model.fit(X_train_transformed, y_train, feature_name=list(feature_names))

y_pred_uwarm_iwarm = lgb_model.predict(X_test_uwarm_iwarm_transformed)
y_pred_uwarm_icold = lgb_model.predict(X_test_uwarm_icold_transformed)
y_pred_ucold_iwarm = lgb_model.predict(X_test_ucold_iwarm_transformed)
y_pred_ucold_icold = lgb_model.predict(X_test_ucold_icold_transformed)



[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 12.303795 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 2917
[LightGBM] [Info] Number of data points in the train set: 13534315, number of used features: 728
[LightGBM] [Info] Start training from score 3.858934




In [14]:

# Save the predictions as RatingID, PredictedRating

# Warm user warm item
result_uwarm_iwarm = pd.DataFrame({
    'RatingID': test_uwarm_iwarm['RatingID'],
    'Rating': y_pred_uwarm_iwarm
})
result_uwarm_iwarm.to_csv(
    f'{base_path}\\lightgbm\\lightgbm_warm_user_warm_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)
# Warm user cold item
result_uwarm_icold = pd.DataFrame({
    'RatingID': test_uwarm_icold['RatingID'],
    'Rating': y_pred_uwarm_icold
})
result_uwarm_icold.to_csv(
    f'{base_path}\\lightgbm\\lightgbm_warm_user_cold_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)
# Cold user warm item
result_ucold_iwarm = pd.DataFrame({
    'RatingID': test_ucold_iwarm['RatingID'],
    'Rating': y_pred_ucold_iwarm
})
result_ucold_iwarm.to_csv(
    f'{base_path}\\lightgbm\\lightgbm_cold_user_warm_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)
# Cold user cold item
result_ucold_icold = pd.DataFrame({
    'RatingID': test_ucold_icold['RatingID'],
    'Rating': y_pred_ucold_icold
})
result_ucold_icold.to_csv(
    f'{base_path}\\lightgbm\\lightgbm_cold_user_cold_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)

In [15]:
# Evaluate

# Calculate MSE for each test set
mse_uwarm_iwarm = mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mse_uwarm_icold = mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mse_ucold_iwarm = mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mse_ucold_icold = mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)

# Calculate RMSE for each test set
rmse_uwarm_iwarm = root_mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
rmse_uwarm_icold = root_mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
rmse_ucold_iwarm = root_mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
rmse_ucold_icold = root_mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)

# Calculate MAE for each test set
mae_uwarm_iwarm = mean_absolute_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mae_uwarm_icold = mean_absolute_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mae_ucold_iwarm = mean_absolute_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mae_ucold_icold = mean_absolute_error(y_test_ucold_icold, y_pred_ucold_icold)

# Print the results
print(f'MSE for warm user warm item: {mse_uwarm_iwarm:.4f}')
print(f'RMSE for warm user warm item: {rmse_uwarm_iwarm:.4f}')
print(f'MAE for warm user warm item: {mae_uwarm_iwarm:.4f}')
print('-' * 50)
print(f'MSE for warm user cold item: {mse_uwarm_icold:.4f}')
print(f'RMSE for warm user cold item: {rmse_uwarm_icold:.4f}')
print(f'MAE for warm user cold item: {mae_uwarm_icold:.4f}')
print('-' * 50)
print(f'MSE for cold user warm item: {mse_ucold_iwarm:.4f}')
print(f'RMSE for cold user warm item: {rmse_ucold_iwarm:.4f}')
print(f'MAE for cold user warm item: {mae_ucold_iwarm:.4f}')
print('-' * 50)
print(f'MSE for cold user cold item: {mse_ucold_icold:.4f}')
print(f'RMSE for cold user cold item: {rmse_ucold_icold:.4f}')
print(f'MAE for cold user cold item: {mae_ucold_icold:.4f}')


MSE for warm user warm item: 0.4031
RMSE for warm user warm item: 0.6349
MAE for warm user warm item: 0.4711
--------------------------------------------------
MSE for warm user cold item: 0.4447
RMSE for warm user cold item: 0.6668
MAE for warm user cold item: 0.5031
--------------------------------------------------
MSE for cold user warm item: 0.5012
RMSE for cold user warm item: 0.7080
MAE for cold user warm item: 0.5233
--------------------------------------------------
MSE for cold user cold item: 0.5852
RMSE for cold user cold item: 0.7650
MAE for cold user cold item: 0.5821


## XGBoost model run

In [16]:
# XGBoost with DMatrix
dtrain = xgb.DMatrix(X_train_transformed, label=y_train, feature_names=list(feature_names))
xgb_model = xgb.train(
    params={
        'objective': 'reg:squarederror',
        'eval_metric': 'rmse',
        'seed': 42
    },
    dtrain=dtrain,
    num_boost_round=100
)

d_test_uwarm_iwarm = xgb.DMatrix(X_test_uwarm_iwarm_transformed, feature_names=list(feature_names))
d_test_uwarm_icold = xgb.DMatrix(X_test_uwarm_icold_transformed, feature_names=list(feature_names))
d_test_ucold_iwarm = xgb.DMatrix(X_test_ucold_iwarm_transformed, feature_names=list(feature_names))
d_test_ucold_icold = xgb.DMatrix(X_test_ucold_icold_transformed, feature_names=list(feature_names))

# Predict using the trained model
y_pred_uwarm_iwarm = xgb_model.predict(d_test_uwarm_iwarm)
y_pred_uwarm_icold = xgb_model.predict(d_test_uwarm_icold)
y_pred_ucold_iwarm = xgb_model.predict(d_test_ucold_iwarm)
y_pred_ucold_icold = xgb_model.predict(d_test_ucold_icold)


In [17]:

# Save the predictions as RatingID, PredictedRating
# Warm user warm item
result_uwarm_iwarm = pd.DataFrame({
    'RatingID': test_uwarm_iwarm['RatingID'],
    'Rating': y_pred_uwarm_iwarm
})
result_uwarm_iwarm.to_csv(
    f'{base_path}\\xgboost\\xgboost_warm_user_warm_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)
# Warm user cold item
result_uwarm_icold = pd.DataFrame({
    'RatingID': test_uwarm_icold['RatingID'],
    'Rating': y_pred_uwarm_icold
})
result_uwarm_icold.to_csv(
    f'{base_path}\\xgboost\\xgboost_warm_user_cold_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)
# Cold user warm item
result_ucold_iwarm = pd.DataFrame({
    'RatingID': test_ucold_iwarm['RatingID'],
    'Rating': y_pred_ucold_iwarm
})
result_ucold_iwarm.to_csv(
    f'{base_path}\\xgboost\\xgboost_cold_user_warm_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)
# Cold user cold item
result_ucold_icold = pd.DataFrame({
    'RatingID': test_ucold_icold['RatingID'],
    'Rating': y_pred_ucold_icold
})
result_ucold_icold.to_csv(
    f'{base_path}\\xgboost\\xgboost_cold_user_cold_item.csv', 
    index=False, 
    header=['RatingID', 'Rating']
)


In [18]:
# Evaluate
# Calculate MSE for each test set
mse_uwarm_iwarm = mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mse_uwarm_icold = mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mse_ucold_iwarm = mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mse_ucold_icold = mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)
# Calculate RMSE for each test set
rmse_uwarm_iwarm = root_mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
rmse_uwarm_icold = root_mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
rmse_ucold_iwarm = root_mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
rmse_ucold_icold = root_mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)
# Calculate MAE for each test set
mae_uwarm_iwarm = mean_absolute_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mae_uwarm_icold = mean_absolute_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mae_ucold_iwarm = mean_absolute_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mae_ucold_icold = mean_absolute_error(y_test_ucold_icold, y_pred_ucold_icold)
# Print the results
print(f'MSE for warm user warm item: {mse_uwarm_iwarm:.4f}')
print(f'RMSE for warm user warm item: {rmse_uwarm_iwarm:.4f}')
print(f'MAE for warm user warm item: {mae_uwarm_iwarm:.4f}')
print('-' * 50)
print(f'MSE for warm user cold item: {mse_uwarm_icold:.4f}')
print(f'RMSE for warm user cold item: {rmse_uwarm_icold:.4f}')
print(f'MAE for warm user cold item: {mae_uwarm_icold:.4f}')
print('-' * 50)
print(f'MSE for cold user warm item: {mse_ucold_iwarm:.4f}')
print(f'RMSE for cold user warm item: {rmse_ucold_iwarm:.4f}')
print(f'MAE for cold user warm item: {mae_ucold_iwarm:.4f}')
print('-' * 50)
print(f'MSE for cold user cold item: {mse_ucold_icold:.4f}')
print(f'RMSE for cold user cold item: {rmse_ucold_icold:.4f}')
print(f'MAE for cold user cold item: {mae_ucold_icold:.4f}')


MSE for warm user warm item: 0.3974
RMSE for warm user warm item: 0.6304
MAE for warm user warm item: 0.4677
--------------------------------------------------
MSE for warm user cold item: 0.5654
RMSE for warm user cold item: 0.7519
MAE for warm user cold item: 0.5578
--------------------------------------------------
MSE for cold user warm item: 0.4983
RMSE for cold user warm item: 0.7059
MAE for cold user warm item: 0.5247
--------------------------------------------------
MSE for cold user cold item: 0.8514
RMSE for cold user cold item: 0.9227
MAE for cold user cold item: 0.6975


# Evaluate top-k

In [19]:
# Load results
results_uwarm_iwarm = pd.read_csv(
    f'{base_path}\\xgboost\\xgboost_warm_user_warm_item.csv', 
    usecols=['RatingID', 'Rating']
)
results_uwarm_icold = pd.read_csv(
    f'{base_path}\\xgboost\\xgboost_warm_user_cold_item.csv', 
    usecols=['RatingID', 'Rating']
)
results_ucold_iwarm = pd.read_csv(
    f'{base_path}\\xgboost\\xgboost_cold_user_warm_item.csv', 
    usecols=['RatingID', 'Rating']
)
results_ucold_icold = pd.read_csv(
    f'{base_path}\\xgboost\\xgboost_cold_user_cold_item.csv', 
    usecols=['RatingID', 'Rating']
)

# Load the test set
test_uwarm_iwarm = pd.read_csv(
    f'{base_path}\\testset_warm_user_warm_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating']
)
test_uwarm_icold = pd.read_csv(
    f'{base_path}\\testset_warm_user_cold_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating']
)
test_ucold_iwarm = pd.read_csv(
    f'{base_path}\\testset_cold_user_warm_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating']
)
test_ucold_icold = pd.read_csv(
    f'{base_path}\\testset_cold_user_cold_item.csv', 
    usecols=['RatingID', 'UserID', 'WineID', 'Rating']
)
# Merge the results with the test set
results_uwarm_iwarm = results_uwarm_iwarm.merge(test_uwarm_iwarm, on='RatingID', how='left')
results_uwarm_icold = results_uwarm_icold.merge(test_uwarm_icold, on='RatingID', how='left')
results_ucold_iwarm = results_ucold_iwarm.merge(test_ucold_iwarm, on='RatingID', how='left')
results_ucold_icold = results_ucold_icold.merge(test_ucold_icold, on='RatingID', how='left')


In [20]:
# Create Rank and Rank_pred columns

# Warm user, warm item
results_uwarm_iwarm["Rank"] = results_uwarm_iwarm.groupby("UserID")["Rating_y"].rank(method="first", ascending=False)
results_uwarm_iwarm["Rank_pred"] = results_uwarm_iwarm.groupby("UserID")["Rating_x"].rank(method="first", ascending=False)
# Warm user, cold item
results_uwarm_icold["Rank"] = results_uwarm_icold.groupby("UserID")["Rating_y"].rank(method="first", ascending=False)
results_uwarm_icold["Rank_pred"] = results_uwarm_icold.groupby("UserID")["Rating_x"].rank(method="first", ascending=False)
# Cold user, warm item
results_ucold_iwarm["Rank"] = results_ucold_iwarm.groupby("UserID")["Rating_y"].rank(method="first", ascending=False)
results_ucold_iwarm["Rank_pred"] = results_ucold_iwarm.groupby("UserID")["Rating_x"].rank(method="first", ascending=False)
# Cold user, cold item
results_ucold_icold["Rank"] = results_ucold_icold.groupby("UserID")["Rating_y"].rank(method="first", ascending=False)
results_ucold_icold["Rank_pred"] = results_ucold_icold.groupby("UserID")["Rating_x"].rank(method="first", ascending=False)

# Calculate Relevance
results_uwarm_iwarm["Relevance"] = results_uwarm_iwarm["Rating_y"].apply(lambda x: 1 if x >= 3.5 else 0)
results_uwarm_icold["Relevance"] = results_uwarm_icold["Rating_y"].apply(lambda x: 1 if x >= 3.5 else 0)
results_ucold_iwarm["Relevance"] = results_ucold_iwarm["Rating_y"].apply(lambda x: 1 if x >= 3.5 else 0)
results_ucold_icold["Relevance"] = results_ucold_icold["Rating_y"].apply(lambda x: 1 if x >= 3.5 else 0)


In [21]:

def evaluate_topk_fast(df, k=10):
    # Pre-sort so top-k is at the top per user
    df = df.sort_values(['UserID', 'Rank_pred'], ascending=[True, True])

    # Assign group index per row (unique integer per user)
    user_index, user_pos = np.unique(df['UserID'], return_inverse=True)

    # Count items per user
    user_counts = np.bincount(user_pos)
    user_offsets = np.zeros(len(df), dtype=int)
    np.add.at(user_offsets, np.cumsum(user_counts)[:-1], 1)
    user_offsets = np.cumsum(user_offsets)

    # Mask to keep only top-k per user
    df['row_number'] = df.groupby('UserID').cumcount()
    topk_df = df[df['row_number'] < k].copy()

    # Precision@k
    precision = topk_df['Relevance'].groupby(topk_df['UserID']).mean().mean()

    # Recall@k
    relevant_per_user = df.groupby('UserID')['Relevance'].sum()
    hits_per_user = topk_df.groupby('UserID')['Relevance'].sum()
    recall = (hits_per_user / relevant_per_user).fillna(0).mean()

    # HitRate@k
    hits = (hits_per_user > 0).astype(int)
    hit_rate = hits.mean()

    # MAP@k
    def map_at_k_per_user(x):
        rels = x['Relevance'].values
        precisions = [(rels[:i + 1].sum() / (i + 1)) for i in range(len(rels)) if rels[i]]
        return np.mean(precisions) if precisions else 0
    mapk = topk_df.groupby('UserID').apply(map_at_k_per_user).mean()

    # nDCG@k
    def dcg(rels):
        return np.sum(rels / np.log2(np.arange(2, len(rels) + 2)))
    def ndcg_per_user(x):
        dcg_val = dcg(x['Relevance'].values)
        ideal = x.sort_values('Relevance', ascending=False).head(k)
        idcg_val = dcg(ideal['Relevance'].values)
        return dcg_val / idcg_val if idcg_val > 0 else 0
    ndcg = topk_df.groupby('UserID').apply(ndcg_per_user).mean()

    return {
        'Precision@k': precision,
        'Recall@k': recall,
        'HitRate@k': hit_rate,
        'MAP@k': mapk,
        'nDCG@k': ndcg
    }


In [22]:
# Run evaluation
k = 10
results = {}
results['warm user, warm item'] = evaluate_topk_fast(results_uwarm_iwarm, k=k)
results['warm user, cold item'] = evaluate_topk_fast(results_uwarm_icold, k=k)
results['cold user, warm item'] = evaluate_topk_fast(results_ucold_iwarm, k=k)
results['cold user, cold item'] = evaluate_topk_fast(results_ucold_icold, k=k)

# Print evaluation results
for case, metrics in results.items():
    print(f"Evaluation on {case} at top {k}:")
    for metric, value in metrics.items():
        print(f"{metric}: {value:.4f}")
    print('-' * 50)

  mapk = topk_df.groupby('UserID').apply(map_at_k_per_user).mean()
  ndcg = topk_df.groupby('UserID').apply(ndcg_per_user).mean()
  mapk = topk_df.groupby('UserID').apply(map_at_k_per_user).mean()
  ndcg = topk_df.groupby('UserID').apply(ndcg_per_user).mean()
  mapk = topk_df.groupby('UserID').apply(map_at_k_per_user).mean()
  ndcg = topk_df.groupby('UserID').apply(ndcg_per_user).mean()
  mapk = topk_df.groupby('UserID').apply(map_at_k_per_user).mean()


Evaluation on warm user, warm item at top 10:
Precision@k: 0.8812
Recall@k: 0.9232
HitRate@k: 0.9526
MAP@k: 0.9220
nDCG@k: 0.9335
--------------------------------------------------
Evaluation on warm user, cold item at top 10:
Precision@k: 0.8389
Recall@k: 0.8605
HitRate@k: 0.8605
MAP@k: 0.8500
nDCG@k: 0.8531
--------------------------------------------------
Evaluation on cold user, warm item at top 10:
Precision@k: 0.8171
Recall@k: 0.9218
HitRate@k: 0.9930
MAP@k: 0.9048
nDCG@k: 0.9417
--------------------------------------------------
Evaluation on cold user, cold item at top 10:
Precision@k: 0.7089
Recall@k: 0.7210
HitRate@k: 0.7210
MAP@k: 0.7157
nDCG@k: 0.7171
--------------------------------------------------


  ndcg = topk_df.groupby('UserID').apply(ndcg_per_user).mean()


In [26]:
import pickle
# LGBM tuning

def objective(trial):
    params = {
        "objective": "regression",
        "metric": "mse",
        "boosting_type": "gbdt",
        "verbosity": -1,
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.2),
        "num_leaves": trial.suggest_int("num_leaves", 30, 300),
        "max_depth": trial.suggest_int("max_depth", 3, 16),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.6, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.6, 1.0),
        "lambda_l1": trial.suggest_float("lambda_l1", 0.0, 10.0),
        "lambda_l2": trial.suggest_float("lambda_l2", 0.0, 10.0),
    }

    lgb_train = lgb.Dataset(X_train_transformed, y_train)
    lgb_valid = lgb.Dataset(X_val_transformed, y_val, reference=lgb_train)

    model = lgb.train(params, lgb_train,
                      valid_sets=[lgb_valid],
                      callbacks=[lgb.early_stopping(stopping_rounds=50), lgb.log_evaluation(period=0)],
                    )
    preds = model.predict(X_val_transformed)
    return mean_squared_error(y_val, preds) 

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=50)  # 1 hour

# Save the best model to pkl
with open(f'{base_path}\\lightgbm\\lgbm_model.pkl', 'wb') as f:
    pickle.dump(study, f)
    
best_model = lgb.LGBMRegressor(**study.best_params)




[I 2025-05-18 20:17:31,481] A new study created in memory with name: no-name-95b4cc90-4d36-43cc-beb3-947f3bb48189


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.460831


[I 2025-05-18 20:20:17,359] Trial 0 finished with value: 0.46083116428832877 and parameters: {'learning_rate': 0.13045391082284233, 'num_leaves': 119, 'max_depth': 11, 'feature_fraction': 0.7644506725476632, 'bagging_fraction': 0.9258527730057012, 'lambda_l1': 3.858310278489699, 'lambda_l2': 8.326726434066856}. Best is trial 0 with value: 0.46083116428832877.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.488147


[I 2025-05-18 20:22:57,493] Trial 1 finished with value: 0.4881468879232135 and parameters: {'learning_rate': 0.019777737861563657, 'num_leaves': 65, 'max_depth': 13, 'feature_fraction': 0.9388693825696973, 'bagging_fraction': 0.6466945952365604, 'lambda_l1': 3.2205831525455606, 'lambda_l2': 6.594478666684367}. Best is trial 0 with value: 0.46083116428832877.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.464115


[I 2025-05-18 20:24:41,782] Trial 2 finished with value: 0.46411500095808705 and parameters: {'learning_rate': 0.19474868965916237, 'num_leaves': 43, 'max_depth': 11, 'feature_fraction': 0.7288937039575442, 'bagging_fraction': 0.6402005577562181, 'lambda_l1': 7.860344225679329, 'lambda_l2': 5.633758470053102}. Best is trial 0 with value: 0.46083116428832877.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.452286


[I 2025-05-18 20:27:06,880] Trial 3 finished with value: 0.452286244393998 and parameters: {'learning_rate': 0.15814413689915602, 'num_leaves': 260, 'max_depth': 14, 'feature_fraction': 0.740975506150041, 'bagging_fraction': 0.7188835219382578, 'lambda_l1': 9.743894727018693, 'lambda_l2': 6.792800314545683}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.458627


[I 2025-05-18 20:29:33,497] Trial 4 finished with value: 0.45862676301335553 and parameters: {'learning_rate': 0.10915788220475382, 'num_leaves': 162, 'max_depth': 14, 'feature_fraction': 0.8074632540566499, 'bagging_fraction': 0.695069513212854, 'lambda_l1': 8.85220703979081, 'lambda_l2': 2.3109636844337}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.45951


[I 2025-05-18 20:32:23,662] Trial 5 finished with value: 0.45951021679515297 and parameters: {'learning_rate': 0.11223977782402415, 'num_leaves': 173, 'max_depth': 12, 'feature_fraction': 0.6503757085870779, 'bagging_fraction': 0.7283414849552249, 'lambda_l1': 8.102246973692763, 'lambda_l2': 7.192581559244634}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.47218


[I 2025-05-18 20:35:31,981] Trial 6 finished with value: 0.47217994997562635 and parameters: {'learning_rate': 0.034909577194380585, 'num_leaves': 146, 'max_depth': 12, 'feature_fraction': 0.7525697163270872, 'bagging_fraction': 0.6516711363116362, 'lambda_l1': 0.0014736219575395282, 'lambda_l2': 2.77070908658429}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.459312


[I 2025-05-18 20:37:49,513] Trial 7 finished with value: 0.4593122897919127 and parameters: {'learning_rate': 0.13794197871946656, 'num_leaves': 191, 'max_depth': 10, 'feature_fraction': 0.6473702240704083, 'bagging_fraction': 0.6728521583782838, 'lambda_l1': 1.0645535605146517, 'lambda_l2': 9.062763723638717}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.465695


[I 2025-05-18 20:41:11,562] Trial 8 finished with value: 0.4656952297563073 and parameters: {'learning_rate': 0.04612994273991407, 'num_leaves': 219, 'max_depth': 12, 'feature_fraction': 0.6104200380438741, 'bagging_fraction': 0.8033584790815059, 'lambda_l1': 4.0440323803924425, 'lambda_l2': 7.601415335781261}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.455506


[I 2025-05-18 20:43:45,040] Trial 9 finished with value: 0.4555055152173334 and parameters: {'learning_rate': 0.11376924209368486, 'num_leaves': 241, 'max_depth': 14, 'feature_fraction': 0.8712556437619501, 'bagging_fraction': 0.7837579168376795, 'lambda_l1': 8.927458059871508, 'lambda_l2': 5.03750873102343}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.467264


[I 2025-05-18 20:45:11,105] Trial 10 finished with value: 0.46726380664375183 and parameters: {'learning_rate': 0.18159091720406087, 'num_leaves': 300, 'max_depth': 6, 'feature_fraction': 0.9917593118777853, 'bagging_fraction': 0.8925803633389914, 'lambda_l1': 6.396938802001518, 'lambda_l2': 0.5462316445068227}. Best is trial 3 with value: 0.452286244393998.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.45047


[I 2025-05-18 20:48:13,409] Trial 11 finished with value: 0.4504695007633011 and parameters: {'learning_rate': 0.15609282879099753, 'num_leaves': 278, 'max_depth': 16, 'feature_fraction': 0.8493414287851505, 'bagging_fraction': 0.7952777253929746, 'lambda_l1': 9.975690605396665, 'lambda_l2': 4.186645123364285}. Best is trial 11 with value: 0.4504695007633011.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.449721


[I 2025-05-18 20:51:22,908] Trial 12 finished with value: 0.44972070039428813 and parameters: {'learning_rate': 0.15592472750150818, 'num_leaves': 300, 'max_depth': 16, 'feature_fraction': 0.8478538250101956, 'bagging_fraction': 0.8504895325236643, 'lambda_l1': 9.666093633164055, 'lambda_l2': 3.5743798315364517}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.458259


[I 2025-05-18 20:55:18,744] Trial 13 finished with value: 0.4582588884658774 and parameters: {'learning_rate': 0.07068963114621002, 'num_leaves': 288, 'max_depth': 16, 'feature_fraction': 0.8666125137197733, 'bagging_fraction': 0.9999738589191689, 'lambda_l1': 6.295737073132086, 'lambda_l2': 3.152030247447189}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.464957


[I 2025-05-18 20:57:09,030] Trial 14 finished with value: 0.46495703131603566 and parameters: {'learning_rate': 0.1616434005367293, 'num_leaves': 260, 'max_depth': 7, 'feature_fraction': 0.8541098837095809, 'bagging_fraction': 0.8355637632550303, 'lambda_l1': 9.863297520603437, 'lambda_l2': 3.8193589095121836}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.487263


[I 2025-05-18 20:58:21,504] Trial 15 finished with value: 0.487262645421536 and parameters: {'learning_rate': 0.0833742506595046, 'num_leaves': 214, 'max_depth': 3, 'feature_fraction': 0.9302631050394562, 'bagging_fraction': 0.8683656301618697, 'lambda_l1': 6.678330004362659, 'lambda_l2': 1.2425390197383201}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450427


[I 2025-05-18 21:01:26,232] Trial 16 finished with value: 0.45042739047787406 and parameters: {'learning_rate': 0.15720835983025633, 'num_leaves': 273, 'max_depth': 16, 'feature_fraction': 0.8168364711143219, 'bagging_fraction': 0.7700737566443343, 'lambda_l1': 7.527140875156801, 'lambda_l2': 4.016072482039324}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450938


[I 2025-05-18 21:04:31,433] Trial 17 finished with value: 0.45093824305251407 and parameters: {'learning_rate': 0.17749955959674515, 'num_leaves': 234, 'max_depth': 16, 'feature_fraction': 0.8057841199942966, 'bagging_fraction': 0.7531968895968679, 'lambda_l1': 7.546869251270179, 'lambda_l2': 1.6137422255539775}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.466224


[I 2025-05-18 21:06:33,559] Trial 18 finished with value: 0.4662243797603351 and parameters: {'learning_rate': 0.1443708321732725, 'num_leaves': 89, 'max_depth': 7, 'feature_fraction': 0.9119010233685526, 'bagging_fraction': 0.9673417644876127, 'lambda_l1': 5.3165138710281585, 'lambda_l2': 5.553209864714651}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.456521


[I 2025-05-18 21:09:55,611] Trial 19 finished with value: 0.45652136873254306 and parameters: {'learning_rate': 0.08585244306842389, 'num_leaves': 266, 'max_depth': 15, 'feature_fraction': 0.9899899574715232, 'bagging_fraction': 0.8489089763503894, 'lambda_l1': 8.591179769422238, 'lambda_l2': 9.962663471398297}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.458422


[I 2025-05-18 21:12:16,338] Trial 20 finished with value: 0.4584220704443651 and parameters: {'learning_rate': 0.17628989950764756, 'num_leaves': 198, 'max_depth': 9, 'feature_fraction': 0.6945194411587525, 'bagging_fraction': 0.9265150040682585, 'lambda_l1': 7.145698792136635, 'lambda_l2': 4.228913611595041}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450471


[I 2025-05-18 21:15:30,917] Trial 21 finished with value: 0.450470670375464 and parameters: {'learning_rate': 0.15724755316017788, 'num_leaves': 282, 'max_depth': 16, 'feature_fraction': 0.8332924968956186, 'bagging_fraction': 0.7917591739679939, 'lambda_l1': 9.997188884819565, 'lambda_l2': 4.441799370859382}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.452796


[I 2025-05-18 21:18:36,986] Trial 22 finished with value: 0.4527955673284349 and parameters: {'learning_rate': 0.13018484301660316, 'num_leaves': 297, 'max_depth': 15, 'feature_fraction': 0.8845147836509372, 'bagging_fraction': 0.6040193239997095, 'lambda_l1': 9.095984743567236, 'lambda_l2': 3.6868111445163505}. Best is trial 12 with value: 0.44972070039428813.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.449458


[I 2025-05-18 21:21:24,768] Trial 23 finished with value: 0.44945807184027836 and parameters: {'learning_rate': 0.19270501009662055, 'num_leaves': 251, 'max_depth': 15, 'feature_fraction': 0.7793483855756894, 'bagging_fraction': 0.8206422861447707, 'lambda_l1': 5.547639794671292, 'lambda_l2': 5.100626797993326}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.449717


[I 2025-05-18 21:24:53,748] Trial 24 finished with value: 0.4497169172189235 and parameters: {'learning_rate': 0.19939315896988113, 'num_leaves': 245, 'max_depth': 15, 'feature_fraction': 0.7999337666784668, 'bagging_fraction': 0.8304905178692752, 'lambda_l1': 5.648721461894074, 'lambda_l2': 5.929890443056419}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450169


[I 2025-05-18 21:28:02,756] Trial 25 finished with value: 0.4501690931492071 and parameters: {'learning_rate': 0.1964048122691561, 'num_leaves': 239, 'max_depth': 14, 'feature_fraction': 0.7722539951733728, 'bagging_fraction': 0.8225007726565281, 'lambda_l1': 5.172499881604052, 'lambda_l2': 5.878746288648381}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450892


[I 2025-05-18 21:30:54,877] Trial 26 finished with value: 0.4508919964971002 and parameters: {'learning_rate': 0.19872116191931358, 'num_leaves': 244, 'max_depth': 13, 'feature_fraction': 0.7135944759016642, 'bagging_fraction': 0.8881100115935279, 'lambda_l1': 2.400312608473629, 'lambda_l2': 4.944856811035524}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.451428


[I 2025-05-18 21:34:13,851] Trial 27 finished with value: 0.45142843459881726 and parameters: {'learning_rate': 0.18160284841870877, 'num_leaves': 216, 'max_depth': 15, 'feature_fraction': 0.7930556937250287, 'bagging_fraction': 0.8596509153068206, 'lambda_l1': 5.925126667459978, 'lambda_l2': 6.11998092588772}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.452396


[I 2025-05-18 21:37:36,820] Trial 28 finished with value: 0.452395938069915 and parameters: {'learning_rate': 0.1717008375551741, 'num_leaves': 255, 'max_depth': 13, 'feature_fraction': 0.8988031342143481, 'bagging_fraction': 0.9104795027739171, 'lambda_l1': 4.585572755566475, 'lambda_l2': 7.867498313116075}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.462491


[I 2025-05-18 21:40:02,742] Trial 29 finished with value: 0.4624909307042256 and parameters: {'learning_rate': 0.12791250486440564, 'num_leaves': 126, 'max_depth': 9, 'feature_fraction': 0.7738920993667691, 'bagging_fraction': 0.933079269750724, 'lambda_l1': 2.997014315868734, 'lambda_l2': 4.973470145429467}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.452068


[I 2025-05-18 21:43:06,315] Trial 30 finished with value: 0.4520675927886856 and parameters: {'learning_rate': 0.185533458297902, 'num_leaves': 192, 'max_depth': 15, 'feature_fraction': 0.6848954091369032, 'bagging_fraction': 0.8212449687123392, 'lambda_l1': 4.201944577423665, 'lambda_l2': 2.8670382700063977}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450268


[I 2025-05-18 21:46:20,735] Trial 31 finished with value: 0.4502675602645327 and parameters: {'learning_rate': 0.19977311168456083, 'num_leaves': 232, 'max_depth': 14, 'feature_fraction': 0.7747267750103959, 'bagging_fraction': 0.8253259886312987, 'lambda_l1': 5.284923677728261, 'lambda_l2': 6.05608655576884}. Best is trial 23 with value: 0.44945807184027836.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.449365


[I 2025-05-18 21:49:48,484] Trial 32 finished with value: 0.44936544196071715 and parameters: {'learning_rate': 0.19020821286141062, 'num_leaves': 251, 'max_depth': 15, 'feature_fraction': 0.7780293627377767, 'bagging_fraction': 0.7521301101163334, 'lambda_l1': 4.772524582778525, 'lambda_l2': 6.79039850935762}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.449707


[I 2025-05-18 21:52:49,068] Trial 33 finished with value: 0.44970654235827306 and parameters: {'learning_rate': 0.18829266168830902, 'num_leaves': 279, 'max_depth': 15, 'feature_fraction': 0.8277361783401412, 'bagging_fraction': 0.7522757749106084, 'lambda_l1': 3.530870032144615, 'lambda_l2': 6.735247854350033}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.45267


[I 2025-05-18 21:55:45,890] Trial 34 finished with value: 0.4526698839968094 and parameters: {'learning_rate': 0.19014196836728586, 'num_leaves': 251, 'max_depth': 11, 'feature_fraction': 0.729022278684115, 'bagging_fraction': 0.7433677696750118, 'lambda_l1': 2.9477466962200602, 'lambda_l2': 6.878453162353491}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.453128


[I 2025-05-18 21:59:02,102] Trial 35 finished with value: 0.45312804735846246 and parameters: {'learning_rate': 0.17138484869002016, 'num_leaves': 271, 'max_depth': 13, 'feature_fraction': 0.7901878521001778, 'bagging_fraction': 0.7078865041916251, 'lambda_l1': 3.3881742663910046, 'lambda_l2': 8.57283406988132}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.452945


[I 2025-05-18 22:02:02,034] Trial 36 finished with value: 0.4529452960901806 and parameters: {'learning_rate': 0.16879554666698754, 'num_leaves': 224, 'max_depth': 14, 'feature_fraction': 0.8249835795765483, 'bagging_fraction': 0.7643252295553136, 'lambda_l1': 1.9889943028021246, 'lambda_l2': 7.373427975484681}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.464109


[I 2025-05-18 22:04:06,599] Trial 37 finished with value: 0.4641093674472255 and parameters: {'learning_rate': 0.1862526875045782, 'num_leaves': 39, 'max_depth': 15, 'feature_fraction': 0.7310993491783473, 'bagging_fraction': 0.7309513414288394, 'lambda_l1': 5.701773657541641, 'lambda_l2': 6.583313160221854}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.456955


[I 2025-05-18 22:07:04,580] Trial 38 finished with value: 0.4569553011708584 and parameters: {'learning_rate': 0.1473067728203028, 'num_leaves': 181, 'max_depth': 11, 'feature_fraction': 0.753720768834229, 'bagging_fraction': 0.678691627454951, 'lambda_l1': 4.40971362640455, 'lambda_l2': 8.073872264834275}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.452854


[I 2025-05-18 22:10:07,901] Trial 39 finished with value: 0.45285444929523777 and parameters: {'learning_rate': 0.18895119453733, 'num_leaves': 204, 'max_depth': 12, 'feature_fraction': 0.7915532631488302, 'bagging_fraction': 0.8043588101888348, 'lambda_l1': 3.5760888769255827, 'lambda_l2': 6.50804666173381}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.455448


[I 2025-05-18 22:12:37,519] Trial 40 finished with value: 0.4554479691458737 and parameters: {'learning_rate': 0.1664590576494036, 'num_leaves': 151, 'max_depth': 13, 'feature_fraction': 0.8311915129913011, 'bagging_fraction': 0.7684915450702613, 'lambda_l1': 4.949179868848883, 'lambda_l2': 5.442565083490085}. Best is trial 32 with value: 0.44936544196071715.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.449144


[I 2025-05-18 22:15:39,751] Trial 41 finished with value: 0.4491444240751035 and parameters: {'learning_rate': 0.19270944136663726, 'num_leaves': 290, 'max_depth': 15, 'feature_fraction': 0.8455080690450716, 'bagging_fraction': 0.8372887497075213, 'lambda_l1': 1.3127169173842077, 'lambda_l2': 4.667635782015954}. Best is trial 41 with value: 0.4491444240751035.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.4488


[I 2025-05-18 22:19:06,985] Trial 42 finished with value: 0.4488002186655424 and parameters: {'learning_rate': 0.19949864383408128, 'num_leaves': 283, 'max_depth': 15, 'feature_fraction': 0.8097140286131668, 'bagging_fraction': 0.8737349843116362, 'lambda_l1': 1.046040620381623, 'lambda_l2': 7.041416588234524}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450251


[I 2025-05-18 22:22:40,028] Trial 43 finished with value: 0.45025086851258206 and parameters: {'learning_rate': 0.18900714139879496, 'num_leaves': 286, 'max_depth': 14, 'feature_fraction': 0.7518084901391857, 'bagging_fraction': 0.8837223394432299, 'lambda_l1': 0.15054899322936355, 'lambda_l2': 7.048036655844342}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.450617


[I 2025-05-18 22:25:41,221] Trial 44 finished with value: 0.4506172862325816 and parameters: {'learning_rate': 0.17729979478731114, 'num_leaves': 262, 'max_depth': 15, 'feature_fraction': 0.8173424781081493, 'bagging_fraction': 0.8076417393117806, 'lambda_l1': 1.3352452056918112, 'lambda_l2': 8.662765573030816}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.45049


[I 2025-05-18 22:28:43,076] Trial 45 finished with value: 0.45048978544902546 and parameters: {'learning_rate': 0.1930494341351584, 'num_leaves': 285, 'max_depth': 13, 'feature_fraction': 0.8446091590129626, 'bagging_fraction': 0.7832645921770944, 'lambda_l1': 0.7686294969726198, 'lambda_l2': 7.542445878884051}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.49112


[I 2025-05-18 22:32:57,803] Trial 46 finished with value: 0.4911203777838928 and parameters: {'learning_rate': 0.011627045031191552, 'num_leaves': 275, 'max_depth': 14, 'feature_fraction': 0.8858212939338967, 'bagging_fraction': 0.7176209909751331, 'lambda_l1': 1.6583644091430947, 'lambda_l2': 5.245575532947917}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.467408


[I 2025-05-18 22:37:36,871] Trial 47 finished with value: 0.46740844918849667 and parameters: {'learning_rate': 0.03490428188053521, 'num_leaves': 292, 'max_depth': 12, 'feature_fraction': 0.8593025200387122, 'bagging_fraction': 0.7443568321356486, 'lambda_l1': 2.292280215959863, 'lambda_l2': 6.3875674432151825}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.482866


[I 2025-05-18 22:38:56,215] Trial 48 finished with value: 0.48286619461775465 and parameters: {'learning_rate': 0.14761384966812108, 'num_leaves': 257, 'max_depth': 3, 'feature_fraction': 0.7107040637507662, 'bagging_fraction': 0.9064498009286492, 'lambda_l1': 0.5146823304446094, 'lambda_l2': 4.732260719056257}. Best is trial 42 with value: 0.4488002186655424.


Training until validation scores don't improve for 50 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's l2: 0.474517


[I 2025-05-18 22:40:42,041] Trial 49 finished with value: 0.47451737357624235 and parameters: {'learning_rate': 0.11856434388374129, 'num_leaves': 268, 'max_depth': 5, 'feature_fraction': 0.7825763803063628, 'bagging_fraction': 0.8735126049131225, 'lambda_l1': 3.825478000466773, 'lambda_l2': 9.02721392655709}. Best is trial 42 with value: 0.4488002186655424.


In [27]:
# Evaluate the best model on the test sets
best_model.fit(X_train_transformed, y_train, feature_name=list(feature_names))
y_pred_uwarm_iwarm = best_model.predict(X_test_uwarm_iwarm_transformed)
y_pred_uwarm_icold = best_model.predict(X_test_uwarm_icold_transformed)
y_pred_ucold_iwarm = best_model.predict(X_test_ucold_iwarm_transformed)
y_pred_ucold_icold = best_model.predict(X_test_ucold_icold_transformed)

# MSE
mse_uwarm_iwarm = mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mse_uwarm_icold = mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mse_ucold_iwarm = mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mse_ucold_icold = mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)
# RMSE
rmse_uwarm_iwarm = root_mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
rmse_uwarm_icold = root_mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
rmse_ucold_iwarm = root_mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
rmse_ucold_icold = root_mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)
# MAE
mae_uwarm_iwarm = mean_absolute_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mae_uwarm_icold = mean_absolute_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mae_ucold_iwarm = mean_absolute_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mae_ucold_icold = mean_absolute_error(y_test_ucold_icold, y_pred_ucold_icold)
# Print the results
print(f'MSE for warm user warm item: {mse_uwarm_iwarm:.4f}')
print(f'RMSE for warm user warm item: {rmse_uwarm_iwarm:.4f}')
print(f'MAE for warm user warm item: {mae_uwarm_iwarm:.4f}')
print('-' * 50)
print(f'MSE for warm user cold item: {mse_uwarm_icold:.4f}')
print(f'RMSE for warm user cold item: {rmse_uwarm_icold:.4f}')
print(f'MAE for warm user cold item: {mae_uwarm_icold:.4f}')
print('-' * 50)
print(f'MSE for cold user warm item: {mse_ucold_iwarm:.4f}')
print(f'RMSE for cold user warm item: {rmse_ucold_iwarm:.4f}')
print(f'MAE for cold user warm item: {mae_ucold_iwarm:.4f}')
print('-' * 50)
print(f'MSE for cold user cold item: {mse_ucold_icold:.4f}')
print(f'RMSE for cold user cold item: {rmse_ucold_icold:.4f}')
print(f'MAE for cold user cold item: {mae_ucold_icold:.4f}')



MSE for warm user warm item: 0.3818
RMSE for warm user warm item: 0.6179
MAE for warm user warm item: 0.4561
--------------------------------------------------
MSE for warm user cold item: 0.4436
RMSE for warm user cold item: 0.6660
MAE for warm user cold item: 0.5046
--------------------------------------------------
MSE for cold user warm item: 0.4804
RMSE for cold user warm item: 0.6931
MAE for cold user warm item: 0.5139
--------------------------------------------------
MSE for cold user cold item: 0.5864
RMSE for cold user cold item: 0.7658
MAE for cold user cold item: 0.5860


In [28]:
# Hyperparameter tuning for XGBoost
def objective_xgb(trial):
    params = {
        "objective": "reg:squarederror",
        "eval_metric": "rmse",
        "seed": 42,
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.2),
        "max_depth": trial.suggest_int("max_depth", 3, 16),
        "min_child_weight": trial.suggest_int("min_child_weight", 1, 10),
        "subsample": trial.suggest_float("subsample", 0.6, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.6, 1.0),
        "lambda": trial.suggest_float("lambda", 0.0, 10.0),
        "alpha": trial.suggest_float("alpha", 0.0, 10.0),
    }

    xgb_train = xgb.DMatrix(X_train_transformed, label=y_train, feature_names=list(feature_names))
    xgb_valid = xgb.DMatrix(X_val_transformed, label=y_val, feature_names=list(feature_names))
    model = xgb.train(
        params,
        xgb_train,
        num_boost_round=100,
        evals=[(xgb_valid, "validation")],
        early_stopping_rounds=50,
        verbose_eval=False
    )
    preds = model.predict(xgb_valid)
    return mean_squared_error(y_val, preds)
study_xgb = optuna.create_study(direction="minimize")
study_xgb.optimize(objective_xgb, n_trials=50)
# Save the best model to pkl
with open(f'{base_path}\\xgboost\\xgboost_model.pkl', 'wb') as f:
    pickle.dump(study_xgb, f)
best_model_xgb = xgb.XGBRegressor(**study_xgb.best_params)

[I 2025-05-18 22:57:08,107] A new study created in memory with name: no-name-fc44de7d-bc8e-4cc3-ba9d-0ec1af8e886d
[I 2025-05-18 23:01:40,782] Trial 0 finished with value: 0.4635090842655603 and parameters: {'learning_rate': 0.08328078417767078, 'max_depth': 9, 'min_child_weight': 2, 'subsample': 0.7592351861881813, 'colsample_bytree': 0.6910308796289071, 'lambda': 8.758232003970122, 'alpha': 1.864305453206434}. Best is trial 0 with value: 0.4635090842655603.
[I 2025-05-18 23:09:53,233] Trial 1 finished with value: 0.46107416838745335 and parameters: {'learning_rate': 0.03105084920165796, 'max_depth': 12, 'min_child_weight': 8, 'subsample': 0.6462569000517838, 'colsample_bytree': 0.9288487123658122, 'lambda': 5.961820518044934, 'alpha': 2.557103185263112}. Best is trial 1 with value: 0.46107416838745335.
[I 2025-05-18 23:13:59,326] Trial 2 finished with value: 0.46589010614200993 and parameters: {'learning_rate': 0.052321244852632585, 'max_depth': 9, 'min_child_weight': 3, 'subsample': 

In [29]:
# Evaluate the best model on the test sets
best_model.fit(X_train_transformed, y_train, feature_name=list(feature_names))
y_pred_uwarm_iwarm = best_model.predict(X_test_uwarm_iwarm_transformed)
y_pred_uwarm_icold = best_model.predict(X_test_uwarm_icold_transformed)
y_pred_ucold_iwarm = best_model.predict(X_test_ucold_iwarm_transformed)
y_pred_ucold_icold = best_model.predict(X_test_ucold_icold_transformed)

# MSE
mse_uwarm_iwarm = mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mse_uwarm_icold = mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mse_ucold_iwarm = mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mse_ucold_icold = mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)
# RMSE
rmse_uwarm_iwarm = root_mean_squared_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
rmse_uwarm_icold = root_mean_squared_error(y_test_uwarm_icold, y_pred_uwarm_icold)
rmse_ucold_iwarm = root_mean_squared_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
rmse_ucold_icold = root_mean_squared_error(y_test_ucold_icold, y_pred_ucold_icold)
# MAE
mae_uwarm_iwarm = mean_absolute_error(y_test_uwarm_iwarm, y_pred_uwarm_iwarm)
mae_uwarm_icold = mean_absolute_error(y_test_uwarm_icold, y_pred_uwarm_icold)
mae_ucold_iwarm = mean_absolute_error(y_test_ucold_iwarm, y_pred_ucold_iwarm)
mae_ucold_icold = mean_absolute_error(y_test_ucold_icold, y_pred_ucold_icold)
# Print the results
print(f'MSE for warm user warm item: {mse_uwarm_iwarm:.4f}')
print(f'RMSE for warm user warm item: {rmse_uwarm_iwarm:.4f}')
print(f'MAE for warm user warm item: {mae_uwarm_iwarm:.4f}')
print('-' * 50)
print(f'MSE for warm user cold item: {mse_uwarm_icold:.4f}')
print(f'RMSE for warm user cold item: {rmse_uwarm_icold:.4f}')
print(f'MAE for warm user cold item: {mae_uwarm_icold:.4f}')
print('-' * 50)
print(f'MSE for cold user warm item: {mse_ucold_iwarm:.4f}')
print(f'RMSE for cold user warm item: {rmse_ucold_iwarm:.4f}')
print(f'MAE for cold user warm item: {mae_ucold_iwarm:.4f}')
print('-' * 50)
print(f'MSE for cold user cold item: {mse_ucold_icold:.4f}')
print(f'RMSE for cold user cold item: {rmse_ucold_icold:.4f}')
print(f'MAE for cold user cold item: {mae_ucold_icold:.4f}')



MSE for warm user warm item: 0.3818
RMSE for warm user warm item: 0.6179
MAE for warm user warm item: 0.4561
--------------------------------------------------
MSE for warm user cold item: 0.4436
RMSE for warm user cold item: 0.6660
MAE for warm user cold item: 0.5046
--------------------------------------------------
MSE for cold user warm item: 0.4804
RMSE for cold user warm item: 0.6931
MAE for cold user warm item: 0.5139
--------------------------------------------------
MSE for cold user cold item: 0.5864
RMSE for cold user cold item: 0.7658
MAE for cold user cold item: 0.5860
