# Using Neural Networks to create Recommender Systems
In a sense, it is an evolution of the Singular Value Decomposition (SVD) - see https://colab.research.google.com/drive/1goBESj_Z3nQBKEyQA3nq_PL4yWzW2PkP?usp=sharing. The idea is more or less the same: finding latent variables. But SVD is based on a linear framework, while NN are inherently **non-linear**.

We will see two different approaches with NN:
* a plain Multi-Layer Perceptron (MLP);
* a Two Tower model.

They are both Neural Network models, but they have some key differences in their architecture (and use cases, too).

## Using a Multi-Layer Perceptron (MLP) to define a Recommender System
This approach involves using a NN to learn the patterns and relationships between customers and products or services, and using this information to make recommendations. It can be particularly effective when dealing with complex datasets and non-linear patterns.

Here we will use some simulated data...

* we define the number of customers, products, and features;
* we then generate synthetic data for the customer and product features using the truncated normal distribution, with a mean of 0 and a standard deviation of 1, truncated to the range [-1, 1];
* we  generate synthetic data for the product ratings using randint.

We then **flatten the customer and product features into the X: these are the features**; each row of X corresponds to a combination of a customer and a product (grouped together), and each column of X corresponds to a single feature.

The y vector contains the corresponding **ratings = response variables**.
Rating can be an explicit judgment (think about Amazon, or Netflix), but in our case it is **implicit judgment** - for example relative frequency of purchase of a product/service (or a a click on a call to action).

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from scipy.stats import truncnorm
from keras.models import Sequential
from keras.layers import Dense

# Define the number of customers, products, and features
num_customers = 1000
num_products = 50
num_customer_features = 5
num_product_features = 10

# Generate synthetic data for customer and product features
customer_features = truncnorm.rvs(-1, 1, size=(num_customers, num_customer_features))
product_features = truncnorm.rvs(-1, 1, size=(num_products, num_product_features))

# Generate synthetic data for product ratings
ratings = np.random.randint(low=1, high=6, size=(num_customers, num_products))

# Flatten customer and product features and ratings into X and y matrices
X = np.concatenate((np.tile(customer_features, (num_products, 1)), np.repeat(product_features, num_customers, axis=0)), axis=1)
y = ratings.reshape(-1)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
np.tile(customer_features, (num_products, 1)).shape

(50000, 5)

In [None]:
np.repeat(product_features, num_customers, axis=0).shape

(50000, 10)

In [None]:
X.shape

(50000, 15)

In [None]:
customer_features.shape

(1000, 5)

In [None]:
product_features.shape

(50, 10)

In [None]:
y.shape

(50000,)

## Define and train the model
Let's **define the neural network architecture**.
* It's a Multi-Layer Perceptron (MLP);
* It has just three layers (they can be more - but be aware of overfitting);
* The input layer has the same number of neurons as the number of features in the X matrix;
* There's a hidden layer with 64 neurons and a ReLU activation function;
* There's another hidden layer with 32 neurons and a ReLU activation function;
* Finally, we add an output layer with a single neuron and a linear activation function.

We then compile the model using the **mean squared error loss function** because the response variable is a rating = a continuous variable. Note that we would insted use a **cross-entropy function in case of binary classification**, i.e., output ={0, 1}).
We use the **Adam optimizer**.

We **train the model** on the training set using 10 epochs and a batch size of 32.

Then we **evaluate the model** on the test set and print out the mean squared error.

In [None]:
# Define the neural network architecture
model = Sequential()
model.add(Dense(units=64, activation='relu', input_dim=X.shape[1]))
model.add(Dense(units=32, activation='relu'))
model.add(Dense(units=1, activation='linear'))

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32, verbose=1)

# Evaluate the model on the test set
mse = model.evaluate(X_test, y_test, verbose=0)
print('Mean Squared Error: ', mse)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Mean Squared Error:  2.0460219383239746


## Prediction
Now let's **generate new customer data** using the same truncated normal distribution as before, tile the data to match the number of products, concatenate the new customer data with the product features, bla bla bla.

Then we make a **prediction for the ratings of the new data** using the predict method of the model, and **get the indices of the top-rated products**.

Note the use of the *argsort* method of NumPy. We also reverse the indices using the [::-1] slicing syntax to get the top-rated products in descending order.
For the sake of simplicity we take just the top 3 products using the [:3] slicing syntax, and in the end we print the indices of the top-rated products...

In [None]:
# Generate new customer data
new_customer_features = truncnorm.rvs(-1, 1, size=(1, num_customer_features))

# Tile the customer data to match the number of products
new_customer_data = np.tile(new_customer_features, (num_products, 1))

# Concatenate the new customer data with the product features
new_data = np.concatenate((new_customer_data, product_features), axis=1)

# Make a prediction for the ratings of the new data (i.e., ratings = goals' intensity)
new_ratings = model.predict(new_data)

# Get the indices of the top-rated products
top_products = np.argsort(new_ratings, axis=0)[::-1][:3]

# Print the top-rated products (i.e., goals/needs)
print('Top Rated Products: ', top_products)

Top Rated Products:  [[21]
 [11]
 [ 3]]


## Using a Two Tower Model
The Two Tower model is a NN architecture specifically designed for recommendation systems.

The model consists of **two separate sub-models, or "towers"**. Each tower is responsible for **encoding the features of either the user or the item being recommended**: the towers take the raw feature vectors representing the products and customers as input, and transform them into a lower-dimensional representation that should captures the most important information about the items and customers for making recommendations (more or less like PCA, but it's non-linear). This process of transforming the raw feature vectors into a lower-dimensional representation is referred to as **"feature encoding" or "embedding."** The towers use neural networks to learn this encoding, which is a non-linear mapping function from the high-dimensional input space to a lower-dimensional feature space that preserves the most important information about the input.

For those somewhat familiar with recommendation systems, the Two-Tower model combines the user-based and item-based approaches in a joint architecture:
* the user tower in the Two-Tower model is similar to the user-based approach, as it takes in user features to produce a vector representation of the user;
* the item (products) tower is similar to the item-based approach, as it takes in item features to produce a vector representation of the items;
* the joint space where the two vector representations are combined can be thought of as a hybrid of the user and item space.

**The two towers are trained jointly to learn how to encode products and customers**, and the **encoded features are concatenated and passed through one or more additional layers to make the final prediction** for the rating or preference of each customer for each item.
This output is then used to make recommendations.

### Let's generate some synthetic data
We use a similar but different dataset, which emphasizes the difference between product-related features and customer-related features.
Let's say we have:
* N=1000 customers (i.e., examples, on the rows);
* K1 = 5 features describing the customers;
* K2 = 3 features describing Q = 5 products;
* Q = 5 response variables for each customer, representing an implicit or explicit rating given by each customer to 5 different products.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Concatenate
from tensorflow.keras.models import Model

# Generate synthetic data
N = 1000  # number of customers
K1 = 5  # number of features describing customers
K2 = 3  # number of features describing products
Q = 5  # number of products and response variables
Q2 = 1  # number of products and response variables

np.random.seed(123)  # set the random seed for reproducibility

# Generate synthetic data for customers, products, and ratings
X1 = np.random.rand(N, K1)  # customer features
X2 = np.random.rand(Q, K2)  # product features
ratings = np.random.rand(N, Q2)  # ratings given by each customer to each product

Define the **model architecture** - again MSE and Adam optimizer (...see above MLP) - then **train the model**.

Note: Since it's a similar example to before, for the sake of simplicity let's skip data splitting - for that step see the previous example on MLP.

In [None]:
# Build Two Tower model
input1 = Input(shape=(K1,))  # input layer for customer features
input2 = Input(shape=(K2,))  # input layer for product features
x1 = Dense(32, activation='relu')(input1)  # first hidden layer for customer features
x2 = Dense(32, activation='relu')(input2)  # first hidden layer for product features
x = Concatenate()([x1, x2])  # concatenate the output from the two hidden layers
output = Dense(Q2, activation='sigmoid')(x)  # output layer for predicted ratings
model = Model(inputs=[input1, input2], outputs=output)  # define the model

model.compile(loss='mse', optimizer='adam')  # compile the model with mean squared error loss and Adam optimizer

# Train the model on the synthetic dataset
model.fit([X1, np.tile(X2, (N, 1))], ratings, epochs=50, batch_size=32, validation_split=0.2)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x781a8432e650>

The model is then used to **make predictions** on new synthetic data with the same characteristics as the training data, but with 10 new customers. The predicted ratings for each new customer and each product are outputted: the expected rating is the intensity of goal/need. **To identify the recommendation, just sort the rating forecasts from highest to lowest and find the associated goal based portfolios.**

In [None]:
# Generate new synthetic data for prediction
N_new = 10  # number of new customers
X1_new = np.random.rand(N_new, K1)  # new customer features
X2_new = np.random.rand(Q, K2)  # product features for the existing products

# Do prediction on new data
predicted_ratings = model.predict([X1_new, np.tile(X2_new, (N_new, 1))[:N_new]])  # predict ratings for the new customers
print(predicted_ratings)  # print the predicted ratings = top goals/needs

[[0.46298596]
 [0.5108748 ]
 [0.52795464]
 [0.53764   ]
 [0.56107664]
 [0.4699516 ]
 [0.51727223]
 [0.5187645 ]
 [0.551029  ]
 [0.50386727]]
