# Introduction

In many real scenarios, the buying and rating behaviors of customers are associated with temporal information.

Time can be viewed from a recency and forecasting perspective, or from a contextual (e.g., seasonal) perspective. From a recency perspective, the basic idea is that recent ratings are more important than older ratings. In the contextual perspective, various periodic aspects, such as season or month, may be used.

When time is viewed as a continuous variable, the recommendations are often created as functions of time. The temporal context can be viewed from a periodic, recency, or modeling point of view.

Time can be treated as a modeling variable by explicitly expressing the predicted ratings as a function of time. The parameters of this function can be learned in a data-driven manner by minimizing the squared error of the predicted ratings with respect to the observed ratings. An example of such a model is time-SVD++, which expresses the predicted ratings as a function of temporally parameterized biases and factor matrices. 

# Temporal Collaborative Filtering

Temporal information can be used in one of two ways in order to improve the effectiveness of prediction:
1. Recency-based models: Some models consider recent ratings more important than older ratings. In these cases, window-based and decay-based models are used for more accurate prediction.
2. Periodic context-based models: In periodic context-based models, the specific property of a period, such as the time at the level of specificity of the hour, day, week, month, or sea- son, is used to perform the recommendation. 
3. Models that explicitly use time as an independent variable: A recent approach, referred to as time-SVD++, uses time as an independent variable within the modeling process. 

## Recency-Based Models

### Decay-Based Methods

In decay-based methods, a time-stamp $t_{uj}$ is associated with each observed rating of user $u$ and item $j$ in the $m \times n$ ratings matrix $R$. It is assumed that all recommendations should be made at a future time $t_f$ . This future time is also referred to as the target time. Then, the weight $w_{uj}(t_f)$ of the rating $r_{uj}$ at target time $t_f$ is defined with the use of a decay function, that penalizes larger distances between $t_{uj}$ and $t_f$ . A decay function is the exponential function: 
$$ w_{uj}(t_f) = exp[-\lambda (t_f - t_{uj})]$$

The decay-rate $\lambda$ is a user-defined parameter that regulates the importance of time. Larger values of $\lambda$ de-emphasize older ratings to a greater degree.

Then:
$$ \hat{r}_{uj} (t_f) = \mu_u + \dfrac{\sum_{v \in P_u(j)} w_{vj} (t_f) .Sim(u, v).(r_{vj} - \mu_v)}{\sum_{v \in P_u(j)} w_{vj} (t_f) .|Sim(u, v)|} $$

Here, $P_u(j)$ represents the $k$ closest users to user $u$ that have specified ratings for item $j$. The optimal value of λ can be learned using cross-validation methods,

#### Example in MovieLens

In [1]:
import numpy as np
import tensorflow as tf
import sklearn
import csv
import pandas as pd
from datetime import datetime

  from ._conv import register_converters as _register_converters


In [2]:
import os

dir_path = os.path.abspath(os.path.join('', os.pardir))

#### Get data

In [3]:
names = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(os.path.join(dir_path, 'data/ml-100k/u.data'), names=names, sep='\t')

In [4]:
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]

In [5]:
nan = np.nan

time_matrix = np.zeros((n_users, n_items)) * nan
ratings_matrix = np.zeros((n_users, n_items)) * nan

for line in df.itertuples():
    ratings_matrix[line[1]-1, line[2]-1] = line[3]
    time_matrix[line[1]-1, line[2]-1] = line[4]

#### Get test data
We need to get the test data with the newest time.

In [6]:
flatten_time_matrix = np.reshape(time_matrix, -1)

In [7]:
sorted_time_indices = np.argsort(flatten_time_matrix) # decreasing

In [8]:
flatten_test_indices = sorted_time_indices[:5000]
flatten_train_indices = sorted_time_indices[5000:]

In [9]:
def get_indices(flatten_indices):
    X = []
    Y = []
    
    for indices in flatten_indices:
        x = indices // n_items
        y = indices % n_items
        
        X.append(x)
        Y.append(y)
        
    return [tuple(X), tuple(Y)]

In [10]:
test_indices = get_indices(flatten_test_indices)
train_indices = get_indices(flatten_train_indices)

In [11]:
def get_ratings_matrix(original_matrix, indices):
    shape = np.shape(original_matrix)
    matrix = np.ones(shape) * nan
    
    for i in range(len(indices[0])):
        x = indices[0][i]
        y = indices[1][i]
        
        matrix[x][y] = original_matrix[x][y]
    
    return matrix

In [12]:
train_ratings_matrix = get_ratings_matrix(ratings_matrix, train_indices)
test_ratings_matrix = get_ratings_matrix(ratings_matrix, test_indices)

train_time_matrix = get_ratings_matrix(time_matrix, train_indices)
test_time_matrix = get_ratings_matrix(time_matrix, test_indices)

In [13]:
# indices for vector
def specified_rating_indices(u):
    return np.where(np.isfinite(u))

In [14]:
# mean rating for each user i using his specified rating
def mean(u):
    # may use specified_rating_indices but use more time
    specified_ratings = u[specified_rating_indices(u)]#u[np.isfinite(u)]
    if np.shape(specified_ratings)[0] == 0: return nan
    m = sum(specified_ratings)/np.shape(specified_ratings)[0]
    return m

In [15]:
def all_user_mean_ratings(ratings_matrix):
    return np.array([mean(ratings_matrix[u, :]) for u in range(ratings_matrix.shape[0])])

In [16]:
def get_mean_centered_ratings_matrix(ratings_matrix):
    users_mean_rating = all_user_mean_ratings(ratings_matrix)
    mean_centered_ratings_matrix = ratings_matrix - np.reshape(users_mean_rating, [-1, 1])
    return mean_centered_ratings_matrix

In [17]:
mean_centered_ratings_matrix = get_mean_centered_ratings_matrix(train_ratings_matrix)

In [18]:
def pearson(u, v):
    mean_u = mean(u)
    mean_v = mean(v)
    
    specified_rating_indices_u = set(specified_rating_indices(u)[0])
    specified_rating_indices_v = set(specified_rating_indices(v)[0])
    
    mutually_specified_ratings_indices = specified_rating_indices_u.intersection(specified_rating_indices_v)
    mutually_specified_ratings_indices = list(mutually_specified_ratings_indices)
    
    u_mutually = u[mutually_specified_ratings_indices]
    v_mutually = v[mutually_specified_ratings_indices]
      
    centralized_mutually_u = u_mutually - mean_u
    centralized_mutually_v = v_mutually - mean_v
#     print(np.sqrt(np.sum(np.square(centralized_mutually_u))))

    result = np.sum(np.multiply(centralized_mutually_u, centralized_mutually_v)) 
    result = result / (np.sqrt(np.sum(np.square(centralized_mutually_u))) * np.sqrt(np.sum(np.square(centralized_mutually_v))))
    
    return result

In [19]:
from sklearn.metrics.pairwise import cosine_similarity
from surprise import similarities

In [20]:
def mean_centered(u):
    return u - mean(u)

In [21]:
def get_user_similarity_value_for(u_index, ratings_matrix, func):
    user_ratings = ratings_matrix[u_index, :]
    similarity_value = np.array([func(ratings_matrix[i, :], user_ratings) for i in range(ratings_matrix.shape[0])])
    return similarity_value

In [22]:
from tqdm import tqdm
def get_user_similarity_matrix(ratings_matrix, func):
    similarity_matrix = []
    for u_index in tqdm(range(ratings_matrix.shape[0])):
        similarity_value = get_user_similarity_value_for(u_index, ratings_matrix, func)
        similarity_matrix.append(similarity_value)
    return np.array(similarity_matrix)
    

In [23]:
user_similarity_matrix = get_user_similarity_matrix(train_ratings_matrix, pearson)

100%|██████████| 943/943 [01:49<00:00,  8.59it/s]


In [24]:
users_mean_rating = all_user_mean_ratings(train_ratings_matrix)

In [25]:
def diff_month(d1, d2):
    d1 = datetime.fromtimestamp(d1)
    d2 = datetime.fromtimestamp(d2)
    return (d1.year - d2.year) * 12 + d1.month - d2.month

In [26]:
# get weight by diff of months

def weight_for_rating(lambda_param, rating_time, current_time):
    months = diff_month(rating_time, current_time)
    return np.exp(-lambda_param*months)

In [122]:
def predict(u_index, i_index, k, current_time=None):    
    similarity_value = user_similarity_matrix[u_index]
    sorted_users_similar = np.argsort(similarity_value)
    sorted_users_similar = np.flip(sorted_users_similar, axis=0)
        
    # only for this item
    users_rated_item = specified_rating_indices(train_ratings_matrix[:, i_index])[0]

    set_2 = frozenset(users_rated_item)
    ranked_similar_user_rated_item = [u for u in sorted_users_similar if u in set_2] 
    
    if k < len(ranked_similar_user_rated_item):
        top_k_similar_user = ranked_similar_user_rated_item[0:k]   
    else:
        top_k_similar_user = np.array(ranked_similar_user_rated_item)
            
    # replace with mean_centered for user
    
    ratings_in_item = mean_centered_ratings_matrix[:, i_index]
    try: 
        top_k_ratings = ratings_in_item[top_k_similar_user]
        top_k_similarity_value = similarity_value[top_k_similar_user]
    except:
        return nan
    
    
    
    current_time = 875743787.0 #time_matrix[u_index, i_index]
    weights = []
    for i in range(len(top_k_similar_user)):
        rating_time = time_matrix[top_k_similar_user[i], i_index]
        weights.append(weight_for_rating(0.1, rating_time, current_time))
    
    weights = np.array(weights)
    top_k_similarity_value = np.multiply(top_k_similarity_value, weights)

    r_hat = users_mean_rating[u_index] + np.sum(top_k_ratings * top_k_similarity_value)/np.sum(np.abs(top_k_similarity_value))
    return r_hat

In [123]:
predict(10, 1, 10)
users_mean_rating[10]

3.4640883977900554

In [125]:
pairs = []
for i in range(len(test_indices[0])):
    pairs.append((test_indices[0][i], test_indices[1][i]))

In [128]:
def test(pairs):
    for u, i in pairs:
        print(predict(u, i, 10))

In [130]:
# test(pairs)

In [131]:
def get_predicted_ratings_matrix(current_time):
    predicted_ratings = []
    for u_index in tqdm(range(n_users)):
        user_ratings = []
        for i_index in range(n_items):
#             rating = ratings_matrix[u_index][i_index]
#             if np.isnan(rating):
            rating = predict(u_index, i_index, 100, current_time)
            user_ratings.append(rating)
        predicted_ratings.append(user_ratings)
    return predicted_ratings            

In [57]:
# predicted_ratings = get_predicted_ratings_matrix()
# predicted_ratings = np.array(predicted_ratings)

100%|██████████| 943/943 [05:18<00:00,  2.96it/s]


### Window-Based Methods

In window-based methods, ratings that are older than a particular time are pruned from consideration. This approach can be viewed as a special case of pre-filtering or post-filtering methods in context-based models. There are several ways in which windows can be modeled:
1. If the difference between the target time $t_f$ and the rating time $t_{ij}$ is larger than a particular threshold, then the rating is dropped.
2. In some cases, it is possible to obtain some insight into the active periods for various items depending on the underlying domain. In such cases, the windows can be set in a domain- and item-specific way.

## Handling Periodic Context

Periodic context is designed to handle cases in which the time dimension may refer to a specific period in time, such as hour of the day, day of the week, season, or the time intervals in the vicinity of specific periodic events.

### Pre-Filtering and Post-Filtering

In pre-filtering, a significant part of the ratings data are removed that are not relevant to the specific target time (i.e., context) within which the recommendation is being performed or executed. Within each context, a separate model is constructed for prediction. After filtering, any non-contextual method may be used to make predictions on the pruned data within each segment. 

In post-filtering, the recommendations are adjusted based on the context, after a non- contextual method has been used to generate the recommendation on all the data. Therefore, the basic approach of post-filtering uses the following two steps:
1. Generate the recommendations using a conventional collaborative filtering approach on all data, while ignoring the temporal context.
2. Adjust the generated recommendation list with the use of temporal context as a post- processing step. Either the ranks of the recommended list may be adjusted, or the list may be pruned of contextually irrelevant items.


### Direct Incorporation of Temporal Context
It is also possible to directly modify existing models such as neighborhood methods in order to incorporate temporal context. In such cases, one works directly with the 3-dimensional representation corresponding to user, item, and context. One can also modify regression and latent-factor models to incorporate the temporal context directly. These methods apply generally to any context- based scenario (e.g., location), and not just the temporal context.

## Modeling Ratings as a Function of Time
In these methods, the ratings are modeled as a function of time and the parameters of the model are learned in a data-dependent way. These methods can intelligently separate long-term trends from transient and noisy trends.

### The Time-SVD++ Model
(This model is built based on both explicit and implicit feedback)

The factor model, which incorporates bias, expresses the ratings matrix $R = [r_{ij}]_{m \times n}$ in terms of the user biases, the item biases, and the factor matrices. The predicted rating $\hat{r_{ij}}$ is expressed in terms of these variables as follows:
$$ \hat{r}_{ij} = o_i + p_j + \sum_{s=1}^k u_{is}.v_{js} $$

Intuitively, the variable $o_i$ indicates the propensity of user $i$ to rate all items highly, whereas the variable $p_j$ denotes the propensity of item $j$ to be rated highly.

This basic bias-based model is further enhanced with the notion of implicit feedback variables $Y = [y_{ij}]_{n×k}$ for each user-item pair. These variables encode the propensity of each factor-item combination to contribute to implicit feedback.

Let $I_i$ be the set of items rated by user $i$. Then, the predicted value of the rating, which includes implicit feedback, can be expressed as follows:
$$ \hat{r}_{ij} = o_i + p_j + \sum_{s=1}^k (u_{is} + \sum_{h \in I_i} \dfrac{y_{hs}}{\sqrt{|I_i|}} ).v_{js} $$


Specifically, the time-SVD++ model assumes that the user biases $o_i$ , item biases $p_j$ , and the user factors $u_{is}$ are functions of time. Therefore, these terms will be expressed as $o_i(t)$, $p_j(t)$, and $u_{is}(t)$ to denote the fact that they are functions of time. By using these temporal variables, one now obtains the time-varying predicted value $\hat{r}_{ij}(t)$ of the $(i, j)$th entry of the ratings matrix at time $t$ as follows:
$$ \hat{r}_{ij} = o_i(t) + p_j(t) + \sum_{s=1}^k (u_{is}(t) + \sum_{h \in I_i} \dfrac{y_{hs}}{\sqrt{|I_i|}} ).v_{js} $$


### How to compute $o_i(t)$, $p_j(t)$ and $u_{is}(t)$

1. The intuition for choosing the temporal form of the item bias $p_j(t)$ is that the popular- ity of an item can vary significantly with time, but it shows a high level of continuity and stability over shorter periods. Therefore, the time horizon can be split into bins of equal size, and the ratings belonging to a particular bin have the same bias. Smaller bin sizes lead to better granularity but it may also result in overfitting because enough ratings may not be present in each bin. The item bias $p_j(t)$ can now be split into a constant part and an offset parameter, which is bin-specific depending on the time $t$ at which item $j$ is rated:
$$ p_j(t) = C_j + Offset_{j, Bin(t)} $$

Note that both the constant part $C_j$ and offsets are parameters that need to be learned in a data-driven manner.
2. A different approach is used to parameterize the user bias $o_i(t)$. Therefore, a functional form may be used to parameterize the user bias, which captures the concept drift of the user over time. Let the mean date of all ratings of user $i$ be denoted by $\nu_i$. Then, the concept drift $dev_i(t)$ of user $i$ at time $t$ can be computed as a function of $t$ as follows:
$$ dev_i(t) = sign(t − \nu_i) . |t − \nu_i|^\beta $$
The parameter $\beta$ is selected using cross-validation. A typical value of $\beta$ is around 0.4. In addition, the transient noise at each time t is captured with the parameters $\epsilon_{it}$. Then, the user bias $o_i(t)$ is split into a constant part, a time-dependent part, and transient noise, as follows:
$$ o_i(t) = K_i + \alpha_i · dev_i(t) + \epsilon_{it} $$

3. The user factors $u_{is}(t)$ correspond to the affinity of users towards various concepts. As in the case of user biases, the amount of elapsed time is a crucial factor in deciding the amount of drift. Therefore, a similar approach to user biases is used for modeling the temporal change in the user factors:
$$ u_{is}(t) = K' + \alpha' . dev(t) + \epsilon'_{ist} $$
As in the case of user biases, the constant effects, long-term effects, and transient effects are modeled by the three terms.

### Optimization Problem
If $S$ contains the set of user-item pairs for which ratings are specified in the matrix $R = [r_{ij}]_{m \times n}$, then one must solve the following optimization problem:
$$ Minimize J = \dfrac{1}{2} \sum_{(i, j) \in S}[r_{ij} − \hat{r}_{ij} (t_{ij})] + \lambda . (Regularization Term) $$

# Discrete Temporal Models

Discrete temporal models are relevant to the case where the underlying data is received as discrete sequences. Such data can be encountered in a variety of application scenarios, most of which are associated with implicit user feedback rather than explicit ratings. Some examples of such application scenarios are as follows:
1. Web logs and clickstreams
2. Supermarket transactions
3. Query recommendations

## Markovian Models

# Location-Aware Recommender Systems

Location can influence the recommendation process in a wide variety of ways, of which the following two ways are particularly common:

1. The global geographical location of a user can have a significant influence on her pref- erences in terms of taste, culture, clothing, eating habits, and so on. This property is referred to as preference locality. In this case, the locality is inherently associated with the user, but not with the item. Therefore, in this case, the users are spatial, whereas the items are not.
2. Mobile users often want to discover restaurants or leisure places in the vicinity of their current location. In this case, the recommended items are inherently spatial. This prop- erty is referred to as travel locality.
3. It is possible to imagine scenarios in which both users and items are spatial.