# 1 Evaluation Metrics for top-*N* Lists

When ratings are not available, i.e., with unary data, measuring the rating prediction accuracy isn't possible.

In these cases, evaluation is done using $R_{train}$ to learn $L_u$ and evaluating on $R_{test}$

Let $T_u \subset I_u \cap I_{test}$ the subset of test items that the user $u$ found relevant, e.g., rated positively, clicked, purchased.

## 1.1 Precision

Precision measures how many recommended items are relevant, out of all recommended items to the user $u$.

$$Precision(L_u) = \frac{|L_u \cap T_u |}{|L_u|}$$

To evaluate the RS as a whole, we average the precision for all active users $u \in U$.

$$Precision(L) = \frac{\sum\limits_{u \in U} Precision(L_u)}{|U|}$$

## 1.2 Recall

Recall, on the other side, relates to how many relevant were recommended, out of all relevant items for the user $u$.

$$Recall(L_u) = \frac{|L_u \cap T_u |}{|T_u|}$$

Again, to evaluate the TS we average the results of all active users $u \in U$.

$$Recall(L) = \frac{\sum\limits_{u \in U} Recall(L_u)}{|U|}$$

## 1.3 Average Precision (AP)

Precision and recall ignore the ordering. Therefore we need a ranking metric.

To understand average precision, we must start with Precision@k and Recall@k, i.e., precision and recall up to cut-off $k$.

In other words, we consider only the subset of recommendations $L_u^k \subset L_u$ from rank 1 through rank $k \leqslant N$.

$$PrecisionAtk(L_u) = \frac{|L_u^k \cap T_u |}{|L_u^k|}$$

$$RecallAtk(L_u) = \frac{|L_u^k \cap T_u |}{|T_u|}$$

The AP is a ranking metric, measuring the frequency of relevant recommendations.

$$APatN(L_u) = \frac{\sum\limits_{k = 1}^N (PrecisionAtk(L_u) \cdot relevant(k^{th})}{|T_u|}$$

The $relevant(k^{th})$ bit is a boolean value, indicating whether the $k$-th element is relevant, or not.

Every hit is valued as how many correct recommendations $|L_u^k \cap T_u|$ we have up to the rank $k$, out of all recommendations $|L_u^k|$.

A first interpretation is that the AP increases only with correct recommendations (what a surprise!).

Also, early hits, i.e., front-loading correct recommendations, carry over and are continuously rewarded.

Finally, the AP can never decrease as you increase $N$.

There is, however, an alternative formula for AP, in terms of both precision and the change in recall from the subset $k$ − 1 to the $k$-th.

$$APatN(L_u) = \sum\limits_{k=1}^NPrecisionAtk(L_u) * \Delta RecallAtk(L_u)$$ 

## 1.4 Mean Average Precision (mAP)

The Average Precision (AP) is further averaged over all users and reported as a single score.

$$mAPatN(L) = \frac{\sum\limits_{u \in U} APatN(L_u)}{|U|}$$

This way, we use a metric that considers both the number and the ranking of hits, i.e., useful recommendations.

In this last section, we learned how to use unary data, make predictions based on it and how to evaluate our algorithms.

Time to practice!