# **Understanding NDCG (Normalized Discounted Cumulative Gain)**

## **1. Introduction to NDCG**

**NDCG** is a metric used to evaluate the effectiveness of search engines and recommendation systems. It measures how well a system ranks relevant items higher than irrelevant ones. It is particularly useful in situations where the relevance of items can vary and needs to be considered in a ranked list.

---

## **2. Key Concepts**

### **2.1 Discounted Cumulative Gain (DCG)**

**DCG** is a measure of the usefulness of a document based on its position in the ranking list. It takes into account both the relevance of the document and its position in the list. The gain of a document is "discounted" logarithmically, meaning that a document appearing lower in the list contributes less to the DCG score.

<br/>

### **Mathematical Expression**

The DCG for a set of retrieved documents is given by:

$$ 
\text{DCG}_p = \frac{\text{rel}_1}{\log_2(2)} + \frac{\text{rel}_2}{\log_2(3)} + \frac{\text{rel}_3}{\log_2(4)} + \cdots + \frac{\text{rel}_p}{\log_2(p+1)} 
$$

where:
- $\text{rel}_i$ is the relevance score of the document at position $p$.
- $p$ is the position up to which DCG is calculated.

### **Pros:**
- **Intuitive:** DCG is easy to understand and calculate.
- **Focus on Position:** It takes into account the position of relevant documents, giving higher scores to documents appearing earlier in the list.

### **Cons:**
- **Absolute Scores:** DCG values are absolute and can be difficult to interpret across different queries or datasets.

<br/>

---
   
### **2.2 Ideal Discounted Cumulative Gain (IDCG)**

**IDCG** is the DCG score of the ideal ranking, where documents are ordered by their relevance in descending order. It serves as the baseline for measuring the effectiveness of the actual ranking.

<br/>

### **Mathematical Expression**

The IDCG is calculated similarly to DCG but with the documents sorted by relevance:

$$ 
\text{IDCG}_p = \frac{\text{rel}_{\text{ideal}_1}}{\log_2(2)} + \frac{\text{rel}_{\text{ideal}_2}}{\log_2(3)} + \frac{\text{rel}_{\text{ideal}_3}}{\log_2(4)} + \cdots + \frac{\text{rel}_{\text{ideal}_p}}{\log_2(p+1)} 
$$

where $\text{rel}_{\text{ideal}_p}$ is the relevance score of the document at position $p$ in the ideal ranking.

### **Pros:**
- **Benchmarking:** Provides a benchmark to compare the DCG of the actual ranking.

### **Cons:**
- **Computationally Intensive:** Requires knowledge of the ideal relevance order, which may not always be feasible.


---


### **2.3 Normalized Discounted Cumulative Gain (NDCG)**

**NDCG** normalizes DCG by dividing it by IDCG, providing a score between 0 and 1.

<br/>

### **Mathematical Expression**

The NDCG is given by:

$$\text{NDCG}_p = \frac{\text{DCG}_p}{\text{IDCG}_p}$$

where:
- $\text{DCG}_p$ is the Discounted Cumulative Gain at position $p$.
- $\text{IDCG}_p$ is the Ideal Discounted Cumulative Gain at position $p$.

### **Pros:**
- **Normalization:** Provides a normalized score, making it easier to compare across different queries or datasets.
- **Interpretability:** Scores range from 0 to 1, which is intuitive for understanding the performance of the ranking.

### **Cons:**
- **Dependency on Ideal Ranking:** The normalization depends on the ideal ranking, which may not be available or may vary.

---

## **3. When to Use NDCG**

<br/>

#### **3.1 Search Engines**

In search engines, NDCG evaluates how well search results match user queries and returns relevant results at the top of the list. A higher NDCG indicates that more relevant documents appear higher in the search results. 

#### **3.2 Recommender Systems**

For recommendation systems, NDCG assesses how well the recommended items align with user preferences. It ensures that highly relevant recommendations are given higher priority. (E-commerce)

#### **3.3 Ranking Algorithms**

NDCG is used to compare different ranking algorithms, helping to select the one that best orders relevant items higher.

## **4. Example Calculation of DCG, IDCG, and NDCG**

### **4.1. Example Data**

Consider the following relevance scores for a set of documents: *(note: retrieval order is Document 1, 2, 3, 4, 5)*

- **Document 1**: Relevance = 3
- **Document 2**: Relevance = 2
- **Document 3**: Relevance = 3
- **Document 4**: Relevance = 0
- **Document 5**: Relevance = 1

### **4.2. Calculate DCG**

The Discounted Cumulative Gain (DCG) at position $p = 5$ is calculated as follows:

$$ \text{DCG}_5 = \frac{\text{rel}_1}{\log_2(2)} + \frac{\text{rel}_2}{\log_2(3)} + \frac{\text{rel}_3}{\log_2(4)} + \frac{\text{rel}_4}{\log_2(5)} + \frac{\text{rel}_5}{\log_2(6)} $$

Substitute the relevance scores:

$$ \text{DCG}_5 = 3 + \frac{2}{\log_2(3)} + \frac{3}{\log_2(4)} + \frac{0}{\log_2(5)} + \frac{1}{\log_2(6)} $$

$$ \text{DCG}_5 = 3 + \frac{2}{1.585} + \frac{3}{2} + \frac{0}{2.322} + \frac{1}{2.585} $$

$$ \text{DCG}_5 \approx 3 + 1.263 + 1.5 + 0 + 0.387 $$

$$ \text{DCG}_5 \approx 6.150 $$

### **4.3. Calculate IDCG**

To calculate Ideal DCG (IDCG), sort the documents by relevance in descending order:

- **Ideal Order**: Document 1 (3), Document 3 (3), Document 2 (2), Document 5 (1), Document 4 (0)

Then calculate IDCG at position $p = 5$:

$$ \text{IDCG}_5 = \frac{\text{rel}_{\text{ideal}_1}}{\log_2(2)} + \frac{\text{rel}_{\text{ideal}_2}}{\log_2(3)} + \frac{\text{rel}_{\text{ideal}_3}}{\log_2(4)} + \frac{\text{rel}_{\text{ideal}_4}}{\log_2(5)} + \frac{\text{rel}_{\text{ideal}_5}}{\log_2(6)} $$

Substitute the ideal relevance scores:

$$ \text{IDCG}_5 = \frac{3}{\log_2(2)} + \frac{3}{\log_2(3)} + \frac{2}{\log_2(4)} + \frac{1}{\log_2(5)} + \frac{0}{\log_2(6)} $$

$$ \text{IDCG}_5 = 3 + \frac{3}{1.585} + \frac{2}{2} + \frac{1}{2.322} + 0 $$

$$ \text{IDCG}_5 \approx 3 + 1.894 + 1 + 0.431 $$

$$ \text{IDCG}_5 \approx 6.325 $$

### **4.4. Calculate NDCG**

The Normalized Discounted Cumulative Gain (NDCG) is the ratio of DCG to IDCG:

$$ \text{NDCG}_5 = \frac{\text{DCG}_5}{\text{IDCG}_5} $$

Substitute the calculated values:

$$ \text{NDCG}_5 = \frac{6.150}{6.325} $$

$$ \text{NDCG}_5 \approx 0.972 $$

### **4.5 Summary of Results**

- **DCG at 5**: 6.150
- **IDCG at 5**: 6.325
- **NDCG at 5**: 0.972

This example illustrates how to calculate DCG, IDCG, and NDCG using a small set of documents. These calculations can be applied to larger datasets or different queries in a similar manner.



## **5. Summary**

- DCG, IDCG, and NDCG are fundamental metrics for evaluating ranking systems. While DCG provides a measure of the usefulness of the ranking, IDCG represents the best possible ranking. NDCG combines these measures to provide a normalized score, facilitating comparison and interpretation of ranking performance.

- NDCG is a powerful metric for evaluating the quality of ranked lists in search engines and recommendation systems. By considering both relevance and position, NDCG provides a comprehensive measure of ranking effectiveness.

## Python Code Implementation

In [77]:
import math

# Example relevance scores
relevance_scores = [3, 2, 3, 0, 1]
ideal_relevance_scores = sorted(relevance_scores, reverse=True)

def calculate_dcg(relevance_scores, p):
    dcg = 0
    for i in range(min(p, len(relevance_scores))):
        dcg += relevance_scores[i] / math.log2(i + 2)
    return dcg

def calculate_idcg(ideal_relevance_scores, p):
    idcg = 0
    for i in range(min(p, len(ideal_relevance_scores))):
        idcg += ideal_relevance_scores[i] / math.log2(i + 2)
    return idcg

def calculate_ndcg(dcg, idcg):
    if idcg == 0:
        return 0
    return dcg / idcg

# Position Parameter
p = 5

# Calculate DCG and IDCG
dcg = calculate_dcg(relevance_scores, p)
idcg = calculate_idcg(ideal_relevance_scores, p)
ndcg = calculate_ndcg(dcg, idcg)

# Print results
print(f"DCG at {p}: {dcg:.3f}")
print(f"IDCG at {p}: {idcg:.3f}")
print(f"NDCG at {p}: {ndcg:.3f}")


DCG at 5: 6.149
IDCG at 5: 6.323
NDCG at 5: 0.972


### Additional Reading Resources:

- https://www.geeksforgeeks.org/normalized-discounted-cumulative-gain-multilabel-ranking-metrics-ml/
- https://en.wikipedia.org/wiki/Discounted_cumulative_gain
- https://spotintelligence.com/2024/08/08/normalised-discounted-cumulative-gain-ndcg/