# Glicko-2 Reward Function for Nectar Dataset

This notebook implements a reward function based on Glicko-2 ratings for the Berkeley NEST Nectar dataset. The reward function is defined as:

```
Reward = Glicko-2 Rating - Rating Volatility
```

This rewards items with high ratings while penalizing those with high volatility (inconsistency).

## 1. Installing Required Libraries

In [None]:
!pip install datasets glicko2 matplotlib pandas numpy tqdm

## 2. Importing Libraries

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datasets import load_dataset
from glicko2 import Player
from tqdm.notebook import tqdm
import json
import random
from collections import defaultdict

## 3. Loading the Nectar Dataset from Hugging Face

In [13]:
# Load the dataset
print("Loading the Berkeley NEST Nectar dataset...")
try:
    nectar_dataset = load_dataset("berkeley-nest/Nectar")
    print(f"Dataset loaded with {len(nectar_dataset['train'])} entries.")
except Exception as e:
    print(f"Error loading dataset: {e}")
    print("\nTrying alternative approach...")
    try:
        # Try with explicit download mode
        nectar_dataset = load_dataset("berkeley-nest/Nectar", download_mode="force_redownload")
        print(f"Dataset loaded with {len(nectar_dataset['train'])} entries.")
    except Exception as e2:
        print(f"Error with alternative approach: {e2}")
        print("\nCreating synthetic dataset for demonstration purposes...")
        # Create a simple synthetic dataset for demonstration
        synthetic_data = []
        for i in range(1000):
            synthetic_data.append({
                "item_a": f"response_{i}",
                "item_b": f"response_{i+1000}",
                "preferred": random.choice(["item_a", "item_b"])
            })
        nectar_dataset = {"train": synthetic_data}

# Display dataset structure
print("\nDataset Structure:")
if isinstance(nectar_dataset, dict):
    print(f"Dictionary with keys: {list(nectar_dataset.keys())}")
    print(f"Train set length: {len(nectar_dataset['train'])}")
else:
    print(nectar_dataset)

# Examine the first few records in detail
print("\nDetailed Exploration of First 3 Records:")
for i in range(min(3, len(nectar_dataset['train']))):
    print(f"\n===== Record {i} =====")
    record = nectar_dataset['train'][i]
    
    # Print the keys first
    print(f"Keys: {list(record.keys())}")
    
    # Print the full record
    try:
        print(json.dumps(record, indent=2))
    except:
        print("Unable to JSON serialize the record. Printing key-by-key:")
        for key, value in record.items():
            print(f"{key}: {type(value)}")
            if isinstance(value, (dict, list)):
                try:
                    print(json.dumps(value, indent=2)[:500])
                    if len(json.dumps(value)) > 500:
                        print("... (truncated)")
                except:
                    print(f"  Complex structure with {len(value) if hasattr(value, '__len__') else 'unknown'} items")
            else:
                print(f"  {str(value)[:500]}")
                if len(str(value)) > 500:
                    print("... (truncated)")

Loading the Berkeley NEST Nectar dataset...
Dataset loaded with 182954 entries.

Dataset Structure:
Dictionary with keys: ['train']
Train set length: 182954

Detailed Exploration of First 3 Records:

===== Record 0 =====
Keys: ['prompt', 'answers', 'turns', 'num_responses', 'source', 'good_natured']
{
  "prompt": "\n\nHuman: 0.002 = 1000 \n1 = x?\n\nAssistant: ",
  "answers": [
    {
      "answer": "To find the value of x, we can set up a proportion using the given information:\n\n0.002/1000 = 1/x\n\nTo solve for x, we can cross multiply:\n\n0.002 * x = 1000 * 1\n\n0.002x = 1000\n\nDividing both sides by 0.002:\n\nx = 1000 / 0.002\n\nx = 500,000\n\nTherefore, 1 is equal to 500,000 in this proportion.",
      "model": "gpt-3.5-turbo",
      "rank": 1.0
    },
    {
      "answer": "To solve for x in this equation, you can use cross-multiplication. Here's how:\n\n0.002 / 1 = 1000 / x\n\nCross-multiplication gives:\n\n0.002 * x = 1000\n\nTo solve for x, divide both sides by 0.002:\n\nx =

## 4. Data Preprocessing

We need to extract the relevant comparison data from the dataset to calculate Glicko-2 ratings.

In [14]:
def extract_comparisons(dataset):
    """
    Extract comparison data from the Nectar dataset.
    
    The exact structure depends on the dataset, but we're looking for:
    - Pairs of items that were compared
    - The outcome of the comparison (which one was preferred)
    """
    comparisons = []
    
    # This implementation may need to be adjusted based on the actual dataset structure
    for entry in tqdm(dataset['train']):
        # Assuming the dataset has fields like 'chosen', 'rejected', or similar
        # If the structure is different, we'll need to adapt this
        
        # Check for different possible field names used in preference datasets
        if 'chosen_id' in entry and 'rejected_id' in entry:
            comparisons.append({
                'winner': entry['chosen_id'],
                'loser': entry['rejected_id']
            })
        elif 'winner' in entry and 'loser' in entry:
            comparisons.append({
                'winner': entry['winner'],
                'loser': entry['loser']
            })
        elif 'preferred' in entry and 'dispreferred' in entry:
            comparisons.append({
                'winner': entry['preferred'],
                'loser': entry['dispreferred']
            })
        elif 'prompt' in entry and 'chosen' in entry and 'rejected' in entry:
            # Common structure in preference datasets
            comparisons.append({
                'winner': f"{entry['prompt']}_chosen",
                'loser': f"{entry['prompt']}_rejected"
            })
        # Add more cases as needed based on dataset exploration
        
    print(f"Extracted {len(comparisons)} comparisons")
    return comparisons

# Extract comparisons from the dataset
comparisons = extract_comparisons(nectar_dataset)

  0%|          | 0/182954 [00:00<?, ?it/s]

Extracted 0 comparisons


## 5. Implementing Glicko-2 Rating System

Now we'll use the glicko2 library to calculate ratings for each item based on the comparison data.

In [None]:
def calculate_glicko2_ratings(comparisons):
    """
    Calculate Glicko-2 ratings for all items based on comparison results.
    
    Returns a dictionary mapping item IDs to their Player objects containing rating, RD, and volatility.
    """
    # Initialize players (items) with default Glicko-2 values
    players = {}
    item_ids = set()
    
    # Collect all unique item IDs
    for comp in comparisons:
        item_ids.add(comp['winner'])
        item_ids.add(comp['loser'])
    
    # Initialize all players
    for item_id in item_ids:
        players[item_id] = Player()
    
    print(f"Initialized {len(players)} items with default Glicko-2 ratings")
    
    # Process comparisons to update ratings
    # Group comparisons into rating periods (batches)
    batch_size = 50  # Number of comparisons per rating period
    num_batches = len(comparisons) // batch_size + (1 if len(comparisons) % batch_size > 0 else 0)
    
    print(f"Processing {len(comparisons)} comparisons in {num_batches} rating periods...")
    
    for batch_idx in tqdm(range(num_batches)):
        start_idx = batch_idx * batch_size
        end_idx = min((batch_idx + 1) * batch_size, len(comparisons))
        batch_comparisons = comparisons[start_idx:end_idx]
        
        # Track matches each player participates in during this period
        player_matches = defaultdict(list)
        
        # Record all matches for this period
        for comp in batch_comparisons:
            winner_id = comp['winner']
            loser_id = comp['loser']
            
            # Winner beats loser (winner=1, loser=0)
            player_matches[winner_id].append((players[loser_id], 1))
            player_matches[loser_id].append((players[winner_id], 0))
        
        # Update ratings for all players who had matches this period
        for player_id, matches in player_matches.items():
            players[player_id].update_player(matches)
    
    return players

# Calculate Glicko-2 ratings
player_ratings = calculate_glicko2_ratings(comparisons)

## 6. Creating the Reward Function

Now we'll implement our reward function: Rating - Volatility

In [None]:
def calculate_rewards(players, scaling_factor=1.0):
    """
    Calculate rewards for each item based on the formula: Rating - (scaling_factor * Volatility)
    
    Args:
        players: Dictionary mapping item IDs to Player objects
        scaling_factor: Factor to scale volatility by in the reward calculation
        
    Returns:
        Dictionary mapping item IDs to their reward values
    """
    rewards = {}
    
    for item_id, player in players.items():
        # Extract Glicko-2 parameters
        rating = player.rating
        volatility = player.vol
        
        # Calculate reward
        reward = rating - (scaling_factor * volatility)
        rewards[item_id] = reward
    
    return rewards

# Calculate rewards using our function
item_rewards = calculate_rewards(player_ratings)

# Display top and bottom 10 items by reward
sorted_items = sorted(item_rewards.items(), key=lambda x: x[1], reverse=True)

print("Top 10 items by reward:")
for item_id, reward in sorted_items[:10]:
    player = player_ratings[item_id]
    print(f"Item: {item_id}, Reward: {reward:.2f}, Rating: {player.rating:.2f}, RD: {player.rd:.2f}, Volatility: {player.vol:.6f}")

print("\nBottom 10 items by reward:")
for item_id, reward in sorted_items[-10:]:
    player = player_ratings[item_id]
    print(f"Item: {item_id}, Reward: {reward:.2f}, Rating: {player.rating:.2f}, RD: {player.rd:.2f}, Volatility: {player.vol:.6f}")

## 7. Analyzing Reward Distribution

Let's visualize the distribution of rewards and examine the relationship between ratings and volatility.

In [None]:
# Create dataframe with Glicko-2 parameters and rewards
data = []
for item_id, player in player_ratings.items():
    data.append({
        'item_id': item_id,
        'rating': player.rating,
        'rating_deviation': player.rd,
        'volatility': player.vol,
        'reward': item_rewards[item_id]
    })

df = pd.DataFrame(data)

# Plot the distribution of rewards
plt.figure(figsize=(10, 6))
plt.hist(df['reward'], bins=30, edgecolor='black')
plt.title('Distribution of Rewards (Rating - Volatility)')
plt.xlabel('Reward')
plt.ylabel('Frequency')
plt.grid(alpha=0.3)
plt.show()

# Scatterplot of Rating vs Volatility, colored by Reward
plt.figure(figsize=(12, 8))
scatter = plt.scatter(df['rating'], df['volatility'], c=df['reward'], 
                     cmap='viridis', alpha=0.7, s=50)
plt.colorbar(scatter, label='Reward')
plt.title('Rating vs Volatility (colored by Reward)')
plt.xlabel('Rating')
plt.ylabel('Volatility')
plt.grid(alpha=0.3)
plt.show()

# Calculate correlation matrix
correlation = df[['rating', 'rating_deviation', 'volatility', 'reward']].corr()
print("Correlation Matrix:")
print(correlation)

## 8. Experimental: Adjusting the Scaling Factor

We can experiment with different scaling factors for volatility to find an optimal balance.

In [None]:
# Test different scaling factors
scaling_factors = [0.1, 0.5, 1.0, 2.0, 5.0, 10.0]
reward_variations = {}

for factor in scaling_factors:
    rewards = calculate_rewards(player_ratings, scaling_factor=factor)
    reward_variations[factor] = rewards
    
    # Get top 5 items for this scaling factor
    top_items = sorted(rewards.items(), key=lambda x: x[1], reverse=True)[:5]
    
    print(f"\nTop 5 items with scaling factor {factor}:")
    for item_id, reward in top_items:
        player = player_ratings[item_id]
        print(f"Item: {item_id}, Reward: {reward:.2f}, Rating: {player.rating:.2f}, Volatility: {player.vol:.6f}")

# Calculate rank correlation between different scaling factors
print("\nSpearman Rank Correlation between different scaling factors:")
for i, factor1 in enumerate(scaling_factors[:-1]):
    for factor2 in scaling_factors[i+1:]:
        # Create ranked lists
        items1 = sorted(reward_variations[factor1].keys(), 
                        key=lambda x: reward_variations[factor1][x], 
                        reverse=True)
        items2 = sorted(reward_variations[factor2].keys(), 
                        key=lambda x: reward_variations[factor2][x], 
                        reverse=True)
        
        # Convert to ranks
        ranks1 = {item: rank for rank, item in enumerate(items1)}
        ranks2 = {item: rank for rank, item in enumerate(items2)}
        
        # Calculate correlation
        common_items = set(ranks1.keys()) & set(ranks2.keys())
        if common_items:
            rank_pairs = [(ranks1[item], ranks2[item]) for item in common_items]
            rank_array = np.array(rank_pairs)
            correlation = np.corrcoef(rank_array[:, 0], rank_array[:, 1])[0, 1]
            print(f"Correlation between factor {factor1} and {factor2}: {correlation:.4f}")

## 9. Saving the Results

Finally, let's save our calculated rewards to a CSV file and return the final reward function.

In [None]:
# Save the results to a CSV file
df.to_csv('nectar_glicko2_rewards.csv', index=False)
print("Results saved to 'nectar_glicko2_rewards.csv'")

# Define our final reward function
def glicko2_reward(rating, volatility, scaling_factor=1.0):
    """
    Calculate the reward as Rating - (scaling_factor * Volatility)
    
    Args:
        rating: Glicko-2 rating value
        volatility: Glicko-2 volatility value
        scaling_factor: Factor to scale volatility by (default=1.0)
        
    Returns:
        The calculated reward value
    """
    return rating - (scaling_factor * volatility)

print("\nFinal Reward Function:")
print("Reward = Rating - (scaling_factor * Volatility)")
print("\nWhere:")
print("- Rating is the Glicko-2 rating value")
print("- Volatility is the Glicko-2 volatility parameter")
print("- scaling_factor controls the penalty for volatility (default=1.0)")