# Proof that the maximum score is 30

In this brief notebook we will show that the maximum score for the competition is 30.

For this we just need to notice how the scoring metric is computed:

For each essay we have LLM scores q1, q2, q3, and we therefore have avg_q = mean(q1,q2,q3), and var_q = var(q1,q2,q3). When computing the final score we take the mean of var_q and avg_q in all essays. 

**score = A * B**

Where **A = mean(var_q)/(max_q - mean(avg_q))** 

And **B = (avg_e/avg_s_clipped)**

**The max for B is 1/0.2 = 5**

**What is the max of A?** **We will show it is 6**

## Optimization 

We need to maximize mean(var_q) and mean(avg_q) simultaneously so that we maximize the score. Since var_q and avg_q are computed independently for every essay, we need only maximize var_q and avg_q for each essay, so we need to find values q1, q2, q3 that accomplish this.

In [None]:
import numpy as np
from scipy.optimize import minimize

# Parameters
max_q = 9

# Define the final score function
def final_score(params):
    q1, q2, q3 = params
    avg_q = (q1 + q2 + q3) / 3
    var_q = np.var([q1, q2, q3])
    score = var_q / (max_q - avg_q)
    return -score  # Negate for minimization

# Initial guess
initial_guess = [9.0, 5.0, 4.0]

# Bounds for q1, q2, q3
bounds = [(0, max_q), (0, max_q), (0, max_q)]

# Optimize
result = minimize(final_score, initial_guess, bounds=bounds)
optimal_q1, optimal_q2, optimal_q3 = result.x
max_score = -result.fun

print(f"Optimal q1: {optimal_q1:.4f}")
print(f"Optimal q2: {optimal_q2:.4f}")
print(f"Optimal q3: {optimal_q3:.4f}")
print(f"Maximum Final Score: {max_score:.4f}")

So we see that if we can be perfect in two judges and absolutely aweful in the third one, we can achieve the maximum score.

## Score surface visualization

We can visualize the surface for q3 = 0 just for fun

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Parameters
max_q = 9
num_points = 50  # Number of points for each dimension

# Define the final score function
def final_score(q1, q2, q3):
    avg_q = (q1 + q2 + q3) / 3
    var_q = np.var([q1, q2, q3])
    if avg_q >= max_q:  # To prevent division by zero or negative scores
        return 0
    return var_q / (max_q - avg_q)

# Create a grid of q1, q2, and q3 values
q1_vals = np.linspace(0, max_q, num_points)
q2_vals = np.linspace(0, max_q, num_points)
q3_vals = np.linspace(0, max_q, num_points)
q1, q2, q3 = np.meshgrid(q1_vals, q2_vals, q3_vals)

# Compute scores for each combination
scores = np.zeros_like(q1)
for i in range(num_points):
    for j in range(num_points):
        for k in range(num_points):
            scores[i, j, k] = final_score(q1[i, j, k], q2[i, j, k], q3[i, j, k])

# Visualize the score as a function of q1, q2 (fix q3 at zero value, or change to experiment)
fixed_q3_idx = 0 #num_points -1
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(q1[:, :, fixed_q3_idx], q2[:, :, fixed_q3_idx], scores[:, :, fixed_q3_idx], cmap='viridis', alpha=0.8)

# Customize plot
ax.set_xlabel("q1")
ax.set_ylabel("q2")
ax.set_zlabel("Score")
ax.set_title(f"Score Surface for Fixed q3 = {q3_vals[fixed_q3_idx]:.2f}")
fig.colorbar(surf, shrink=0.5, aspect=10)

plt.show()


Therefore if we could optimize our essays to be amazing in two of the three dimensions, and extremely bad in the third one we would maximize the score and achieve A x B = 6 x 5 = 30