# P02: From Boltzmann to Choice (PhD Level)

## 1. Introduction: The Statistical Mechanics of Decision Making

In undergraduate courses, the **Multinomial Logit (MNL)** model is often presented simply as a formula:
$$ P_n(i) = \frac{e^{\beta V_{ni}}}{\sum_j e^{\beta V_{nj}}} $$

In this PhD-level notebook, we will derive this from first principles (**Random Utility Maximization**), expose its fundamental limitation (**IIA property**), and resolve it using a hierarchical structure (**Nested Logit**). Finally, we will prove that the denominator of the Logit model is exactly the **Partition Function ($Z$)** and relates directly to **Consumer Surplus (Free Energy)**.

### Core Mappings
| Urban Economics (RUM) | Statistical Physics (Canonical Ensemble) |
| :--- | :--- |
| Utility $U_j = V_j + \varepsilon_j$ | Energy $E_j$ (with thermal noise) |
| Deterministic Utility $V_j$ | Ground State Energy $-E_j$ |
| Sensitivity $\beta$ | Inverse Temperature $\beta = 1/k_B T$ |
| Gumbel Noise $\varepsilon$ | Heat Bath Fluctuations |
| Choice Probability $P(j)$ | Boltzmann Factor $\frac{1}{Z}e^{-\beta E_j}$ |
| **Logsum** $\ln \sum e^{\beta V_j}$ | **Free Energy** $F = -k_B T \ln Z$ |

## 2. Derivation: Why Gumbel?

Why does the error term $\varepsilon$ have to be Gumbel distributed for the choice probability to be Logit (Boltzmann)?

If $U_j = V_j + \varepsilon_j$, and we choose $j$ if $U_j > U_k \forall k \neq j$. 
If $\varepsilon$ follows a **Gumbel(0, $\mu$)** distribution:
$$ f(\varepsilon) = \mu e^{-\mu(\varepsilon + e^{-\mu \varepsilon})} $$
Then the max of i.i.d Gumbel variables is also Gumbel (stability), and the difference of two Gumbel variables is Logistic distributed.

This stability property is analogous to how **Gaussian** is stable under addition (Central Limit Theorem). **Gumbel** is stable under maximization (Extreme Value Theory).

In [None]:
import numpy as np
import pandas as pd
import scipy.optimize as opt
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

## 3. The Red Bus / Blue Bus Paradox (IIA Failure)

The standard MNL assumes **Independence of Irrelevant Alternatives (IIA)**: $\frac{P(A)}{P(B)}$ should not depend on whether option C exists.

**Scenario**:
- Originally: Car vs Red Bus. Utilities equal ($V_{car} = V_{red}$). $P(Car)=0.5, P(Red)=0.5$.
- Disruption: Introduce "Blue Bus". $V_{blue} = V_{red}$ (perfect substitute).
- MNL Prediction: $P(Car) = P(Red) = P(Blue) = 1/3$.
- Reality: Bus share stays 0.5, split between Red and Blue. $P(Car)=0.5, P(Red)=0.25, P(Blue)=0.25$.

MNL fails fundamentally here because it treats Blue Bus as a distinct, independent mode (like a train).

In [None]:
def logit_prob(V_array, beta=1.0):
    # V_array: shape (N_alts,)
    exp_v = np.exp(beta * np.array(V_array))
    return exp_v / np.sum(exp_v)

print("Scenario 1: Car vs Red Bus (Equal Utility)")
probs_1 = logit_prob([1.0, 1.0]) # [Car, Red Bus]
print(f"P(Car)={probs_1[0]:.2f}, P(Red)={probs_1[1]:.2f}")

print("\nScenario 2: Car vs Red Bus vs Blue Bus (MNL Prediction)")
probs_2 = logit_prob([1.0, 1.0, 1.0]) # [Car, Red Bus, Blue Bus]
print(f"P(Car)={probs_2[0]:.2f}, P(Red)={probs_2[1]:.2f}, P(Blue)={probs_2[2]:.2f}")
print("-> Note: Car share drops from 50% to 33%. This is the IIA fallacy.")

## 4. The Solution: Nested Logit (Renormalization)

We group Red Bus and Blue Bus into a **Nest** called "Bus".

**Step 1**: Calculate "inclusive value" (Logsum) of the nest.
$$ I_{bus} = \ln (e^{V_{red}} + e^{V_{blue}}) $$

**Step 2**: Calculate upper level choice (Car vs Bus).
$$ P_{bus} = \frac{e^{\mu I_{bus}}}{e^{V_{car}} + e^{\mu I_{bus}}} $$
Where $\mu$ is a scale parameter ($0 < \mu \le 1$). If $\mu=1$, it collapses back to MNL.

**Physics Interpretation**: This is exactly **Coarse Graining (Renormalization)**. We trace out the micro-states (Red/Blue) and replace them with a single macro-state (Bus) with an effective free energy ($F_{eff} = -kT \cdot I_{bus}$).

In [None]:
def nested_logit_prob(V_car, V_red, V_blue, mu=0.5):
    # 1. Lower nest (Red, Blue)
    # Inclusive Value (Logsum) of the nest
    IV_bus = np.log(np.exp(V_red) + np.exp(V_blue))
    
    # 2. Upper level (Car, Bus_Nest)
    # Utility of nest = mu * IV
    V_upper = np.array([V_car, mu * IV_bus])
    exp_V_upper = np.exp(V_upper)
    probs_upper = exp_V_upper / np.sum(exp_V_upper) # [P_car, P_bus]
    
    # 3. Conditional probs inside nest
    P_red_given_bus = np.exp(V_red) / (np.exp(V_red) + np.exp(V_blue))
    P_blue_given_bus = np.exp(V_blue) / (np.exp(V_red) + np.exp(V_blue))
    
    # 4. Final probabilities
    P_car = probs_upper[0]
    P_red = probs_upper[1] * P_red_given_bus
    P_blue = probs_upper[1] * P_blue_given_bus
    
    return P_car, P_red, P_blue

print("\nScenario 3: Nested Logit (mu=0.5)")
P_car, P_red, P_blue = nested_logit_prob(1.0, 1.0, 1.0, mu=0.5)
print(f"P(Car)={P_car:.2f}")
print(f"P(Red)={P_red:.2f}")
print(f"P(Blue)={P_blue:.2f}")
print("-> Success! Car share is preserved (approx 0.5), while bus share is split.")

## 5. Consumer Surplus as Free Energy

In economics, the **Consumer Surplus (CS)** is the expected maximum utility a person gets from the choice set.
$$ CS = E[\max_j (V_j + \varepsilon_j)] $$

Williams (1977) proved that for Logit (Gumbel noise), this has a closed form:
$$ CS = \frac{1}{\beta} \ln \left( \sum_j e^{\beta V_j} \right) + C $$

Compare this to Helmholtz Free Energy:
$$ F = -k_B T \ln Z = -\frac{1}{\beta} \ln \left( \sum_j e^{-\beta E_j} \right) $$

Notice the sign flip ($V_j \leftrightarrow -E_j$).

**Physics Meaning**: We minimize Free Energy $F = U - TS$. 
**Economics Meaning**: We maximize Surplus $CS = \text{Mean Utility} + \text{Entropy Benefit}$.

The term $\ln \sum e^{\beta V_j}$ is not just a mathematical trick; it measures the **Value of Variety**.

In [None]:
# Calculating Value of Variety
def calculate_cs(V_options, beta=1.0):
    exp_sum = np.sum(np.exp(beta * np.array(V_options)))
    return (1/beta) * np.log(exp_sum)

cs_1_option = calculate_cs([1.0])
cs_2_options = calculate_cs([1.0, 1.0])
cs_3_options = calculate_cs([1.0, 1.0, 1.0])

print(f"Surplus (1 option): {cs_1_option:.4f}")
print(f"Surplus (2 options): {cs_2_options:.4f}")
print(f"Surplus (3 options): {cs_3_options:.4f}")
print(f"Incremental benefit of 2nd option: {cs_2_options - cs_1_option:.4f} (Should beln(2) when beta=1)")