<a href="https://colab.research.google.com/github/vijaygwu/posts/blob/main/Explain_Arg_Max_MLE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The code generates an interactive heatmap visualization that represents a "likelihood landscape" and highlights the maximum likelihood estimate (MLE) on it. Let's break down its functionality step-by-step:

1. **Data Generation and Preparation:**

   * It creates sample data for the likelihood landscape using `linspace` to generate evenly spaced values for two parameters, `Theta 1` and `Theta 2`.
   * A grid of all possible combinations of these parameter values is created using `np.meshgrid`.
   * An example likelihood function is defined (you would replace this with your actual likelihood function). In this case, it assumes a simple bivariate normal distribution centered at (0, 0).
   * The maximum likelihood estimates for `Theta 1` and `Theta 2` are set (you would replace these with your actual MLE values).
   * The data is organized into a Pandas DataFrame, flattening the 2D grid into 1D arrays for plotting.

2. **Data Aggregation and Transformation:**

   * The continuous ranges of `Theta 1` and `Theta 2` values are divided into 20 equal-width bins using `pd.cut`.
   * The data is grouped by these bins, and the mean likelihood within each bin is calculated.
   * The bin intervals, which are of type `Interval`, are converted to strings for compatibility with JSON serialization when saving the chart.

3. **Visualization Creation:**

   * An Altair heatmap is created using the aggregated data. The x and y axes represent the bins for `Theta 1` and `Theta 2`, and the color intensity of each rectangle corresponds to the mean likelihood within that bin.
   * A black point is added to the chart to mark the location of the maximum likelihood estimate.
   * The heatmap and the MLE point are combined into a single layered chart.
   * The chart is made interactive, allowing users to zoom, pan, and hover over elements for more details.

4. **Output and Display:**

   * The chart is saved as a JSON file named "likelihood_landscape_heatmap.json".
   * The chart is displayed within the Colab notebook using the `display()` function from IPython.display.
   * A message is printed to indicate that the visualization has been saved.

In essence, this code visualizes how the likelihood function varies across different combinations of parameter values, helping to understand the process of finding the MLE - the parameter values that maximize the likelihood of observing the given data.

In [None]:
import altair as alt
import numpy as np
import pandas as pd
from IPython.display import display  # Import display function

# Sample data for the likelihood landscape
theta1 = np.linspace(-2, 2, 100)
theta2 = np.linspace(-2, 2, 100)
theta1_grid, theta2_grid = np.meshgrid(theta1, theta2)

# Example likelihood function (replace with your actual function)
likelihood = np.exp(-(theta1_grid**2 + theta2_grid**2))

# Maximum likelihood estimate
mle_theta1 = 0
mle_theta2 = 0

# Create the data for the plot
data = pd.DataFrame({
    'Theta 1': theta1_grid.flatten(),
    'Theta 2': theta2_grid.flatten(),
    'Likelihood': likelihood.flatten()
})

# Aggregate data into bins for Theta 1 and Theta 2
data['Theta 1 Bin'] = pd.cut(data['Theta 1'], bins=20)
data['Theta 2 Bin'] = pd.cut(data['Theta 2'], bins=20)

# Calculate mean Likelihood for each bin
aggregated_data = data.groupby(['Theta 1 Bin', 'Theta 2 Bin']).agg(Mean_Likelihood=('Likelihood', 'mean')).reset_index()

# Convert Interval type columns to string for JSON serialization
aggregated_data['Theta 1 Bin'] = aggregated_data['Theta 1 Bin'].astype(str)
aggregated_data['Theta 2 Bin'] = aggregated_data['Theta 2 Bin'].astype(str)

# Create the heatmap
heatmap = alt.Chart(aggregated_data).mark_rect().encode(
    x='Theta 1 Bin:O',
    y='Theta 2 Bin:O',
    color='Mean_Likelihood:Q',
    tooltip=['Theta 1 Bin', 'Theta 2 Bin', 'Mean_Likelihood']
).properties(
    title='Likelihood Landscape and Maximum Likelihood Estimate'
)

# Add the MLE point
mle_point = alt.Chart(pd.DataFrame({
    'Theta 1': [mle_theta1],
    'Theta 2': [mle_theta2]
})).mark_point(size=100, color='black').encode(
    x='Theta 1:Q',
    y='Theta 2:Q'
)

# Combine the heatmap and the MLE point
chart = alt.layer(heatmap, mle_point).resolve_scale(
    color='independent'
).interactive()

# Save the chart as a JSON file
chart.save('likelihood_landscape_heatmap.json')

# Display the chart in Colab
display(chart)

print("Visualization saved as likelihood_landscape_heatmap.json")

  aggregated_data = data.groupby(['Theta 1 Bin', 'Theta 2 Bin']).agg(Mean_Likelihood=('Likelihood', 'mean')).reset_index()
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


Visualization saved as likelihood_landscape_heatmap.json


The "Theta bins" in the heatmap visualization refer to the intervals or ranges into which the values of the parameters `Theta 1` and `Theta 2` have been divided.

In the code that generated the heatmap, the lines `data['Theta 1 Bin'] = pd.cut(data['Theta 1'], bins=20)` and `data['Theta 2 Bin'] = pd.cut(data['Theta 2'], bins=20)` are responsible for creating these bins. The `pd.cut` function takes the continuous range of values for each Theta parameter and divides them into 20 equal-width intervals or "bins".

The purpose of binning is to aggregate the data and make it suitable for visualization as a heatmap. Instead of plotting the likelihood for every single combination of `Theta 1` and `Theta 2` values, we calculate the average likelihood within each bin and represent it with a colored rectangle in the heatmap. This allows us to see the overall trend of the likelihood landscape more clearly.

The labels on the axes of the heatmap, such as '(-2.0, -1.8]' represent these bins. The interval notation indicates the range of `Theta` values that fall within that particular bin. For example, '(-2.0, -1.8]' includes all `Theta` values that are greater than -2.0 and less than or equal to -1.8.

The heatmap visualizes the "likelihood landscape" for a statistical model with two parameters, Theta 1 and Theta 2. The color intensity of each rectangle in the heatmap represents the average likelihood associated with a specific range of Theta 1 and Theta 2 values. The black dot marks the Maximum Likelihood Estimate (MLE), which is the combination of Theta 1 and Theta 2 values that maximizes the likelihood of observing the given data.

**Key points for understanding the heatmap:**

* **Axes:** The x-axis and y-axis represent the ranges or "bins" for Theta 1 and Theta 2, respectively. Each bin encompasses a specific interval of values for the corresponding parameter.
* **Color Intensity:** The color intensity of each rectangle indicates the average likelihood within that bin. Brighter colors represent higher likelihood values, suggesting that those combinations of Theta 1 and Theta 2 are more likely to have generated the observed data.
* **Maximum Likelihood Estimate (MLE):** The black dot pinpoints the MLE, which is the combination of Theta 1 and Theta 2 that yields the highest likelihood. It represents the "best" parameter estimates based on the given data.
* **Overall Trend:** The heatmap provides a visual representation of how the likelihood function varies across different combinations of Theta 1 and Theta 2. It helps to understand the shape of the likelihood landscape and identify regions of high and low likelihood.

**Interpreting the heatmap:**

* **High Likelihood Regions:** Areas with brighter colors indicate combinations of Theta 1 and Theta 2 that are more likely to explain the observed data.
* **Low Likelihood Regions:** Areas with darker colors represent parameter combinations that are less likely to have generated the data.
* **MLE Significance:** The MLE, marked by the black dot, is the most probable explanation for the data based on the model and the available information.
* **Exploration:** The interactive nature of the heatmap allows you to zoom, pan, and hover over specific bins to get more detailed information about the likelihood values and parameter ranges.

By understanding these key points, you can effectively interpret the heatmap and gain insights into the likelihood landscape and the MLE for the given statistical model.