# 🌌 LLM Thought Landscape Visualization
**Understanding Reasoning Patterns in Large Language Models**  

*Visual Diagnostics for Chain-of-Thought Reasoning*  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QzAw5bW6RO1v-Tb68dowj5562nN3Cv_c?usp=sharing)  



![demo](./imgs/demo.png)

---

## Table of Contents
1. [Background & Motivation](#1)
2. [Data Preparation](#2)
3. [Visualization](#3) 
   

<a id="1"></a>
## 1. Background & Motivation

### Why Thought Landscapes?
Modern LLMs demonstrate impressive reasoning capabilities through techniques like chain-of-thought prompting, but their internal decision-making processes remain opaque. This notebook implements our novel visualization method that:

1. **Maps reasoning paths** as 2D landscapes using semantic distance metrics
2. **Identifies critical patterns**: 
   - Model capability boundaries
   - Answer confidence levels
   - Consistency between reasoning steps
3. **Supports model diagnostics** through:
   - Weak/Strong model differentiation
   - Error cluster detection
   - Uncertainty quantification

**Key Innovation**: First method enabling direct visual comparison of different reasoning strategies (CoT, LeastToMost, ToT, and MCTS) across multiple choice questions.

<a id="2"></a>
## 2. Data Preparation

We provide the exact same data used for visulizing the plot in our paper, you can use the following command to download the data.


In [None]:
!git lfs clone git@hf.co:datasets/GazeEzio/Landscape-Data

The expected file tree is:

```
Landscape-of-Thoughts/
│    ...
├──  Landscape-Data/
│    ├── aqua
│    │   ├── distance_matrix
│    │   └── thoughts
│    └──...
│   ...
```

In [None]:
from step_3_plot_landscape import draw
from utils.visual_utils import *

ROOT="./Landscape-Data"
MODEL="Meta-Llama-3.1-70B-Instruct-Turbo"
DATASET="aqua"
METHOD="cot"
TOTAL_SAMPLE=50
SAMPLE_INDEX=1

print("==> Loading data for single problem...")
(
    coordinates_2d, num_thoughts_each_chain, num_chains, labels_anchors, 
    answer_gt_short, anchors_idx_x, num_all_thoughts
) = process_single_thought_file(
    thoughts_file=f'{ROOT}/{DATASET}/thoughts/{MODEL}--{METHOD}--{DATASET}--{SAMPLE_INDEX}.json', 
)

print("==> Loading data for all the problems...")
list_all_T_2D, A_matrix_2D, list_plot_data, list_num_all_thoughts_w_start_list = process_data(
    model='Meta-Llama-3.1-70B-Instruct-Turbo', 
    dataset='aqua', 
    plot_type='method',
    total_sample=50,
    ROOT=ROOT
)

<a id="3"></a>
## 3. Visualization

Plot all the samples

In [None]:
print("==> Drawing...")
for plot_datas, splited_T_2D, num_all_thoughts_w_start_list in zip(list_plot_data, list_all_T_2D, list_num_all_thoughts_w_start_list):
    fig = draw(
        dataset_name='aqua', 
        plot_datas=plot_datas, splited_T_2D=splited_T_2D, A_matrix_2D=A_matrix_2D, num_all_thoughts_w_start_list=num_all_thoughts_w_start_list, 
    )
    fig.show()

Plot single sample

In [None]:
print("==> Drawing...")
fig = plot_chain_animation(num_chains, num_thoughts_each_chain, coordinates_2d, anchors_idx_x, labels_anchors, answer_gt_short)
fig.show()