# Trip Planner — Evaluation Interface Demo

This notebook demonstrates the minimal evaluation interface you can import from `trip_planner.session`.

It covers:
- **What** interfaces are designed
- **Why** we designed them this way
- **How** evaluators should use them in practice


## 1) What interfaces are designed

We expose a single lightweight class: **`Session`**.

**Constructor**  
```python
from trip_planner.session import Session

# Create a new session (with background info as memory index 0)
sess = Session(background_info: str)

# Load an existing session by ID
sess = Session(session_id: str)


**Arguments**

1. background_info: initial user profile / background description. Stored as history item (mem_index=0) and inserted into session-scoped long-term memory (LTM) with kind="profile" so it is retrievable.

2. session_id: if provided, the constructor loads persisted history + memory from disk.

**Methods**

```python
sess.append_message(content: str, owner: Literal["user","agent"]) -> None
sess.get_history() -> List[Dict]
sess.chat(user_request: str, context_size: int = 6, use_ltm: bool = True, store_to_cache: bool = False, verbose: bool = False) -> str
```

1. append_message: Replay dataset turns (both user and agent) verbatim, without calling the model. Each appended message gets a mem_index (0-based, where 0 is background).

2. get_history: Returns a list of message records with minimal provenance fields: {mem_index, owner, content, gen_by_engine, context_size, use_ltm, memory_injected, use_tools}

3. chat: One agent turn. Controls context trimming (context_size), toggles LTM retrieval (use_ltm), and optionally writes a compact Q/A summary to LTM (store_to_cache). Returns the agent’s response.

## 2) Why we designed it this way

**Simplicity & Reproducibility**
- Evaluation is **session-level** (no cross-session leakage) to compare `use_ltm=True` vs `False` fairly.
- One class (`Session`) covers create/load/append/chat to minimize moving parts.

**Deterministic context policy**
- `context_size` is enforced via a deterministic trimming policy (implemented in `context.trim_context`).

**Unified long-term memory**
- Background info is stored as `mem_index=0` and also written to a session-scoped LTM as `kind="profile"` so it can be retrieved when relevant.
- The same LTM is optionally updated with compact Q/A summaries when `store_to_cache=True`.

**Traceability**
- Every generated agent message is stored with `gen_by_engine=True` and lists `memory_injected` mem_indices (e.g., includes `0` if background was retrieved), plus `use_tools` information.


## 3) How evaluators should use it

In [1]:
import os

os.chdir("..")

### A. Create or load a session
- For a fresh run, provide `background_info` once.  
- For resuming, pass `session_id`.

In [2]:
from backend.trip_planner.session import Session

# Fresh session with background (mem_index = 0)
sess = Session(background_info="User Name: Alice. Prefers quiet places and local food.")
sess_id = sess.session_id
sess_id

's_1b4c29ef'

### B. Preload conversation history
If your dataset includes prior turns, you can **replay** them without calling the model:


In [3]:
sess.append_message("We will land in Tokyo on Friday evening.", owner="user")
sess.append_message("Great. Do you need late-night dining options near your hotel?", owner="agent")
sess.get_history()

[{'mem_index': 0,
  'owner': 'user',
  'content': 'We will land in Tokyo on Friday evening.',
  'gen_by_engine': False,
  'context_size': 0,
  'use_ltm': False,
  'memory_injected': [],
  'use_tools': []},
 {'mem_index': 1,
  'owner': 'agent',
  'content': 'Great. Do you need late-night dining options near your hotel?',
  'gen_by_engine': False,
  'context_size': 0,
  'use_ltm': False,
  'memory_injected': [],
  'use_tools': []}]

### C. Run a turn with and without LTM
Use the exact same context size to compare the impact of retrieval.


In [4]:
resp_ltm_on  = sess.chat("Plan a 1-day Kyoto route avoiding crowds.", context_size=6, use_ltm=True,  store_to_cache=False, verbose=True)
print("\nWith LTM:\n", resp_ltm_on["content"][:800], "...\n\n")

resp_ltm_off = sess.chat("Plan a 1-day Kyoto route avoiding crowds.", context_size=6, use_ltm=False, store_to_cache=True, verbose=False)
print("\nWithout LTM:\n", resp_ltm_off["content"][:800], "...\n")


[Mem] query: 'Plan a 1-day Kyoto route avoiding crowds.'
[Mem] alpha= 0.7 min_sim= 0.55 half_life_days= 14.0
[Mem] ---- top candidates ----
[Mem] rank |  cos   kw    td    fused | pass | preview
[Mem]  1   |  0.30  0.08  1.00  0.232 |  X   | Q: We will land in Tokyo on Friday evening. A: Great. Do you need late-night din...
[Mem]  2   |  0.19  0.00  1.00  0.130 |  X   | User Name: Alice. Prefers quiet places and local food....
[Mem] no items >= min_sim(0.55); fallback to top-4.
[Mem] returned 2 item(s) with fused scores: [0.232, 0.130]

[INFO] search_tool is called. Executing...

With LTM:
 Here's a suggested 1-day itinerary for Kyoto that focuses on quieter spots and local experiences:

### Morning
1. **Arashiyama Bamboo Grove**: Start early to enjoy the serene atmosphere. Arrive before 8 AM to avoid crowds.
2. **Tenryu-ji Temple**: Visit this UNESCO World Heritage site nearby. Explore the gardens and enjoy the tranquility.

### Late Morning
3. **Okochi Sanso Villa**: A short walk fr

### D. Inspect history with provenance
Each message record keeps minimal but sufficient fields for grading and analysis.

In [5]:
sess.get_history()

[{'mem_index': 0,
  'owner': 'user',
  'content': 'We will land in Tokyo on Friday evening.',
  'gen_by_engine': False,
  'context_size': 0,
  'use_ltm': False,
  'memory_injected': [],
  'use_tools': []},
 {'mem_index': 1,
  'owner': 'agent',
  'content': 'Great. Do you need late-night dining options near your hotel?',
  'gen_by_engine': False,
  'context_size': 0,
  'use_ltm': False,
  'memory_injected': [],
  'use_tools': []},
 {'mem_index': 2,
  'owner': 'user',
  'content': 'Plan a 1-day Kyoto route avoiding crowds.',
  'gen_by_engine': False,
  'context_size': 0,
  'use_ltm': False,
  'memory_injected': [],
  'use_tools': []},
 {'mem_index': 2,
  'owner': 'agent',
  'content': "Here's a suggested 1-day route in Kyoto that focuses on less crowded spots:\n\n### Morning\n1. **Philosopher's Path**: Start your day early with a peaceful walk along this scenic canal lined with cherry trees. It's less crowded in the morning.\n2. **Ginkaku-ji (Silver Pavilion)**: Visit this beautiful Zen 

### E. Resume later
You can restore the session at any time by ID (history and LTM are persisted on disk under `./eval_runs/sessions/<session_id>/`).

In [6]:
# Rehydrate a previous run
sess2 = Session(session_id=sess_id, root="./eval_runs")
len(sess2.get_history())

4

In [7]:
sess2.mem._items

[MemoryItem(id='1760465352349', kind='profile', text='User Name: Alice. Prefers quiet places and local food.', created_at=1760465352.3494582, meta={'mem_index': 0}),
 MemoryItem(id='1760465352954', kind='turn', text='Q: We will land in Tokyo on Friday evening.\nA: Great. Do you need late-night dining options near your hotel?', created_at=1760465352.9546313, meta={'mem_index': 1}),
 MemoryItem(id='1760465378895', kind='turn', text="Q: Plan a 1-day Kyoto route avoiding crowds.\nA: Here's a suggested 1-day route in Kyoto that focuses on less crowded spots:\n\n### Morning\n1. **Philosopher's Path**: Start your day early with a peaceful walk along this scenic canal lined with cherry trees. It's less crowded in the morning.\n2. **Ginkaku-ji (Silver Pavilion)**: Visit this beautiful Zen temple, which is often quieter than its more famous counterparts.\n\n### Lunch\n3. **Nanzen-ji Temple**: Head to this large temple complex. Enjoy a quiet lunch at a nearby café or bring a bento box to enjoy in