# AI-Generation Pipeline

This jupyter notebook documents the complete AI Plan generation workflow.

The core logic is implemented in Python scripts within the `src/ai_advice/v2/` directory.  
Because of the large size of the source code, this Notebook serves as the scheduling layer, executing these scripts sequentially while explaining the logic behind each step.  


## Step 1: Standardize Input

**Script:** `src/ai_advice/v2/standardize_input.py`

**Purpose:**  
This step reads the paper sets recommended by the recommendation system (e.g., `recommend_application.json`) and converts them into a standardised format (`standardize_input_YYYY-MM-DD_HHMM.json`).   
This ensures downstream AI models receive clean, consistent data unaffected by the original source format.

**Arguments:**  
- `--mode`: Specifies which paper-set to use (e.g., `application`, `review`, `trending`, `theory`).   
The recommendation system has generated 4 different types of paper sets for the 4 modes. We process the corresponding paper set based on the mode selected by the user.  
We default to `application` here.

In [1]:
!python ../src/ai_advice/v2/standardize_input.py --mode application

Processed 4 papers from recommend_application.json -> /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/standardize_input_2025-12-04_1749.json


## Step 2: Define Schema Contract

**Script:** `src/ai_advice/v2/schema_contract.py`

**Purpose:**  
This step defines the JSON Schema that the AI model must strictly follow. It generates a schema file (`plan_schema_YYYY-MM-DD_HHMM.json`) and an example output file.  
This Schema ensures that the AI's output strictly follows our expected format and can be recognised and loaded by our web pages.

In [2]:
!python ../src/ai_advice/v2/schema_contract.py

Schema written -> /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/plan_schema_2025-12-04_1752.json
Example plan saved -> /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/plan_example_2025-12-04_1752.json


## Step 3: Generate Prompts

**Script:** `src/ai_advice/v2/prompts.py`

**Purpose:**  
This step constructs the actual prompts sent to the AI model.  
It integrates standardised paper data (from Step 1) with pattern constraints (from Step 2) into system prompts and user prompts.  
It simultaneously generates documentation and preview files for logs.

In [3]:
!python ../src/ai_advice/v2/prompts.py

wrote docs -> /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/docs/prompt_design_2025-12-04_1754.md
dumped system prompt -> /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/system_prompt_2025-12-04_1754.txt
user prompt preview -> /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/user_prompt_preview_2025-12-04_1754.txt


## Step 4: Generate AI Plan

**Script:** `src/ai_advice/v2/generate_plan_v3.py`

**Purpose:**  
This is the final execution step. Its primary function is to utilise the prompts and schema prepared in preceding steps to call the OpenAI API, then generating a structured, personalised paper study plan for the user.  
Briefly, its workflow proceeds as follows:  
- Data Preparation: Read the latest standardised paper-data and format constraints (Schema).  
- AI Generation: Call the OpenAI API to intelligently plan reading sequences and generate suggestions.  
- Processing and Validation: Verify the correctness of the AI-generated data format and automatically supplement with paper titles and relevant topic recommendations (Relevant topic recommendations stem from our prior clustering process and are not AI-generated.).  
- Archiving: Save the final plan as a JSON file while logging the cost (tokens) and duration of this run.  

**Note:** This step requires the `OPENAI_API_KEY`.  
Before running this file, please create a new .env file in the project root directory and enter your OpenAI API key in the following format:  
`OPENAI_API_KEY=sk-XXXXXXXX  `

`OPENAI_MODEL=gpt-5`

Optionally, just use our Website [**PaperTrail**â†—](https://xinhuangcs.github.io/PaperTrail/)  to experience the full user journey (we've integrated our own API into the backend for your free use)

In [4]:
!python ../src/ai_advice/v2/generate_plan_v3.py


== Run Summary ==
trace_id: 20251204-180604-f22b6b58
model: gpt-4o  temp: 0.2
latency: 15.21s  tokens(in/out): 1301/613  cost~: 0.0
artifact: /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/artifacts/plan_20251204-180604-f22b6b58.json
latest:   /Users/jasonh/Desktop/02807/PaperTrail/data/ai_advice/plan_latest.json
