# Week 9&10 Assignment: AI Research Assistant Capstone Project


## Homework Introduction

In this two-week capstone, you will build a **fully integrated AI research assistant** by combining the components developed in earlier weeks. The goal is to create a **real, demo-ready assistant** that can take voice queries, search academic content, summarize findings, and save the results – all in one polished system. This project is **interactive and full-stack**, showcasing skills in ASR, NLP, web APIs, and persistence, making it a strong portfolio piece.

* Combine all prior work (ASR, search, summarization, etc.) into one assistant.
* Ensure the system is **interactive and persistent** (supports follow-up questions and context).
* Deliver a polished demo (voice in/out, UI or CLI) suitable for your career portfolio.

## Learning Objectives

By completing this capstone, you will learn to:

* **Integrate multiple AI tools:** Orchestrate ASR (Whisper), LLMs, retrieval, summarization, and TTS into a single assistant.
* **Pipeline chaining:** Build a workflow that chains **voice ASR → intent/tool decision → content retrieval → summarization → voice output**.
* **Function calling / tool use:** Use LLM function-calling to let the model decide when to invoke tools (e.g. `search_arxiv` vs `summarize`).
* **Persistence with Notion:** Automatically sync each session’s conversation and summary into Notion (using a Notion API plugin).
* **Session & context management:** Maintain a unique session ID and context history to enable follow-up queries and logging of dialogue.

## Project Design

You should **reuse and connect** components from previous weeks:

* **Whisper ASR (Week 3):** Use the Whisper speech-to-text pipeline to transcribe user voice input. For example, in Python: `model = whisper.load_model("base"); result = model.transcribe("audio.wav")`.
* **Retrieval API (Weeks 4–5):** Leverage your vector or hybrid search API (e.g. a semantic search over ArXiv papers) to fetch relevant passages. Implement a function like `search_arxiv(query)` that returns the top-3 relevant documents or excerpts.
* **Function Calling Logic (Week 6):** Use function-calling to orchestrate tools. For instance, the LLM’s output might specify calling `search_arxiv`, or `summarize`, or `sync_to_notion`. Wrap each tool call with logging.
* **Summarization (Weeks 7–8):** Apply your summarization pipeline or fine-tuned model to condense retrieved content. For example, use Hugging Face’s summarization pipeline (see Transformers docs).
* **Notion Sync (Week 1):** Use your Notion integration to save content. Implement a function like `sync_to_notion(session_id, content)` that appends the conversation and summary to a Notion database.

**Sample Interaction Flow:** A typical session might work like this:

* **User query (Voice → Text):** The user speaks a question. Whisper transcribes it to text.
* **Intent & Retrieval:** The assistant (via function-calling) decides to search. It invokes `search_arxiv(query)`, which fetches the top-3 relevant academic passages.
* **Summarization:** The passages are passed to `summarize(texts)`, generating a concise answer (e.g. via a Hugging Face pipeline).
* **Voice Response (TTS):** The assistant reads the summary aloud using a TTS engine. For example, Python’s `pyttsx3` can speak a string: `engine.say("Your summary")`, `engine.runAndWait()`.
* **Notion Sync:** After responding, the full dialogue and the generated summary are synced to Notion with `sync_to_notion(session_id, content)`.

**Session Management:** Assign a **unique session ID** to each conversation. Log every query and response (with timestamps) under that session ID. This enables multi-turn dialogues (follow-up questions) and ensures the entire history can be synced to Notion. *Optionally*, maintain chat history or context so that follow-ups build on previous answers.

## Starter Code & API Format

Your code should clearly define the tool functions and API endpoints. For example:

* **Function Signatures:** Create stub functions for your tools, such as:



In [None]:

def search_arxiv(query: str) -> List[str]:
      """Return relevant document passages for the query."""
      ...
def summarize(texts: List[str]) -> str:
      """Return a concise summary of the given passages."""
      ...
def sync_to_notion(session_id: str, content: str):
      """Append the session content and summary to Notion."""
      ...



* **LLM Invocation & Logging:** Wrap LLM calls so that you can see inputs/outputs. If using OpenAI’s function-calling, declare these tools so the model can output a JSON like `{"name": "search_arxiv", "arguments": ...}`. When the LLM outputs a function call, execute the corresponding function and feed the result back to the LLM. Log every invocation for debugging.
* **API Endpoints (FastAPI):** Design a simple HTTP interface. For example:

  * `POST /ask`: Accepts a voice file or text question. Runs the full assistant pipeline and returns the answer (text and/or audio).
  * `POST /notion-sync`: (Optional) Manually triggers syncing the current conversation to Notion (or this can be done automatically at session end).
  * `GET /status`: Returns the current session ID or a health check status.

Make sure each endpoint handler logs activity. For instance, `/ask` should process the request, call Whisper, LLM, summarizer, TTS in order, and return the final response.

## Environment Setup

* **Local Development:** Use a Python 3 environment. Install needed libraries: `fastapi`, `uvicorn` (for running the server), `openai-whisper` or `whisper` (for ASR), `transformers` or `llama-cpp-python` (for LLM/summarization), as well as retrieval libraries (`sentence-transformers`, `faiss-cpu`, etc.), and a Notion client (e.g. `notion-client`).
* **Running the App:** You can run the assistant locally using FastAPI. For example, start the server with `uvicorn main:app --reload`. You may build a simple frontend (Web or CLI) to record microphone input and play audio output, but a cURL-based or script-based interface is sufficient.
* **Notion API:** Create a Notion integration and share the target database/page with it. Obtain the **Notion Secret** and **Database ID**. Store them as environment variables (e.g. `NOTION_TOKEN`, `NOTION_DB_ID`). Use these to authenticate API calls. (You can use the official Notion SDK for Python, or send HTTPS requests to the Notion API.)
* **Inference Backend:** Decide on your model setup. You could run a local Llama model via `llama-cpp-python` or use Hugging Face `transformers`. Ensure the model can handle chat and function-calling if needed. Also install any ASR/TTS engines: Whisper (for ASR) and a library like `pyttsx3` or an online TTS (for voice output).

## Deliverables

* **Code Repository:** A complete, well-organized codebase for the assistant. Structure your code into modules (e.g. `asr.py`, `search.py`, `summarize.py`, `notion.py`, `api.py`). Include comments and docstrings for clarity.
* **Demo Video:** A \~60-second screen recording (you can use any screen-capture tool) demonstrating the assistant in action. The demo should show: speaking a question to the assistant, the assistant retrieving and summarizing a paper, the assistant reading the summary aloud, and the resulting summary appearing in Notion.
* **README:** A brief README file that explains how to set up and run your assistant. Include instructions on installing dependencies, setting environment variables (Notion token/DB), and how to start the server or client.



## Exploration Tips

For extra credit or a more polished project, consider adding:

* **Long Conversation Support:** Maintain chat history so the assistant can handle follow-up questions that refer to earlier parts of the conversation.
* **Notion Enhancements:** Tag or categorize summaries in Notion (e.g. by topic or source). Include previews or first lines of the content in the synced page. Generate a meaningful title for each entry.
* **UI Improvements:** Build a simple frontend (e.g. with Streamlit, Gradio, or a web app) so users can interact without CLI. The UI could record voice input, show the conversation transcript, and embed the Notion page link.
* **Additional Features:** Experiment with multi-source search (combining different databases), improve ASR accuracy (handle noise or different accents), or use a better TTS voice for output. Any enhancements that make the assistant more useful or user-friendly are welcome.

**Hints:** Make sure to test each component separately first (e.g. whisper transcription, search function, summarizer) before integrating. Keep the code modular so you can easily swap or update parts. Good luck building your AI research assistant!

**References:** For example usage of Whisper ASR and summarization pipelines, see the Whisper GitHub and Hugging Face docs. For TTS, Python’s `pyttsx3` is one option. (These are just for guidance on usage; focus on stitching the components together as described.)
