## Prompt Clarification for Research

Deep research models work best with detailed, well-structured prompts. When users provide vague or underspecified queries, we can guide them toward providing more detail by gathering clarifying information before invoking the research service.

This notebook demonstrates how to use an LLM to interactively refine a research query through clarifying questions, ensuring the final prompt is specific, unambiguous, and aligned with what the user actually wants to learn.


**How it works**
1. **Initial input**: User provides a research query (potentially vague or incomplete)
2. **Clarification loop**: An LLM evaluates the query and asks targeted follow-up questions to gather missing details
3. **Iterative refinement**: The user responds, and the loop continues until the LLM determines it has enough context (or a max iteration limit is reached)
4. **Research**: The refined, detailed prompt is passed to Tavily's research API


In [None]:
%pip install -q tavily-python pydantic
%pip install -U "langchain[openai]"

In [None]:
import getpass
import os
from tavily import TavilyClient

if not os.environ.get("TAVILY_API_KEY"):
    os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY:\n")

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY:\n")

TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")
tavily_client = TavilyClient(api_key=TAVILY_API_KEY)

headers = {"Authorization": f"Bearer {TAVILY_API_KEY}"}
url = "https://api.tavily.com/research/"

In [None]:
from langchain.chat_models import init_chat_model
from pydantic import BaseModel, Field
import time
import httpx
from IPython.display import display, Markdown

model = init_chat_model("gpt-5.1-mini", model_provider="openai")

In [None]:
class ClarificationResponse(BaseModel):
    """Structured response for query clarification."""
    needs_clarification: bool = Field(description="True if more info needed, False if ready to research")
    message: str = Field(description="Either follow-up questions OR the refined research query")

PROMPT = """You are a research assistant refining a research query through conversation.

Original topic: {query}

Conversation:
{conversation}

If you need more details, set needs_clarification=True and ask 2-3 questions about:
- Specific subtopics, time frame, depth needed, relevant contexts, or source types

If you have enough context, set needs_clarification=False and provide a detailed refined query.
"""

def clarify(query: str, conversation: list) -> ClarificationResponse:
    conv_text = "\n".join(f"{m['role'].title()}: {m['content']}" for m in conversation) or "(none)"
    return model.with_structured_output(ClarificationResponse).invoke(
        PROMPT.format(query=query, conversation=conv_text)
    )

## Interactive Query Refinement

> Note: This example uses `input()` for interactive prompts. If you're running in an environment that doesn't support stdin for notebooks (for example, some IDEs or hosted runners), you can replace the `input()` calls with hard-coded strings for `initial_query` and the follow-up replies.

> This cell is primarily meant as a simple example of how to implement an interactive clarification loopâ€”feel free to adapt the flow and UX for your own application.


In [None]:
max_iterations = 3

# Get initial query from user
initial_query = input("What would you like to research?\n> ")
conversation = []

# Refinement loop
for i in range(max_iterations):
    response = clarify(initial_query, conversation)
    
    if not response.needs_clarification:
        refined_query = response.message
        print(f"\nâœ… Refined query:\n{refined_query}")
        break
    
    print(f"\nðŸ¤– Assistant:\n{response.message}")
    conversation.append({"role": "assistant", "content": response.message})
    
    user_input = input("\n> ")
    conversation.append({"role": "user", "content": user_input})
else:
    # Max iterations reached - force final query
    response = clarify(initial_query, conversation)
    refined_query = response.message
    print(f"\nâœ… Refined query:\n{refined_query}")

## Execute Research


In [None]:
result = tavily_client.research(input=refined_query, model="mini")
request_id = result["id"]

# Poll until complete
while True:
    resp = httpx.get(f"{url}{request_id}", headers=headers).json()
    if resp["status"] == "completed":
        break
    if resp["status"] == "failed":
        raise RuntimeError(f"Research failed: {resp['error']}")
    print(f"Status: {resp['status']}... polling in 10s")
    time.sleep(10)

print("\nâœ… Research Complete!\n")
display(Markdown(resp["content"]))

In [None]:
resp.get("sources", [])