# Usecase Delivery Planning Agent

This notebook implements a RAG-based delivery planning agent using DSPy that helps users generate comprehensive project plans for data migration and delivery projects. The agent leverages indexed documents and follows a structured approach to gather requirements and generate actionable project plans.

## Features:
- **Intelligent Question Generation**: Automatically generates relevant questions based on project context
- **Document Retrieval**: Uses vector search to find relevant information from indexed documents
- **Structured Planning**: Generates comprehensive project plans with timelines, resources, and milestones
- **Risk Assessment**: Identifies potential risks and mitigation strategies
- **DSPy Optimization**: Uses DSPy for prompt optimization and performance improvement


## 1. Setup and Configuration


In [0]:
# Install required packages
# Note: These versions are compatible with Databricks Runtime 16.4 LTS
%pip install dspy-ai databricks-vectorsearch pydantic databricks-agents

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
# Restart Python to ensure packages are loaded
dbutils.library.restartPython()


In [0]:
# Configuration
VECTOR_SEARCH_ENDPOINT_NAME = "use-case-planning-agent"  # Your vector search endpoint
VECTOR_SEARCH_INDEX_NAME = "vbdemos.usecase_agent.migration_plan_pdfs"  # Your existing vector search index
CATALOG_NAME = "vbdemos"
SCHEMA_NAME = "usecase_agent"
DOCUMENTS_TABLE = f"{CATALOG_NAME}.{SCHEMA_NAME}.usecase_planning_agent_pdf_parsed"  # Source table
VECTOR_INDEX_TABLE = f"{CATALOG_NAME}.{SCHEMA_NAME}.migration_plan_pdfs"  # Vector search index table

# Project planning categories and questions
PLANNING_CATEGORIES = {
    "Resource": [
        "How many team members are there?",
        "Are they using PS skills?",
        "Are they using an SI?",
        "Are the teams sufficiently skilled/trained?",
        "Are resources shared with other projects?",
        "Have resources done this work before?",
        "Is there a product owner?",
        "Are the DevOps/SecOps/Infra teams under this 'teams' control/purview?",
        "Program manager?",
        "Are the BAU teams that will ultimately manage the new system sufficiently trained?",
        "Has an end-user adoption plan been created?"
    ],
    "Current Process Maturity": [
        "Do they have a history of delays?",
        "Do they have change management authority/equivalent?",
        "Do they have an identified way of working - agile/waterfall?"
    ],
    "Customer Background": [
        "Does the customer have a specific deadline/reason for that deadline?",
        "Customer is already using cloud?",
        "Customer has Databricks elsewhere?",
        "Customer has security approval for this migration?",
        "Are there any key connectors that are needed?",
        "What are the key drivers of the migration?",
        "Are there any legal compliance or requirements to consider?"
    ],
    "Scope": [
        "Has a pilot or POC been conducted?",
        "Does the customer have visibility of all the data and pipelines that need migration?",
        "Is customer aware of where and who uses the data?",
        "Is lift and shift or redesign preferred?",
        "How many pipelines are to be migrated?",
        "Relative complexity of pipelines?",
        "Volume of data to be migrated?",
        "How frequently is the data updated?",
        "Is there a proposed UC design/infrastructure design?",
        "Is PII handling included?",
        "Does the migration include monitoring?",
        "Does the migration include optimization?",
        "Will it be run in parallel or phased move over?",
        "Awareness of business critical pipelines/pipelines that cannot be down?",
        "Do they have control over how they receive data?",
        "Are additional data quality checks needing to be implemented?",
        "Are there any key connectors that need to be migrated?",
        "What level of testing is required and who will be doing this?",
        "Are data consumers/systems that use the data known?",
        "Does customer already have a plan?",
        "What is the quality of the data?",
        "Are the data pathways known?",
        "Has a permissions model been agreed?",
        "Is there a new data layout/has the UC catalog/schema structure been designed and agreed?",
        "Is disaster recovery included in this migration?"
    ]
}

print("Configuration loaded successfully!")
print(f"Vector Search Endpoint: {VECTOR_SEARCH_ENDPOINT_NAME}")
print(f"Vector Search Index: {VECTOR_SEARCH_INDEX_NAME}")
print(f"Documents Table: {DOCUMENTS_TABLE}")
print(f"Vector Index Table: {VECTOR_INDEX_TABLE}")
print(f"Planning Categories: {list(PLANNING_CATEGORIES.keys())}")
print("\nNote: Vector search index must be in format <catalog>.<schema>.<table>")
print("Vector index structure: path (string), text (string), __db_text_vector (array)")


Configuration loaded successfully!
Vector Search Endpoint: use-case-planning-agent
Vector Search Index: vbdemos.usecase_agent.migration_plan_pdfs
Documents Table: vbdemos.usecase_agent.usecase_planning_agent_pdf_parsed
Vector Index Table: vbdemos.usecase_agent.migration_plan_pdfs
Planning Categories: ['Resource', 'Current Process Maturity', 'Customer Background', 'Scope']

Note: Vector search index must be in format <catalog>.<schema>.<table>
Vector index structure: path (string), text (string), __db_text_vector (array)


## 2. DSPy Setup and Model Configuration


In [0]:
import dspy
import json
from typing import List, Dict, Any
from databricks.vector_search.client import VectorSearchClient
from pyspark.sql import SparkSession

# Initialize Spark
spark = SparkSession.builder.appName("DeliveryPlanningAgent").getOrCreate()

# Configure DSPy with Databricks model
token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
url = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get() + '/serving-endpoints'

lm = dspy.LM(
    model="databricks/databricks-claude-sonnet-4",
    api_key=token,
    api_base=url,
)

# Configure DSPy with Databricks model
# Note: MLflow integration is handled automatically in Databricks Runtime 16.4 LTS
# The warnings about MLflowCallback can be safely ignored as they don't affect functionality
dspy.configure(lm=lm)

print(f"DSPy version: {dspy.__version__}")
print("DSPy configured successfully with Databricks model")

# Initialize Vector Search client
vsc = VectorSearchClient(disable_notice=True)

print("DSPy and Vector Search configured successfully!")


DSPy version: 3.0.3
DSPy configured successfully with Databricks model
DSPy and Vector Search configured successfully!


## 3. DSPy Signatures and Modules


In [0]:
# DSPy Signatures for the delivery planning agent

class QuestionGenerator(dspy.Signature):
    """Generate relevant questions for migrating TO Databricks from existing data/analytics platforms."""
    project_context: str = dspy.InputField(desc="Description of the current data/analytics platform and objectives for migrating TO Databricks")
    category: str = dspy.InputField(desc="Planning category (Resource, Scope, Customer Background, etc.)")
    existing_answers: str = dspy.InputField(desc="Previously answered questions and responses")
    questions: str = dspy.OutputField(desc="List of 3-5 most relevant questions for this category specific to migrating TO Databricks")

class DocumentRetriever(dspy.Signature):
    """Retrieve relevant documents based on questions and context."""
    question: str = dspy.InputField(desc="Specific question to find relevant information for")
    project_context: str = dspy.InputField(desc="Project context and background")
    retrieved_docs: str = dspy.OutputField(desc="Relevant document excerpts and information")

class AnswerAnalyzer(dspy.Signature):
    """Analyze answers and extract key insights for project planning."""
    question: str = dspy.InputField(desc="The question that was asked")
    answer: str = dspy.InputField(desc="The answer provided by the user")
    relevant_docs: str = dspy.InputField(desc="Relevant documents retrieved from knowledge base")
    insights: str = dspy.OutputField(desc="Key insights, risks, and implications for project planning")

class ProjectPlanGenerator(dspy.Signature):
    """Generate comprehensive migration plan for moving TO Databricks with structured table outputs."""
    project_context: str = dspy.InputField(desc="Overall project context and objectives for migrating TO Databricks")
    gathered_insights: str = dspy.InputField(desc="All insights gathered from questions and documents about current platform")
    timeline_requirements: str = dspy.InputField(desc="Timeline constraints and requirements for migrating TO Databricks")
    project_plan: str = dspy.OutputField(desc="Comprehensive migration plan for moving TO Databricks. Output as structured tables with clear headers for: 1) Migration Timeline, 2) Resource Requirements, 3) Migration Phases, 4) Risk Assessment. Use markdown table format with | separators.")

class RiskAssessor(dspy.Signature):
    """Assess risks and provide mitigation strategies."""
    project_plan: str = dspy.InputField(desc="The proposed project plan")
    project_context: str = dspy.InputField(desc="Project context and constraints")
    risk_assessment: str = dspy.OutputField(desc="Identified risks, their likelihood, impact, and mitigation strategies")

print("DSPy signatures defined successfully!")


DSPy signatures defined successfully!


In [0]:
# DSPy Modules for the delivery planning agent

class QuestionGenerationModule(dspy.Module):
    """Module to generate relevant questions for each planning category."""
    
    def __init__(self):
        super().__init__()
        self.generate = dspy.ChainOfThought(QuestionGenerator)
    
    def forward(self, project_context: str, category: str, existing_answers: str = ""):
        return self.generate(
            project_context=project_context,
            category=category,
            existing_answers=existing_answers
        )

class DocumentRetrievalModule(dspy.Module):
    """Module to retrieve relevant documents using vector search."""
    
    def __init__(self, vector_search_endpoint: str, vector_search_index: str):
        super().__init__()
        self.vector_search_endpoint = vector_search_endpoint
        self.vector_search_index = vector_search_index
        self.retrieve = dspy.ChainOfThought(DocumentRetriever)
    
    def forward(self, question: str, project_context: str):
        # Perform vector search
        search_results = vsc.get_index(
            endpoint_name=self.vector_search_endpoint,
            index_name=self.vector_search_index
        ).similarity_search(
            query_text=question,
            columns=["path", "text"],  # Updated to match your vector index structure,
            num_results=5
        )
        
        # Format retrieved documents
        # Handle the case where search_results might be a list or dict
        if isinstance(search_results, list):
            documents = search_results
        else:
            documents = search_results.get('result', {}).get('data_array', [])
        
        retrieved_docs = "\n\n".join([
            f"Source: {doc.get('path', 'Unknown') if isinstance(doc, dict) else 'Unknown'}\nContent: {doc.get('text', '') if isinstance(doc, dict) else str(doc)}"
            for doc in documents
        ])
        
        return self.retrieve(
            question=question,
            project_context=project_context,
            retrieved_docs=retrieved_docs
        )

class AnswerAnalysisModule(dspy.Module):
    """Module to analyze answers and extract insights."""
    
    def __init__(self):
        super().__init__()
        self.analyze = dspy.ChainOfThought(AnswerAnalyzer)
    
    def forward(self, question: str, answer: str, relevant_docs: str):
        return self.analyze(
            question=question,
            answer=answer,
            relevant_docs=relevant_docs
        )

class ProjectPlanGenerationModule(dspy.Module):
    """Module to generate comprehensive project plans."""
    
    def __init__(self):
        super().__init__()
        self.generate_plan = dspy.ChainOfThought(ProjectPlanGenerator)
    
    def forward(self, project_context: str, gathered_insights: str, timeline_requirements: str):
        return self.generate_plan(
            project_context=project_context,
            gathered_insights=gathered_insights,
            timeline_requirements=timeline_requirements
        )

class RiskAssessmentModule(dspy.Module):
    """Module to assess risks and provide mitigation strategies."""
    
    def __init__(self):
        super().__init__()
        self.assess_risks = dspy.ChainOfThought(RiskAssessor)
    
    def forward(self, project_plan: str, project_context: str):
        return self.assess_risks(
            project_plan=project_plan,
            project_context=project_context
        )

print("DSPy modules defined successfully!")


DSPy modules defined successfully!


## 4. Main Delivery Planning Agent


In [0]:
class DeliveryPlanningAgent(dspy.Module):
    """Main delivery planning agent that orchestrates the entire planning process."""
    
    def __init__(self, vector_search_endpoint: str, vector_search_index: str):
        super().__init__()
        self.question_generator = QuestionGenerationModule()
        self.document_retriever = DocumentRetrievalModule(vector_search_endpoint, vector_search_index)
        self.answer_analyzer = AnswerAnalysisModule()
        self.plan_generator = ProjectPlanGenerationModule()
        self.risk_assessor = RiskAssessmentModule()
        
        # Store conversation state
        self.conversation_history = []
        self.gathered_insights = []
        self.project_context = ""
    
    def start_planning_session(self, project_context: str, timeline_requirements: str = ""):
        """Start a new planning session."""
        self.project_context = project_context
        self.conversation_history = []
        self.gathered_insights = []
        
        print(" Starting Delivery Planning Session")
        print(f" Project Context: {project_context}")
        if timeline_requirements:
            print(f"⏰ Timeline Requirements: {timeline_requirements}")
        print("\n" + "="*50)
        
        return self._generate_questions_for_category("Resource")
    
    def _generate_questions_for_category(self, category: str):
        """Generate questions for a specific category."""
        existing_answers = self._format_existing_answers()
        
        result = self.question_generator(
            project_context=self.project_context,
            category=category,
            existing_answers=existing_answers
        )
        
        print(f"\n {category} Questions:")
        print("-" * 30)
        questions = result.questions.split('\n')
        for i, question in enumerate(questions, 1):
            if question.strip():
                print(f"{i}. {question.strip()}")
        
        return questions
    
    def answer_question(self, question: str, answer: str, category: str = ""):
        """Process a user's answer to a question."""
        print(f"\n Processing Answer:")
        print(f"Q: {question}")
        print(f"A: {answer}")
        
        # Retrieve relevant documents
        doc_result = self.document_retriever(question, self.project_context)
        relevant_docs = doc_result.retrieved_docs
        
        # Analyze the answer
        analysis_result = self.answer_analyzer(question, answer, relevant_docs)
        insights = analysis_result.insights
        
        # Store in conversation history
        self.conversation_history.append({
            "question": question,
            "answer": answer,
            "category": category,
            "insights": insights,
            "relevant_docs": relevant_docs
        })
        
        self.gathered_insights.append(insights)
        
        print(f"\n Key Insights:")
        print(insights)
        
        return insights
    
    def generate_project_plan(self, timeline_requirements: str = ""):
        """Generate the final project plan based on all gathered information."""
        print("\n Generating Project Plan...")
        print("="*50)
        
        # Combine all insights
        all_insights = "\n\n".join(self.gathered_insights)
        
        # Generate project plan
        plan_result = self.plan_generator(
            project_context=self.project_context,
            gathered_insights=all_insights,
            timeline_requirements=timeline_requirements
        )
        
        project_plan = plan_result.project_plan
        
        print(" PROJECT PLAN:")
        print("="*50)
        print(project_plan)
        
        # Assess risks
        risk_result = self.risk_assessor(project_plan, self.project_context)
        risk_assessment = risk_result.risk_assessment
        
        print("\n  RISK ASSESSMENT:")
        print("="*50)
        print(risk_assessment)
        
        return {
            "project_plan": project_plan,
            "risk_assessment": risk_assessment,
            "conversation_history": self.conversation_history
        }
    
    def _format_existing_answers(self):
        """Format existing answers for context."""
        if not self.conversation_history:
            return "No previous answers yet."
        
        formatted = []
        for entry in self.conversation_history:
            formatted.append(f"Q: {entry['question']}\nA: {entry['answer']}")
        
        return "\n\n".join(formatted)
    
    def get_next_category_questions(self):
        """Get questions for the next planning category."""
        categories = list(PLANNING_CATEGORIES.keys())
        completed_categories = set(entry.get('category', '') for entry in self.conversation_history)
        
        for category in categories:
            if category not in completed_categories:
                return self._generate_questions_for_category(category)
        
        return None  # All categories completed

print("Delivery Planning Agent defined successfully!")


Delivery Planning Agent defined successfully!


## 5. Example Usage and Testing


In [0]:
# Initialize the delivery planning agent
agent = DeliveryPlanningAgent(VECTOR_SEARCH_ENDPOINT_NAME, VECTOR_SEARCH_INDEX_NAME)

# Example: Start a planning session
project_context = """
We are planning a data migration project for a large financial services company. 
They want to migrate their existing Oracle data warehouse to Databricks on Azure. 
The migration involves 50+ data pipelines, 2TB of data, and needs to be completed 
within 6 months. The company has some cloud experience but limited Databricks knowledge.
"""

timeline_requirements = "6 months deadline, must be completed by end of Q2 2024"

# Start the planning session
questions = agent.start_planning_session(project_context, timeline_requirements)

print("\n" + "="*50)
print("Example: Answer some questions to see the agent in action")
print("="*50)


🚀 Starting Delivery Planning Session
📋 Project Context: 
We are planning a data migration project for a large financial services company. 
They want to migrate their existing Oracle data warehouse to Databricks on Azure. 
The migration involves 50+ data pipelines, 2TB of data, and needs to be completed 
within 6 months. The company has some cloud experience but limited Databricks knowledge.

⏰ Timeline Requirements: 6 months deadline, must be completed by end of Q2 2024


📝 Resource Questions:
------------------------------
1. 1. What is the current size and composition of your technical team, and how many team members have experience with Databricks, Azure, and data migration projects?
3. 2. What is the allocated budget for this migration project, including costs for infrastructure, licensing, training, and potential external consulting services?
5. 3. Are you planning to engage external consultants or system integrators for this migration, and if so, what specific expertise are you l

Trace(trace_id=tr-52a09ccfc2d073910e77f272ac136c1a)

In [0]:
# Example: Simulate answering questions
# In a real scenario, these would be user inputs

# Answer Resource questions
resource_answers = [
    ("How many team members are there?", "We have 8 team members including 2 data engineers, 3 analysts, 2 developers, and 1 project manager"),
    ("Are the teams sufficiently skilled/trained?", "The data engineers have some cloud experience but no Databricks experience. The analysts are familiar with SQL but not Spark. We need training."),
    ("Is there a product owner?", "Yes, we have a product owner from the business side who understands the data requirements"),
    ("Are the DevOps/SecOps/Infra teams under this 'teams' control/purview?", "No, they are separate teams with their own priorities and reporting lines")
]

print(" Processing Resource Questions...")
for question, answer in resource_answers:
    agent.answer_question(question, answer, "Resource")

print("\n" + "="*50)
print("Moving to next category...")
print("="*50)

# Get next category questions
next_questions = agent.get_next_category_questions()


📝 Processing Resource Questions...

💬 Processing Answer:
Q: How many team members are there?
A: We have 8 team members including 2 data engineers, 3 analysts, 2 developers, and 1 project manager
[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.

🔍 Key Insights:
**Team Composition Analysis:**
- **Total headcount:** 8 team members provides sufficient capacity for a 6-month, 50+ pipeline migration project
- **Technical depth:** 7 technical resources (87.5%) vs 1 management resource indicates hands-on project approach
- **Data engineering capacity:** 2 data engineers may be adequate but could become a bottleneck for 50+ pipeline migrations - monitor workload distribution
- **Analyst-heavy structure:** 3 analysts (largest group) suggests strong focus on requirements gathering and data analysis, which is critical for Oracle-to-Databri

[Trace(trace_id=tr-e5b229b679bad3de0c59cae6c3847d91), Trace(trace_id=tr-a57d6ee6e8d00836cb4e00dabb9f1a0c), Trace(trace_id=tr-4eb66219ce387d5758bbe30d23307900), Trace(trace_id=tr-72450886f817bfe1a4683f852657fce3), Trace(trace_id=tr-16ee1fc86ec6acd0f8d27e51da444efb), Trace(trace_id=tr-9574af1ec473d0bfb94f3ec9ec6336f9), Trace(trace_id=tr-3cfb9ab92cd28a7f965054b472cbf75a), Trace(trace_id=tr-57d34f06921df3898e0f9206ba98a031), Trace(trace_id=tr-b887991231222dc41531c5d8aa62e4ed)]

In [0]:
# Answer Scope questions
scope_answers = [
    ("How many pipelines are to be migrated?", "We have approximately 50 data pipelines that need to be migrated"),
    ("Volume of data to be migrated?", "The total data volume is about 2TB across multiple tables and schemas"),
    ("Is lift and shift or redesign preferred?", "We prefer a hybrid approach - lift and shift for simple pipelines, redesign for complex ones"),
    ("Does the migration include monitoring?", "Yes, we need comprehensive monitoring and alerting for the new system"),
    ("Will it be run in parallel or phased move over?", "We plan to do a phased migration over 6 months, starting with non-critical pipelines")
]

print(" Processing Scope Questions...")
for question, answer in scope_answers:
    agent.answer_question(question, answer, "Scope")

print("\n" + "="*50)
print("Generating Project Plan...")
print("="*50)

# Generate the final project plan
final_plan = agent.generate_project_plan(timeline_requirements)

print("\n Planning session completed!")
print(" Summary:")
print(f"- Total questions answered: {len(agent.conversation_history)}")
print(f"- Categories covered: {set(entry['category'] for entry in agent.conversation_history)}")
print(f"- Key insights gathered: {len(agent.gathered_insights)}")


📝 Processing Scope Questions...

💬 Processing Answer:
Q: How many pipelines are to be migrated?
A: We have approximately 50 data pipelines that need to be migrated
[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.

🔍 Key Insights:
Key insights for project planning:
- **Scale Impact**: 50 pipelines represent a significant migration workload requiring systematic approach and phased execution
- **Resource Planning**: Will need dedicated teams for pipeline analysis, conversion, testing, and validation across all 50 pipelines
- **Timeline Considerations**: With 6-month deadline, this averages to ~8-9 pipelines per month, requiring parallel processing and efficient workflow management
- **Risk Factors**: High volume increases complexity and potential for cascading failures; need robust testing strategy for each pipeline
- **Dependenci

[Trace(trace_id=tr-ca0ee8e0cef57603991ced7f4a316f0c), Trace(trace_id=tr-a18c04f71c39c2c4380bd402a539dc71), Trace(trace_id=tr-0d32f8bc0de9bf06bd6239880b8b50f5), Trace(trace_id=tr-fb139b4d07eec24079a2e1122654b346), Trace(trace_id=tr-41af0774ed1afee715bbfb09d79e6cf6), Trace(trace_id=tr-e4ba617f5fbc7d5b15beae4c49a445a9), Trace(trace_id=tr-1089ad519f974e70b0c4a88f7db41536), Trace(trace_id=tr-6961f91e1c50507383d7e6db6a6f2e36), Trace(trace_id=tr-9b13cc7e1ff7b7a348bf4d5e0229c1e8), Trace(trace_id=tr-c76cd14815d762147f6aa8c1219ce0fb)]

## 6. Interactive Planning Interface


In [0]:
def interactive_planning_session():
    """Interactive function for users to run their own planning sessions."""
    
    print(" Welcome to the Delivery Planning Agent!")
    print("="*50)
    
    # Get project context from user
    project_context = input("Please describe your project context: ")
    timeline_requirements = input("Any specific timeline requirements? (press Enter to skip): ")
    
    # Initialize agent
    agent = DeliveryPlanningAgent(VECTOR_SEARCH_ENDPOINT_NAME, VECTOR_SEARCH_INDEX_NAME)
    
    # Start planning session
    questions = agent.start_planning_session(project_context, timeline_requirements)
    
    print("\n Please answer the questions above. Type 'next' to move to the next category, or 'plan' to generate the project plan.")
    
    while True:
        user_input = input("\nYour response (or 'next'/'plan'): ").strip()
        
        if user_input.lower() == 'next':
            next_questions = agent.get_next_category_questions()
            if next_questions is None:
                print(" All categories completed! Type 'plan' to generate your project plan.")
            continue
        
        elif user_input.lower() == 'plan':
            final_plan = agent.generate_project_plan(timeline_requirements)
            break
        
        else:
            # This is a simplified version - in practice, you'd need to match the response to the current question
            print("Please provide a more specific answer or use 'next'/'plan' commands.")
    
    return final_plan

# Uncomment the line below to run an interactive session
# interactive_planning_session()

print("Interactive planning interface ready!")
print("To start an interactive session, uncomment and run: interactive_planning_session()")


Interactive planning interface ready!
To start an interactive session, uncomment and run: interactive_planning_session()


In [0]:
interactive_planning_session()

🚀 Welcome to the Delivery Planning Agent!


Please describe your project context:  Oracle migration for a big pharmaceutical company, with 3 departments, 1.Fiannce 2. Sales 3. HR around 100 reports in each department with around roughly 200 Stored Procedures with ETL logic and 1000 tables need a project plan

Any specific timeline requirements? (press Enter to skip):  

🚀 Starting Delivery Planning Session
📋 Project Context: Oracle migration for a big pharmaceutical company, with 3 departments, 1.Fiannce 2. Sales 3. HR around 100 reports in each department with around roughly 200 Stored Procedures with ETL logic and 1000 tables need a project plan


📝 Resource Questions:
------------------------------
1. 1. What is the current size and composition of your technical team, and how many team members have Oracle migration experience or database expertise?
3. 2. Do you have dedicated database administrators (DBAs) and ETL developers available internally, or will you need to hire external consultants or contractors for this migration?
5. 3. What is your preferred project timeline, and are there any critical business periods (like financial year-end, sales cycles, or HR processes) when system downtime must be avoided?
7. 4. What is the approved budget range for this migration project, including potential costs for external resources, training, and migration 


Your response (or 'next'/'plan'):  What is the current size and composition of your technical team, and how many team members have Oracle migration experience or database expertise? 10 Data Engineers, no migration expertise to migrate to DAtabricks. Dedcated DBAs are available, within 1 year i.e. by September 2026, 1Million USD budget, no tools for migrations

Please provide a more specific answer or use 'next'/'plan' commands.



Your response (or 'next'/'plan'):  plan


📊 Generating Project Plan...


  Expected `Union[Choices, StreamingChoices]` but got `Choices` - serialized value may not be as expected"


📋 PROJECT PLAN:
# Oracle Migration Project Plan - Pharmaceutical Company

## Project Overview
**Objective**: Migrate Oracle database infrastructure for Finance, Sales, and HR departments including 300 reports, 200 stored procedures, and 1000 tables.

**Scope**: Complete database migration with minimal business disruption and full functionality preservation.

## Phase 1: Assessment & Planning (Weeks 1-8)

### 1.1 Current State Analysis
- **Database Assessment**
  - Inventory all 1000 tables with dependencies mapping
  - Catalog 200 stored procedures and ETL logic
  - Document 300 reports (100 per department)
  - Analyze data volumes and growth patterns
  - Identify custom code and third-party integrations

- **Business Impact Analysis**
  - Map critical business processes to database objects
  - Identify peak usage periods and maintenance windows
  - Document regulatory compliance requirements (FDA, GxP)
  - Assess data retention and archival policies

### 1.2 Target Architecture Design

  Expected `Union[Choices, StreamingChoices]` but got `Choices` - serialized value may not be as expected"



⚠️  RISK ASSESSMENT:
**CRITICAL RISKS (High Impact, High Probability):**

1. **Regulatory Compliance Failure**
   - **Risk**: Non-compliance with FDA 21 CFR Part 11, GxP, or other pharmaceutical regulations
   - **Impact**: Regulatory sanctions, audit failures, potential product recalls, legal liability
   - **Likelihood**: Medium-High
   - **Mitigation**: Engage regulatory compliance team from Phase 1, maintain complete audit trails, implement electronic signatures, conduct compliance-specific testing, obtain regulatory pre-approval for migration approach

2. **Data Integrity Compromise**
   - **Risk**: Data corruption, loss, or inconsistency during migration affecting patient safety data or financial records
   - **Impact**: Patient safety risks, financial misstatements, regulatory violations
   - **Likelihood**: Medium
   - **Mitigation**: Implement checksums and data validation at every step, maintain parallel systems during transition, conduct multiple data integrity tests, estab

{'project_plan': '# Oracle Migration Project Plan - Pharmaceutical Company\n\n## Project Overview\n**Objective**: Migrate Oracle database infrastructure for Finance, Sales, and HR departments including 300 reports, 200 stored procedures, and 1000 tables.\n\n**Scope**: Complete database migration with minimal business disruption and full functionality preservation.\n\n## Phase 1: Assessment & Planning (Weeks 1-8)\n\n### 1.1 Current State Analysis\n- **Database Assessment**\n  - Inventory all 1000 tables with dependencies mapping\n  - Catalog 200 stored procedures and ETL logic\n  - Document 300 reports (100 per department)\n  - Analyze data volumes and growth patterns\n  - Identify custom code and third-party integrations\n\n- **Business Impact Analysis**\n  - Map critical business processes to database objects\n  - Identify peak usage periods and maintenance windows\n  - Document regulatory compliance requirements (FDA, GxP)\n  - Assess data retention and archival policies\n\n### 1.2 T

[Trace(trace_id=tr-6ff28d7c3504962c6ec008df78bc048d), Trace(trace_id=tr-09bdc2b2ef0d2f1b9c69d7a894e8e13a), Trace(trace_id=tr-0d4f931294e7f689439708087045edce)]

In [0]:
# Deploy Agent using DSPy MLflow Integration
import mlflow
import mlflow.dspy
from mlflow.models.resources import DatabricksVectorSearchIndex, DatabricksServingEndpoint

# Set registry URI for Unity Catalog
mlflow.set_registry_uri("databricks-uc")

# Input example for the model
input_example = {
    "messages": [
        {
            "role": "user",
            "content": "How many team members are there for the Databricks migration? Category: Resource. Context: We have 5 team members."
        }
    ]
}

# Create a DSPy-compatible wrapper for the agent
class DeliveryPlanningAgentDSPy(dspy.Module):
    """DSPy-compatible wrapper for the Delivery Planning Agent."""
    
    def __init__(self):
        super().__init__()
        self.agent = None
        
    def forward(self, question, category="General", context=""):
        """Forward method for DSPy."""
        try:
            if not self.agent:
                # Initialize the agent if not already done
                self.agent = DeliveryPlanningAgent(
                    vector_search_endpoint=VECTOR_SEARCH_ENDPOINT_NAME,
                    vector_search_index=VECTOR_SEARCH_INDEX_NAME
                )
            
            result = self.agent.answer_question(question, context, category)
            return dspy.Prediction(answer=result["answer"])
        except Exception as e:
            return dspy.Prediction(answer=f"Error: {str(e)}")

# Create the DSPy model instance
dspy_agent = DeliveryPlanningAgentDSPy()

# Log the agent using DSPy MLflow integration
with mlflow.start_run() as run:
    logged_agent_info = mlflow.dspy.log_model(
        dspy_model=dspy_agent,
        artifact_path="delivery-planning-agent",
        input_example=input_example,
        task="llm/v1/chat",
        model_config={
            "vector_search_endpoint": VECTOR_SEARCH_ENDPOINT_NAME,
            "vector_search_index": VECTOR_SEARCH_INDEX_NAME
        },
        resources=[
            DatabricksVectorSearchIndex(index_name=VECTOR_SEARCH_INDEX_NAME),
            DatabricksServingEndpoint(endpoint_name="databricks-claude-sonnet-4")
        ]
    )
    
    print(f" Agent logged to MLflow: {logged_agent_info.model_uri}")

# Register the agent in Unity Catalog
model_name = f"{CATALOG_NAME}.{SCHEMA_NAME}.delivery-planning-agent"
uc_model_info = mlflow.register_model(model_uri=logged_agent_info.model_uri, name=model_name)

print(f" Agent registered in Unity Catalog: {uc_model_info.name}")
print(f"   Version: {uc_model_info.version}")

print(f" Agent ready for deployment!")
print(f" Model URI: {logged_agent_info.model_uri}")
print(f"  Registered as: {uc_model_info.name}")
print(f" Next step: Deploy via Databricks Model Serving UI or API")
print(f" Use the model URI to create a serving endpoint")


🔗 View Logged Model at: https://adb-984752964297111.11.azuredatabricks.net/ml/experiments/3013595711630001/models/m-7f4c40b1812a4289b6a7890f6f085478?o=984752964297111
KeyError('messages')Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-5605b0f9-6d0c-4b1d-9c46-7dcbcfc9760c/lib/python3.12/site-packages/mlflow/utils/_capture_modules.py", line 166, in load_model_and_predict
    model.predict(input_example, params=params)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-5605b0f9-6d0c-4b1d-9c46-7dcbcfc9760c/lib/python3.12/site-packages/mlflow/dspy/wrapper.py", line 164, in predict
    converted_inputs = self._get_model_input(inputs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-5605b0f9-6d0c-4b1d-9c46-7dcbcfc9760c/lib/python3.12/site-packages/mlflow/dspy/wrapper.py", line 213, in _get_model_input
    return inputs["messages"]
           ~~~~~~^^^^^^^^^^^^
KeyError: 'messages'
2025/09/13 20:53:40 

✅ Agent logged to MLflow: models:/m-7f4c40b1812a4289b6a7890f6f085478


Registered model 'vbdemos.usecase_agent.delivery-planning-agent' already exists. Creating a new version of this model...


Downloading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

✅ Agent registered in Unity Catalog: vbdemos.usecase_agent.delivery-planning-agent
   Version: 3
✅ Agent ready for deployment!
📊 Model URI: models:/m-7f4c40b1812a4289b6a7890f6f085478
🏷️  Registered as: vbdemos.usecase_agent.delivery-planning-agent
📈 Next step: Deploy via Databricks Model Serving UI or API
🔗 Use the model URI to create a serving endpoint


🔗 Created version '3' of model 'vbdemos.usecase_agent.delivery-planning-agent': https://adb-984752964297111.11.azuredatabricks.net/explore/data/models/vbdemos/usecase_agent/delivery-planning-agent/version/3?o=984752964297111
