🌊 SlideFlow: AI-Powered PPT Automation Engine

SlideFlow is more than just a PPT generator; it's an end-to-end automation engine powered by LangGraph orchestration, MCP (Model Context Protocol), and RAG (Retrieval-Augmented Generation). It transforms vague ideas or complex documents into professional, high-fidelity, and fully editable presentations.

🏗️ Architecture: Highly Decoupled Productivity Model

SlideFlow adopts a four-layer architecture common in modern AI applications, ensuring excellence in Agent interaction, concurrent generation, and infrastructure scalability.

1. Logical Architecture Overview (Mermaid)

graph TD
    subgraph Client_Layer [1. Interaction Layer]
        User((User))
        Agent[AI Agent / Claude Desktop]
        Web_UI[Responsive Dashboard]
    end

    subgraph Interface_Layer [2. Interface & Protocol Layer]
        MCP_Server[FastMCP Server]
        REST_API[Flask / Web API]
        SSE_Engine[SSE Stream Engine]
    end

    subgraph Logic_Layer [3. Logic Orchestration - LangGraph]
        direction TB
        SearchOutline[1. Web Search & Outline Gen]
        ChapterContent[2. Deep Chapter Expansion]
        PageGeneration[3. Atomic HTML Slide Gen]
        PDFSynthesis[4. Pixel-Perfect PDF Synthesis]
        HTMLToPPTX[5. Vectorized PPTX Conversion]
        
        SearchOutline --> ChapterContent
        SearchOutline --> PageGeneration
        ChapterContent --> PDFSynthesis
        PageGeneration --> PDFSynthesis
        PDFSynthesis --> HTMLToPPTX
    end

    subgraph Infrastructure_Layer [4. Execution & Infrastructure]
        WebSearch[Serper / DuckDuckGo]
        VectorDB[Milvus Lite / RAG]
        Templates[Jinja2 / Tailwind CSS]
        Conversion[Playwright / Python-pptx]
    end

    %% Data Flow
    User --> Agent
    User --> Web_UI
    Agent -- Tool Calls --> MCP_Server
    Web_UI -- API Requests --> REST_API
    MCP_Server --> Logic_Layer
    REST_API --> Logic_Layer
    Logic_Layer --> Infrastructure_Layer
    Logic_Layer --> VectorDB
    Logic_Layer -.-> SSE_Engine
    SSE_Engine -.-> Web_UI

2. Core Components Responsibility Table

Layer	Component	Key Technology	Core Responsibility
Interaction	Web Dashboard	Vue 3 + Tailwind	Provides real-time visual feedback, progress tracking, and logs.
Protocol	MCP Server	FastMCP	Exposes 10 atomic tools, giving AI Agents "hands" and "eyes" to operate SlideFlow.
Orchestration	LangGraph	StateGraph	Manages a complex DAG, enabling parallel content expansion and generation.
Infrastructure	Vector RAG Engine	Milvus Lite	Provides local PDF retrieval for specialized knowledge enhancement.
Infrastructure	Conversion Engine	Playwright	Performs pixel-level capture to map HTML templates to vectorized PPTX elements.

🔬 Technical Deep Dive: Core Workflow Analysis

1. LangGraph Orchestration: Precise Task Control

The brain of SlideFlow is the state machine in ppt_graph.py. Unlike traditional linear scripts, it supports complex parallel processing and state backtracking.

# core/ppt_graph.py core logic snippet
def create_ppt_graph():
    # PPTState stores the entire context (outline, pages, paths, etc.)
    workflow = StateGraph(PPTState)

    # Register 5 core nodes
    workflow.add_node("search_outline", search_outline_node)   # Node 1: Structured JSON Outline
    workflow.add_node("chapter_content", chapter_content_node) # Node 2: Deep Content Expansion (Parallel)
    workflow.add_node("page_generation", page_generation_node) # Node 3: Atomic HTML Generation (Parallel)
    workflow.add_node("pdf_synthesis", pdf_synthesis_node)     # Node 4: High-Fidelity PDF Rendering
    workflow.add_node("html_to_pptx", html_to_pptx_node)       # Node 5: Vectorized PPTX Reconstruction

    # Orchestration: Start expansion and page gen simultaneously after search
    workflow.set_entry_point("search_outline")
    workflow.add_edge("search_outline", "chapter_content")
    workflow.add_edge("search_outline", "page_generation") 
    
    # Converge to synthesis after parallel branches finish
    workflow.add_edge("chapter_content", "pdf_synthesis")
    workflow.add_edge("page_generation", "pdf_synthesis") 
    
    workflow.add_edge("pdf_synthesis", "html_to_pptx")
    workflow.add_edge("html_to_pptx", END)

    return workflow.compile()

2. High-Fidelity HTML-to-PPTX: Bridging DOM and Vectors

SlideFlow's "secret sauce" is in html_to_ppt.py. Instead of simple screenshots, it performs true element reconstruction.

Coordinate Capture: Uses Playwright to execute custom JS, getting getBoundingClientRect() for every character, image, and shape.
Style Restoration: Extracts computed CSS (RGB colors, font sizes, alignment) and maps them to python-pptx properties.
Layering Strategy: Background -> Decorative Elements -> Text -> Icons.

// html_to_ppt.py coordinate capture logic
async () => {
    const elements = [];
    // Traverse all visible elements except scripts/styles
    const allElements = document.querySelectorAll('body *:not(script):not(style)');
    allElements.forEach(el => {
        const rect = el.getBoundingClientRect();
        const style = window.getComputedStyle(el);
        
        // Handle TextNodes and line breaks accurately
        if (hasDirectText(el)) {
            elements.push({
                type: 'text',
                text: el.textContent.trim(),
                x: rect.left, y: rect.top, // Pixel-perfect coordinates
                fontSize: parseFloat(style.fontSize),
                fontFamily: style.fontFamily,
                color: parseRGB(style.color), // Convert 'rgb(r,g,b)' to array
                textAlign: style.textAlign
            });
        }
    });
    return elements;
}

3. RAG Engine: Knowledge Enhancement with Milvus Lite

In vector_utils.py, we integrate Milvus Lite to provide vertical domain knowledge.

Local Indexing: No need for complex DB clusters; milvus_demo.db provides low-latency retrieval for local Agents.
Auto Chunking: Uses tiktoken for precise token counting and overlapping chunks to ensure context integrity.

# vector_utils.py core RAG logic
class VectorSearchManager:
    def __init__(self, db_path: str = "milvus_demo.db"):
        # Initialize Milvus Lite client
        self.milvus_client = MilvusClient(db_path)
        self.collection_name = "pdf_chunks"

    async def add_pdf(self, pdf_path: str):
        """Vectorize PDF content and insert into Milvus"""
        text = self.extract_text_from_pdf(pdf_path)
        chunks = self.chunk_text(text, chunk_size=500, chunk_overlap=50)
        
        # Embed and insert into vector store
        for chunk in chunks:
            embedding = await self.get_embedding(chunk)
            self.milvus_client.insert(
                collection_name=self.collection_name,
                data=[{"vector": embedding, "text": chunk, "source": pdf_path}]
            )

🔌 MCP Mode: Mounting SlideFlow to Your Agent

1. Core Tool List (10 Atomic Tools)

Tool Name	Stage	Description
`initialize_task_workspace`	Setup	Creates a workspace directory for a new task.
`search_web`	Research	Fetches real-time internet information via Serper/DuckDuckGo.
`search_vector_db`	Research	Retrieves specialized knowledge from the local PDF vector store.
`get_generation_guidelines`	Planning	CRITICAL: Gets System Prompts and rules for different slide types.
`list_available_templates`	Design	Returns a list of built-in visual themes (e.g., `company_report`).
`get_template_reference`	Design	Gets reference HTML/CSS structure for a specific template slide.
`search_images`	Design	Searches for high-quality backgrounds to enhance visual appeal.
`save_html_to_workspace`	Design	Saves Agent-generated HTML to disk.
`list_workspace_files`	Verification	Lists all saved HTML slides for the current task.
`synthesize_final_documents`	Synthesis	Final Step: Triggers Playwright and PPTX engines to generate outputs.

2. Configure Claude Desktop

Mount SlideFlow as an "external skill" for Claude:

{
  "mcpServers": {
    "slideflow": {
      "command": "python",
      "args": ["/Users/your_path/SlideFlow/mcp_server.py"],
      "env": {
        "OPENAI_API_KEY": "sk-xxxx",
        "SERPER_API_KEY": "xxxx"
      }
    }
  }
}

🚀 Quick Start

1. Prerequisites

# Recommended Python 3.10+
pip install -r requirements.txt

# Install Playwright drivers (for HTML rendering)
playwright install chromium

2. Running Modes

Web Dashboard: Run python main.py and visit http://localhost:5001.
Agent Mode: Connect mcp_server.py via Claude Desktop or any MCP client.

🔮 Future Roadmap

Multimodal Chart Recognition: Recognize charts in uploaded images and convert them to native PPT charts.
Multi-Agent Collaborative Flow: Introduce "Design Agent" and "Copy Agent" for iterative optimization.
SVG Asset Library: Support a wider range of vectorized icons and dynamic background templates.

📂 Project Structure

SlideFlow/
├── core/
│   ├── nodes/          # LangGraph nodes (Outline, Content, Synthesis, etc.)
│   ├── utils/          # Core utilities (RAG, Conversion, Search)
│   ├── ppt_graph.py    # Workflow State Machine
│   └── state.py        # Shared state PPTState definition
├── web/                # Flask-based Dashboard
├── mcp_server.py       # MCP Protocol Entry
├── main.py             # Web Service Entry
└── assets/             # Templates & Static Assets

📄 License

This project is licensed under the MIT License.

💡 Pro Tip: For best results, use Claude 3.5 Sonnet or GPT-4o as the underlying model.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
config		config
core		core
routes		routes
tests		tests
web		web
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
main.py		main.py
mcp_server.py		mcp_server.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌊 SlideFlow: AI-Powered PPT Automation Engine

🏗️ Architecture: Highly Decoupled Productivity Model

1. Logical Architecture Overview (Mermaid)

2. Core Components Responsibility Table

🔬 Technical Deep Dive: Core Workflow Analysis

1. LangGraph Orchestration: Precise Task Control

2. High-Fidelity HTML-to-PPTX: Bridging DOM and Vectors

3. RAG Engine: Knowledge Enhancement with Milvus Lite

🔌 MCP Mode: Mounting SlideFlow to Your Agent

1. Core Tool List (10 Atomic Tools)

2. Configure Claude Desktop

🚀 Quick Start

1. Prerequisites

2. Running Modes

🔮 Future Roadmap

📂 Project Structure

📄 License

About

Uh oh!

Releases

Packages

Languages

xiaoyesoso/SlideFlow

Folders and files

Latest commit

History

Repository files navigation

🌊 SlideFlow: AI-Powered PPT Automation Engine

🏗️ Architecture: Highly Decoupled Productivity Model

1. Logical Architecture Overview (Mermaid)

2. Core Components Responsibility Table

🔬 Technical Deep Dive: Core Workflow Analysis

1. LangGraph Orchestration: Precise Task Control

2. High-Fidelity HTML-to-PPTX: Bridging DOM and Vectors

3. RAG Engine: Knowledge Enhancement with Milvus Lite

🔌 MCP Mode: Mounting SlideFlow to Your Agent

1. Core Tool List (10 Atomic Tools)

2. Configure Claude Desktop

🚀 Quick Start

1. Prerequisites

2. Running Modes

🔮 Future Roadmap

📂 Project Structure

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages