Skip to content

xiaoyesoso/SlideFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒŠ SlideFlow: AI-Powered PPT Automation Engine

English | ไธญๆ–‡็‰ˆ

Python 3.10+ License: MIT Framework: LangGraph Protocol: MCP

SlideFlow is more than just a PPT generator; it's an end-to-end automation engine powered by LangGraph orchestration, MCP (Model Context Protocol), and RAG (Retrieval-Augmented Generation). It transforms vague ideas or complex documents into professional, high-fidelity, and fully editable presentations.

SlideFlow Dashboard


๐Ÿ—๏ธ Architecture: Highly Decoupled Productivity Model

SlideFlow adopts a four-layer architecture common in modern AI applications, ensuring excellence in Agent interaction, concurrent generation, and infrastructure scalability.

1. Logical Architecture Overview (Mermaid)

graph TD
    subgraph Client_Layer [1. Interaction Layer]
        User((User))
        Agent[AI Agent / Claude Desktop]
        Web_UI[Responsive Dashboard]
    end

    subgraph Interface_Layer [2. Interface & Protocol Layer]
        MCP_Server[FastMCP Server]
        REST_API[Flask / Web API]
        SSE_Engine[SSE Stream Engine]
    end

    subgraph Logic_Layer [3. Logic Orchestration - LangGraph]
        direction TB
        SearchOutline[1. Web Search & Outline Gen]
        ChapterContent[2. Deep Chapter Expansion]
        PageGeneration[3. Atomic HTML Slide Gen]
        PDFSynthesis[4. Pixel-Perfect PDF Synthesis]
        HTMLToPPTX[5. Vectorized PPTX Conversion]
        
        SearchOutline --> ChapterContent
        SearchOutline --> PageGeneration
        ChapterContent --> PDFSynthesis
        PageGeneration --> PDFSynthesis
        PDFSynthesis --> HTMLToPPTX
    end

    subgraph Infrastructure_Layer [4. Execution & Infrastructure]
        WebSearch[Serper / DuckDuckGo]
        VectorDB[Milvus Lite / RAG]
        Templates[Jinja2 / Tailwind CSS]
        Conversion[Playwright / Python-pptx]
    end

    %% Data Flow
    User --> Agent
    User --> Web_UI
    Agent -- Tool Calls --> MCP_Server
    Web_UI -- API Requests --> REST_API
    MCP_Server --> Logic_Layer
    REST_API --> Logic_Layer
    Logic_Layer --> Infrastructure_Layer
    Logic_Layer --> VectorDB
    Logic_Layer -.-> SSE_Engine
    SSE_Engine -.-> Web_UI
Loading

2. Core Components Responsibility Table

Layer Component Key Technology Core Responsibility
Interaction Web Dashboard Vue 3 + Tailwind Provides real-time visual feedback, progress tracking, and logs.
Protocol MCP Server FastMCP Exposes 10 atomic tools, giving AI Agents "hands" and "eyes" to operate SlideFlow.
Orchestration LangGraph StateGraph Manages a complex DAG, enabling parallel content expansion and generation.
Infrastructure Vector RAG Engine Milvus Lite Provides local PDF retrieval for specialized knowledge enhancement.
Infrastructure Conversion Engine Playwright Performs pixel-level capture to map HTML templates to vectorized PPTX elements.

๐Ÿ”ฌ Technical Deep Dive: Core Workflow Analysis

1. LangGraph Orchestration: Precise Task Control

The brain of SlideFlow is the state machine in ppt_graph.py. Unlike traditional linear scripts, it supports complex parallel processing and state backtracking.

# core/ppt_graph.py core logic snippet
def create_ppt_graph():
    # PPTState stores the entire context (outline, pages, paths, etc.)
    workflow = StateGraph(PPTState)

    # Register 5 core nodes
    workflow.add_node("search_outline", search_outline_node)   # Node 1: Structured JSON Outline
    workflow.add_node("chapter_content", chapter_content_node) # Node 2: Deep Content Expansion (Parallel)
    workflow.add_node("page_generation", page_generation_node) # Node 3: Atomic HTML Generation (Parallel)
    workflow.add_node("pdf_synthesis", pdf_synthesis_node)     # Node 4: High-Fidelity PDF Rendering
    workflow.add_node("html_to_pptx", html_to_pptx_node)       # Node 5: Vectorized PPTX Reconstruction

    # Orchestration: Start expansion and page gen simultaneously after search
    workflow.set_entry_point("search_outline")
    workflow.add_edge("search_outline", "chapter_content")
    workflow.add_edge("search_outline", "page_generation") 
    
    # Converge to synthesis after parallel branches finish
    workflow.add_edge("chapter_content", "pdf_synthesis")
    workflow.add_edge("page_generation", "pdf_synthesis") 
    
    workflow.add_edge("pdf_synthesis", "html_to_pptx")
    workflow.add_edge("html_to_pptx", END)

    return workflow.compile()

2. High-Fidelity HTML-to-PPTX: Bridging DOM and Vectors

SlideFlow's "secret sauce" is in html_to_ppt.py. Instead of simple screenshots, it performs true element reconstruction.

  • Coordinate Capture: Uses Playwright to execute custom JS, getting getBoundingClientRect() for every character, image, and shape.
  • Style Restoration: Extracts computed CSS (RGB colors, font sizes, alignment) and maps them to python-pptx properties.
  • Layering Strategy: Background -> Decorative Elements -> Text -> Icons.
// html_to_ppt.py coordinate capture logic
async () => {
    const elements = [];
    // Traverse all visible elements except scripts/styles
    const allElements = document.querySelectorAll('body *:not(script):not(style)');
    allElements.forEach(el => {
        const rect = el.getBoundingClientRect();
        const style = window.getComputedStyle(el);
        
        // Handle TextNodes and line breaks accurately
        if (hasDirectText(el)) {
            elements.push({
                type: 'text',
                text: el.textContent.trim(),
                x: rect.left, y: rect.top, // Pixel-perfect coordinates
                fontSize: parseFloat(style.fontSize),
                fontFamily: style.fontFamily,
                color: parseRGB(style.color), // Convert 'rgb(r,g,b)' to array
                textAlign: style.textAlign
            });
        }
    });
    return elements;
}

3. RAG Engine: Knowledge Enhancement with Milvus Lite

In vector_utils.py, we integrate Milvus Lite to provide vertical domain knowledge.

  • Local Indexing: No need for complex DB clusters; milvus_demo.db provides low-latency retrieval for local Agents.
  • Auto Chunking: Uses tiktoken for precise token counting and overlapping chunks to ensure context integrity.
# vector_utils.py core RAG logic
class VectorSearchManager:
    def __init__(self, db_path: str = "milvus_demo.db"):
        # Initialize Milvus Lite client
        self.milvus_client = MilvusClient(db_path)
        self.collection_name = "pdf_chunks"

    async def add_pdf(self, pdf_path: str):
        """Vectorize PDF content and insert into Milvus"""
        text = self.extract_text_from_pdf(pdf_path)
        chunks = self.chunk_text(text, chunk_size=500, chunk_overlap=50)
        
        # Embed and insert into vector store
        for chunk in chunks:
            embedding = await self.get_embedding(chunk)
            self.milvus_client.insert(
                collection_name=self.collection_name,
                data=[{"vector": embedding, "text": chunk, "source": pdf_path}]
            )

๐Ÿ”Œ MCP Mode: Mounting SlideFlow to Your Agent

1. Core Tool List (10 Atomic Tools)

Tool Name Stage Description
initialize_task_workspace Setup Creates a workspace directory for a new task.
search_web Research Fetches real-time internet information via Serper/DuckDuckGo.
search_vector_db Research Retrieves specialized knowledge from the local PDF vector store.
get_generation_guidelines Planning CRITICAL: Gets System Prompts and rules for different slide types.
list_available_templates Design Returns a list of built-in visual themes (e.g., company_report).
get_template_reference Design Gets reference HTML/CSS structure for a specific template slide.
search_images Design Searches for high-quality backgrounds to enhance visual appeal.
save_html_to_workspace Design Saves Agent-generated HTML to disk.
list_workspace_files Verification Lists all saved HTML slides for the current task.
synthesize_final_documents Synthesis Final Step: Triggers Playwright and PPTX engines to generate outputs.

2. Configure Claude Desktop

Mount SlideFlow as an "external skill" for Claude:

{
  "mcpServers": {
    "slideflow": {
      "command": "python",
      "args": ["/Users/your_path/SlideFlow/mcp_server.py"],
      "env": {
        "OPENAI_API_KEY": "sk-xxxx",
        "SERPER_API_KEY": "xxxx"
      }
    }
  }
}

๐Ÿš€ Quick Start

1. Prerequisites

# Recommended Python 3.10+
pip install -r requirements.txt

# Install Playwright drivers (for HTML rendering)
playwright install chromium

2. Running Modes

  • Web Dashboard: Run python main.py and visit http://localhost:5001.
  • Agent Mode: Connect mcp_server.py via Claude Desktop or any MCP client.

๐Ÿ”ฎ Future Roadmap

  1. Multimodal Chart Recognition: Recognize charts in uploaded images and convert them to native PPT charts.
  2. Multi-Agent Collaborative Flow: Introduce "Design Agent" and "Copy Agent" for iterative optimization.
  3. SVG Asset Library: Support a wider range of vectorized icons and dynamic background templates.

๐Ÿ“‚ Project Structure

SlideFlow/
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ nodes/          # LangGraph nodes (Outline, Content, Synthesis, etc.)
โ”‚   โ”œโ”€โ”€ utils/          # Core utilities (RAG, Conversion, Search)
โ”‚   โ”œโ”€โ”€ ppt_graph.py    # Workflow State Machine
โ”‚   โ””โ”€โ”€ state.py        # Shared state PPTState definition
โ”œโ”€โ”€ web/                # Flask-based Dashboard
โ”œโ”€โ”€ mcp_server.py       # MCP Protocol Entry
โ”œโ”€โ”€ main.py             # Web Service Entry
โ””โ”€โ”€ assets/             # Templates & Static Assets

๐Ÿ“„ License

This project is licensed under the MIT License.

๐Ÿ’ก Pro Tip: For best results, use Claude 3.5 Sonnet or GPT-4o as the underlying model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published