English | ไธญๆ็
SlideFlow is more than just a PPT generator; it's an end-to-end automation engine powered by LangGraph orchestration, MCP (Model Context Protocol), and RAG (Retrieval-Augmented Generation). It transforms vague ideas or complex documents into professional, high-fidelity, and fully editable presentations.
SlideFlow adopts a four-layer architecture common in modern AI applications, ensuring excellence in Agent interaction, concurrent generation, and infrastructure scalability.
graph TD
subgraph Client_Layer [1. Interaction Layer]
User((User))
Agent[AI Agent / Claude Desktop]
Web_UI[Responsive Dashboard]
end
subgraph Interface_Layer [2. Interface & Protocol Layer]
MCP_Server[FastMCP Server]
REST_API[Flask / Web API]
SSE_Engine[SSE Stream Engine]
end
subgraph Logic_Layer [3. Logic Orchestration - LangGraph]
direction TB
SearchOutline[1. Web Search & Outline Gen]
ChapterContent[2. Deep Chapter Expansion]
PageGeneration[3. Atomic HTML Slide Gen]
PDFSynthesis[4. Pixel-Perfect PDF Synthesis]
HTMLToPPTX[5. Vectorized PPTX Conversion]
SearchOutline --> ChapterContent
SearchOutline --> PageGeneration
ChapterContent --> PDFSynthesis
PageGeneration --> PDFSynthesis
PDFSynthesis --> HTMLToPPTX
end
subgraph Infrastructure_Layer [4. Execution & Infrastructure]
WebSearch[Serper / DuckDuckGo]
VectorDB[Milvus Lite / RAG]
Templates[Jinja2 / Tailwind CSS]
Conversion[Playwright / Python-pptx]
end
%% Data Flow
User --> Agent
User --> Web_UI
Agent -- Tool Calls --> MCP_Server
Web_UI -- API Requests --> REST_API
MCP_Server --> Logic_Layer
REST_API --> Logic_Layer
Logic_Layer --> Infrastructure_Layer
Logic_Layer --> VectorDB
Logic_Layer -.-> SSE_Engine
SSE_Engine -.-> Web_UI
| Layer | Component | Key Technology | Core Responsibility |
|---|---|---|---|
| Interaction | Web Dashboard | Vue 3 + Tailwind | Provides real-time visual feedback, progress tracking, and logs. |
| Protocol | MCP Server | FastMCP | Exposes 10 atomic tools, giving AI Agents "hands" and "eyes" to operate SlideFlow. |
| Orchestration | LangGraph | StateGraph | Manages a complex DAG, enabling parallel content expansion and generation. |
| Infrastructure | Vector RAG Engine | Milvus Lite | Provides local PDF retrieval for specialized knowledge enhancement. |
| Infrastructure | Conversion Engine | Playwright | Performs pixel-level capture to map HTML templates to vectorized PPTX elements. |
The brain of SlideFlow is the state machine in ppt_graph.py. Unlike traditional linear scripts, it supports complex parallel processing and state backtracking.
# core/ppt_graph.py core logic snippet
def create_ppt_graph():
# PPTState stores the entire context (outline, pages, paths, etc.)
workflow = StateGraph(PPTState)
# Register 5 core nodes
workflow.add_node("search_outline", search_outline_node) # Node 1: Structured JSON Outline
workflow.add_node("chapter_content", chapter_content_node) # Node 2: Deep Content Expansion (Parallel)
workflow.add_node("page_generation", page_generation_node) # Node 3: Atomic HTML Generation (Parallel)
workflow.add_node("pdf_synthesis", pdf_synthesis_node) # Node 4: High-Fidelity PDF Rendering
workflow.add_node("html_to_pptx", html_to_pptx_node) # Node 5: Vectorized PPTX Reconstruction
# Orchestration: Start expansion and page gen simultaneously after search
workflow.set_entry_point("search_outline")
workflow.add_edge("search_outline", "chapter_content")
workflow.add_edge("search_outline", "page_generation")
# Converge to synthesis after parallel branches finish
workflow.add_edge("chapter_content", "pdf_synthesis")
workflow.add_edge("page_generation", "pdf_synthesis")
workflow.add_edge("pdf_synthesis", "html_to_pptx")
workflow.add_edge("html_to_pptx", END)
return workflow.compile()SlideFlow's "secret sauce" is in html_to_ppt.py. Instead of simple screenshots, it performs true element reconstruction.
- Coordinate Capture: Uses Playwright to execute custom JS, getting
getBoundingClientRect()for every character, image, and shape. - Style Restoration: Extracts computed CSS (RGB colors, font sizes, alignment) and maps them to
python-pptxproperties. - Layering Strategy: Background -> Decorative Elements -> Text -> Icons.
// html_to_ppt.py coordinate capture logic
async () => {
const elements = [];
// Traverse all visible elements except scripts/styles
const allElements = document.querySelectorAll('body *:not(script):not(style)');
allElements.forEach(el => {
const rect = el.getBoundingClientRect();
const style = window.getComputedStyle(el);
// Handle TextNodes and line breaks accurately
if (hasDirectText(el)) {
elements.push({
type: 'text',
text: el.textContent.trim(),
x: rect.left, y: rect.top, // Pixel-perfect coordinates
fontSize: parseFloat(style.fontSize),
fontFamily: style.fontFamily,
color: parseRGB(style.color), // Convert 'rgb(r,g,b)' to array
textAlign: style.textAlign
});
}
});
return elements;
}In vector_utils.py, we integrate Milvus Lite to provide vertical domain knowledge.
- Local Indexing: No need for complex DB clusters;
milvus_demo.dbprovides low-latency retrieval for local Agents. - Auto Chunking: Uses
tiktokenfor precise token counting and overlapping chunks to ensure context integrity.
# vector_utils.py core RAG logic
class VectorSearchManager:
def __init__(self, db_path: str = "milvus_demo.db"):
# Initialize Milvus Lite client
self.milvus_client = MilvusClient(db_path)
self.collection_name = "pdf_chunks"
async def add_pdf(self, pdf_path: str):
"""Vectorize PDF content and insert into Milvus"""
text = self.extract_text_from_pdf(pdf_path)
chunks = self.chunk_text(text, chunk_size=500, chunk_overlap=50)
# Embed and insert into vector store
for chunk in chunks:
embedding = await self.get_embedding(chunk)
self.milvus_client.insert(
collection_name=self.collection_name,
data=[{"vector": embedding, "text": chunk, "source": pdf_path}]
)| Tool Name | Stage | Description |
|---|---|---|
initialize_task_workspace |
Setup | Creates a workspace directory for a new task. |
search_web |
Research | Fetches real-time internet information via Serper/DuckDuckGo. |
search_vector_db |
Research | Retrieves specialized knowledge from the local PDF vector store. |
get_generation_guidelines |
Planning | CRITICAL: Gets System Prompts and rules for different slide types. |
list_available_templates |
Design | Returns a list of built-in visual themes (e.g., company_report). |
get_template_reference |
Design | Gets reference HTML/CSS structure for a specific template slide. |
search_images |
Design | Searches for high-quality backgrounds to enhance visual appeal. |
save_html_to_workspace |
Design | Saves Agent-generated HTML to disk. |
list_workspace_files |
Verification | Lists all saved HTML slides for the current task. |
synthesize_final_documents |
Synthesis | Final Step: Triggers Playwright and PPTX engines to generate outputs. |
Mount SlideFlow as an "external skill" for Claude:
{
"mcpServers": {
"slideflow": {
"command": "python",
"args": ["/Users/your_path/SlideFlow/mcp_server.py"],
"env": {
"OPENAI_API_KEY": "sk-xxxx",
"SERPER_API_KEY": "xxxx"
}
}
}
}# Recommended Python 3.10+
pip install -r requirements.txt
# Install Playwright drivers (for HTML rendering)
playwright install chromium- Web Dashboard: Run
python main.pyand visithttp://localhost:5001. - Agent Mode: Connect
mcp_server.pyvia Claude Desktop or any MCP client.
- Multimodal Chart Recognition: Recognize charts in uploaded images and convert them to native PPT charts.
- Multi-Agent Collaborative Flow: Introduce "Design Agent" and "Copy Agent" for iterative optimization.
- SVG Asset Library: Support a wider range of vectorized icons and dynamic background templates.
SlideFlow/
โโโ core/
โ โโโ nodes/ # LangGraph nodes (Outline, Content, Synthesis, etc.)
โ โโโ utils/ # Core utilities (RAG, Conversion, Search)
โ โโโ ppt_graph.py # Workflow State Machine
โ โโโ state.py # Shared state PPTState definition
โโโ web/ # Flask-based Dashboard
โโโ mcp_server.py # MCP Protocol Entry
โโโ main.py # Web Service Entry
โโโ assets/ # Templates & Static Assets
This project is licensed under the MIT License.
๐ก Pro Tip: For best results, use Claude 3.5 Sonnet or GPT-4o as the underlying model.
