Shandu is a cutting-edge AI research assistant that performs in-depth, multi-source research on any topic using advanced language models, intelligent web scraping, and iterative exploration to generate comprehensive, well-structured reports with proper citations.
Shandu is an intelligent, LLM-powered research system that automates the comprehensive research process - from initial query clarification to in-depth content analysis and report generation. Built on LangGraph's state-based workflow, it recursively explores topics with sophisticated algorithms for source evaluation, content extraction, and knowledge synthesis.
- Academic Research: Generate literature reviews, background information, and complex topic analyses
- Market Intelligence: Analyze industry trends, competitor strategies, and market opportunities
- Content Creation: Produce well-researched articles, blog posts, and reports with proper citations
- Technology Exploration: Track emerging technologies, innovations, and technical developments
- Policy Analysis: Research regulations, compliance requirements, and policy implications
- Competitive Analysis: Compare products, services, and company strategies across industries
Shandu 2.0 introduces a major redesign of the report generation pipeline to produce more coherent, reliable reports:
- Modular Report Generation: Process reports in self-contained sections, enhancing overall system reliability
- Robust Error Recovery: Automatic retry mechanisms with intelligent fallbacks prevent the system from getting stuck
- Section-By-Section Processing: Each section is processed independently, allowing for better error isolation
- Progress Tracking: Detailed progress tracking helps identify exactly where the process is at each stage
- Enhanced Citation Management: More reliable citation handling ensures proper attribution throughout reports
- Intelligent Parallelization: Key processes run in parallel where possible for improved performance
- Comprehensive Fallback Mechanisms: If any step fails, the system gracefully degrades rather than halting
flowchart TB
subgraph Input
Q[User Query]
B[Breadth Parameter]
D[Depth Parameter]
end
subgraph Research[Research Phase]
direction TB
DR[Deep Research]
SQ[SERP Queries]
PR[Process Results]
NL[(Sources & Learnings)]
ND[(Directions)]
end
subgraph Report[Report Generation]
direction TB
TG[Title Generation]
TE[Theme Extraction]
IR[Initial Report]
ES[Section Enhancement]
EX[Section Expansion]
FR[Final Report]
end
%% Main Flow
Q & B & D --> DR
DR --> SQ --> PR
PR --> NL
PR --> ND
DP{depth > 0?}
NL & ND --> DP
RD["Next Direction:
- Prior Goals
- New Questions
- Learnings"]
%% Circular Flow
DP -->|Yes| RD
RD -->|New Context| DR
%% To Report Generation
DP -->|No| TG
TG --> TE --> IR --> ES --> EX --> FR
%% Styling
classDef input fill:#7bed9f,stroke:#2ed573,color:black
classDef process fill:#70a1ff,stroke:#1e90ff,color:black
classDef recursive fill:#ffa502,stroke:#ff7f50,color:black
classDef output fill:#ff4757,stroke:#ff6b81,color:white
classDef storage fill:#a8e6cf,stroke:#3b7a57,color:black
class Q,B,D input
class DR,SQ,PR,TG,TE,IR,ES,EX process
class DP,RD recursive
class FR output
class NL,ND storage
- Intelligent State-based Workflow: Leverages LangGraph for a structured, step-by-step research process
- Iterative Deep Exploration: Recursively explores topics with dynamic depth and breadth parameters
- Multi-source Information Synthesis: Analyzes data from search engines, web content, and knowledge bases
- Enhanced Web Scraping: Features dynamic JS rendering, content extraction, and ethical scraping practices
- Smart Source Evaluation: Automatically assesses source credibility, relevance, and information value
- Content Analysis Pipeline: Uses advanced NLP to extract key information, identify patterns, and synthesize findings
- Sectional Report Generation: Creates detailed reports by processing individual sections for maximum reliability
- Parallel Processing Architecture: Implements concurrent operations for efficient multi-query execution
- Adaptive Search Strategy: Dynamically adjusts search queries based on discovered information
- Full Citation Management: Properly attributes all sources with formatted citations in multiple styles
# Install from PyPI
pip install shandu
# Install from source
git clone https://github.com/jolovicdev/shandu.git
cd shandu
pip install -e .
# Configure API settings (supports various LLM providers)
shandu configure
# Run comprehensive research
shandu research "Your research query" --depth 2 --breadth 4 --output report.md
# Quick AI-powered search with web scraping
shandu aisearch "Who is the current sitting president of United States?" --detailed
shandu research "Your research query" \
--depth 3 \ # How deep to explore (1-5, default: 2)
--breadth 5 \ # How many parallel queries (2-10, default: 4)
--output report.md \ # Save to file instead of terminal
--verbose # Show detailed progress
You can find example reports in the examples directory:
- The Intersection of Quantum Computing, Synthetic Biology, and Climate Modeling
shandu research "The Intersection of Quantum Computing, Synthetic Biology, and Climate Modeling" --depth 3 --breadth 3 --output examples/o3-mini-high.md
from shandu.agents import ResearchGraph
from langchain_openai import ChatOpenAI
# Initialize with custom LLM if desired
llm = ChatOpenAI(model="gpt-4")
# Initialize the research graph
researcher = ResearchGraph(
llm=llm,
temperature=0.5
)
# Perform deep research
results = researcher.research_sync(
query="Your research query",
depth=3, # How deep to go with recursive research
breadth=4, # How many parallel queries to explore
detail_level="high"
)
# Print or save results
print(results.to_markdown())
Shandu's research pipeline consists of these key stages:
- Query Clarification: Interactive questions to understand research needs
- Research Planning: Strategic planning for comprehensive topic coverage
- Iterative Exploration:
- Smart query generation based on knowledge gaps
- Multi-engine search with parallelized execution
- Relevance filtering of search results
- Intelligent web scraping with content extraction
- Source credibility assessment
- Information analysis and synthesis
- Reflection on findings to identify gaps
Shandu 2.0 introduces a robust, modular report generation pipeline:
- Data Preparation: Registration of all sources and their metadata for proper citation
- Title Generation: Creating a concise, professional title (with retry mechanisms)
- Theme Extraction: Identifying key themes to organize the report structure
- Citation Formatting: Properly formatting all citations for reference
- Initial Report Generation: Creating a comprehensive draft report
- Section Enhancement: Individually processing each section to add detail and depth
- Key Section Expansion: Identifying and expanding the most important sections
- Report Finalization: Final processing and validation of the complete report
Each step includes:
- Comprehensive error handling
- Automatic retries with exponential backoff
- Intelligent fallbacks when issues occur
- Progress tracking for transparency
- Validation to ensure quality output
- Google Search
- DuckDuckGo
- Wikipedia
- ArXiv (academic papers)
- Custom search engines can be added
- Dynamic JS Rendering: Handles JavaScript-heavy websites
- Content Extraction: Identifies and extracts main content from web pages
- Parallel Processing: Concurrent execution of searches and scraping
- Caching: Efficient caching of search results and scraped content
- Rate Limiting: Respectful access to web resources
- Robots.txt Compliance: Ethical web scraping practices
- Flexible Output Formats: Markdown, JSON, plain text
This project is licensed under the MIT License - see the LICENSE file for details.