GitHub - IAAR-Shanghai/SurveyX: Academic Survey Paper Generation.

SurveyX: Academic Survey Automation via Large Language Models

✨Welcome to SurveyX! If you want to experience the full features, please log in to our website. This open-source code only provides offline processing capabilities.✨

If you find our work helpful, don't forget to give us a star! ⭐️
👉 Visit SurveyX 👈

[English | 中文]

🤔What is SurveyX?

SurveyX is an advanced academic survey automation system that leverages the power of Large Language Models (LLMs) to generate high-quality, domain-specific academic papers and surveys. By simply providing a paper title and keywords for literature retrieval, users can request comprehensive academic papers or surveys tailored to specific topics.

🆚 Full Version vs. Offline Open Source Version

The open-source code in this repository only provides offline processing capabilities. If you want to experience the full features, please log in to our website.

Missing features in the open-source version:

Real-time online search: You can only generate surveys based on your own uploaded .md format references. The open-source version lacks access to our paper database, web crawler system, keyword expansion algorithms, and dual-layer semantic filtering for literature acquisition.
Multimodal document parsing: The generated survey will not include image understanding or illustrations from the references.

🛠️ How to Use the Offline Open Source Version (This repo)

1. Prerequisites

Python 3.10+ (Anaconda recommended)
All Python dependencies in requirements.txt
LaTeX environment (for PDF compilation):
You need to convert all your reference documents to Markdown (.md) format and put them together in a single folder before running the pipeline.

sudo apt update && sudo apt install texlive-full

2. Installation

Clone the repository:

git clone https://github.com/IAAR-Shanghai/SurveyX.git
cd SurveyX

Install Python dependencies:

pip install -r requirements.txt

3. LLM Configuration

Edit src/configs/config.py to provide your LLM API URL, token, and model information before running the pipeline.

Example:

REMOTE_URL = "https://api.openai.com/v1/chat/completions"
TOKEN = "sk-xxxx..."
DEFAULT_EMBED_ONLINE_MODEL = "BAAI/bge-base-en-v1.5"
EMBED_REMOTE_URL = "https://api.siliconflow.cn/v1/embeddings"
EMBED_TOKEN = "your embed token here"

4. Workflow

Each run creates a unique result folder under outputs/, named by the task id outputs/<task_id> (e.g., outputs/2025-06-18-0935_keyword/).

Run the full pipeline:

python tasks/offline_run.py --title "Your Survey Title" --key_words "keyword1, keyword2, ..." --ref_path "path/to/your/reference/dir"

Or run step by step:

export task_id="your_task_id"
python tasks/workflow/03_gen_outlines.py --task_id $task_id
python tasks/workflow/04_gen_content.py --task_id $task_id
python tasks/workflow/05_post_refine.py --task_id $task_id
python tasks/workflow/06_gen_latex.py --task_id $task_id

Note: Your local reference documents must be in Markdown (.md) format and placed in a single directory.

5. Output

All results are saved under outputs/<task_id>/
- survey.pdf: Final compiled survey
- outlines.json: Generated outline
- latex/: LaTeX sources
- tmp/: Intermediate files

Example Papers

Title	Keywords
A Survey of NoSQL Database Systems for Flexible and Scalable Data Management	NoSQL, Database Systems, Flexibility, Scalability, Data Management
Vector Databases and Their Role in Modern Data Management and Retrieval A Survey	Vector Databases, Data Management, Data Retrieval, Modern Applications
Graph Databases A Survey on Models, Data Modeling, and Applications	Graph Databases, Data Modeling
A Survey on Large Language Model Integration with Databases for Enhanced Data Management and Survey Analysis	Large Language Models, Database Integration, Data Management, Survey Analysis, Enhanced Processing
A Survey of Temporal Databases Real-Time Databases and Data Management Systems	Temporal Databases, Real-Time Databases, Data Management
From BERT to GPT-4: A Survey of Architectural Innovations in Pre-trained Language Models	Transformer, BERT, GPT-3, self-attention, masked language modeling, cross-lingual transfer, model scaling
Unsupervised Cross-Lingual Word Embedding Alignment: Techniques and Applications	low-resource NLP, few-shot learning, data augmentation, unsupervised alignment, synthetic corpora, NLLB, zero-shot transfer
Vision-Language Pre-training: Architectures, Benchmarks, and Emerging Trends	multimodal learning, CLIP, Whisper, cross-modal retrieval, modality fusion, video-language models, contrastive learning
Efficient NLP at Scale: A Review of Model Compression Techniques	model compression, knowledge distillation, pruning, quantization, TinyBERT, edge computing, latency-accuracy tradeoff
Domain-Specific NLP: Adapting Models for Healthcare, Law, and Finance	domain adaptation, BioBERT, legal NLP, clinical text analysis, privacy-preserving NLP, terminology extraction, few-shot domain transfer
Attention Heads of Large Language Models: A Survey	attention head, attention mechanism, large language model, LLM,transformer architecture, neural networks, natural language processing
Controllable Text Generation for Large Language Models: A Survey	controlled text generation, text generation, large language model, LLM,natural language processing
A survey on evaluation of large language models	evaluation of large language models,large language models assessment, natural language processing, AI model evaluation
Large language models for generative information extraction: a survey	information extraction, large language models, LLM,natural language processing, generative AI, text mining
Internal consistency and self feedback of LLM	Internal consistency, self feedback, large language model, LLM,natural language processing, model evaluation, AI reliability
Review of Multi Agent Offline Reinforcement Learning	multi agent, offline policy, reinforcement learning,decentralized learning, cooperative agents, policy optimization
Reasoning of large language model: A survey	reasoning of large language models, large language models, LLM,natural language processing, AI reasoning, transformer models
Hierarchy Theorems in Computational Complexity: From Time-Space Tradeoffs to Oracle Separations	P vs NP, NP-completeness, polynomial hierarchy, space complexity, oracle separation, Cook-Levin theorem
Classical Simulation of Quantum Circuits: Complexity Barriers and Implications	BQP, quantum supremacy, Shor's algorithm, post-quantum cryptography, QMA, hidden subgroup problem
Kernelization: Theory, Techniques, and Limits	fixed-parameter tractable (FPT), kernelization, treewidth, W-hierarchy, ETH (Exponential Time Hypothesis), parameterized reduction
Optimal Inapproximability Thresholds for Combinatorial Optimization Problems	PCP theorem, approximation ratio, Unique Games Conjecture, APX-hardness, gap-preserving reduction, LP relaxation
Hardness in P: When Polynomial Time is Not Enough	SETH (Strong Exponential Time Hypothesis), 3SUM conjecture, all-pairs shortest paths (APSP), orthogonal vectors problem, fine-grained reduction, dynamic lower bounds
Consistency Models in Distributed Databases: From ACID to NewSQL	CAP theorem, ACID vs BASE, Paxos/Raft, Spanner, NewSQL, sharding, linearizability
Cloud-Native Databases: Architectures, Challenges, and Future Directions	cloud databases, AWS Aurora, Snowflake, storage-compute separation, auto-scaling, pay-per-query, multi-tenancy
Graph Database Systems: Storage Engines and Query Optimization Techniques	graph traversal, Neo4j, SPARQL, property graph, subgraph matching, RDF triplestore, Gremlin
Real-Time Aggregation in TSDBs: Techniques for High-Cardinality Data	time-series data, InfluxDB, Prometheus, downsampling, time windowing, high-cardinality indexing, stream processing
Self-Driving Databases: A Survey of AI-Powered Autonomous Management	autonomous databases, learned indexes, query optimization, Oracle AutoML, workload forecasting, anomaly detection
Multi-Model Databases: Integrating Relational, Document, and Graph Paradigms	multi-model database, MongoDB, ArangoDB, JSONB, unified query language, schema flexibility, polystore
Vector Databases for AI: Efficient Similarity Search and Retrieval-Augmented Generation	vector database, FAISS, Milvus, ANN search, embedding indexing, RAG (Retrieval-Augmented Generation), HNSW
Software-Defined Networking: Evolution, Challenges, and Future Scalability	OpenFlow, control plane/data plane separation, NFV orchestration, network slicing, P4 language, OpenDaylight, scalability bottlenecks
Beyond 5G: Architectural Innovations for Terahertz Communication and Network Slicing	network slicing, MEC (Multi-access Edge Computing), beamforming, mmWave, URLLC (Ultra-Reliable Low-Latency Communication), O-RAN, energy efficiency
IoT Network Protocols: A Comparative Study of LoRaWAN, NB-IoT, and Thread	LPWAN, LoRa, ZigBee 3.0, 6LoWPAN, TDMA scheduling, RPL routing, device density management
Edge Caching in Content Delivery Networks: Algorithms and Economic Incentives	CDN, Akamai, cache replacement policies, DASH (Dynamic Adaptive Streaming), QoE optimization, edge server placement, bandwidth cost reduction
A survey on flow batteries	battery electrolyte formulation
Research on battery electrolyte formulation	flow batteries

📃Citing SurveyX

Please cite us if you find this project helpful for your project/paper:

@misc{liang2025surveyxacademicsurveyautomation,
      title={SurveyX: Academic Survey Automation via Large Language Models}, 
      author={Xun Liang and Jiawei Yang and Yezhaohui Wang and Chen Tang and Zifan Zheng and Shichao Song and Zehao Lin and Yebin Yang and Simin Niu and Hanyu Wang and Bo Tang and Feiyu Xiong and Keming Mao and Zhiyu li},
      year={2025},
      eprint={2502.14776},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14776}, 
}

Open Source Version Notice

This open source version of Surveyx is a simplified edition. It relies entirely on user-provided local reference documents and does not include advanced features such as:

Keyword expansion and filtering algorithms
Multimodal image parsing or figure extraction
Online reference search or automatic data fetching

These advanced modules are only available in the full version of Surveyx, which is hosted by MemTensor (Shanghai) Technology Co., Ltd. If you would like to experience the complete features, please visit our official website: surveyx.cn

For questions or issues, please open an issue on the repository.

⚠️ Disclaimer

SurveyX uses advanced language models to assist with the generation of academic papers. However, it is important to note that the generated content is a tool for research assistance. Users should verify the accuracy of the generated papers, as SurveyX cannot guarantee full compliance with academic standards.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
assets		assets
eval		eval
examples		examples
resources		resources
scripts		scripts
src		src
tasks		tasks
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SurveyX: Academic Survey Automation via Large Language Models

🤔What is SurveyX?

🆚 Full Version vs. Offline Open Source Version

🛠️ How to Use the Offline Open Source Version (This repo)

1. Prerequisites

2. Installation

3. LLM Configuration

4. Workflow

5. Output

Example Papers

📃Citing SurveyX

Open Source Version Notice

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

IAAR-Shanghai/SurveyX

Folders and files

Latest commit

History

Repository files navigation

SurveyX: Academic Survey Automation via Large Language Models

🤔What is SurveyX?

🆚 Full Version vs. Offline Open Source Version

🛠️ How to Use the Offline Open Source Version (This repo)

1. Prerequisites

2. Installation

3. LLM Configuration

4. Workflow

5. Output

Example Papers

📃Citing SurveyX

Open Source Version Notice

⚠️ Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages