SciScholar: A Large-Scale Knowledge Graph for Automated Scientific Research

🌐 English · 简体中文

A pip-installable client and CLI for literature-grounded scientific research workflows on top of the hosted SciScholar API.

📄 arXiv · 🔑 Get API Token · 🩺 API Health

✨ Overview

SciScholar is a research map you can use from the command line. Give it a topic, an idea, an author, or a paper trail, and it helps you look up literature, gather graph-backed evidence, and turn the result into readable reports and reusable JSON artifacts.

Behind that simple workflow is a large scientific knowledge graph. SciScholar connects papers, authors, institutions, venues, keywords, citations, and a four-level research taxonomy from domains down to topics. That means a search is not limited to matching words: it can follow how research areas, people, concepts, and papers relate to one another.

This repository packages that capability as a lightweight SciScholar client. New users can install it with pip, register an API token, and start running literature-grounded research tasks without setting up Neo4j, maintaining graph data, or touching backend infrastructure.

SciScholar spans a broad research landscape, from medicine and social sciences to engineering, computer science, materials science, mathematics, and more.

The graph links papers with authors, institutions, sources, keywords, citations, related work, and the domain-field-subfield-topic hierarchy.

With the client, SciScholar becomes a practical research assistant for:

graph-aware paper search: combine keywords, semantic matching, title anchors, references, and graph propagation instead of stopping at plain keyword matching;
research workflow automation: run literature review, idea grounding, idea evaluation, idea generation, trend analysis, related-author retrieval, and researcher profiling;
agent-friendly outputs: keep reproducible machine-readable artifacts such as request.json and response.json, plus user-facing summary.txt and report.md;
editable CLI skills: inspect, copy, modify, and rerun common downstream workflows as reusable JSON skills;
portable Agent Skill pack: use agent-skill/ to migrate the base search-papers capability into end-to-end downstream tasks for tools such as Codex, Claude Code, and other coding agents.

🚀 Quick Start

1. Install

Install directly from GitHub:

pip install "git+https://github.com/zjunlp/SciScholar.git#subdirectory=scinet"

For isolated CLI usage:

pipx install "git+https://github.com/zjunlp/SciScholar.git#subdirectory=scinet"

After installation:

scischolar -h

2. Register an API Token

Open:

http://scinet.openkg.cn/register

Complete email verification and copy your personal token.

Quick link: 🔑 API Token.

3. Configure

At minimum, configure the hosted SciScholar API endpoint and your personal token.

Linux / macOS:

export SCISCHOLAR_API_BASE_URL="http://scinet.openkg.cn"
export SCISCHOLAR_API_KEY="your-personal-scischolar-token"
export SCISCHOLAR_TIMEOUT=900
export SCISCHOLAR_RUNS_DIR="./runs"

Windows CMD:

set SCISCHOLAR_API_BASE_URL=http://scinet.openkg.cn
set SCISCHOLAR_API_KEY=your-personal-scischolar-token
set SCISCHOLAR_TIMEOUT=900
set SCISCHOLAR_RUNS_DIR=.\runs

Compatibility variables:

KG2API_BASE_URL=http://scinet.openkg.cn
KG2API_API_KEY=your-personal-scischolar-token

For new setups, prefer SCISCHOLAR_*.

📕 Optional: use your own LLM for keyword extraction

export LLM_PROVIDER="chat_completions"
export LLM_API_KEY="your-provider-api-key"
export LLM_BASE_URL="https://your-provider-or-gateway.example/v1"
export LLM_MODEL="your-model-name"
# Optional when your provider uses a custom endpoint or auth header:
# export LLM_CHAT_COMPLETIONS_URL="https://your-provider-or-gateway.example/v1/chat/completions"
# export LLM_AUTH_HEADER="x-api-key: your-provider-api-key"
export SCISCHOLAR_LLM_TIMEOUT=30
export SCISCHOLAR_LLM_TEMPERATURE=0
export SCISCHOLAR_LLM_MAX_TOKENS=512

This step is optional. Configure it only when you want SciScholar to use your LLM API to turn a free-form query into better search keywords.

Keep LLM_PROVIDER=chat_completions, then replace LLM_API_KEY, LLM_BASE_URL, and LLM_MODEL with your provider values. If your provider gives a full chat-completions endpoint, set LLM_CHAT_COMPLETIONS_URL; if it requires a custom auth header, set LLM_AUTH_HEADER.

Leave the LLM values empty if you do not need this. SciScholar will use built-in keyword extraction, and normal search, review, idea, trend, and researcher workflows still run.

User-editable template: .env.example. Set these variables only if you want LLM-assisted keyword extraction.

🖊 Optional: OpenAlex metadata support

export OA_API_KEY=""
export OPENALEX_MAILTO=""

OpenAlex is useful when you want extra metadata or PDF-related support. It is not required for the main CLI examples in this README. If you leave these variables empty, normal SciScholar retrieval still works.

User-editable template: .env.example. Set these only if you want OpenAlex-assisted metadata support.

🖌 Optional: GROBID for local PDF workflows

GROBID is only needed when you process local PDF files. It reads scientific PDFs and extracts titles, authors, abstracts, and references. If you are only running the text-based CLI commands above, you can skip this section.

Start GROBID locally:

docker pull lfoppiano/grobid:latest
docker run -d --rm --name grobid -p 8070:8070 lfoppiano/grobid:latest
curl http://127.0.0.1:8070/api/isalive

Then set:

export GROBID_BASE_URL="http://127.0.0.1:8070"

Windows CMD:

set GROBID_BASE_URL=http://127.0.0.1:8070

User-editable template: .env.example. Leave GROBID_BASE_URL empty unless you process local PDFs.

Runtime variables:

Variable	Required For	Notes
`SCISCHOLAR_API_BASE_URL`	all hosted SciScholar tasks	Hosted SciScholar API base URL.
`SCISCHOLAR_API_KEY`	all hosted SciScholar tasks	Sent as `X-API-Key` and `Authorization: Bearer`.
`LLM_PROVIDER`	optional frontend enhancement	Keep as `chat_completions`.
`LLM_API_KEY`	optional frontend enhancement	Your provider key; leave empty for local or no-auth services.
`LLM_BASE_URL`	optional frontend enhancement	Provider base URL, usually ending in `/v1`.
`LLM_CHAT_COMPLETIONS_URL`	optional frontend enhancement	Use only when your provider gives a full endpoint.
`LLM_MODEL`	optional frontend enhancement	Model name from your provider.
`LLM_AUTH_HEADER`	optional frontend enhancement	Use only for custom auth, for example `x-api-key: your-provider-api-key`.
`LLM_HTTP_HEADERS`	optional frontend enhancement	Optional extra headers as JSON.
`GROBID_BASE_URL`	PDF tasks	Needed for `--pdf-path` workflows.
`OA_API_KEY`	optional	OpenAlex metadata/PDF support.
`OPENALEX_MAILTO`	optional	OpenAlex contact email.

4. Test

scischolar health
scischolar config

5. Run a Paper Search

scischolar search-papers \
  --query "open world agent" \
  --keyword "high:open world agent" \
  --top-k 10

🔑 API Token

SciScholar uses personal API tokens for public access.

Browser Registration

Visit:

http://scinet.openkg.cn/register

Steps:

enter your name, email, organization, and use case;
click Send code;
check your inbox for the verification code;
enter the code and create a token;
copy the returned scischolar_xxx token.

The token is shown only once.

Check Token Status

curl -H "Authorization: Bearer $SCISCHOLAR_API_KEY" \
  http://scinet.openkg.cn/v1/auth/token/status

Check Usage

curl -H "Authorization: Bearer $SCISCHOLAR_API_KEY" \
  "http://scinet.openkg.cn/v1/auth/usage?days=7"

🧩 Supported Tasks

Command	Scenario	Main Output
`scischolar search-papers`	Paper search	Related papers and Markdown report
`scischolar related-authors`	Related-author discovery	Candidate authors and scores
`scischolar author-papers`	Author paper lookup	Papers by a specified author
`scischolar support-papers`	Support-paper retrieval	Evidence papers for candidate authors
`scischolar paper-search`	Lightweight low-level paper search	Fast paper candidates
`scischolar literature-review`	Literature review	Core paper pool, timeline, writing hints
`scischolar idea-grounding`	Idea grounding	Similar works and differentiation evidence
`scischolar idea-evaluate`	Idea evaluation	Evidence for novelty, feasibility, and soundness
`scischolar idea-generate`	Idea generation	Topic combinations and idea seeds
`scischolar trend-report`	Trend analysis	Evolution evidence and representative works
`scischolar researcher-review`	Researcher background review	Research trajectory and representative works
`scischolar skill`	Editable skill registry	Reusable workflow presets

🛠️ CLI-First Workflow

SciScholar is CLI-first: you can start with one command, inspect the saved artifacts, and then move into larger research workflows. If you are new, run help once, try a basic retrieval, then choose one of the downstream workflows below.

Documentation: 📚 SciScholar Documentation. Use it to check API setup, CLI commands, parameter meanings, and runnable examples.

Help

scischolar -h
scischolar search-papers -h
scischolar literature-review -h
scischolar skill -h

Input Styles

SciScholar supports two input styles. For formal runs, prefer expert parameters because every field is explicit and easier to reproduce. Natural-language input is useful for quick trials or exploratory use.

Recommended: expert parameters

scischolar --timeout 900 search-papers \
  --retrieval-mode hybrid \
  --query "open world agent" \
  --domain "artificial intelligence" \
  --time-range 2020-2024 \
  --keyword "high:open world agent" \
  --keyword "middle:embodied agent" \
  --title "middle:Voyager: An Open-Ended Embodied Agent with Large Language Models" \
  --reference "low:JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models" \
  --top-k 5 \
  --top-keywords 0 \
  --max-titles 0 \
  --max-refs 0 \
  --bias-keyword high \
  --bias-related high \
  --bias-exploration low \
  --ranking-profile precision \
  --report-max-items 5

Compatible: natural-language input

Use --text when you want SciScholar to parse the request from a short instruction. You can still add structured hints such as keyword[high]: ... in the text.

scischolar --timeout 900 search-papers \
  --retrieval-mode hybrid \
  --text "Find papers related to open world agent in artificial intelligence since 2020. Return 3 papers.

keyword[high]: open world agent" \
  --top-k 3 \
  --top-keywords 1 \
  --max-titles 0 \
  --max-refs 0

Basic Retrieval

Use this when you want a quick, evidence-backed paper list for one topic.

scischolar search-papers \
  --query "open world agent" \
  --domain "artificial intelligence" \
  --time-range 2020-2024 \
  --keyword "high:open world agent" \
  --top-k 5 \
  --top-keywords 0 \
  --max-titles 0 \
  --max-refs 0

Downstream Workflows

Each workflow prints a concise terminal summary and saves full artifacts under runs/<run_id>/.

Literature Review

Build an initial reading list and get evidence for writing a literature review.

scischolar literature-review \
  --query "retrieval augmented generation" \
  --domain "artificial intelligence" \
  --time-range 2020-2025 \
  --keyword "high:retrieval augmented generation" \
  --top-k 10

Idea Evaluation

Check whether a proposed research idea is novel, feasible, and well supported by existing work.

scischolar idea-evaluate \
  --idea "LLM-based multi-perspective evaluation for scientific research ideas" \
  --domain "artificial intelligence" \
  --time-range 2020-2025 \
  --keyword "high:idea evaluation" \
  --keyword "middle:LLM as a judge" \
  --top-k 10

Idea Generation

Explore promising topic combinations and generate candidate research directions.

scischolar idea-generate \
  --query "knowledge editing for large language models" \
  --domain "artificial intelligence" \
  --time-range 2020-2025 \
  --keyword "high:knowledge editing" \
  --keyword "middle:large language models" \
  --keyword "low:continual learning" \
  --top-k 10

Trend Report

Trace how a topic has developed and identify representative works along the way.

scischolar trend-report \
  --query "retrieval augmented generation" \
  --domain "artificial intelligence" \
  --time-range 2020-2025 \
  --keyword "high:retrieval augmented generation" \
  --keyword "middle:knowledge graph" \
  --top-k 10

Researcher Review

Summarize a researcher's publication trajectory and representative papers.

scischolar researcher-review \
  --author "Yoshua Bengio" \
  --limit 10 \
  --no-abstract

Retrieval Modes

Mode	Meaning	Best For
`keyword`	Keyword-driven KG retrieval	Clear terminology
`semantic`	Semantic retrieval	Broad semantic matching
`title`	Title-anchor retrieval	Known paper titles
`hybrid`	Keyword + semantic + title + graph walk	Default and recommended

If --retrieval-mode is omitted, SciScholar uses hybrid.

Expert Anchors

Use anchors when you already know a strong keyword, title, or reference and want the graph search to start from it.

--keyword "high:open world agent"
--title "middle:Voyager: An Open-Ended Embodied Agent with Large Language Models"
--reference "low:JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models"

Graph Bias Parameters

Parameter	Meaning
`--bias-keyword`	Keyword association strength
`--bias-non-seed-keyword`	Non-seed keyword expansion
`--bias-citation`	Citation edge strength
`--bias-related`	Paper relatedness strength
`--bias-authorship`	Author-paper relation strength
`--bias-coauthorship`	Coauthor network strength
`--bias-cooccurrence`	Keyword co-occurrence strength
`--bias-exploration`	Graph exploration level
`--ranking-profile`	Ranking preference: `precision`, `balanced`, `discovery`, `impact`

Recommended safe defaults:

--top-k 10
--top-keywords 0
--max-titles 0
--max-refs 0
--bias-exploration low

🧰 Editable Skills

SciScholar skills are JSON presets for downstream research workflows. They make complex workflows easier to inspect, reuse, and customize.

scischolar skill list
scischolar skill show literature-review
scischolar skill run literature-review --query "open world agent" --keyword "high:open world agent"
scischolar skill run --dry-run literature-review --query "open world agent" --keyword "high:open world agent"

Create a custom skill:

scischolar skill init my-review --from literature-review

This creates:

./skills/my-review.json

Edit it, then run:

scischolar skill run my-review --query "your topic"

User-defined skills are loaded from:

./skills/*.json
~/.scischolar/skills/*.json
directories specified by SCISCHOLAR_SKILLS_DIR

User-defined skills can override built-in skills with the same name.

🖊Agent Skill

SciScholar also ships a portable Agent Skill pack under agent-skill/. These are not runtime outputs or simple command aliases. They are downstream task playbooks that teach tools such as Codex, Claude Code, and other coding agents how to start from SciScholar's base search-papers retrieval capability, read saved artifacts, and complete a research goal.

Included skills:

Skill	Retrieval base	Downstream use case
`scischolar-quick-paper-search`	`search-papers`	Small evidence seed and downstream routing
`scischolar-literature-review`	`search-papers`	Reading lists and related-work reports
`scischolar-idea-grounding`	`search-papers`	Prior-art grounding for research ideas
`scischolar-idea-evaluate`	`search-papers`	Novelty, feasibility, and soundness checks
`scischolar-idea-generate`	`search-papers`	Literature-grounded idea seeds
`scischolar-trend-report`	`search-papers`	Timeline and trend analysis
`scischolar-researcher-review`	`search-papers` plus author seed lookup	Researcher profiles and representative works

To use one locally, copy its directory into the skill directory supported by your agent tool, then restart or refresh that tool. For Codex, that is usually ~/.codex/skills or %USERPROFILE%\.codex\skills. The CLI commands remain the retrieval/execution layer; the Agent Skill pack is the downstream reasoning layer on top.

🐍 Python SDK

SciScholar also provides a lightweight Python client.

from scischolar import SciScholarClient

client = SciScholarClient()

print(client.health())

result = client.search_papers(
    query="open world agent",
    keywords=[{"text": "open world agent", "score": 10}],
    top_k=3,
)

print(result)

You can also pass credentials directly:

from scischolar import SciScholarClient

client = SciScholarClient(
    base_url="http://scinet.openkg.cn",
    api_key="your-personal-scischolar-token",
)

print(client.token_status())

📦 Outputs and Artifacts

Terminal output is concise and table-based. Full outputs are saved under:

runs/<run_id>/

Common artifacts:

File	Description
`plan.json`	Structured search plan
`request.json`	Full request sent to SciScholar API
`response.json`	Raw backend response
`summary.txt`	Short summary
`report.md`	User-facing Markdown report
`metadata.json`	Runtime metadata

📂 Repository Layout

The tree below highlights the main user-facing areas of the repository. Generated outputs and local cache folders are omitted.

SciScholar/
  README.md / README_zh.md       # project documentation
  .env.example                   # root runtime configuration template
  requirements.txt
  run_scischolar.py                  # lightweight local runner
  agent-skill/                   # portable Agent Skill pack
  docs/api/                      # unified static API and CLI documentation site
  imgs/                          # README figures
  scinet/                        # pip-installable SciScholar client package
    pyproject.toml
    src/scinet/                  # packaged CLI, client, config, and skills
    core/ search/ tasks/         # retrieval planning and workflow logic
    evidence/ llm/ renderers/    # PDF evidence, optional LLM, report rendering
    examples/ tests/
  references/search/             # reference KG search implementation
  runs/                          # generated CLI outputs

🧯 Troubleshooting

`scischolar health` works but `search-papers` returns 401

Your token is missing or invalid.

echo $SCISCHOLAR_API_KEY
export SCISCHOLAR_API_KEY="your-personal-scischolar-token"

Windows CMD:

set SCISCHOLAR_API_KEY=your-personal-scischolar-token

No email verification code

Check the email address, spam folder, and resend interval.

Retrieval is slow or times out

Use lightweight settings:

--top-k 3
--top-keywords 0
--max-titles 0
--max-refs 0
--bias-exploration low

`scischolar` command is not found on Windows

Use the virtual environment executable directly:

.venv\Scripts\scischolar.exe -h

or reinstall:

.venv\Scripts\python.exe -m pip install -e .

📝 TODO

CLI Tools. Add more user-facing CLI capabilities so downstream users and AI agents can invoke retrieval workflows without touching database internals.
Portable Agent Skill pack. Package reusable agent skills for common scientific discovery workflows and expose best practices as easier-to-load components.
More Knowledge. Integrate more knowledge forms beyond paper-centric entities, such as datasets, code, standards, theorems, and experimental experience.
Benchmark and Evaluation. Build dedicated benchmarks and evaluation protocols for downstream scientific research tasks supported by SciScholar.
Dynamic UpdateImprove dynamic knowledge updates toward a more systematic and frequent refresh mechanism.

✍️ Citation

If you find SciScholar helpful, please cite:

📄 License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
agent-skill		agent-skill
docs/api		docs/api
imgs		imgs
references		references
scinet		scinet
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt
run_scinet.py		run_scinet.py
run_scischolar.py		run_scischolar.py

Folders and files

Latest commit

History

Repository files navigation

SciScholar: A Large-Scale Knowledge Graph for Automated Scientific Research

✨ Overview

📑 Table of Contents

🚀 Quick Start

1. Install

2. Register an API Token

3. Configure

4. Test

5. Run a Paper Search

🔑 API Token

Browser Registration

Check Token Status

Check Usage

🧩 Supported Tasks

🛠️ CLI-First Workflow

Help

Input Styles

Recommended: expert parameters

Compatible: natural-language input

Basic Retrieval

Downstream Workflows

Literature Review

Idea Evaluation

Idea Generation

Trend Report

Researcher Review

Retrieval Modes

Expert Anchors

Graph Bias Parameters

🧰 Editable Skills

🖊Agent Skill

🐍 Python SDK

📦 Outputs and Artifacts

📂 Repository Layout

🧯 Troubleshooting

scischolar health works but search-papers returns 401

No email verification code

Retrieval is slow or times out

scischolar command is not found on Windows

📝 TODO

✍️ Citation

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scischolar health` works but `search-papers` returns 401

`scischolar` command is not found on Windows

Packages