SciNet: A Large-Scale Knowledge Graph for Automated Scientific Research

Open-source client for running literature-grounded scientific research tasks on top of SciNet API.

Our KG backend is currently undergoing intensive deployment and testing, and will be released within one week! Thank you for your patient waiting!

✨ Overview

SciNet is a large-scale, multi-disciplinary, heterogeneous academic resource knowledge graph designed as a panoramic scientific evolution network. By integrating over 43M papers from 26 disciplines, and a total of 157M entites and 3B triplets, SciNet provides a structured topological cognitive substrate that dismantles disciplinary barriers and furnishes AI agents with a global perspective.

Discipline Distribution in SciNet

Schema of SciNet

This repository provides a runnable client for several scientific research workflows, including idea evaluation, topic review, author discovery, author profiling, and idea generation.

The local client is responsible for:

building a structured request
calling a hosted SciNet API
running client-side post-processing such as reranking, PDF parsing, grounding, and Markdown report generation

Users do not need to connect to Neo4j or other database components directly.

🔍 Scope

This repository is intended to be a lightweight, runnable demo client.

run_scinet.py is the main entrypoint.
scinet/ contains the runnable workflow code.
references/search/ is a reference implementation for the standalone search stack and is not part of the main demo runtime.

🧩 Supported Tasks

Task Type	Required Input	Main Output
`grounded_review`	`--idea-text` or `--pdf-path`	grounded evidence, paragraph matches, and idea-level analysis
`topic_trend_review`	`--topic-text`	topic evolution summary and representative papers
`related_authors`	`--idea-text` or `--pdf-path`	related authors and supporting papers
`author_profile`	`--author-name`	research trajectory and representative works
`idea_generation`	`--topic-text`	generated ideas grounded in retrieved literature

🏗️ Workflow

Input -> Local Planning -> SciNet API Retrieval -> Local Post-processing -> JSON + Markdown Reports

Typical post-processing includes reranking, PDF extraction, evidence grounding, and response rendering.

📦 Installation

1. Create an environment

python3 -m venv .venv
source .venv/bin/activate

2. Install dependencies

pip install -U pip
pip install -r requirements.txt

⚙️ Configuration

1. Create the environment file

cp .env.example .env

2. Fill in the required variables

SCINET_API_BASE_URL=https://your-scinet-api.example.com
SCINET_API_KEY=replace-me
SCINET_API_TIMEOUT=120

OPENAI_API_KEY=replace-me
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1
OPENAI_MODEL=your-model-name

GROBID_BASE_URL=http://127.0.0.1:8070
OA_API_KEY=
OPENALEX_MAILTO=

3. Know what is required

Variable	Required For	Notes
`SCINET_API_BASE_URL`	all tasks	hosted `SciNet API` base URL
`SCINET_API_KEY`	all tasks	sent as `X-API-Key`
`OPENAI_API_KEY`	all tasks	used for planning and LLM summarization
`OPENAI_BASE_URL`	all tasks	OpenAI-compatible endpoint
`OPENAI_MODEL`	all tasks	chat model name
`GROBID_BASE_URL`	PDF tasks	needed for `--pdf-path` flows
`OA_API_KEY`	optional	OpenAlex fallback support
`OPENALEX_MAILTO`	optional	OpenAlex contact email

The code still accepts legacy SCIMAP_* and KG2API_* variables for compatibility, but new setups should use SCINET_API_*.

🛠️ GROBID

GROBID is a very lightweight information extraction tool specifically designed for technical and scientific publications, which can rapidly extract metadata, including titles, authors, abstracts, and references, from paper’s PDF file.

GROBID is needed for:

grounded_review
related_authors when using --pdf-path

Example startup with Docker:

docker pull lfoppiano/grobid:latest
docker run -d --rm --name grobid -p 8070:8070 lfoppiano/grobid:latest
curl http://127.0.0.1:8070/api/isalive

📂 Repository Layout

.
├── run_scinet.py
├── scinet/
│   ├── cli.py
│   ├── core/
│   ├── llm/
│   ├── search/
│   ├── tasks/
│   ├── evidence/
│   └── renderers/
├── examples/
├── tests/
└── references/
    └── search/

Key directories:

scinet/core/: shared config, schemas, and API client code
scinet/tasks/: task dispatch and task-specific logic
scinet/evidence/: PDF manifest building and evidence grounding
scinet/renderers/: Markdown rendering
examples/: runnable request examples
references/search/: standalone search reference code

🚀 Quick Start

If you only want the shortest path to a working run:

1. Make sure the following services are ready

a hosted SciNet API
an OpenAI-compatible LLM endpoint
GROBID if you want to use --pdf-path

2. Run a task

python3 run_scinet.py \
  --task-type topic_trend_review \
  --topic-text "research idea evaluation with large language models" \
  --pretty

3. Check the output

Each run creates a directory under runs/ containing:

request.json
result.json
result.md

🧪 Run Tasks

`grounded_review`

python3 run_scinet.py \
  --task-type grounded_review \
  --idea-text "Use literature-grounded evidence to evaluate research ideas." \
  --pretty

With PDF input:

python3 run_scinet.py \
  --task-type grounded_review \
  --pdf-path /absolute/path/to/paper.pdf \
  --params-file examples/grounded_review_params.example.json \
  --pretty

`topic_trend_review`

python3 run_scinet.py \
  --task-type topic_trend_review \
  --topic-text "research idea evaluation with large language models" \
  --pretty

`related_authors`

python3 run_scinet.py \
  --task-type related_authors \
  --idea-text "knowledge-grounded evaluation of scientific research ideas" \
  --pretty

`author_profile`

python3 run_scinet.py \
  --task-type author_profile \
  --author-name "Geoffrey Hinton" \
  --pretty

`idea_generation`

python3 run_scinet.py \
  --task-type idea_generation \
  --topic-text "scientific idea generation with retrieval-augmented large language models" \
  --pretty

📁 Request Files

You can also run tasks from JSON request files in examples/:

python3 run_scinet.py --request-file examples/grounded_review_request.json --pretty
python3 run_scinet.py --request-file examples/topic_trend_review_request.json --pretty
python3 run_scinet.py --request-file examples/related_authors_request.json --pretty
python3 run_scinet.py --request-file examples/author_profile_request.json --pretty
python3 run_scinet.py --request-file examples/idea_generation_request.json --pretty

For grounded_review, you can also override model-related parameters with:

examples/grounded_review_params.example.json
examples/grounded_review_params.cpu.example.json

By default, grounded_review uses:

embedding model: BAAI/bge-large-en-v1.5 huggingface_url
reranker model: BAAI/bge-reranker-large huggingface_url

The first run may download these models into the local Hugging Face cache.

✅ Testing

python3 -m unittest discover -s tests

📝 TODO

CLI Tools. Add more user-facing CLI capabilities so downstream users and AI agents can invoke retrieval workflows without touching database internals.
Skills. Package reusable agent skills for common scientific discovery workflows and expose best practices as easier-to-load components.
More Knowledge. Integrate more knowledge forms beyond paper-centric entities, such as datasets, code, standards, theorems, and experimental experience.
Benchmark and Evaluation. Build dedicated benchmarks and evaluation protocols for downstream scientific research tasks supported by SciNet.
Dynamic UpdateImprove dynamic knowledge updates toward a more systematic and frequent refresh mechanism.

✍️ Citation

If you find our work helpful, please use the following citations.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
imgs		imgs
references		references
scinet		scinet
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_scinet.py		run_scinet.py

Folders and files

Latest commit

History

Repository files navigation

SciNet: A Large-Scale Knowledge Graph for Automated Scientific Research

📑 Table of Contents

✨ Overview

🔍 Scope

🧩 Supported Tasks

🏗️ Workflow

📦 Installation

1. Create an environment

2. Install dependencies

⚙️ Configuration

1. Create the environment file

2. Fill in the required variables

3. Know what is required

🛠️ GROBID

📂 Repository Layout

🚀 Quick Start

1. Make sure the following services are ready

2. Run a task

3. Check the output

🧪 Run Tasks

grounded_review

topic_trend_review

related_authors

author_profile

idea_generation

📁 Request Files

✅ Testing

📝 TODO

✍️ Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`grounded_review`

`topic_trend_review`

`related_authors`

`author_profile`

`idea_generation`

Packages