Open-source client for running literature-grounded scientific research tasks on top of SciNet API.
Our KG backend is currently undergoing intensive deployment and testing, and will be released within one week! Thank you for your patient waiting!- β¨ Overview
- π Scope
- π§© Supported Tasks
- ποΈ Workflow
- π¦ Installation
- βοΈ Configuration
- π οΈ GROBID
- π Repository Layout
- π Quick Start
- π§ͺ Run Tasks
- π Request Files
- β Testing
- π TODO
- βοΈ Citation
SciNet is a large-scale, multi-disciplinary, heterogeneous academic resource knowledge graph designed as a panoramic scientific evolution network. By integrating over 43M papers from 26 disciplines, and a total of 157M entites and 3B triplets, SciNet provides a structured topological cognitive substrate that dismantles disciplinary barriers and furnishes AI agents with a global perspective.
This repository provides a runnable client for several scientific research workflows, including idea evaluation, topic review, author discovery, author profiling, and idea generation.
The local client is responsible for:
- building a structured request
- calling a hosted
SciNet API - running client-side post-processing such as reranking, PDF parsing, grounding, and Markdown report generation
Users do not need to connect to Neo4j or other database components directly.
This repository is intended to be a lightweight, runnable demo client.
run_scinet.pyis the main entrypoint.scinet/contains the runnable workflow code.references/search/is a reference implementation for the standalone search stack and is not part of the main demo runtime.
| Task Type | Required Input | Main Output |
|---|---|---|
grounded_review |
--idea-text or --pdf-path |
grounded evidence, paragraph matches, and idea-level analysis |
topic_trend_review |
--topic-text |
topic evolution summary and representative papers |
related_authors |
--idea-text or --pdf-path |
related authors and supporting papers |
author_profile |
--author-name |
research trajectory and representative works |
idea_generation |
--topic-text |
generated ideas grounded in retrieved literature |
Input -> Local Planning -> SciNet API Retrieval -> Local Post-processing -> JSON + Markdown Reports
Typical post-processing includes reranking, PDF extraction, evidence grounding, and response rendering.
python3 -m venv .venv
source .venv/bin/activatepip install -U pip
pip install -r requirements.txtcp .env.example .envSCINET_API_BASE_URL=https://your-scinet-api.example.com
SCINET_API_KEY=replace-me
SCINET_API_TIMEOUT=120
OPENAI_API_KEY=replace-me
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1
OPENAI_MODEL=your-model-name
GROBID_BASE_URL=http://127.0.0.1:8070
OA_API_KEY=
OPENALEX_MAILTO=| Variable | Required For | Notes |
|---|---|---|
SCINET_API_BASE_URL |
all tasks | hosted SciNet API base URL |
SCINET_API_KEY |
all tasks | sent as X-API-Key |
OPENAI_API_KEY |
all tasks | used for planning and LLM summarization |
OPENAI_BASE_URL |
all tasks | OpenAI-compatible endpoint |
OPENAI_MODEL |
all tasks | chat model name |
GROBID_BASE_URL |
PDF tasks | needed for --pdf-path flows |
OA_API_KEY |
optional | OpenAlex fallback support |
OPENALEX_MAILTO |
optional | OpenAlex contact email |
The code still accepts legacy SCIMAP_* and KG2API_* variables for compatibility, but new setups should use SCINET_API_*.
GROBID is a very lightweight information extraction tool specifically designed for technical and scientific publications, which can rapidly extract metadata, including titles, authors, abstracts, and references, from paperβs PDF file.
GROBID is needed for:
grounded_reviewrelated_authorswhen using--pdf-path
Example startup with Docker:
docker pull lfoppiano/grobid:latest
docker run -d --rm --name grobid -p 8070:8070 lfoppiano/grobid:latest
curl http://127.0.0.1:8070/api/isalive.
βββ run_scinet.py
βββ scinet/
β βββ cli.py
β βββ core/
β βββ llm/
β βββ search/
β βββ tasks/
β βββ evidence/
β βββ renderers/
βββ examples/
βββ tests/
βββ references/
βββ search/
Key directories:
scinet/core/: shared config, schemas, and API client codescinet/tasks/: task dispatch and task-specific logicscinet/evidence/: PDF manifest building and evidence groundingscinet/renderers/: Markdown renderingexamples/: runnable request examplesreferences/search/: standalone search reference code
If you only want the shortest path to a working run:
- a hosted
SciNet API - an OpenAI-compatible LLM endpoint
- GROBID if you want to use
--pdf-path
python3 run_scinet.py \
--task-type topic_trend_review \
--topic-text "research idea evaluation with large language models" \
--prettyEach run creates a directory under runs/ containing:
request.jsonresult.jsonresult.md
python3 run_scinet.py \
--task-type grounded_review \
--idea-text "Use literature-grounded evidence to evaluate research ideas." \
--prettyWith PDF input:
python3 run_scinet.py \
--task-type grounded_review \
--pdf-path /absolute/path/to/paper.pdf \
--params-file examples/grounded_review_params.example.json \
--prettypython3 run_scinet.py \
--task-type topic_trend_review \
--topic-text "research idea evaluation with large language models" \
--prettypython3 run_scinet.py \
--task-type related_authors \
--idea-text "knowledge-grounded evaluation of scientific research ideas" \
--prettypython3 run_scinet.py \
--task-type author_profile \
--author-name "Geoffrey Hinton" \
--prettypython3 run_scinet.py \
--task-type idea_generation \
--topic-text "scientific idea generation with retrieval-augmented large language models" \
--prettyYou can also run tasks from JSON request files in examples/:
python3 run_scinet.py --request-file examples/grounded_review_request.json --pretty
python3 run_scinet.py --request-file examples/topic_trend_review_request.json --pretty
python3 run_scinet.py --request-file examples/related_authors_request.json --pretty
python3 run_scinet.py --request-file examples/author_profile_request.json --pretty
python3 run_scinet.py --request-file examples/idea_generation_request.json --prettyFor grounded_review, you can also override model-related parameters with:
examples/grounded_review_params.example.jsonexamples/grounded_review_params.cpu.example.json
By default, grounded_review uses:
- embedding model:
BAAI/bge-large-en-v1.5huggingface_url - reranker model:
BAAI/bge-reranker-largehuggingface_url
The first run may download these models into the local Hugging Face cache.
python3 -m unittest discover -s tests- CLI Tools. Add more user-facing CLI capabilities so downstream users and AI agents can invoke retrieval workflows without touching database internals.
- Skills. Package reusable agent skills for common scientific discovery workflows and expose best practices as easier-to-load components.
- More Knowledge. Integrate more knowledge forms beyond paper-centric entities, such as datasets, code, standards, theorems, and experimental experience.
- Benchmark and Evaluation. Build dedicated benchmarks and evaluation protocols for downstream scientific research tasks supported by SciNet.
- Dynamic UpdateImprove dynamic knowledge updates toward a more systematic and frequent refresh mechanism.
If you find our work helpful, please use the following citations.
This project is licensed under the MIT License - see the LICENSE file for details.

