DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation

DomAgent is an autonomous coding agent designed to enhance the domain adaptability of large language models (LLMs). While general-purpose LLMs excel at generic code generation, they often struggle in real-world software development that requires domain-specific knowledge. To bridge this gap, DomAgent enables LLMs to generate domain-adapted code through structured reasoning and targeted knowledge retrieval. Its core component, DomRetriever, combines knowledge-graph reasoning (top-down understanding) with case-based reasoning (bottom-up examples) to dynamically retrieve and synthesize domain knowledge and representative cases. We train the large language models Qwen2.5-7B and LLaMA-3.1-8B using a Reinforcement Learning (RL) framework GRPO.

DomAgent can work as a standalone system or integrate with any LLM, helping small open-source models approach the performance of large proprietary ones in complex, domain-specific coding tasks.

Here, we only include the open-source benchmark dataset DS-1000. Due to the double-blind review policy, the truck CAN signal dataset used in our experiments will be released after the paper is accepted.

Package Directory Structure

.
├── OpenRLHF-RAG
├── README.md
├── requirements.txt
├── data
│   └── ds1000.jsonl
├── case-base-tool
│   ├── plain_text.json
│   └── semantic_search.py
├── kg-tool
│   ├── DS-KG.ttl
│   ├── kg_search_function.py
│   ├── prompt_list.py
│   └── utils.py
├── evaluation
│   ├── edit_ds1000.py
│   ├── execution.py
│   ├── extract_entity_from_query.py
│   ├── infer_domagent.py
│   ├── run_dist_inference.py
│   ├── run_inference.py
│   ├── run_openai.py
│   └── test_ds1000.py
├── reward-remote
│   └── reward_server.py
└── scripts
    ├── ray_start.sh
    └── reinforce_train.sh

Project Structure

OpenRLHF-RAG/
Contains tools for RLHF (Reinforcement Learning with Human Feedback) model training.
This folder is adapted from OpenRLHF/OpenRLHF.
data/
Stores the training and testing datasets.
evaluation/
Used to load trained models and perform evaluation.
kg-tool/
Provides utilities for retrieving information from a knowledge graph.
case-base-tool/
Provides utilities for retrieving information from a case base.
reward-remote/
Implements the remote reward function used during RL training.
scripts/
Contains training scripts used in the RL training pipeline.

Requirements

Python 3.x
Install the required libraries:
```
pip install -r requirements.txt
```

Usage

Enter the DomAgent folder:
```
cd DomAgent
```

Training：

 ## Ray start
 bash scripts/ray_start.sh

 ## Start Reward Server
 python reward-remote/reward_server.py --port 1278

 ## Training
 bash scripts/reinforce_train.sh

Inference:

 python evaluation/infer_domagent.py --gpu_id 0 --temp 0.0 --port 5004 --prompt_type v0 --src_file data/eval_set/bamboogle_500.jsonl --model_path the_path_to_model

Execute the generated code to obtain the result：
```
 python evaluation/test_ds1000.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation

Package Directory Structure

Project Structure

Requirements

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
OpenRLHF-RAG		OpenRLHF-RAG
case-base-tool		case-base-tool
data		data
evaluation		evaluation
kg-tool		kg-tool
reward-remote		reward-remote
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation

Package Directory Structure

Project Structure

Requirements

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages