Skip to content

Wangshuaiia/DomAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation

DomAgent is an autonomous coding agent designed to enhance the domain adaptability of large language models (LLMs). While general-purpose LLMs excel at generic code generation, they often struggle in real-world software development that requires domain-specific knowledge. To bridge this gap, DomAgent enables LLMs to generate domain-adapted code through structured reasoning and targeted knowledge retrieval. Its core component, DomRetriever, combines knowledge-graph reasoning (top-down understanding) with case-based reasoning (bottom-up examples) to dynamically retrieve and synthesize domain knowledge and representative cases. We train the large language models Qwen2.5-7B and LLaMA-3.1-8B using a Reinforcement Learning (RL) framework GRPO.

DomAgent can work as a standalone system or integrate with any LLM, helping small open-source models approach the performance of large proprietary ones in complex, domain-specific coding tasks.

Here, we only include the open-source benchmark dataset DS-1000. Due to the double-blind review policy, the truck CAN signal dataset used in our experiments will be released after the paper is accepted.

Package Directory Structure

.
├── OpenRLHF-RAG
├── README.md
├── requirements.txt
├── data
│   └── ds1000.jsonl
├── case-base-tool
│   ├── plain_text.json
│   └── semantic_search.py
├── kg-tool
│   ├── DS-KG.ttl
│   ├── kg_search_function.py
│   ├── prompt_list.py
│   └── utils.py
├── evaluation
│   ├── edit_ds1000.py
│   ├── execution.py
│   ├── extract_entity_from_query.py
│   ├── infer_domagent.py
│   ├── run_dist_inference.py
│   ├── run_inference.py
│   ├── run_openai.py
│   └── test_ds1000.py
├── reward-remote
│   └── reward_server.py
└── scripts
    ├── ray_start.sh
    └── reinforce_train.sh

Project Structure

  • OpenRLHF-RAG/
    Contains tools for RLHF (Reinforcement Learning with Human Feedback) model training.
    This folder is adapted from OpenRLHF/OpenRLHF.

  • data/
    Stores the training and testing datasets.

  • evaluation/
    Used to load trained models and perform evaluation.

  • kg-tool/
    Provides utilities for retrieving information from a knowledge graph.

  • case-base-tool/
    Provides utilities for retrieving information from a case base.

  • reward-remote/
    Implements the remote reward function used during RL training.

  • scripts/
    Contains training scripts used in the RL training pipeline.


Requirements

  • Python 3.x
  • Install the required libraries:
    pip install -r requirements.txt

Usage

  1. Enter the DomAgent folder:
    cd DomAgent
  2. Training:
     ## Ray start
     bash scripts/ray_start.sh
    
     ## Start Reward Server
     python reward-remote/reward_server.py --port 1278
    
     ## Training
     bash scripts/reinforce_train.sh
  3. Inference:
     python evaluation/infer_domagent.py --gpu_id 0 --temp 0.0 --port 5004 --prompt_type v0 --src_file data/eval_set/bamboogle_500.jsonl --model_path the_path_to_model
  4. Execute the generated code to obtain the result:
     python evaluation/test_ds1000.py

About

Replication package of DomAgent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages