Skip to content

whisperzqh/FastCoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification

FastCoder is a simple yet highly efficient approach for accelerating LLM inference specifically designed for code generation, without comprising the quality of the output.

Contents

Benchmark results

  • Performance on repository-level code generation (DevEval & RepoEval)

  • Performance on standalone-level code generation (HumanEval)

Below is an example. FastCoder completes the inference in just 4.2 seconds, while REST and autoregressive methods takes 6.2 seconds and 13.5 seconds, respectively. FastCoder demonstrates a 3.21x speedup compared to the autoregressive decoding and a 1.48x acceleration over REST.

The inference speeds of FastCoder and REST are comparable at the beginning. However, at approximately 2.5 seconds, the context- and LLM preference-aware cache of FastCoder becomes activated, and FastCoder achieves a substantial acceleration in the subsequent phases of inference.

Additionally, we have developed a VSCode plugin demo based on FastCoder, with more functionalities currently under development.

Installation

conda create -n fastcoder python=3.9
conda activate fastcoder
pip install -r requirements.txt
pip install DraftRetriever/wheels/draftretriever-0.1.0-cp39-cp39-linux_x86_64.whl

Build datastore

Build the common datastore

Build a common datastore from The Stack

cd datastore
python get_common_datastore.py --model-path deepseek-ai/deepseek-coder-6.7b-base

Build the repo datastore

Build a repo datastore for each repository from the source code after pre-processed (with the portions to be generated excluded).

Users can download the original repositories of DevEval from DevEval_Source_Code ( RepoEval from RepoEval_Source_Code). To exclude the ground truth from the original repositories, users should first place the downloaded archive into the designated directory dataset/DevEval(dataset/RepoEval), rename it to source_code.zip, and extract its contents. After that, execute the following command and the dataset/DevEval/source_code directory will contain the original repositories with all ground truth content removed.

cd dataset/DevEval
python filter_source_code.py

Alternatively, users can also directly download our preprocessed source code (with ground truth removed) from the link, place it in the designated directory, and extract it.

Then, use following commands to build a repo datastore.

cd datastore/DevEval
unzip source_code.zip
python3 get_repo_datastore.py --model-path deepseek-ai/deepseek-coder-6.7b-base --dataset DevEval

Inference

Inference on DevEval

cd evaluation
CUDA_VISIBLE_DEVICES=0 python deveval_test.py --model-path deepseek-ai/deepseek-coder-6.7b-base --p 0.5 --l 50 --s 20 --weights [1,1]

Inference on RepoEval

cd evaluation
CUDA_VISIBLE_DEVICES=0 python repoeval_test.py --model-path deepseek-ai/deepseek-coder-6.7b-base --p 0.5 --l 50 --s 20 --weights [1,1]

Inference on HumanEval

cd evaluation
CUDA_VISIBLE_DEVICES=0 python humaneval_test.py --model-path deepseek-ai/deepseek-coder-6.7b-base --p 0.5 --l 50 --s 20

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published