The Adaptive AlpacaEval framework is designed to automate the generation and evaluation of datasets using different configurations. This repository includes scripts for generating data, hosting models, and evaluating the results comprehensively. We continuously refine our approach based on community feedback to enhance the open-source tools we offer.
- Data Generation: Scripts to generate datasets using multiple configurations and language models.
- Model Hosting: Instructions and scripts for hosting various language models for easy API access.
- Evaluation: Comprehensive evaluation tools to assess the performance of language models on generated datasets.
- Scalability: Designed to handle multiple datasets and language model configurations efficiently.
- [Date] New features or updates about the framework will be announced here.
Install all required dependencies to ensure all scripts function correctly.
pip install -r requirements.txtBefore running any scripts, update the API keys and model details in the provided shell scripts as per your setup. Ensure the input paths and other parameters are correctly set according to your environment.
Data generation scripts are organized into separate folders based on their specific tasks:
Run the following command to generate different dataset variants. This script handles multiple datasets and applies specified generation modes.
bash generate_all_adap_alpaca.shFor hosting models locally, use the provided script which sets up a server for a specified model:
bash host_vllm_server.shEvaluation scripts are included to assess the quality of the generated datasets and model responses. These scripts compare outputs against a reference set and provide detailed metrics.
bash generate_different_dataset_and_eval_gpt.shHere we explain how to utilize each script within the AdapAlpaca framework.
This script generates datasets with varied word count limits. Each dataset is tailored to specific model configurations and generation modes.
Usage:
bash generate_all_adap_alpaca.shWhat it does:
- Creates an output directory for the generated datasets.
- Executes a Python script multiple times with different configurations to cover a range of word count limits from the template JSON.
- Outputs are stored in
adapAlpaca_outputwith filenames indicating the word count range.
Hosts a specified language model on a local server, allowing API access to the model functionalities.
Usage:
bash host_vllm_server.shDetails:
- Sets up a server for the LLaMA model or any specified model compatible with the VLLM serving guidelines.
- Configures the server to run in the background, logging its output for monitoring purposes.
Generates datasets using a specified GPT model and evaluates them.
Usage:
bash generate_different_dataset_and_eval_gpt.shSteps:
- Dataset Generation: Generates datasets for different configurations (e.g.,
koala,vicuna). - Evaluation Preparation: Prepares folders and configurations for evaluation.
- Evaluation Execution: Runs evaluation scripts to assess the dataset quality and model performance, outputting detailed metrics and logs.
NOTE: Before running this script you need to setup vllm and host a openai serve -> check host_vllm_server.sh
Generates datasets using a specified model hosting through vllm and evaluates them.
Usage:
bash generate_different_dataset_and_eval_vllm.shSteps:
- Dataset Generation: Generates datasets for different configurations (e.g.,
koala,vicuna). - Evaluation Preparation: Prepares folders and configurations for evaluation.
- Evaluation Execution: Runs evaluation scripts to assess the dataset quality and model performance, outputting detailed metrics and logs.
Generates datasets based on the GPT model for different instructions set templates.
Usage:
bash generate_different_dataset_gpt.shFunctionality:
- Iterates through various dataset templates.
- Applies the specified GPT model to generate outputs, which are saved in a designated output folder.
NOTE: Before running this script you need to setup vllm and host a openai serve -> check host_vllm_server.sh
Generates datasets using the hosted VLLM models, specified by the user.
Usage:
bash generate_different_dataset_vllm.shOperation:
- Similar to the GPT script but tailored for VLLM models.
- Ensures datasets are compatible with the VLLM model outputs and specifications.
If you find the AdapAlpaca framework useful in your research, please consider citing:
@misc{adapalpaca2024,
title={Understanding Win Rate for Better LLM-based Preference Evaluation},
author={xxx, xxx},
year={2024},
note={Provided scripts and tools for dataset generation and model evaluation}
}We welcome contributions and suggestions from the community. Please feel free to fork the repository, make changes, and submit pull requests. Your insights are valuable to us!
