Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Added twitter, discord and other links plus a bunch of emojis.
  • Loading branch information
Albertorizzoli authored and simedw committed Jul 6, 2023
1 parent f462fe7 commit 2378d8c
Showing 1 changed file with 34 additions and 20 deletions.
54 changes: 34 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,52 @@
# BenchLLM
# 🏋️‍♂️ BenchLLM 🏋️‍♀️

BenchLLM is a Python-based open-source library that streamlines the testing process for Large Language Models (LLMs) and AI-powered applications. It offers an intuitive and robust way to validate and score the output of your code with minimal boilerplate or configuration.
🦾 Continuous Integration for LLM powered applications 🦙🦅🤖

BenchLLM is actively used at [V7](https://www.v7labs.com) for improving our LLM applications and now Open Sourced under MIT License to share with the wider community
[![GitHub Repo stars](https://img.shields.io/github/stars/v7labs/BenchLLM?style=social)](https://github.com/v7labs/BenchLLM/stargazers)
[![Twitter Follow](https://img.shields.io/twitter/follow/V7Labs?style=social)](https://twitter.com/V7Labs)
[![Discord Follow](https://dcbadge.vercel.app/api/server/x7ExfHb3bG?style=flat)](https://discord.gg/x7ExfHb3bG)

BenchLLM is a Python-based open-source library that streamlines the testing of Large Language Models (LLMs) and AI-powered applications. It measures the accuracy of your model, agents, or chains by validating responses on any number of tests via LLMs.

BenchLLM is actively used at [V7](https://www.v7labs.com) for improving our LLM applications and is now Open Sourced under MIT License to share with the wider community


## 💡 Get help on [Discord](https://discord.gg/x7ExfHb3bG) or [Tweet at us](https://twitter.com/V7Labs)

<hr/>

Use BenchLLM to:

- Easily set up a comprehensive testing suite for your LLMs.
- Continous integration for your langchain/agents/models.
- Elimiate flaky chains and create confidence in your code.
- Test the responses of your LLM across any number of prompts.
- Continuous integration for chains like [Langchain](https://github.com/hwchase17/langchain), agents like [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT), or LLM models like [Llama](https://github.com/facebookresearch/llama) or GPT-4.
- Eliminate flaky chains and create confidence in your code.
- Spot inaccurate responses and hallucinations in your application at every version.

> **NOTE:** BenchLLM is in the early stage of development and will be subject to rapid changes.
<hr/>

For bug reporting, feature requests, or contributions, please open an issue or submit a pull request (PR) on our GitHub page.
> ⚠️ **NOTE:** BenchLLM is in the early stage of development and will be subject to rapid changes.
>
>For bug reporting, feature requests, or contributions, please open an issue or submit a pull request (PR) on our GitHub page.
## BenchLLM Testing Methodology
## 🧪 BenchLLM Testing Methodology

BenchLLM implements a distinct two-step methodology for validating your machine learning models:

1. **Testing**: This stage involves running your code against various tests and capturing the predictions produced by your model without immediate judgment or comparison.
1. **Testing**: This stage involves running your code against any number of expected responses and capturing the predictions produced by your model without immediate judgment or comparison.

2. **Evaluation**: During this phase, the recorded predictions are compared against the expected output. Detailed comparison reports, including pass/fail status and other metrics, are generated.
2. **Evaluation**: The recorded predictions are compared against the expected output using LLMs to verify factual similarity (or optionally manually). Detailed comparison reports, including pass/fail status and other metrics, are generated.

This methodical separation offers a comprehensive view of your model's performance and allows for better control and refinement of each step.

## Install
## 🚀 Install

To install BenchLLM we use pip

```
pip install benchllm
```

## Usage
## 💻 Usage

Start by importing the library and use the @benchllm.test decorator to mark the function you'd like to test:

Expand Down Expand Up @@ -102,9 +116,9 @@ The non interactive evaluators also supports `--workers N` to run in the evaluat
$ bench run --evaluator string-match --workers 5
```

### Eval
### 🧮 Eval

While bench run runs each test function and then evaluates their output, it can often be beneficial to separate these into two steps. For example, if you want a person to manually do the evaluation or if you want to try multiple evaluation methods on the same function.
While _bench run_ runs each test function and then evaluates their output, it can often be beneficial to separate these into two steps. For example, if you want a person to manually do the evaluation or if you want to try multiple evaluation methods on the same function.

```bash
$ bench run --no-eval
Expand All @@ -117,7 +131,7 @@ Then later you can evaluate them with
$ bench eval output/latest/predictions
```

## API
## 🔌 API

For more detailed control, BenchLLM provides an API.
You are not required to add YML/JSON tests to be able to evaluate your model.
Expand Down Expand Up @@ -149,14 +163,14 @@ results = evaluator.run()
print(results)
```

## Commands
## ☕️ Commands

- `bench add`: Add a new test to a suite.
- `bench tests`: List all tests in a suite.
- `bench run`: Run all or target test suites.
- `bench eval`: Runs the evaluation of an existing test run.

## Contribute
## 🙌 Contribute

BenchLLM is developed for Python 3.10, although it may work with other Python versions as well. We recommend using a Python 3.10 environment. You can use conda or any other environment manager to set up the environment:

Expand All @@ -180,6 +194,6 @@ Contribution steps:
4. Test your changes.
5. Submit a pull request.

We adhere to PEP8 style guide. Please follow this guide when contributing.
We adhere to the PEP8 style guide. Please follow this guide when contributing.

For further information and advanced usage, please refer to the comprehensive BenchLLM documentation. If you need any support, feel free to open an issue on our GitHub page.
If you need any support, feel free to open an issue on our GitHub page.

0 comments on commit 2378d8c

Please sign in to comment.