evals orb

This repository has the code for the CircleCI Evals Orb.

The Evals orb simplifies the definition and execution of evaluation jobs using popular third-party tools, and generates reports of evaluation results.

Given the volatile nature of evaluations, evaluations orchestrated through this orb do not halt the pipeline if an evaluation fails. This approach ensures that the inherent flakiness of evaluations does not disrupt the development cycle. Instead, a summary of the evaluation results is created and presented:

As an artifact within the CircleCI User Interface:
As a comment on the corresponding GitHub pull request (only available for GitHub projects integrated through OAuth):

Usage

Getting Started

Enter your OpenAI, LangSmith, and/or BrainTrust credentials into CircleCI

Just navigate to Project Settings > LLMOps and fill out the form by Clicking Set up Integration.

This will create a context with environment variables for the credentials you've set up above.

⚠️ Please take note of the generated context name (e.g. ai-llm-eval-examples). This will be used to update context value in the CircleCI configuration file.

💡 You can also optionally store a GITHUB_TOKEN as an environment variable on this context, if you'd like your pipelines to post summarized eval job results as comments on GitHub pull requests.

Set up the orb to post eval job summaries as comments on GitHub pull requests

Warning

Currently, this feature is available only to GitHub projects integrated through OAuth. To find out which GitHub account type you have, refer to the GitHub OAuth integration page of our Docs.

In order to post comments to GitHub pull requests, you will need to create an environment variable named GITHUB_TOKEN with a GitHub Personal Access Token that has repo scope access.

Once created, add GITHUB_TOKEN as a context environment variable on the same context you created as part of LLMOps Integration via Project Settings > LLMOps.

You can also access this context via Organization Settings > Contexts.

You will then need to ensure you add the context key to the job that requires access to it, as follows...

# WORKFLOWS
workflows:
  braintrust-evals:
    when: << pipeline.parameters.run-braintrust-evals >>
    jobs:
      - run-braintrust-evals:
          context:
            - ai-llm-eval-examples # Replace this with your context name
  langsmith-evals:
    when: << pipeline.parameters.run-langsmith-evals >>
    jobs:
      - run-langsmith-evals:
          context:
            - ai-llm-eval-examples # Replace this with your context name

Orb Parameters

The evals orb accepts the following parameters:

Some of the parameters are optional based on the eval platform being used.

Common parameters

circle_pipeline_id: CircleCI Pipeline ID
cmd: Command to run the evaluation
eval_platform: Evaluation platform (e.g. braintrust, langsmith or custom; default: custom)
evals_result_location: Location to save evaluation results (default: ./results)
shell: Shell to use (default: /bin/bash). This param only applies when eval_platform is not provided or is set to custom.

Braintrust-specific parameters

braintrust_experiment_name (optional): Braintrust experiment name
- If no value is provided, an experiment name will be auto-generated based on an MD5 hash of <CIRCLE_PIPELINE_ID>_<CIRCLE_WORKFLOW_ID>.

LangSmith-specific parameters

langsmith_endpoint (optional): LangSmith API endpoint (default: https://api.smith.langchain.com)
langsmith_experiment_name (optional): LangSmith experiment name
- If no value is provided, an experiment name will be auto-generated based on an MD5 hash of <CIRCLE_PIPELINE_ID>_<CIRCLE_WORKFLOW_ID>.

Use in Config

For full config usage guidelines, see the evals orb documentation.

Usage Examples

For evals orb usage examples, check out the llm-eval-examples repo.

FAQ

View the FAQ in the wiki

Contributing

We welcome issues to and pull requests against this repository!

For further questions/comments about this or other orbs, visit the CircleCI Orbs discussion forum.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.circleci		.circleci
.github		.github
images		images
src		src
.yamllint		.yamllint
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

evals orb

Usage

Getting Started

Enter your OpenAI, LangSmith, and/or BrainTrust credentials into CircleCI

Set up the orb to post eval job summaries as comments on GitHub pull requests

Orb Parameters

Common parameters

Braintrust-specific parameters

LangSmith-specific parameters

Use in Config

Usage Examples

FAQ

Contributing

About

Releases

Packages

Languages

dlayci/evals-orb

Folders and files

Latest commit

History

Repository files navigation

evals orb

Usage

Getting Started

Enter your OpenAI, LangSmith, and/or BrainTrust credentials into CircleCI

Set up the orb to post eval job summaries as comments on GitHub pull requests

Orb Parameters

Common parameters

Braintrust-specific parameters

LangSmith-specific parameters

Use in Config

Usage Examples

FAQ

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages