Welcome to the docs for the LM Evaluation Harness!
- To learn about the public interface of the library, as well as how to evaluate via the command line or as integrated into an external library, see the Interface.
- To learn how to add a new library, API, or model type to the library, as well as a quick explainer on the types of ways to evaluate an LM, see the Model Guide.
- For an extended description of how to extend the library to new model classes served over an API, see the API Guide.
- For a crash course on adding new tasks to the library, see our New Task Guide.
- To learn more about pushing the limits of task configuration that the Eval Harness supports, see the Task Configuration Guide.