CivAgent is an LLM-based Human-like Agent acting as a Digital Player within the Strategy Game Unciv.
-
Updated
Jun 5, 2024
CivAgent is an LLM-based Human-like Agent acting as a Digital Player within the Strategy Game Unciv.
Visualize LLM Evaluations for OpenAI Assistants
The prompt engineering, prompt management, and prompt evaluation tool for Ruby.
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
A prompt collection for testing and evaluation of LLMs.
The prompt engineering, prompt management, and prompt evaluation tool for Java.
This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".
A compilation of referenced benchmark metrics to evaluate different aspects of knowledge for Large Language Models.
Summary Evaluation Tool
The prompt engineering, prompt management, and prompt evaluation tool for Go.
Upload, score, and visually compare multiple LLM-graded summaries simultaneously!
The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.
Evaluate LLMs using custom functions for reasoning and RAGs and dataset using Langchain
LLMs Evaluation
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."