Template for an AI application that extracts the job information from a job description using openAI functions and langchain
-
Updated
Dec 21, 2023 - Python
Template for an AI application that extracts the job information from a job description using openAI functions and langchain
The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
Evaluating LLMs with CommonGen-Lite
Link your OpenAI Assistants to a custom store + Evaluate Assistant responses
[Personalize@EACL 2024] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models.
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy assessment in text generation. This package builds upon the framework provided by the original FactScore repository, which is no longer maintained and contains outdated functions.
A framework for automatically manipulating and evaluating the political ideology of LLMs with two ideology tests: Wahl-O-Mat and Political Compass Test.
EnsembleX utilizes the Knapsack algorithm to optimize Large Language Model (LLM) ensembles for quality-cost trade-offs, offering tailored suggestions across various domains through a Streamlit dashboard visualization.
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.
Large Model Evaluation Experiments
Framework for LLM evaluation, guardrails and security
Summary Evaluation Tool
Find better generation parameters for your LLM
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Superpipe - optimized LLM pipelines for structured data
Add a description, image, and links to the llm-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the llm-evaluation topic, visit your repo's landing page and select "manage topics."