model-benchmarking

Interactive Python toolkit for exploring, testing, and benchmarking LLM tokenization, prompt behaviors, and sequence efficiency in a safe, modular sandbox environment.

python cli yaml automation opensource csv sandbox developer-tools tokenization model-benchmarking ai-evaluation llm prompt-engineering llm-testing

Updated May 26, 2025
Python

StressTestor / PromptPressure-EvalSuite

Star

An open-source evaluation suite for testing LLMs on refusal handling, tone control, and reasoning. Built to explore model behavior across nuanced user cases.

python cli yaml automation opensource csv metrics test-suite evaluation dataset openai developer-tools mistral groq ai-benchmarks model-benchmarking ai-evaluation llm prompt-engineering

Updated Jun 15, 2025
Python

Improve this page

Add a description, image, and links to the model-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-benchmarking

Here are 6 public repositories matching this topic...

Climate-REF / climate-ref

alibaba / AIOpsServing

Sarah-Hesham-2022 / USVotingDataset-Decision-Tree-Model

Rachit-Shaha / Evaluation-of-LLMs-on-STS-Data

StressTestor / TokenPressureSandbox

StressTestor / PromptPressure-EvalSuite

Improve this page

Add this topic to your repo