Skip to content

Navigation Menu

SWE-bench

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Sign up

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

SWE-bench

Organization for maintaining the SWE-bench/agent projects

76 followers
https://swebench.com/

Overview
Repositories 6
Projects
Packages
People 4

More

Overview
Repositories
Projects
Packages
People

README.md

SWE-bench

This organization contains the source code for SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.

Use the repositories in this organization to...

Construct SWE-bench datasets and run local evaluation (SWE-bench/SWE-bench)
Run evaluations automatically and quickly on the cloud (SWE-bench/sb-cli)
Submit your predictions and evaluation results to be featured on the public leaderboard (SWE-bench/experiments)

Also check out related organizations

SWE-bench-repos: Mirror clones for repositories used for SWE-bench style evalautions.
SWE-agent: Solve GitHub issue(s) automatically with a Language Model powered agent!

Pinned Loading

SWE-bench Public

SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

Python 2.9k 485
experiments Public

Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

Shell 168 174
sb-cli Public

Run SWE-bench evaluations remotely

Python 10
swe-bench.github.io Public

Landing page + leaderboard for SWE-Bench benchmark

HTML 4 6

Repositories

Loading

Type

Select type

All Public Sources Forks Archived Mirrors Templates

Language

Select language

All HTML Jupyter Notebook Python Shell

Sort

Select order

Last updated Name Stars

Showing 6 of 6 repositories

SWE-bench Public
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

Python 2,867 MIT 485 36 8 Updated Apr 22, 2025
sb-cli Public
Run SWE-bench evaluations remotely

Python 10 MIT 0 3 0 Updated Apr 18, 2025
swe-bench.github.io Public
Landing page + leaderboard for SWE-Bench benchmark

HTML 4 6 2 2 Updated Mar 31, 2025
experiments Public
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

Shell 168 174 6 13 Updated Mar 31, 2025
.github Public

0 0 0 0 Updated Feb 26, 2025
humanevalfix-results Public archive
Evaluation data + results for SWE-agent inference on HumanEvalFix task

Jupyter Notebook 0 0 0 0 Updated Jul 12, 2024

People

Top languages

Python Shell HTML Jupyter Notebook

Most used topics

Loading…

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.