---
type: Glossary Term
title: SWE-bench
description: "SWE-bench is a standardized benchmark for evaluating how well AI systems can solve real-world software engineering tasks. The benchmark consists of over 2,000 a"
resource: "https://www.contextstudios.ai/glossary/swe-bench"
category: engineering
language: en
timestamp: "2026-04-07T12:11:07.689Z"
---

# SWE-bench

SWE-bench is a standardized benchmark for evaluating how well AI systems can solve real-world software engineering tasks. The benchmark consists of over 2,000 actual GitHub issues from popular open-source projects like Django, Flask, and scikit-learn. Each task includes a problem description, the relevant source code, and automated tests to verify the solution. AI models must analyze the code, identify the root cause of the issue, and generate a working patch — just like a human developer would. SWE-bench has become the primary benchmark for AI coding agents. Current top scores exceed 80 percent (Claude Opus 4.6 achieves 80.8%), demonstrating that AI agents are increasingly capable of solving complex software problems autonomously. Variants like SWE-bench Verified use human-validated subsets for even more reliable results.
