Official Leaderboard
Workspace-Bench compares AI agents on realistic workspace tasks with large-scale file dependencies. Scores below combine task-level success and rubric-level grading. Rows marked Verified correspond to paper-reported or maintainers-verified results.
Public Lite Rankings
Public Workspace-Bench-Lite harness/model rows transcribed from the official repository figure.
Workspace-Bench Leaderboards
Framework x Model Matrix
Matrix view of public Workspace-Bench-Lite rubric pass rates. Blank cells mean the public figure does not expose that framework/model combination.
Threshold Views
Compare how many public Lite system combinations clear different rubric pass-rate thresholds.
Public threshold summary
Composition Analysis
These views are drawn from the official benchmark distribution figure and make it easier to read the benchmark through the same lenses people use when comparing research leaderboards.
Leaderboard Analysis
Secondary analysis views below avoid repeating the headline ranking chart and instead summarize the released public leaderboard from framework, model-family, and threshold-distribution angles.