Representative Tasks
Task intelligence feed for Workspace-Bench examples. The page shows how hidden dependencies, evidence path construction, and rubric-based grading shape the benchmark.
Task intelligence feed
Representative examples
| Signal | What the agent must do | Why it matters |
|---|---|---|
| Hidden Dependencies | Find relevant files that are not named in the instruction. | Prevents shallow root-file lookup from scoring well. |
| Evidence Path | Connect task instruction, workspace files, implicit dependencies, and rubrics. | Measures workspace learning rather than isolated QA. |
| Rubric Examples | Satisfy fine-grained grading criteria for correctness and completeness. | Captures partial failures that binary pass/fail would hide. |
Hidden Dependencies
Important context can live in comments, archives, policy appendices, historical notes, formulas, or metadata.
Rubric Examples
Rubrics check evidence use, policy constraints, conflict resolution, auditability, and final deliverable quality.
Benchmark Role
Examples clarify the kinds of workspace failures that aggregate scores compress into a single number.
Evidence path. A strong run identifies the task instruction, searches likely dependency
locations, resolves conflicts across files, and maps the final deliverable to rubric criteria.
Rubric examples. A task can be partially correct while still failing completeness,
traceability, or policy-adherence checks.
Example Feed
Example cards are generated from the same local Workspace-Bench data bundle used by the rest of the site, so the page works from both local file opening and a static server.