Workspace-Bench Representative Tasks

Task intelligence feed

Representative examples

Signal	What the agent must do	Why it matters
Hidden Dependencies	Find relevant files that are not named in the instruction.	Prevents shallow root-file lookup from scoring well.
Evidence Path	Connect task instruction, workspace files, implicit dependencies, and rubrics.	Measures workspace learning rather than isolated QA.
Rubric Examples	Satisfy fine-grained grading criteria for correctness and completeness.	Captures partial failures that binary pass/fail would hide.

Hidden Dependencies

Important context can live in comments, archives, policy appendices, historical notes, formulas, or metadata.

Rubric Examples

Rubrics check evidence use, policy constraints, conflict resolution, auditability, and final deliverable quality.

Benchmark Role

Examples clarify the kinds of workspace failures that aggregate scores compress into a single number.

Evidence path. A strong run identifies the task instruction, searches likely dependency locations, resolves conflicts across files, and maps the final deliverable to rubric criteria.

Rubric examples. A task can be partially correct while still failing completeness, traceability, or policy-adherence checks.

Example Feed

Example cards are generated from the same local Workspace-Bench data bundle used by the rest of the site, so the page works from both local file opening and a static server.