Task intelligence feed
Representative examples
Signal What the agent must do Why it matters
Hidden Dependencies Find relevant files that are not named in the instruction. Prevents shallow root-file lookup from scoring well.
Evidence Path Connect task instruction, workspace files, implicit dependencies, and rubrics. Measures workspace learning rather than isolated QA.
Rubric Examples Satisfy fine-grained grading criteria for correctness and completeness. Captures partial failures that binary pass/fail would hide.
Hidden Dependencies

Important context can live in comments, archives, policy appendices, historical notes, formulas, or metadata.

Rubric Examples

Rubrics check evidence use, policy constraints, conflict resolution, auditability, and final deliverable quality.

Benchmark Role

Examples clarify the kinds of workspace failures that aggregate scores compress into a single number.

Evidence path. A strong run identifies the task instruction, searches likely dependency locations, resolves conflicts across files, and maps the final deliverable to rubric criteria.
Rubric examples. A task can be partially correct while still failing completeness, traceability, or policy-adherence checks.

Example Feed

Example cards are generated from the same local Workspace-Bench data bundle used by the rest of the site, so the page works from both local file opening and a static server.