Submission
Submit results with enough metadata for maintainers to reproduce the run, inspect the deliverables, and verify the reported score. Verified Results are expected to include logs and configuration details.
Required Files
Result JSON, report link, logs, environment configuration, and task split details.
Recommended Files
Run scripts, cost summary, patched artifacts, and a short write-up of known failure cases.
Verified Results
Verified submissions should be reproducible by maintainers from the provided materials.
Submission Checklist
Every submission should make the evaluation setup explicit.
Agent Metadata
Agent harness name, backbone model, version, tool permissions, and prompt or policy configuration.
Environment Metadata
Runtime versions, hardware notes if relevant, benchmark split, cost budget, timeout policy, and workspace limits.
Execution Artifacts
Logs, final deliverables, patch summaries, and any auxiliary validation outputs generated during the run.
Reproduction Route
A stable report URL or repository path that maintainers can use to replay or inspect the run end to end.