Levelbrook
Labs
An engineering experiment
AI Agent Evaluation Dashboard
New Evaluation Run
Agent Model
Helios-7b-v2
Orion-13b-FT
Cygnus-X1-Alpha
Environment
E-commerce Checkout
Customer Support Triage
Internal API Integration
Simulations
Run Evaluation
Evaluation History
All
Completed
Needs Review
Running
Failed
Run ID
Agent
Environment
Timestamp
Status
Success Rate
Actions
No runs match the current filter.
Run Details