Baseline
Experiment Rules
A model is deployable only if it beats current v1 on more than one axis. A pure PnL win without AUC/calibration support is treated as p-hacking risk.
Results So Far
Use the sort controls to inspect raw AUC, dAUC, OOS PnL, and dPnL. A candidate is not considered better unless both AUC and OOS policy PnL improve; roleplay is a stability guardrail.
| Rank | Model | Status | AUC | dAUC | OOS PnL | dPnL | 72h Roleplay | 96h Roleplay | Read |
|---|
Run Queue
| Priority | Status | Model | Variant | Dependency | Why test it | Risk | dAUC | dPnL | dRoleplay |
|---|