ColdFire / Model Research

Model Candidate Search

Candidate queue for testing model families on top of Feature Experiment Model Experiment Version 1. The goal is to compare AUC, OOS policy PnL, and 72h/96h roleplay deltas before touching production.

Baseline

Experiment Rules

A model is deployable only if it beats current v1 on more than one axis. A pure PnL win without AUC/calibration support is treated as p-hacking risk.

Results So Far

Use the sort controls to inspect raw AUC, dAUC, OOS PnL, and dPnL. A candidate is not considered better unless both AUC and OOS policy PnL improve; roleplay is a stability guardrail.
Rank Model Status AUC dAUC OOS PnL dPnL 72h Roleplay 96h Roleplay Read

Run Queue

Priority Status Model Variant Dependency Why test it Risk dAUC dPnL dRoleplay

Phases

Research Sources