ColdFire Model Candidates

Baseline

Experiment Rules

A model is deployable only if it beats current v1 on more than one axis. A pure PnL win without AUC/calibration support is treated as p-hacking risk.

Training Sample Logic

Decision-Time Match

P5 tests change only row selection or row weights. Features, LightGBM family, and 120s policy stay fixed.

Results So Far

Use the sort controls to inspect raw AUC, dAUC, OOS PnL, and dPnL. A candidate is not considered better unless both AUC and OOS policy PnL improve; roleplay is a stability guardrail.

Rank	Model	Status	AUC	dAUC	OOS PnL	dPnL	72h Roleplay	96h Roleplay	Read

Sequential Sample Tests

Model	Status	Train Rows	Flips	Full dAUC	Full dPnL	Seq dPnL	Fixed Attempt dAUC	96h Roleplay

Calibration-Only Tests

Model	Status	Cal Rows	Flips	Full dAUC	Full dPnL	Seq dPnL	Fixed Attempt dAUC	96h Roleplay

P8 Iteration Winners

Candidate	Logic	AUC Delta	PnL Delta	Seq PnL Delta	72h Roleplay	96h Roleplay	Orders	96h Losses Rejected	96h Wins Rejected	Read

P9 Holdout Check

Candidate	dAUC	dPnL	Early OOS dPnL	Late OOS dPnL	Older Roleplay	Recent 96h	Recent 72h	Robust?	Read

P10/P11 Reweighting

Candidate	Rule	dAUC	dPnL	Seq dPnL	Fixed Attempt dAUC	72h Roleplay	96h Roleplay	168h Roleplay	Read

Run Queue

Priority	Status	Model	Variant	Dependency	Why test it	Risk	dAUC	dPnL	dRoleplay

Model Candidate Search