NBOT · Horizon
reinforcement learning

RL dataset & training

Each turn → one (state, action, reward) sample. The REWARD column is the canonical reward_scalar from the backend — never a hand-combined sum.

No nbot serve reachable — Horizon discovers it via .nconfig/control.json. Dataset is still browsable offline; training is disabled.
transitions
sessions
success %
x̄ reward
x̄ latency
trainers & jobs

Policy training pipeline

Register trainers (built-in nbot-replay or external CLIs), run them against the dataset below, and track job history.

Trainers

Loading…
Pick a trainer.
dataset

Transitions & policy view

sessionturntsstateactionreward