reinforcement learning
RL dataset & training
Each turn → one (state, action, reward) sample. The REWARD column is the canonical reward_scalar from the backend — never a hand-combined sum.
nbot offline
No
nbot serve reachable — Horizon discovers it via .nconfig/control.json. Dataset is still browsable offline; training is disabled.transitions
—
sessions
—
success %
—
x̄ reward
—
x̄ latency
—
trainers & jobs
Policy training pipeline
Register trainers (built-in nbot-replay or external CLIs), run them against the dataset below, and track job history.
Trainers
Loading…
Pick a trainer.
dataset
Transitions & policy view
| session | turn | ts | state | action | reward |
|---|---|---|---|---|---|