AAlphaBot
← Blog

Kill-switch design: the one component that has to work

2026-05-15 · ~7 minute read

A trading bot is a closed-loop control system pointed at a financial account. When the loop misbehaves — bad market-data, an incorrect sign, a runaway re-quote — the account drains in minutes. The kill-switch is the component whose only job is to break the loop the moment the loss crosses a hard line. Of every part of a trading stack, this is the one that has to work.

Why it must be a separate process

A kill-switch embedded inside the trading process can be disabled by the very bug it is supposed to catch: a tight loop monopolising the event loop, an exception swallowing the check, a stale in-memory PnL value that never crossed the threshold because the position cache fell behind. The kill-switch should be a second OS process with its own connection to the exchange, its own copy of position state (pulled directly from the exchange REST API at fixed intervals), and its own “flatten everything” authority. If the trading process is wedged, the kill-switch still runs.

Failure modes a real kill-switch has to survive

  • Stale data: if the position feed has not ticked for N seconds, treat the system as “unknown state” and flatten.
  • Daily loss breach: a hard floor expressed in account currency, not bps — bps numbers drift when the equity moves.
  • Position-size breach: notional per instrument and aggregate notional, both capped.
  • Connectivity loss: if the trading process cannot heartbeat for N seconds, the kill-switch must assume the worst and pull every open order itself.
  • Order-rate explosion: if create/cancel rate exceeds the exchange limit, stop the bot before the exchange does.

What “flatten” actually means

Flatten is not “send a market order for the inverse position.” In a fragile state, a market order can slip across an empty book and turn a small breach into a large one. The right primitive is:

  1. Cancel every open order on every venue.
  2. Verify the cancellation took effect — do not trust the ack, re-query open orders.
  3. Close positions with a passive-then-aggressive ladder, capped at a maximum slippage. If the cap is hit, alert a human and stop.
  4. Disable the trading process entrypoint.

Test it the way you test a fire alarm

A kill-switch you have never tripped in production is a kill-switch you do not own. We trip ours on a scheduled cadence against a sandboxed sub-account: lower the threshold below the current PnL, observe the trip, observe the flatten, observe the alert. If any link in the chain silently fails, the test fails, and the bot does not get promoted to live capital that week.

None of this is novel. It is, however, the difference between a system you can sleep next to and one that turns a Tuesday morning into an incident.