Week 8 — Real-Time Control Loops, Timing Jitter, and a CAN Actuator Protocol

Course 1 syllabus

Overview

The embedded-robotics phase begins where the ML phase ends: a model is only useful if it runs inside a real-time loop talking to real actuators. This week builds a fixed-frequency control loop in C++ on Linux, measures its timing jitter rigorously, and adds a robotics/automotive-style actuator command + telemetry protocol over CAN (via SocketCAN), including deliberate fault injection. The themes are determinism and robustness: a control loop that mostly runs at 100 Hz but occasionally stalls for 50 ms is dangerous, and a protocol that assumes messages never drop will fail in the field.

Course 6 gave you the sampling and discrete-time intuition behind a periodic loop; here you build the loop itself and confront the OS-level realities (scheduling, jitter, priority) that a clean sampling model ignores. This is the runtime that the Week 9 estimator and the Week 10 safety system plug into.

Readings

  • HLW (How Linux Works): kernel vs user space, processes and scheduling, devices, and networking/device interfaces. Extract: why a normal Linux process has timing jitter and what knobs reduce it.
  • CA: CPU chapter skim. Extract: sources of latency (interrupts, cache misses) that perturb loop timing.
  • Embedded AI: embedded systems, CAN, and automotive context. Extract: the CAN model and why robotics/AV systems use it.
  • (Sampling, periodic-rate, and discrete-time control intuition: assumed from Course 6.)

Key Concepts

The fixed-frequency loop

A control loop targets a fixed period \(T\) (e.g. 10 ms for 100 Hz): read sensors, compute, command actuators, sleep until the next deadline. Use clock_nanosleep with TIMER_ABSTIME against a monotonic clock so errors don’t accumulate. Never sleep for a relative duration — drift compounds. The loop’s correctness is defined by hitting deadlines, not by average rate.

Jitter and how to measure it

Jitter is the deviation of actual period from target. Measure the timestamp at each loop top, compute inter-arrival times, and report the full distribution — mean, p50, p99, max — not just the mean. The tail is what matters: a 100 Hz loop with p99.9 = 25 ms has a safety problem the mean hides. Reduce jitter with real-time scheduling (SCHED_FIFO), CPU affinity/isolation, memory locking (mlockall), and avoiding allocation/syscalls in the loop.

CAN and SocketCAN

CAN is a multi-master, priority-arbitrated, message-oriented bus standard in automotive/robotics. A frame has an 11/29-bit ID (lower ID = higher priority) and up to 8 data bytes. SocketCAN exposes it as a Linux network interface, so you send/receive frames with sockets. Design a small protocol: actuator command frames, telemetry/state frames, and a heartbeat.

Designing for faults

Real buses drop frames, deliver them late, and see nodes drop off. Build in: sequence numbers (detect loss/reordering), a heartbeat + timeout (detect a dead peer), and a safe default on timeout (e.g. command zero / hold). Fault injection — deliberately dropping/delaying/corrupting frames — is how you test that the safe path actually works before it matters.

Theory Exercises

  1. Explain why clock_nanosleep(TIMER_ABSTIME) against a monotonic clock avoids the drift that relative sleeps accumulate.
  2. Define jitter and justify reporting p99/max over mean for a safety-critical loop.
  3. Derive a heartbeat timeout threshold from loop period and acceptable missed-frame count; discuss the false-positive vs latency tradeoff.
  4. Explain CAN priority arbitration from frame IDs; assign IDs to command/telemetry/heartbeat by criticality.
  5. Design a sequence-number scheme that detects loss and reordering with bounded state.

Implementation

Build a C++ fixed-frequency loop (runtime/) with absolute-time scheduling, optional SCHED_FIFO + affinity + mlockall. Add a SocketCAN layer (can/) implementing the command/telemetry/heartbeat protocol with sequence numbers and timeouts. Add a fault-injection harness (drop/delay/corrupt). Use a virtual CAN interface (vcan) if no hardware is attached.

Benchmark

Jitter distribution (p50/p99/max) under: default scheduling, SCHED_FIFO, and FIFO+affinity+mlock — quantify each improvement. Protocol: detected vs actual frame loss under injection, time-to-detect a dead peer, and verification that timeout drives the safe default.

Expected baselines: default scheduling shows occasional large tail spikes; real-time scheduling + affinity + mlock tightens p99/max dramatically. The fault harness confirms loss detection and that a missed heartbeat triggers the safe command within the derived threshold.

Connections

This runtime hosts the Week 9 estimator (which needs periodic, timestamped sensor data) and the Week 10 safety state machine (which depends on the heartbeat/timeout machinery). The periodic-sampling foundation is Course 6’s; the OS-level jitter reality is new here and central to embedded autonomy.