Device operating modes and firmware behavior#
Companion to mqtt-contract.md. Where the contract defines the wire protocol between the device and HA, this doc defines the device's own behavior — what it does locally, independent of the protocol.
Scope#
Covers:
- Operating mode state machine (BOOT / ONLINE / OFFLINE)
- Button behavior in each mode
- LED status colors
- Persistent state (what's saved in NVS, what resets)
- Boot behavior and state restoration
- Reconnection strategy
Does not cover: the Rust implementation details (that's firmware code), the audio signal chain (see signal-chain.md), or the MQTT wire format (see mqtt-contract.md).
Operating modes#
power on
│
▼
┌────────┐
│ BOOT │ Initializing peripherals, restoring NVS state, attempting WiFi
└────┬───┘
│
│ WiFi ok + MQTT ok WiFi fails OR MQTT fails
│ │ │
▼ ▼ ▼
┌──────────┐ ┌───────────┐
│ ONLINE │◄─── reconnect ──│ OFFLINE │
│ │ │ │
│ Button → │ │ Button → │
│ MQTT │─── drop ────────► local │
│ events │ │ effect │
└──────────┘ └───────────┘
BOOT#
Entered on power-on or reset. Responsibilities:
- Initialize I2S, GPIO, NVS, RGB LED
- Read persistent state from NVS:
volume_index,volume_direction,was_playing - Read the STA MAC; the lowercase 12-char hex is the device's identity for MQTT topics and discovery
unique_ids (seemqtt-contract.md) - If
was_playing == true: start white noise generator immediately at saved volume (power-blip recovery — don't wake the user with silence) - Attempt WiFi connect against compile-time stored credentials, with 60s retry on failure
- If WiFi connects, attempt MQTT connect (the C MQTT client manages its own reconnect)
- Transition to ONLINE or OFFLINE based on outcome
- On the first successful MQTT
Connectedevent, callesp_ota_mark_app_valid_cancel_rollbackto confirm the running firmware (see "OTA + rollback" below)
BOOT should complete to some steady mode within ~45 seconds worst case.
ONLINE#
WiFi up, MQTT connected, discovery configs published, subscribed to command topics.
- Button events publish to
nightstand/<mac_hex>/button - Commands from
nightstand/<mac_hex>/cmd/+are received and acted on - State changes publish to
nightstand/<mac_hex>/state(retained) - Availability topic shows
online - Short press round-trips through HA: button publishes
{"event_type":"short"}, HA decides whether it's bedtime or morning and publishes backcmd/play ON/OFF; the device does NOT toggle audio locally. Long-press still cycles volume locally (and publishes the event for HA logging). Double-press is publish-only — pure HA gesture.
If MQTT drops (broker down, network partition, etc.): transition to OFFLINE.
OFFLINE#
WiFi or MQTT not available. Device keeps working locally.
- Button events not published (there's nobody listening)
- Short press toggles white noise locally — the network task supplies this fallback so muscle memory still works without HA
- Long press cycles volume locally (yo-yo through preset list — see "Button behavior")
- Double press is detected by firmware but has no local effect (no lights to control offline; serial log notes that the late-night-lights routine is online-only)
- No commands are received
- Background task retries WiFi + MQTT every 60s; on success, transition to ONLINE
Offline mode is the primary travel mode — when the device wakes up in a hotel room and doesn't see the home WiFi, it just works as a standalone white noise machine with a button.
Button behavior matrix#
Full combined table (pairs with the state machine in mqtt-contract.md):
| Input | ONLINE | OFFLINE |
|---|---|---|
| Short press | Publish {"event_type":"short"} only; HA decides and publishes back cmd/play ON/OFF (round-trip) |
Toggle white noise locally (network task supplies the fallback) |
| Long press (≥2s) | Cycle volume preset locally (yo-yo) + publish {"event_type":"long"} |
Cycle volume preset locally (yo-yo) |
| Double press | Publish {"event_type":"double"} (HA late-night-lights routine — no local effect) |
Detected, no-op (no lights to control offline) |
Long-press cycles volume identically in both modes — muscle memory doesn't change for volume. Short-press is the deliberate divergence: when HA is online we let it decide (is it bedtime? morning? toggle just this nightstand or both?), and when HA is offline the device stands alone. Double-press is online-only — it's a pure HA gesture with no useful local fallback.
Volume cycle: long-press advances through [10%, 25%, 50%, 75%, 100%] in the current direction; when it hits an end, the next long-press flips direction (yo-yo). Both volume index and direction are persisted in NVS so the cycle resumes from where you left off after a reboot.
Double-press as the late-night gesture: when one of us has to get up to check on something at 3 AM, double-tapping the nightstand button asks HA to bring up the outdoor and downstairs lights. Detected and emitted by firmware in both modes, but only meaningful when online — offline just logs a note that the routine isn't available.
Latency note: short-press has ~400 ms of detection latency (we have to wait for the double-press window to close before knowing it's a single). Imperceptible for sleepy-user use cases; the cost we pay for unambiguous gesture detection.
LED status colors#
The SK6812 behind the button cap is the only status indicator. Goal: visible enough to read at 1m, dim enough to not light the room at night.
All colors are at dim brightness (~5-10% of full) unless noted.
The LED state machine composes a base color from two orthogonal axes — audio playback state and network state — and applies overrides for OTA and unrecoverable errors on top. Updating and Error win over the base; PressFlash is a transient brightening overlay that decays over ~150 ms.
| Audio × Net | Color | Pattern |
|---|---|---|
| Connecting (any audio) | Cyan | Slow pulse (~1 Hz) |
| Online, idle | Green | Solid, very dim |
| Online, playing | Green | Solid, medium-dim |
| Offline, idle | Amber | Solid, very dim |
| Offline, playing | Amber | Solid, medium-dim |
| Override | Color | Pattern |
|---|---|---|
| OTA download in progress | Magenta | Slow pulse (~1.25 Hz) |
| Error (I2S init failed, etc.) | Red | Slow blink (~2 Hz) |
| Button press ack | Brighten the current color ~50 % | Decays over 150 ms |
The button-press flash is a nice tactile confirmation — press, see a brief brighter pulse, know it registered even in the dark.
The OTA-failure path explicitly clears the magenta override (via an internal UpdateDone signal from the OTA worker) so a failed install drops the LED back to the audio×net base color instead of leaving it stuck pulsing magenta forever.
Persistent state (NVS)#
Stored in ESP32's NVS flash partition. Survives power loss, restarts, even OTA updates (separate partition from app binaries).
| Key | Type | Purpose | Written when |
|---|---|---|---|
volume_index |
u8 | Index 0..=4 into VOLUME_PRESETS = [10, 25, 50, 75, 100] |
Long-press cycles the index, or HA sets volume |
volume_direction |
u8 | 0 = Up, 1 = Down — the current yo-yo direction | Flipped when index hits an end of the preset list |
was_playing |
u8 (0/1) | Whether white noise was playing at last state change | Every play/stop transition |
Not stored (computed / volatile):
- Logical name (derived from MAC)
- Connection state
- RSSI, uptime
WiFi credentials: not in NVS — they live in firmware/cfg.toml (gitignored) and are baked into the binary at compile time via toml-cfg. This is a v1 simplification; an NVS-backed multi-SSID list is a future v2+ enhancement (home + travel router + backup, tried in order during BOOT).
Boot-time state restoration#
Design decision: if the device was playing white noise before losing power, it resumes automatically on boot.
Rationale:
- Power blips happen. If the device came back silent at 3 AM, the user wakes up.
- If the user deliberately turned it off (via button or HA),
was_playingis alreadyfalsein NVS, so it stays off. - The check is
was_playing(what was the last state I was asked to be in?), not "am I currently playing" (obviously no, I just booted).
Edge case: if WiFi/MQTT never connect and was_playing was true, the device plays offline from the start. Correct behavior.
Reconnection strategy#
When OFFLINE (either never connected or dropped from ONLINE):
- Retry WiFi every 60 seconds
- If WiFi connects, retry MQTT within 10s
- On MQTT success, transition to ONLINE:
- Re-publish retained discovery configs (idempotent, covers HA restart during our offline window)
- Re-publish retained
onlineto availability topic - Re-publish retained state snapshot to
nightstand/<mac_hex>/state - Re-subscribe to command topics
- On MQTT fail, stay OFFLINE; try again in 60s
No exponential backoff — device is wall-powered, we don't care about battery life, and 60s is a reasonable balance between "react to the network coming back" and "not spam the broker during multi-hour outages."
OTA + rollback#
The firmware ships with a two-OTA partition layout (ota_0 and ota_1, each 1.875 MB) plus an otadata partition that records which slot is active. New firmware is written to the inactive slot via esp_https_ota; on success, otadata is flipped and the device reboots into the new slot.
Partition layout (4 MB ESP32-PICO-D4)#
| Region | Offset | Size | Purpose |
|---|---|---|---|
| bootloader | 0x01000 |
28 KB | ESP-IDF stage-2 loader |
| partition table | 0x08000 |
4 KB | This file's binary form |
| nvs | 0x09000 |
24 KB | Volume, was_playing |
| otadata | 0x0F000 |
8 KB | Active-slot pointer |
| phy_init | 0x11000 |
4 KB | RF calibration (regenerated if missing) |
| ota_0 | 0x20000 |
1.875 MB | App slot A |
| ota_1 | 0x200000 |
1.875 MB | App slot B |
NVS sits at the same offset as the single-slot v0.1.0/v0.2.0 layout, so the partition swap preserves persisted audio state. The 56 KB gap between phy_init and ota_0 is the cost of the 64 KB alignment requirement on app partitions.
Pending-verify and mark_app_valid#
After an OTA reboot, the new firmware boots in pending-verify state. The bootloader expects the running app to call esp_ota_mark_app_valid_cancel_rollback once it's confident things work; if a reset happens before that call, the bootloader reverts to the previous slot on the next boot. The firmware calls this on the first MQTT Connected event — proving WiFi and the broker both work, which is the device's primary job. Wired-flashed firmware isn't in pending-verify state, so the call is a no-op (documented behavior).
If MQTT never connects after an OTA, the device will roll back on the next reset and come up on the previous version. HA notices the installed_version in update/state reverted; the update card flips back to "Update available."
The trade-off is that any post-OTA reset before MQTT comes up looks like a rollback. In practice that means: don't power-cycle a device for 30 s after clicking Install. Watching the LED flip from magenta → cyan → green is the proxy for "OTA succeeded."
One-time wired migration#
The two-OTA layout is not the default ESP-IDF partition table. Devices going from v0.1.0 / v0.2.0 → v0.3.x must be wire-flashed once to write the new partition table; from v0.3.0 onward, every bump is OTA. The Makefile's flash target writes the new bootloader, partition table, and otadata in addition to the app, so the migration is a single make flash.
Error handling#
| Error | Behavior |
|---|---|
| I2S driver init fails | Red blink, no audio. Stay in whatever connection mode works. Log loudly. |
| NVS read fails | Use defaults (volume_index=2 (=50%), volume_direction=Up, was_playing=false). Log. |
| NVS write fails | Log, keep running. State won't persist across reboot but that's a graceful degradation. |
| WiFi password wrong | Stay OFFLINE forever until updated. No good recovery. |
| MQTT broker unreachable | Stay OFFLINE, retry per strategy above. |
| OTA download fails | Keep running current firmware. Log. Republish update/state with in_progress: false so HA's progress bar disappears. LED reverts from magenta to the audio×net base color. |
| OTA boot fails / app crashes before mark_valid | ESP-IDF's two-partition rollback auto-reverts to the previous slot. Device comes back on the old version; HA sees installed_version revert and lights up the "Update available" card again. |
What's not in this doc#
- WiFi provisioning mechanism — credentials are baked into the binary at compile time via
cfg.toml. SoftAP/BLE/Improv provisioning is a possible future addition. - Audio generation parameters — the actual noise generator's filter shape, amplitude, etc. live in source and are tuned over time.
- Secure boot / signed firmware — we don't sign images. Threat model is LAN-only, same as MQTT being plain.
Sources#
- MQTT Contract — companion doc; wire protocol
- Signal chain — hardware audio path
- Atom Echo pinmap — GPIO usage
- ESP-IDF NVS documentation
- ESP-IDF OTA documentation
- ESP-IDF App rollback (mark_app_valid)