A nightstand noise generator based on M5Stack Atom Echo and integrating with Home Assistant
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

v0.3.4: OTA updates over MQTT, with HA progress bar

Devices now ship new firmware to themselves via Home Assistant. The
publish side is a `make firmware-ota-publish` that builds the binary,
copies it to a static HTTP host on the LAN, and announces the version
on a shared retained MQTT topic. HA's `update` entity compares the
shared latest against each device's installed version and shows an
Install button on the device card; clicking it sends the device the
URL pattern, which streams the binary via `esp_https_ota` into the
inactive partition slot, with `update_percentage` republished every 5%
so HA renders a real progress bar. After reboot, the firmware confirms
itself with `esp_ota_mark_app_valid_cancel_rollback` once MQTT is back,
which arms ESP-IDF's two-slot rollback against any version that can't
reach the broker.

Notable bits in the diff:

* Two-slot OTA partition table (`firmware/partitions.csv`); a one-time
wired flash migrates a v0.2.x device to it. From v0.3 onward every
bump is OTA.
* `CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP=y` — without it the C function
rejects http:// URLs at config-validation time and never opens a
socket. Plain HTTP is intentional; the trust boundary is the LAN.
* `make firmware-ota-publish` reads `OTA_LOCAL_DIR`/`OTA_URL_BASE`/
`MQTT_URL` from `.envrc.private` (gitignored). The firmware reads
`ota_url_base` from `cfg.toml`; consequently any change to that
value is a wired-flash event for now.
* `firmware/src/channels.rs` — the std::sync::mpsc + esp-idf-rs
pthread-mutex incompatibility bites the OTA worker too; we use
FreeRTOS native queues here as elsewhere.
* `reference/mqtt-contract.md` and `operating-modes.md` rewritten to
match the actually-implemented behavior: button is a `sensor` with
an idle-after-N-ms reset (the v0.2.1 change), no more `rssi`, the
`update` entity, the shared `sound-machine/firmware/latest` topic,
and the OTA flow + rollback semantics.

References:
* https://docs.espressif.com/projects/esp-idf/en/v5.3.3/esp32/api-reference/system/ota.html#app-rollback
* https://www.home-assistant.io/integrations/update.mqtt/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

+944 -140
+4
.envrc
··· 15 15 # The symlink in firmware/.lib points the old SONAME at the new library; 16 16 # the ABI clang touches is small and stable enough that this works. 17 17 export LD_LIBRARY_PATH="$PWD/firmware/.lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" 18 + 19 + # Host-only secrets (OTA push directory, etc.) live in .envrc.private, 20 + # which is gitignored. See .envrc.private.example for the shape. 21 + source_env_if_exists .envrc.private
+24
.envrc.private.example
··· 1 + #!/usr/bin/env bash 2 + # Host-only env vars for the publish/OTA flow. 3 + # 4 + # Copy this file to `.envrc.private` (gitignored) and fill in real values. 5 + # `.envrc` sources it automatically when direnv loads. 6 + # 7 + # These are needed by the `make ota-publish` flow on the dev machine; the 8 + # firmware itself reads its own copy of OTA_URL_BASE from `firmware/cfg.toml` 9 + # at compile time. 10 + 11 + # Where on this machine the .bin files get copied so the static HTTP server 12 + # can pick them up. Subdirectory per project; the publish flow copies to 13 + # $OTA_LOCAL_DIR/sound-machine-<version>.bin. 14 + export OTA_LOCAL_DIR="/path/to/firmware/sound-machine" 15 + 16 + # Public-ish base URL the device will fetch from. Must match the firmware's 17 + # `ota_url_base` in cfg.toml. Trailing slash optional; the publish flow 18 + # normalizes it. 19 + export OTA_URL_BASE="http://firmware.example.lan/sound-machine" 20 + 21 + # MQTT broker URL used by `make ota-publish` to push the new latest_version 22 + # to the shared topic. mosquitto_pub accepts the same scheme as the device's 23 + # cfg.toml mqtt_url, so it's fine to reuse that value. 24 + export MQTT_URL="mqtt://mqtt.example.lan:1883"
+1
.gitignore
··· 1 1 .direnv/ 2 + .envrc.private 2 3 *.swp 3 4 *.swo 4 5 .DS_Store
+5 -1
Makefile
··· 6 6 7 7 MAKEFLAGS += --no-print-directory 8 8 9 - .PHONY: all firmware firmware-check firmware-flash firmware-flash-monitor firmware-monitor firmware-clean clean help 9 + .PHONY: all firmware firmware-check firmware-flash firmware-flash-monitor firmware-monitor firmware-ota-publish firmware-clean clean help 10 10 11 11 all: firmware 12 12 ··· 25 25 firmware-monitor: 26 26 $(MAKE) -C firmware monitor 27 27 28 + firmware-ota-publish: 29 + $(MAKE) -C firmware ota-publish 30 + 28 31 firmware-clean: 29 32 $(MAKE) -C firmware clean 30 33 ··· 38 41 @echo " firmware-flash headless: build + flash, no monitor" 39 42 @echo " firmware-flash-monitor interactive: build + flash + serial monitor (needs TTY)" 40 43 @echo " firmware-monitor serial monitor only" 44 + @echo " firmware-ota-publish save .bin to OTA_LOCAL_DIR + publish latest_version" 41 45 @echo " firmware-clean cargo clean" 42 46 @echo " clean clean everything" 43 47 @echo ""
+17 -13
README.md
··· 8 8 9 9 ``` 10 10 sound-machine/ 11 - ├── README.md # this file 12 - ├── Makefile # top-level entry: `make` builds firmware (and future model rendering) 13 - ├── .envrc # direnv: ESP toolchain env + libxml2 shim path 14 - ├── firmware/ # Rust firmware (esp-idf-svc, std). See firmware/README.md 15 - └── reference/ # Design docs and hardware reference 16 - ├── mqtt-contract.md # wire protocol between device and HA 17 - ├── operating-modes.md # firmware state machine, LED scheme, NVS 18 - ├── signal-chain.md # audio path: ESP32 → MAX98357A → speaker 19 - ├── atom-echo/ # M5Stack Atom Echo pinmap, dimensions, schematic 20 - ├── speakers/ # Adafruit 1314 driver notes 21 - └── datasheets/ # vendor PDFs for ESP32-PICO-D4, NS4168, SPM1423 11 + ├── README.md # this file 12 + ├── Makefile # top-level entry: `make` builds firmware (and future model rendering) 13 + ├── .envrc # direnv: ESP toolchain env + libxml2 shim path 14 + ├── .envrc.private.example # template for host-only secrets (OTA dir, MQTT URL) 15 + ├── firmware/ # Rust firmware (esp-idf-svc, std). See firmware/README.md 16 + └── reference/ # Design docs and hardware reference 17 + ├── mqtt-contract.md # wire protocol between device and HA 18 + ├── operating-modes.md # firmware state machine, LED scheme, NVS, OTA 19 + ├── signal-chain.md # audio path: ESP32 → MAX98357A → speaker 20 + ├── atom-echo/ # M5Stack Atom Echo pinmap, dimensions, schematic 21 + ├── speakers/ # Adafruit 1314 driver notes 22 + └── datasheets/ # vendor PDFs for ESP32-PICO-D4, NS4168, SPM1423 22 23 ``` 23 24 24 25 ## Status ··· 26 27 - ✅ **Hardware research and selection complete** — see `reference/` 27 28 - ✅ **MQTT contract and operating modes designed** — `reference/mqtt-contract.md`, `reference/operating-modes.md` 28 29 - ✅ **Toolchain validated end-to-end** — `firmware/` builds, flashes, and runs on real hardware 29 - - ✅ **v0.1.0 — offline-mode firmware** — button, audio, NVS, LED — 2026-04-25 30 + - ✅ **v0.1.0 — offline-mode firmware** — button, audio, NVS, LED 30 31 - ✅ **v0.2.0 — online-mode firmware** — WiFi + MQTT + HA Discovery 32 + - ✅ **v0.3.x — OTA-capable firmware** — two-slot partition layout, HA `update` entity with progress bar, `esp_ota_mark_app_valid` rollback 31 33 - 🚧 **Awaiting hardware** — MAX98357A amps on order from DigiKey 32 34 - 🚧 **Enclosure design** — 3D-printable case TBD 33 35 ··· 35 37 36 38 Each device runs Rust firmware (esp-idf-svc, std mode) on an Atom Echo. WiFi connects to the home network, MQTT to a LAN-only broker (no TLS), HA Discovery announces entities. The button publishes events; HA decides what to do; HA sends back a "play white noise" command. Audio is generated locally on-device (no streaming dependency) and sent over I2S to an external MAX98357A amp driving a 3" 4Ω speaker. Onboard NS4168 amp is bypassed (no I2S data sent to its pins) — it's known not to be sized for sustained white noise. When WiFi or MQTT drops, the device falls into offline mode where the button toggles white noise locally; same code path as travel use. 37 39 38 - Both units run **the same firmware binary** — identity is derived at runtime from the chip's STA MAC and used directly as the topic-prefix segment (`nightstand/<mac_hex>/...`). HA users name each device in the HA UI; the MQTT contract guarantees stable `unique_id`s per MAC. One build, OTA-pushed to both. (OTA is v1.5; v1 ships USB-flashed.) 40 + Firmware updates are over-the-air via HA's MQTT `update` entity: `make firmware-ota-publish` builds the new binary, copies it to a static HTTP host on the LAN, and announces the version on a shared MQTT topic. HA shows an Install button on each device's card; clicking it streams the firmware in over plain HTTP, with a live progress bar driven by retained MQTT publishes. ESP-IDF's two-slot partition layout means a broken firmware automatically rolls back to the previous version on the next reset. 41 + 42 + Both units run **the same firmware binary** — identity is derived at runtime from the chip's STA MAC and used directly as the topic-prefix segment (`nightstand/<mac_hex>/...`). HA users name each device in the HA UI; the MQTT contract guarantees stable `unique_id`s per MAC. 39 43 40 44 For the gory details: `firmware/README.md`, `reference/mqtt-contract.md`, `reference/operating-modes.md`. 41 45
+1 -1
firmware/Cargo.lock
··· 1543 1543 1544 1544 [[package]] 1545 1545 name = "sound-machine" 1546 - version = "0.2.0" 1546 + version = "0.3.4" 1547 1547 dependencies = [ 1548 1548 "anyhow", 1549 1549 "embuild",
+1 -1
firmware/Cargo.toml
··· 1 1 [package] 2 2 name = "sound-machine" 3 - version = "0.2.0" 3 + version = "0.3.4" 4 4 edition = "2021" 5 5 resolver = "2" 6 6 rust-version = "1.77"
+65 -5
firmware/Makefile
··· 12 12 13 13 export LD_LIBRARY_PATH := $(LIB_DIR):$(LD_LIBRARY_PATH) 14 14 15 + # Inject the absolute path to partitions.csv via a generated overlay file. 16 + # ESP-IDF resolves relative CONFIG_PARTITION_TABLE_FILENAME against an 17 + # internal CMake source dir under target/, not against this directory, so 18 + # baking a relative path into the committed sdkconfig.defaults would fail. 19 + # Generating the overlay here keeps the committed config portable. 20 + GEN_DIR := $(CURDIR)/target/gen 21 + GEN_PARTITIONS_SDKCONFIG := $(GEN_DIR)/sdkconfig.defaults.partitions 22 + PARTITIONS_CSV := $(CURDIR)/partitions.csv 23 + 24 + # esp-idf-sys reads ESP_IDF_SDKCONFIG_DEFAULTS as the list of project sdkconfig 25 + # defaults files (it then internally builds its own SDKCONFIG_DEFAULTS env var 26 + # for cmake by prepending its generated defaults). 27 + export ESP_IDF_SDKCONFIG_DEFAULTS := $(CURDIR)/sdkconfig.defaults;$(GEN_PARTITIONS_SDKCONFIG) 28 + 15 29 CARGO ?= cargo 16 30 BIN := target/xtensa-esp32-espidf/release/sound-machine 31 + BOOTLOADER_BIN := target/xtensa-esp32-espidf/release/bootloader.bin 17 32 # First /dev/ttyUSB* / /dev/ttyACM* found, used for headless flashing. 18 33 # Override on the command line: `make flash PORT=/dev/ttyUSB1` 19 34 PORT ?= $(firstword $(wildcard /dev/ttyUSB* /dev/ttyACM*)) 20 35 21 - .PHONY: build check flash flash-monitor monitor clean help $(LIBXML2_COMPAT) 36 + # Firmware version, parsed from Cargo.toml. Used for the OTA filename and 37 + # the latest_version MQTT publish. 38 + VERSION := $(shell sed -nE '0,/^version *= *"([^"]+)"/{s//\1/p}' Cargo.toml) 39 + OTA_BIN := sound-machine-$(VERSION).bin 22 40 23 - build: $(LIBXML2_COMPAT) 41 + .PHONY: build check flash flash-monitor monitor clean help ota-publish $(LIBXML2_COMPAT) 42 + 43 + build: $(LIBXML2_COMPAT) $(GEN_PARTITIONS_SDKCONFIG) 24 44 $(CARGO) build --release 25 45 26 - check: $(LIBXML2_COMPAT) 46 + check: $(LIBXML2_COMPAT) $(GEN_PARTITIONS_SDKCONFIG) 27 47 $(CARGO) check 28 48 29 49 # Headless flash — builds, then writes to the device with no interactive ··· 34 54 echo "no serial port found (looked for /dev/ttyUSB* and /dev/ttyACM*)"; \ 35 55 exit 1; \ 36 56 fi 37 - espflash flash --port $(PORT) $(BIN) 57 + espflash flash --port $(PORT) \ 58 + --bootloader $(BOOTLOADER_BIN) \ 59 + --partition-table $(PARTITIONS_CSV) \ 60 + --erase-parts otadata \ 61 + $(BIN) 38 62 39 63 # Interactive flash + monitor — builds, flashes, attaches the serial monitor. 40 64 # Needs a TTY (the monitor writes to terminal and reads keyboard input). 41 - flash-monitor: $(LIBXML2_COMPAT) 65 + flash-monitor: $(LIBXML2_COMPAT) $(GEN_PARTITIONS_SDKCONFIG) 42 66 $(CARGO) run --release 43 67 44 68 monitor: 45 69 espflash monitor 46 70 71 + # Publish a new firmware version. Reads OTA_LOCAL_DIR / OTA_URL_BASE / MQTT_URL 72 + # from the environment (sourced via direnv from .envrc.private). Builds the 73 + # release binary, generates the flat .bin via espflash save-image, copies it 74 + # into the static-HTTP serve directory, then publishes the new latest_version 75 + # retained to the shared MQTT topic. Both nightstands' HA update cards light 76 + # up the moment the publish lands. 77 + ota-publish: build 78 + @if [ -z "$(OTA_LOCAL_DIR)" ]; then \ 79 + echo "OTA_LOCAL_DIR is not set (see .envrc.private.example)"; exit 1; \ 80 + fi 81 + @if [ -z "$(OTA_URL_BASE)" ]; then \ 82 + echo "OTA_URL_BASE is not set (see .envrc.private.example)"; exit 1; \ 83 + fi 84 + @if [ -z "$(MQTT_URL)" ]; then \ 85 + echo "MQTT_URL is not set (see .envrc.private.example)"; exit 1; \ 86 + fi 87 + @if [ -z "$(VERSION)" ]; then \ 88 + echo "could not parse version from Cargo.toml"; exit 1; \ 89 + fi 90 + @echo "publishing v$(VERSION) → $(OTA_URL_BASE)/$(OTA_BIN)" 91 + espflash save-image --chip esp32 --flash-size 4mb $(BIN) $(OTA_LOCAL_DIR)/$(OTA_BIN) 92 + mosquitto_pub \ 93 + -L "$(MQTT_URL)/sound-machine/firmware/latest" \ 94 + -r \ 95 + -m "$(VERSION)" 96 + @echo "done. devices will see the update in HA within a few seconds." 97 + 47 98 clean: 48 99 $(CARGO) clean 49 100 ··· 60 111 fi; \ 61 112 fi 62 113 114 + # Generated sdkconfig overlay carrying the absolute path to partitions.csv. 115 + # Regenerated whenever the CSV or this Makefile changes. 116 + $(GEN_PARTITIONS_SDKCONFIG): $(PARTITIONS_CSV) $(MAKEFILE_LIST) 117 + @mkdir -p $(GEN_DIR) 118 + @printf 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="%s"\nCONFIG_PARTITION_TABLE_FILENAME="%s"\n' \ 119 + '$(PARTITIONS_CSV)' '$(PARTITIONS_CSV)' > $@ 120 + @echo "generated $@" 121 + 63 122 help: 64 123 @echo "firmware targets:" 65 124 @echo " build cargo build --release (default)" ··· 67 126 @echo " flash headless: build + flash, no monitor (works in any shell)" 68 127 @echo " flash-monitor interactive: build + flash + serial monitor (needs a TTY)" 69 128 @echo " monitor espflash monitor only" 129 + @echo " ota-publish save .bin to \$$OTA_LOCAL_DIR + publish latest_version to MQTT" 70 130 @echo " clean cargo clean"
+88 -16
firmware/README.md
··· 6 6 7 7 ## Status 8 8 9 - **Milestone v0.2.0 — online mode (WiFi + MQTT + HA Discovery).** Adds WiFi/MQTT layered on the v0.1.0 offline foundation. Short press now round-trips through HA when online (button publishes `{"event_type":"short"}`, HA's automation publishes back `cmd/play ON`/`OFF`); offline, short-press still toggles audio locally as a fallback. Long-press cycles volume locally in both modes (with a publish for HA logging when online). Double-press is purely an MQTT gesture for HA's late-night-lights routine. The MAC-derived topic prefix means one binary works on every unit; HA names devices in its UI. 9 + **Milestone v0.3.x — OTA-capable online firmware.** v0.3 adds end-to-end OTA: a two-slot partition layout, an HA `update` entity with installed/latest versions and a live progress bar, the `make ota-publish` workflow, and `esp_ota_mark_app_valid_cancel_rollback` boot validation so a broken firmware reverts to the previous slot. v0.2 (online mode: WiFi + MQTT + HA Discovery, round-tripped short-press) and v0.1 (offline button + audio + NVS + LED) are the foundation underneath. One binary works on every unit — identity is derived at runtime from the STA MAC. 10 10 11 11 | Subsystem | State | 12 12 | --- | --- | ··· 16 16 | Continuous pink noise generator (Paul Kellet IIR) | ✅ xorshift32 white → pink filter, volume-scaled | 17 17 | Button state machine (short / long / double) | ✅ working | 18 18 | NVS persistence (volume + direction + playing) | ✅ working | 19 - | RGB LED (SK6812 on G27 via RMT) | ✅ working — composes (net, audio) → color, press-flash overlay | 19 + | RGB LED (SK6812 on G27 via RMT) | ✅ working — (net, audio) base + OTA/Error overrides + press-flash | 20 20 | WiFi | ✅ working — STA, hostname `nightstand-<mac_hex>`, auto-reconnect | 21 - | MQTT client + HA discovery | ✅ working — LWT, retained discovery on (re)connect, `cmd/play` + `cmd/volume` subscribed | 22 - | OTA updates | ❌ v1.5 (design in `reference/mqtt-contract.md`) | 21 + | MQTT client + HA discovery | ✅ working — LWT, retained discovery on (re)connect, `cmd/play`/`cmd/volume`/`cmd/update` subscribed | 22 + | OTA updates | ✅ working — `esp_https_ota` chunked download with HA progress bar, two-slot rollback | 23 23 | Hardware: external MAX98357A + 1314 speaker | ❌ amps in transit; using onboard for now | 24 24 25 25 ## Module layout ··· 32 32 ├── nvs.rs — typed NVS wrapper for volume_index, volume_direction, was_playing 33 33 ├── audio.rs — I2S + xorshift white noise + Paul Kellet pink filter + audio task 34 34 ├── button.rs — 5-state button FSM + button task (owns G39 PinDriver) 35 - ├── led.rs — SK6812 RMT driver + LED task; (net, audio) → color composition 35 + ├── led.rs — SK6812 RMT driver + LED task; (net, audio) base + OTA/Error overrides 36 36 ├── network.rs — WiFi + MQTT state machine; gatekeeper for online/offline routing 37 37 ├── discovery.rs — HA Discovery JSON payloads (built via format!() — no serde_json) 38 + ├── ota.rs — esp_https_ota wrapper with chunked progress + mark_app_valid 39 + ├── channels.rs — FreeRTOS-queue-backed Sender/Receiver (std::sync::mpsc is broken on esp-idf) 38 40 └── secrets.rs — toml-cfg config struct sourced from cfg.toml at compile time 39 41 ``` 40 42 41 - The deliberate factoring: each task owns its peripherals exclusively; cross-task communication is via `std::sync::mpsc` channels carrying typed events. The network task is the gatekeeper — it decides whether short presses toggle audio locally (offline) or wait for HA to publish back via MQTT (online). Long-press cycles volume locally in both modes; double is a pure-MQTT gesture. 43 + The deliberate factoring: each task owns its peripherals exclusively; cross-task communication is via FreeRTOS-queue channels carrying typed events. The network task is the gatekeeper — it decides whether short presses toggle audio locally (offline) or wait for HA to publish back via MQTT (online), and it owns the OTA install path. Long-press cycles volume locally in both modes; double is a pure-MQTT gesture. 42 44 43 45 ## Build & flash 44 46 ··· 46 48 47 49 ```bash 48 50 # from anywhere in the repo: 49 - make firmware # cargo build --release 50 - make firmware-flash # cargo run --release (flash + monitor) 51 - make firmware-monitor # serial monitor only 52 - make firmware-check # cargo check 51 + make firmware # cargo build --release 52 + make firmware-flash # headless: build + write bootloader + partition table + app 53 + make firmware-flash-monitor # interactive: build + flash + serial monitor (needs TTY) 54 + make firmware-monitor # serial monitor only 55 + make firmware-check # cargo check 56 + make firmware-ota-publish # build + save .bin to OTA dir + publish latest_version (see OTA below) 53 57 make firmware-clean 54 58 ``` 55 59 ··· 58 62 ```bash 59 63 cd firmware 60 64 cargo build --release 61 - espflash flash --port /dev/ttyUSB0 target/xtensa-esp32-espidf/release/sound-machine 65 + espflash flash --port /dev/ttyUSB0 \ 66 + --bootloader target/xtensa-esp32-espidf/release/bootloader.bin \ 67 + --partition-table partitions.csv \ 68 + --erase-parts otadata \ 69 + target/xtensa-esp32-espidf/release/sound-machine 62 70 ``` 63 71 64 72 Incremental builds are ~5 s. First-time builds are ~20 min — they download and compile ESP-IDF (~500 MB) plus all the Rust deps. 65 73 66 - ### `cfg.toml` — secrets 74 + ### `cfg.toml` — compile-time config 67 75 68 76 Before the first build, copy [`cfg.toml.example`](cfg.toml.example) to `cfg.toml` and fill in real values: 69 77 ··· 72 80 wifi_ssid = "your-wifi-ssid" 73 81 wifi_password = "your-wifi-password" 74 82 mqtt_url = "mqtt://your-broker.lan:1883" 83 + ota_url_base = "http://firmware.example.lan/sound-machine" 75 84 ``` 76 85 77 - `cfg.toml` is gitignored. The build will panic at boot if any of the three values is empty, so you can't accidentally flash a no-config binary. 86 + `cfg.toml` is gitignored. The firmware checks for empty values at boot and refuses to proceed with an empty `wifi_ssid`/`wifi_password`/`mqtt_url`, so you can't accidentally flash a no-config binary. An empty `ota_url_base` only blocks OTA installs (with a warning); the rest of the firmware still runs. 87 + 88 + **Compile-time means wired-flash** — changing any of these values requires a USB reflash of every device that needs the new value. WiFi/MQTT/OTA host changes are infrequent enough to be worth the inconvenience for the simpler model. 89 + 90 + ### `.envrc.private` — host-only secrets 91 + 92 + The publish flow needs three host-side variables — `OTA_LOCAL_DIR` (where to copy the .bin), `OTA_URL_BASE` (matches `cfg.toml`), and `MQTT_URL`. Copy [`../.envrc.private.example`](../.envrc.private.example) to `../.envrc.private` and fill in real values; direnv loads it automatically. `.envrc.private` is gitignored. 78 93 79 94 ### Monitoring without re-flashing 80 95 ··· 86 101 87 102 `Ctrl+C` exits the monitor; the device keeps running. Anything written via `log::info!()` etc. shows up here, prefixed with the crate name and a millisecond timestamp. 88 103 104 + ## OTA workflow 105 + 106 + Once a device is on v0.3.x, the next version goes out over WiFi. Two parts: the publish, and the install. 107 + 108 + ### Publish (your dev machine) 109 + 110 + ```bash 111 + # Cargo.toml: bump version 112 + make firmware-ota-publish 113 + ``` 114 + 115 + That target: 116 + 1. Builds a release binary 117 + 2. `espflash save-image`s it to `$OTA_LOCAL_DIR/sound-machine-<version>.bin` 118 + 3. `mosquitto_pub`s the new version retained to `sound-machine/firmware/latest` 119 + 120 + Both nightstands' HA update cards light up within a couple seconds. 121 + 122 + ### Install (HA UI) 123 + 124 + Click "Install" on the device's Firmware card. The device: 125 + 1. LED switches to a magenta pulse 126 + 2. Streams the binary into the inactive OTA slot, publishing `update_percentage` every 5 % (HA renders a progress bar) 127 + 3. Reboots into the new slot 128 + 4. Reconnects to WiFi + MQTT 129 + 5. Calls `esp_ota_mark_app_valid_cancel_rollback` — confirms the new firmware works and disables the bootloader's pending-rollback timer 130 + 6. Republishes `installed_version` matching `latest_version`; HA's card flips back to "Up to date" 131 + 132 + If the new firmware fails to reach MQTT (panic, WiFi misconfig, etc.) the bootloader rolls back to the previous slot on the next reset, and the device comes up on the old version. HA notices `installed_version` reverted and shows "Update available" again. 133 + 134 + ### Bootstrap (the one-time wired flash) 135 + 136 + Going from a pre-v0.3 partition layout (single-slot) to v0.3.x's two-slot layout requires a wired `make firmware-flash` once per device. The `flash` target writes the new bootloader, partition table, and otadata in addition to the app. After that, OTA is the path forever; if you ever need wired access (changing `cfg.toml`, debugging OTA breakage in the OTA path itself), USB still works. 137 + 138 + ### Webserver expectations 139 + 140 + `$OTA_LOCAL_DIR` should be served as static HTTP at `$OTA_URL_BASE`. Plain HTTP is fine and intentional — the trust boundary is the LAN, same as MQTT. ESP-IDF will refuse plain HTTP for OTA *unless* `CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP=y` is set in `sdkconfig.defaults` (it is). 141 + 89 142 ## Toolchain (one-time setup) 90 143 91 144 Already done on Chris's vega; documented here for re-setup or a second machine. ··· 156 209 157 210 This keeps the amp's input continuously fed, even when "doing nothing." 158 211 159 - ### 5. The onboard NS4168 + tiny speaker is a prototype-only path 212 + ### 5. `std::sync::mpsc` and `crossbeam-channel` are broken on esp-idf-rs 213 + 214 + Both libraries assume the GNU/newlib `pthread_mutex_t` ABI (40 bytes, zero-initializer). ESP-IDF's pthread layer uses a 4-byte handle with `0xffffffff` as the lazy-init sentinel. Sending across a channel under contention triggers `pthread_mutex_lock` on a misaligned struct → `LoadProhibited` exception. Symptom: `<channel_type>::try_recv` or `::send` panics in the bowels of mpsc. 215 + 216 + **Fix:** [`channels.rs`](src/channels.rs) wraps `esp_idf_svc::hal::task::queue::Queue` (the FreeRTOS native queue, ISR-safe, byte-copy semantics) with `Sender`/`Receiver` types that mimic the mpsc API. `T: Copy` is required (FreeRTOS queues memcpy values) — usually fine for typed events. Use these in place of `std::sync::mpsc::channel` everywhere. 217 + 218 + ### 6. ESP-IDF's `CONFIG_PARTITION_TABLE_FILENAME` resolves relative to the CMake source dir, not your project 219 + 220 + The ESP-IDF Kconfig docs imply the path is relative to your project. In an esp-idf-rs build, the CMake "project" is the embuild-generated directory under `target/`, so a relative path silently fails. The Makefile generates a small sdkconfig overlay at build time with the *absolute* path to `partitions.csv`, chained via `ESP_IDF_SDKCONFIG_DEFAULTS`. The committed `sdkconfig.defaults` only carries the boolean (`CONFIG_PARTITION_TABLE_CUSTOM=y`). 221 + 222 + ### 7. `esp_https_ota` rejects plain HTTP unless an opt-in flag is set 223 + 224 + Despite working fine at the protocol level, `esp_https_ota` validates the URL scheme and refuses plain HTTP unless `CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP=y` is in sdkconfig. With the flag missing, `esp_https_ota_begin` returns immediately with an error and never opens a socket — webserver logs show no GET requests. The flag is set in `sdkconfig.defaults`. 225 + 226 + ### 8. The onboard NS4168 + tiny speaker is a prototype-only path 160 227 161 228 We're using the I2S pins that drive the built-in amp (G19 BCLK, G22 DOUT, G33 LRCK) for hello-world. The plan ([signal-chain.md](../reference/signal-chain.md)) is to switch to an external MAX98357A on G21/G26/G32 once amps arrive. Don't run sustained white noise on the built-in speaker — short test bursts only. The thermal warning on this is real (we briefly demonstrated it via the swapped-pins bug above). 162 229 ··· 168 235 ├── rust-toolchain.toml # pins to the `esp` channel 169 236 ├── .cargo/config.toml # target = xtensa-esp32-espidf, ldproxy, runner 170 237 ├── build.rs # embuild bootstrap 171 - ├── sdkconfig.defaults # ESP-IDF kconfig knobs 238 + ├── Makefile # build/flash/ota-publish entry points 239 + ├── sdkconfig.defaults # ESP-IDF kconfig knobs (4MB flash, two-OTA layout, allow-HTTP-OTA) 240 + ├── partitions.csv # custom two-slot OTA partition table 241 + ├── cfg.toml.example # template for compile-time config 242 + ├── cfg.toml # real values (gitignored) 172 243 ├── src/ 173 - │ └── main.rs # entry point — currently the hello-world 244 + │ └── *.rs # see "Module layout" above 174 245 ├── .lib/ # libxml2 shim (gitignored) 175 246 ├── target/ # cargo build output (gitignored) 247 + │ └── gen/ # generated sdkconfig overlay for absolute partitions.csv path 176 248 └── .embuild/ # embuild-managed ESP-IDF clone (gitignored) 177 249 ``` 178 250
+4
firmware/cfg.toml.example
··· 12 12 wifi_password = "your-wifi-password" 13 13 # Full URL form so we can swap to mqtts://host:8883 later without a schema change. 14 14 mqtt_url = "mqtt://mqtt.example.local:1883" 15 + # Base URL the device fetches OTA images from. Image URL is 16 + # `<ota_url_base>/sound-machine-<version>.bin`. Plain HTTP is fine on a 17 + # trusted LAN; TLS-without-code-signing only protects transit, not authenticity. 18 + ota_url_base = "http://firmware.example.lan/sound-machine"
+22
firmware/partitions.csv
··· 1 + # Two-slot OTA partition table for the Atom Echo (4MB flash, ESP32-PICO-D4). 2 + # 3 + # Layout: 4 + # 0x01000 - 0x08000 bootloader (28KB, fixed) 5 + # 0x08000 - 0x09000 partition table (this file, 4KB) 6 + # 0x09000 - 0x0F000 nvs 24KB — audio state, WiFi creds cache 7 + # 0x0F000 - 0x11000 otadata 8KB — bootloader's "active slot" pointer 8 + # 0x11000 - 0x12000 phy_init 4KB — RF calibration data 9 + # 0x20000 - 0x200000 ota_0 1.875MB — app slot A 10 + # 0x200000- 0x3E0000 ota_1 1.875MB — app slot B 11 + # 0x3E0000- 0x400000 (free) 128KB — slack for future use 12 + # 13 + # App partitions must be 0x10000-aligned, hence the 56KB gap after phy_init. 14 + # NVS offset/size is unchanged from the single-app default so existing 15 + # persisted state (volume, was_playing) survives the swap. 16 + # 17 + # Name, Type, SubType, Offset, Size, 18 + nvs, data, nvs, 0x9000, 0x6000, 19 + otadata, data, ota, 0xf000, 0x2000, 20 + phy_init, data, phy, 0x11000, 0x1000, 21 + ota_0, app, ota_0, 0x20000, 0x1E0000, 22 + ota_1, app, ota_1, 0x200000, 0x1E0000,
+28
firmware/sdkconfig.defaults
··· 10 10 # Use the hostname we set on the netif for DHCP, so routers and HA show 11 11 # something meaningful (nightstand-<mac_hex>) instead of generic ESP32-XXXX 12 12 CONFIG_LWIP_LOCAL_HOSTNAME=y 13 + 14 + # 4MB flash on the ESP32-PICO-D4 in the Atom Echo. Default is 2MB which 15 + # can't accommodate two OTA slots. 16 + CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y 17 + CONFIG_ESPTOOLPY_FLASHSIZE="4MB" 18 + 19 + # Custom partition table with two OTA slots (see partitions.csv). This is 20 + # the one-time disruptive change — after this flash, every future firmware 21 + # update can be delivered over-the-air via MQTT. 22 + # 23 + # The absolute path to partitions.csv is injected via a generated overlay 24 + # (see Makefile target sdkconfig.defaults.partitions). ESP-IDF resolves 25 + # relative paths against an internal CMake dir buried in target/, not this 26 + # directory, so a relative entry here would break the build. 27 + CONFIG_PARTITION_TABLE_CUSTOM=y 28 + 29 + # Enable rollback support: new firmware boots in a "pending verify" state. 30 + # If the app crashes (or doesn't call esp_ota_mark_app_valid_cancel_rollback) 31 + # before the next reset, the bootloader rolls back to the previous slot. 32 + # Belt and braces against bricked devices. 33 + CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y 34 + 35 + # Permit `esp_https_ota` to fetch over plain HTTP. Without this, the C 36 + # function rejects http:// URLs at config-validation time and never opens 37 + # a socket. Our LAN-only firmware host is plain HTTP intentionally — TLS 38 + # without code signing only protects transit, not authenticity, and the 39 + # trust boundary already matches MQTT. 40 + CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP=y
+69 -4
firmware/src/discovery.rs
··· 23 23 pub payload: String, 24 24 } 25 25 26 - /// Build the discovery entries (button, switch, number, uptime sensor) for 27 - /// the given device. `mac_hex` is lowercase hex with no separators. 26 + /// Topic that carries the latest available firmware version. Same for every 27 + /// device on this firmware; the publisher pushes one retained message and 28 + /// every nightstand sees it. Per-device `installed_version` lives under 29 + /// `nightstand/<mac>/update/installed`. 30 + pub const SHARED_LATEST_VERSION_TOPIC: &str = "sound-machine/firmware/latest"; 31 + 32 + /// Build the discovery entries (button, switch, number, uptime sensor, update) 33 + /// for the given device. `mac_hex` is lowercase hex with no separators. 28 34 pub fn all(mac_hex: &str, sw_version: &str) -> Vec<DiscoveryEntry> { 29 35 let device_id = format!("nightstand_{mac_hex}"); 30 36 let topic_prefix = format!("nightstand/{mac_hex}"); ··· 36 42 switch(&device_id, &topic_prefix, &avail_topic, &state_topic), 37 43 number(&device_id, &topic_prefix, &avail_topic, &state_topic), 38 44 uptime(&device_id, &avail_topic, &state_topic), 45 + update(&device_id, &topic_prefix, &avail_topic), 39 46 ] 40 47 } 41 48 ··· 123 130 DiscoveryEntry { topic, payload } 124 131 } 125 132 133 + /// HA `update` entity. The state_topic carries a single JSON payload with 134 + /// `installed_version`, `in_progress`, and (during a download) the 135 + /// `update_percentage` — that gives HA enough to render a progress bar 136 + /// while OTA runs. The latest_version comes from the shared topic so a 137 + /// single `make ota-publish` lights up the card on every nightstand at 138 + /// once. `cmd/update` receives `install` when the user clicks Install 139 + /// (`cmd/+` is already subscribed for play/volume). 140 + fn update(device_id: &str, topic_prefix: &str, avail: &str) -> DiscoveryEntry { 141 + let topic = format!("homeassistant/update/{device_id}/firmware/config"); 142 + let payload = format!( 143 + concat!( 144 + r#"{{"name":"Firmware","unique_id":"{device_id}_update","#, 145 + r#""state_topic":"{topic_prefix}/update/state","#, 146 + r#""latest_version_topic":"{shared}","#, 147 + r#""latest_version_template":"{{{{ value }}}}","#, 148 + r#""command_topic":"{topic_prefix}/cmd/update","#, 149 + r#""payload_install":"install","#, 150 + r#""device_class":"firmware","entity_category":"config","#, 151 + r#""device":{{"identifiers":["{device_id}"]}},"#, 152 + r#""availability_topic":"{avail}"}}"#, 153 + ), 154 + device_id = device_id, 155 + topic_prefix = topic_prefix, 156 + shared = SHARED_LATEST_VERSION_TOPIC, 157 + avail = avail, 158 + ); 159 + DiscoveryEntry { topic, payload } 160 + } 161 + 126 162 #[cfg(test)] 127 163 mod tests { 128 164 use super::*; 129 165 130 166 #[test] 131 - fn four_entries_with_correct_topics() { 167 + fn five_entries_with_correct_topics() { 132 168 let entries = all("aabbccddeeff", "0.2.0"); 133 - assert_eq!(entries.len(), 4); 169 + assert_eq!(entries.len(), 5); 134 170 let topics: Vec<&str> = entries.iter().map(|e| e.topic.as_str()).collect(); 135 171 assert!(topics.contains(&"homeassistant/sensor/nightstand_aabbccddeeff/button/config")); 136 172 assert!(topics.contains(&"homeassistant/switch/nightstand_aabbccddeeff/white_noise/config")); 137 173 assert!(topics.contains(&"homeassistant/number/nightstand_aabbccddeeff/volume/config")); 138 174 assert!(topics.contains(&"homeassistant/sensor/nightstand_aabbccddeeff/uptime/config")); 175 + assert!(topics.contains(&"homeassistant/update/nightstand_aabbccddeeff/firmware/config")); 176 + } 177 + 178 + #[test] 179 + fn update_entry_uses_shared_latest_and_json_state() { 180 + let entries = all("aabbccddeeff", "0.3.0"); 181 + let update = entries 182 + .iter() 183 + .find(|e| e.topic.contains("/update/")) 184 + .expect("update entry"); 185 + assert!( 186 + update.payload.contains(SHARED_LATEST_VERSION_TOPIC), 187 + "{}", 188 + update.payload 189 + ); 190 + assert!( 191 + update 192 + .payload 193 + .contains(r#""command_topic":"nightstand/aabbccddeeff/cmd/update""#), 194 + "{}", 195 + update.payload 196 + ); 197 + assert!( 198 + update 199 + .payload 200 + .contains(r#""state_topic":"nightstand/aabbccddeeff/update/state""#), 201 + "{}", 202 + update.payload 203 + ); 139 204 } 140 205 141 206 #[test]
+11 -2
firmware/src/events.rs
··· 64 64 /// What the audio / network tasks tell the LED task to display. 65 65 /// 66 66 /// The LED task tracks the most recent `Audio(_)` and `Net(_)` separately and 67 - /// renders the combined color from a 2-axis lookup. `Error` overrides both; 68 - /// `PressFlash` is an overlay that brightens whatever is currently shown. 67 + /// renders the combined color from a 2-axis lookup. `Updating` and `Error` 68 + /// override both; `PressFlash` is an overlay that brightens whatever is 69 + /// currently shown. 69 70 #[derive(Debug, Clone, Copy, PartialEq, Eq)] 70 71 pub enum LedSignal { 71 72 /// Audio task reporting current playback state. ··· 74 75 Net(NetStatus), 75 76 /// Brief brighter flash on top of whatever's currently being shown. 76 77 PressFlash, 78 + /// OTA download in progress — overrides everything else with a magenta 79 + /// pulse so it's visually obvious the device is mid-update. 80 + Updating, 81 + /// OTA download finished (either success or failure). Clears the 82 + /// `Updating` override so the LED falls back to the audio/net axes. 83 + /// On success the device reboots immediately, so this primarily exists 84 + /// for the failure path (so the LED doesn't stay stuck magenta). 85 + UpdateDone, 77 86 /// Something's broken — slow red blink. Reserved for unrecoverable 78 87 /// failures (I2S init, etc.); network outages are just `Net(Offline)`. 79 88 Error,
+16 -2
firmware/src/led.rs
··· 84 84 let mut tx = TxRmtDriver::new(rmt_channel, pin, &config) 85 85 .map_err(|e| anyhow!("RMT init: {e}"))?; 86 86 87 - // Two orthogonal pieces of state, plus the optional overriding Error and 88 - // the transient PressFlash overlay. 87 + // Two orthogonal pieces of state, plus optional overrides (Updating, 88 + // Error) and the transient PressFlash overlay. 89 89 let mut audio = AudioStatus::Idle; 90 90 let mut net = NetStatus::Connecting; 91 + let mut updating = false; 91 92 let mut error = false; 92 93 let mut flash_started: Option<Instant> = None; 93 94 let start = Instant::now(); ··· 106 107 Some(LedSignal::Audio(a)) => audio = a, 107 108 Some(LedSignal::Net(n)) => net = n, 108 109 Some(LedSignal::PressFlash) => flash_started = Some(Instant::now()), 110 + Some(LedSignal::Updating) => updating = true, 111 + Some(LedSignal::UpdateDone) => updating = false, 109 112 Some(LedSignal::Error) => error = true, 110 113 None => { 111 114 // Timeout — render the next frame. 112 115 let elapsed = Instant::now().duration_since(start); 113 116 let base = if error { 114 117 error_color(elapsed) 118 + } else if updating { 119 + updating_color(elapsed) 115 120 } else { 116 121 base_color_for(net, audio, elapsed) 117 122 }; ··· 145 150 AudioStatus::Playing => Rgb::new(60, 24, 0), // brighter amber 146 151 }, 147 152 } 153 + } 154 + 155 + /// Magenta pulse during an OTA download — distinct from any normal state so 156 + /// it's obvious the device is mid-update and shouldn't be power-cycled. 157 + fn updating_color(t: Duration) -> Rgb { 158 + let phase = (t.as_millis() as f32 / 400.0) * std::f32::consts::PI; 159 + let pulse = (phase.sin() * 0.5 + 0.5) * 35.0 + 10.0; 160 + let v = pulse as u8; 161 + Rgb::new(v, 0, v) 148 162 } 149 163 150 164 /// ~2 Hz red blink for unrecoverable failures (I2S init, etc.).
+1
firmware/src/main.rs
··· 28 28 mod led; 29 29 mod network; 30 30 mod nvs; 31 + mod ota; 31 32 mod secrets; 32 33 mod state; 33 34
+270 -2
firmware/src/network.rs
··· 29 29 use crate::events::{ 30 30 AudioCommand, ButtonEvent, LedSignal, NetStatus, OutboundEvent, StateSnapshot, 31 31 }; 32 + use crate::ota; 32 33 use crate::secrets::CONFIG; 33 34 use crate::state::snap_to_preset_index; 34 35 use anyhow::{anyhow, Result}; ··· 42 43 BlockingWifi, ClientConfiguration, Configuration, EspWifi, WifiDeviceId, 43 44 }; 44 45 use log::{info, warn}; 46 + use std::sync::atomic::{AtomicBool, Ordering}; 47 + use std::sync::Arc; 45 48 use std::thread::{Builder, JoinHandle}; 46 49 use std::time::{Duration, Instant}; 47 50 ··· 88 91 Connected, 89 92 Disconnected, 90 93 Outbound(OutboundEvent), 94 + /// New `latest_version` seen on the shared topic. Cached so that an 95 + /// install request later can build the download URL from it. 96 + LatestVersion(VersionBuf), 97 + /// HA published `install` to `cmd/update`. 98 + OtaInstall, 99 + /// OTA worker reporting download progress (0..=100). 100 + OtaProgress(u8), 101 + /// OTA worker finished (true = success, false = failure). On success 102 + /// the worker also calls `esp_restart` so we may never observe this 103 + /// variant for the success case; on failure it lets us repaint the 104 + /// LED and clear the in-progress state. 105 + OtaFinished(bool), 106 + } 107 + 108 + /// Stack-allocated, `Copy`-friendly version string. FreeRTOS queues copy by 109 + /// value, so we can't pass `String`/`heapless::String` (both are non-Copy). 110 + /// 31 bytes covers any reasonable semver, including pre-release tags. 111 + #[derive(Debug, Clone, Copy)] 112 + struct VersionBuf { 113 + bytes: [u8; 31], 114 + len: u8, 115 + } 116 + 117 + impl VersionBuf { 118 + fn from_bytes(b: &[u8]) -> Option<Self> { 119 + if b.is_empty() || b.len() > 31 { 120 + return None; 121 + } 122 + // Reject anything that isn't valid UTF-8; saves the as_str caller a 123 + // failure mode it can't recover from. 124 + std::str::from_utf8(b).ok()?; 125 + let mut bytes = [0u8; 31]; 126 + bytes[..b.len()].copy_from_slice(b); 127 + Some(Self { 128 + bytes, 129 + len: b.len() as u8, 130 + }) 131 + } 132 + 133 + fn as_str(&self) -> &str { 134 + // SAFETY: from_bytes verified UTF-8 at construction. 135 + unsafe { std::str::from_utf8_unchecked(&self.bytes[..self.len as usize]) } 136 + } 91 137 } 92 138 93 139 fn run( ··· 133 179 let cmd_filter = format!("{topic_prefix}/cmd/+"); 134 180 let cmd_play_topic = format!("{topic_prefix}/cmd/play"); 135 181 let cmd_volume_topic = format!("{topic_prefix}/cmd/volume"); 182 + let cmd_update_topic = format!("{topic_prefix}/cmd/update"); 183 + // HA's update entity reads this single JSON-state topic for installed 184 + // version, in_progress flag, and update_percentage. We keep the file- 185 + // path-style suffix `update/state` even though the discovery payload 186 + // calls it state_topic — clearer when subscribed via `mosquitto_sub`. 187 + let update_state_topic = format!("{topic_prefix}/update/state"); 136 188 let client_id = format!("nightstand_{mac_hex}"); 137 189 let hostname = format!("nightstand-{mac_hex}"); 138 190 ··· 175 227 // Connected/Disconnected → state-thread queue 176 228 // Received(/cmd/play) → audio_tx::Play/Stop 177 229 // Received(/cmd/volume) → audio_tx::SetVolumeIndex(snap_to_preset(pct)) 230 + // Received(/cmd/update) → state-thread queue (install request) 231 + // Received(shared latest)→ state-thread queue (cache new version) 178 232 let cb_msg_tx = msg_tx.clone(); 179 233 let cb_audio_tx = audio_tx.clone(); 180 234 let cb_cmd_play = cmd_play_topic.clone(); 181 235 let cb_cmd_volume = cmd_volume_topic.clone(); 236 + let cb_cmd_update = cmd_update_topic.clone(); 182 237 let mqtt_lwt_payload = b"offline"; 183 238 let mqtt_config = MqttClientConfiguration { 184 239 client_id: Some(&client_id), ··· 198 253 &cb_audio_tx, 199 254 &cb_cmd_play, 200 255 &cb_cmd_volume, 256 + &cb_cmd_update, 201 257 ); 202 258 }) 203 259 .map_err(|e| anyhow!("EspMqttClient::new_cb: {e}"))?; 204 260 205 - // Drop our extra Sender so msg_rx will know if everyone hangs up. 206 - drop(msg_tx); 261 + // Keep one Sender alive so the OTA worker can post Progress/Finished 262 + // back to this loop without racing the MQTT callback's clone. We 263 + // intentionally don't drop the original msg_tx — there's no point 264 + // detecting a closed channel from this thread, since this thread is 265 + // the only consumer and the only loop body. 266 + let msg_tx_for_ota = msg_tx; 207 267 208 268 let mut last_snapshot: Option<StateSnapshot> = None; 209 269 let mut online = false; ··· 213 273 // sensor-style button entity in HA needs a stable resting state to show 214 274 // instead of the "Unknown" of the old event entity. 215 275 let mut button_idle_at: Option<Instant> = None; 276 + // Most recent `latest_version` seen on the shared topic. None until 277 + // we've received our first retained message. The OTA URL is built as 278 + // `<ota_url_base>/sound-machine-<latest>.bin` at install time. 279 + let mut latest_version: Option<VersionBuf> = None; 280 + // Cancel-rollback runs once on the first healthy MQTT connect of a 281 + // boot. Set after the call so re-Connecteds are no-ops. 282 + let mut have_marked_valid = false; 283 + // Guards against a second OTA being kicked off while one is already 284 + // running (e.g., HA Install double-click). Cleared on failure; on 285 + // success the device reboots and the flag goes away with it. 286 + let ota_in_progress = Arc::new(AtomicBool::new(false)); 287 + // OTA progress state surfaced into HA's update entity via JSON state. 288 + // None when no OTA is running. Updated in 5% steps from the OTA worker. 289 + let mut ota_progress: Option<u8> = None; 216 290 217 291 info!("network task: entering main loop"); 218 292 ··· 246 320 &avail_topic, 247 321 &state_topic, 248 322 &button_topic, 323 + &update_state_topic, 249 324 &cmd_filter, 250 325 &mac_hex, 251 326 sw_version, 252 327 last_snapshot, 328 + ota_progress, 253 329 boot_at, 254 330 ); 331 + if !have_marked_valid { 332 + match ota::mark_app_valid() { 333 + Ok(()) => info!("OTA: marked running app as valid (rollback canceled)"), 334 + Err(e) => warn!("OTA: mark_app_valid failed: {e}"), 335 + } 336 + have_marked_valid = true; 337 + } 255 338 } 256 339 NetTaskMsg::Disconnected => { 257 340 info!("MQTT disconnected"); ··· 271 354 publish_state(&mut client, &state_topic, snap, boot_at); 272 355 } 273 356 } 357 + NetTaskMsg::LatestVersion(v) => { 358 + info!("OTA: latest_version is now {}", v.as_str()); 359 + latest_version = Some(v); 360 + } 361 + NetTaskMsg::OtaInstall => { 362 + if handle_ota_install(latest_version, &led_tx, &ota_in_progress, &msg_tx_for_ota) { 363 + ota_progress = Some(0); 364 + if online { 365 + publish_update_state( 366 + &mut client, 367 + &update_state_topic, 368 + sw_version, 369 + ota_progress, 370 + ); 371 + } 372 + } 373 + } 374 + NetTaskMsg::OtaProgress(pct) => { 375 + ota_progress = Some(pct); 376 + if online { 377 + publish_update_state( 378 + &mut client, 379 + &update_state_topic, 380 + sw_version, 381 + ota_progress, 382 + ); 383 + } 384 + } 385 + NetTaskMsg::OtaFinished(success) => { 386 + ota_progress = None; 387 + ota_in_progress.store(false, Ordering::SeqCst); 388 + let _ = led_tx.send(LedSignal::UpdateDone); 389 + if !success && online { 390 + // Republish a non-progress state JSON so HA stops 391 + // showing the progress bar. (On success the device 392 + // reboots before reaching this, so success path 393 + // mainly exists for symmetry.) 394 + publish_update_state( 395 + &mut client, 396 + &update_state_topic, 397 + sw_version, 398 + ota_progress, 399 + ); 400 + } 401 + } 274 402 } 275 403 } 276 404 } ··· 303 431 audio_tx: &Sender<AudioCommand>, 304 432 cmd_play: &str, 305 433 cmd_volume: &str, 434 + cmd_update: &str, 306 435 ) { 307 436 match event.payload() { 308 437 EventPayload::Connected(_) => { ··· 339 468 let _ = audio_tx.try_send(AudioCommand::SetVolumeIndex(idx)); 340 469 } 341 470 } 471 + } else if topic == cmd_update { 472 + if data == b"install" { 473 + let _ = msg_tx.try_send(NetTaskMsg::OtaInstall); 474 + } 475 + } else if topic == discovery::SHARED_LATEST_VERSION_TOPIC { 476 + let trimmed = trim_ascii(data); 477 + if let Some(v) = VersionBuf::from_bytes(trimmed) { 478 + let _ = msg_tx.try_send(NetTaskMsg::LatestVersion(v)); 479 + } else { 480 + warn!( 481 + "shared latest_version: invalid payload (len={}, dropped)", 482 + data.len() 483 + ); 484 + } 342 485 } 343 486 } 344 487 EventPayload::Error(e) => { ··· 348 491 } 349 492 } 350 493 494 + /// Strip leading/trailing ASCII whitespace from a byte slice without 495 + /// allocating. (`bytes::trim_ascii` is unstable.) 496 + fn trim_ascii(b: &[u8]) -> &[u8] { 497 + let mut start = 0; 498 + let mut end = b.len(); 499 + while start < end && b[start].is_ascii_whitespace() { 500 + start += 1; 501 + } 502 + while end > start && b[end - 1].is_ascii_whitespace() { 503 + end -= 1; 504 + } 505 + &b[start..end] 506 + } 507 + 351 508 fn connect_wifi_with_retry(wifi: &mut BlockingWifi<EspWifi<'static>>) { 352 509 loop { 353 510 match try_connect_wifi(wifi) { ··· 413 570 avail_topic: &str, 414 571 state_topic: &str, 415 572 button_topic: &str, 573 + update_state_topic: &str, 416 574 cmd_filter: &str, 417 575 mac_hex: &str, 418 576 sw_version: &str, 419 577 last_snapshot: Option<StateSnapshot>, 578 + ota_progress: Option<u8>, 420 579 boot_at: Instant, 421 580 ) { 422 581 if let Err(e) = client.publish(avail_topic, QoS::AtLeastOnce, true, b"online") { ··· 433 592 // v0.2.1 changed the button from `event` (no resting state) to 434 593 // `sensor` (idle/short/long/double). 435 594 format!("homeassistant/event/nightstand_{mac_hex}/button/config"), 595 + // v0.3.3 split the update entity to a JSON state_topic so we can 596 + // surface progress; the old plain-string `update/installed` 597 + // retains stale config after the discovery payload changed. 598 + format!("nightstand/{mac_hex}/update/installed"), 436 599 ]; 437 600 for topic in &retired { 438 601 if let Err(e) = client.publish(topic, QoS::AtLeastOnce, true, b"") { ··· 455 618 // (and HA on first discovery) see a stable resting state. 456 619 publish_button_idle(client, button_topic); 457 620 621 + // Publish our installed firmware version + any in-flight OTA progress 622 + // as a single retained JSON to the update entity's state_topic. HA 623 + // reads installed_version, in_progress, and update_percentage from it. 624 + publish_update_state(client, update_state_topic, sw_version, ota_progress); 625 + 458 626 if let Some(snap) = last_snapshot { 459 627 publish_state(client, state_topic, snap, boot_at); 460 628 } ··· 462 630 if let Err(e) = client.subscribe(cmd_filter, QoS::AtLeastOnce) { 463 631 warn!("subscribe {cmd_filter} failed: {e}"); 464 632 } 633 + // Subscribe to the shared latest-version topic. Retained, so the broker 634 + // delivers the current value immediately (if any) and we cache it. 635 + if let Err(e) = client.subscribe(discovery::SHARED_LATEST_VERSION_TOPIC, QoS::AtLeastOnce) { 636 + warn!( 637 + "subscribe {} failed: {e}", 638 + discovery::SHARED_LATEST_VERSION_TOPIC 639 + ); 640 + } 465 641 } 642 + 643 + /// Publish the JSON `state_topic` for HA's update entity. `installed_version` 644 + /// is always present; `in_progress` and `update_percentage` are added when 645 + /// an OTA is mid-download. Retained so HA picks up the current state on 646 + /// discovery + restart without waiting for the next change. 647 + fn publish_update_state( 648 + client: &mut EspMqttClient<'_>, 649 + update_state_topic: &str, 650 + sw_version: &str, 651 + ota_progress: Option<u8>, 652 + ) { 653 + let payload = match ota_progress { 654 + Some(pct) => format!( 655 + r#"{{"installed_version":"{sw}","in_progress":true,"update_percentage":{pct}}}"#, 656 + sw = sw_version, 657 + pct = pct, 658 + ), 659 + None => format!( 660 + r#"{{"installed_version":"{sw}","in_progress":false}}"#, 661 + sw = sw_version 662 + ), 663 + }; 664 + if let Err(e) = client.publish(update_state_topic, QoS::AtLeastOnce, true, payload.as_bytes()) 665 + { 666 + warn!("publish update state failed: {e}"); 667 + } 668 + } 669 + 670 + /// Triggered when HA publishes "install" to `cmd/update`. Builds the 671 + /// firmware URL from the cached latest version and the configured 672 + /// `ota_url_base`, then spawns a worker thread that does the chunked 673 + /// download with progress callbacks. Returns `true` if the install was 674 + /// accepted (so the caller can update its local progress state). 675 + fn handle_ota_install( 676 + latest: Option<VersionBuf>, 677 + led_tx: &Sender<LedSignal>, 678 + in_progress: &Arc<AtomicBool>, 679 + msg_tx: &Sender<NetTaskMsg>, 680 + ) -> bool { 681 + let Some(version) = latest else { 682 + warn!("OTA install requested but no latest_version cached yet — ignoring"); 683 + return false; 684 + }; 685 + if CONFIG.ota_url_base.is_empty() { 686 + warn!("OTA install requested but ota_url_base is empty in cfg.toml — ignoring"); 687 + return false; 688 + } 689 + if in_progress 690 + .compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst) 691 + .is_err() 692 + { 693 + warn!("OTA install requested but one is already in progress — ignoring"); 694 + return false; 695 + } 696 + 697 + let base = CONFIG.ota_url_base.trim_end_matches('/'); 698 + let url = format!("{base}/sound-machine-{}.bin", version.as_str()); 699 + info!("OTA install: kicking download thread for {url}"); 700 + let _ = led_tx.send(LedSignal::Updating); 701 + 702 + let progress_tx = msg_tx.clone(); 703 + let finished_tx = msg_tx.clone(); 704 + if let Err(e) = Builder::new() 705 + .name("ota".into()) 706 + .stack_size(OTA_THREAD_STACK) 707 + .spawn(move || { 708 + let result = ota::download_and_install(&url, |pct| { 709 + let _ = progress_tx.try_send(NetTaskMsg::OtaProgress(pct)); 710 + }); 711 + match result { 712 + Ok(()) => { 713 + // No need to send OtaFinished(true) — we're about to 714 + // reboot, the network state thread won't get a chance 715 + // to act on it. esp_restart returns `!`. 716 + info!("OTA: rebooting into new firmware"); 717 + esp_idf_svc::hal::reset::restart(); 718 + } 719 + Err(e) => { 720 + warn!("OTA: download_and_install failed: {e}"); 721 + let _ = finished_tx.send(NetTaskMsg::OtaFinished(false)); 722 + } 723 + } 724 + }) 725 + { 726 + warn!("OTA: failed to spawn worker thread: {e}"); 727 + in_progress.store(false, Ordering::SeqCst); 728 + return false; 729 + } 730 + true 731 + } 732 + 733 + const OTA_THREAD_STACK: usize = 12 * 1024; 466 734 467 735 /// Publish a retained "idle" state to the button topic. Called on connect 468 736 /// and after the BUTTON_IDLE_AFTER_MS window following any gesture.
+126
firmware/src/ota.rs
··· 1 + //! Over-the-air firmware updates via the chunked `esp_https_ota_*` API. 2 + //! 3 + //! `download_and_install` runs the begin/perform-loop/finish dance from a 4 + //! single Rust call and reports progress via a caller-supplied closure 5 + //! (typically posts an `OtaProgress(percent)` message to the network state 6 + //! thread so HA's update entity can render a progress bar). Throttled to 7 + //! ~5% steps so the broker doesn't see ~1200 messages per upgrade. 8 + //! 9 + //! Despite the name, `esp_https_ota` is fine over plain HTTP, but only when 10 + //! `CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP=y` is set in sdkconfig — without that 11 + //! the underlying validator rejects http:// URLs at config time, before 12 + //! opening any socket. The trust boundary is the LAN, same as MQTT; signed 13 + //! images (ESP-IDF secure boot) are the answer for tamper resistance. 14 + //! 15 + //! After the new firmware boots, `mark_app_valid` cancels the bootloader's 16 + //! pending-rollback timer once the app proves it works (in `network.rs`, 17 + //! after the first MQTT `Connected` event). 18 + 19 + use anyhow::{anyhow, Result}; 20 + use esp_idf_svc::sys::{ 21 + esp, esp_http_client_config_t, esp_https_ota_abort, esp_https_ota_begin, 22 + esp_https_ota_config_t, esp_https_ota_finish, esp_https_ota_get_image_len_read, 23 + esp_https_ota_get_image_size, esp_https_ota_handle_t, esp_https_ota_is_complete_data_received, 24 + esp_https_ota_perform, esp_ota_mark_app_valid_cancel_rollback, ESP_ERR_HTTPS_OTA_IN_PROGRESS, 25 + ESP_OK, 26 + }; 27 + use log::info; 28 + use std::ffi::CString; 29 + use std::ptr; 30 + 31 + /// Step size (in percent) between successive progress callbacks. Smaller 32 + /// values mean smoother HA progress bars at the cost of more MQTT chatter. 33 + /// 5% → ~20 publishes per upgrade — plenty smooth, easy on the broker. 34 + const PROGRESS_STEP_PCT: u8 = 5; 35 + 36 + /// Download firmware from `url` into the inactive OTA partition, calling 37 + /// `progress(pct)` at ~5% intervals as bytes flow in. On success the new 38 + /// image is set as the boot partition; caller must reboot. 39 + /// 40 + /// The callback is also invoked once with `0` immediately after the HTTP 41 + /// connection is established, and once with `100` right before returning, 42 + /// so HA always sees a complete 0→100 sweep. 43 + pub fn download_and_install(url: &str, mut progress: impl FnMut(u8)) -> Result<()> { 44 + info!("OTA: downloading from {url}"); 45 + 46 + let url_c = CString::new(url).map_err(|_| anyhow!("OTA URL contains nul byte"))?; 47 + 48 + let http_config = esp_http_client_config_t { 49 + url: url_c.as_ptr(), 50 + // Per-recv timeout, not a total deadline. Generous for slow/flaky 51 + // 2.4 GHz links — beats failing partway through a 1+ MB download. 52 + timeout_ms: 60_000, 53 + keep_alive_enable: true, 54 + ..Default::default() 55 + }; 56 + 57 + let ota_config = esp_https_ota_config_t { 58 + http_config: &http_config as *const _, 59 + ..Default::default() 60 + }; 61 + 62 + let mut handle: esp_https_ota_handle_t = ptr::null_mut(); 63 + // SAFETY: configs and url_c live for the duration of this function. 64 + unsafe { esp!(esp_https_ota_begin(&ota_config as *const _, &mut handle)) } 65 + .map_err(|e| anyhow!("esp_https_ota_begin: {e}"))?; 66 + 67 + // Total size from Content-Length. Returns -1 for chunked encoding, 68 + // in which case we just can't report a percentage. Treat that as 0 69 + // for the math and the callback effectively becomes a heartbeat. 70 + let total = unsafe { esp_https_ota_get_image_size(handle) }; 71 + info!("OTA: image size = {total} bytes"); 72 + progress(0); 73 + let mut last_reported = 0u8; 74 + 75 + let result = loop { 76 + let r = unsafe { esp_https_ota_perform(handle) }; 77 + if r == ESP_ERR_HTTPS_OTA_IN_PROGRESS { 78 + if total > 0 { 79 + let read = unsafe { esp_https_ota_get_image_len_read(handle) }; 80 + let pct = ((read as i64 * 100) / total as i64).clamp(0, 99) as u8; 81 + if pct >= last_reported.saturating_add(PROGRESS_STEP_PCT) { 82 + progress(pct); 83 + last_reported = pct; 84 + } 85 + } 86 + continue; 87 + } 88 + // Anything else terminates the perform loop — success or failure. 89 + break r; 90 + }; 91 + 92 + if result != ESP_OK as i32 { 93 + // SAFETY: handle is non-null past begin(); abort accepts it. 94 + unsafe { esp_https_ota_abort(handle) }; 95 + // Keep url_c alive until after abort. 96 + drop(url_c); 97 + return Err(anyhow!("esp_https_ota_perform failed: 0x{:x}", result)); 98 + } 99 + 100 + // The HTTP server can return 200 with a truncated body; the helper 101 + // explicitly checks Content-Length matches what was actually written. 102 + if !unsafe { esp_https_ota_is_complete_data_received(handle) } { 103 + unsafe { esp_https_ota_abort(handle) }; 104 + drop(url_c); 105 + return Err(anyhow!( 106 + "OTA download incomplete: server sent fewer bytes than Content-Length" 107 + )); 108 + } 109 + 110 + unsafe { esp!(esp_https_ota_finish(handle)) } 111 + .map_err(|e| anyhow!("esp_https_ota_finish: {e}"))?; 112 + 113 + drop(url_c); 114 + progress(100); 115 + info!("OTA: download complete; new firmware staged in inactive slot"); 116 + Ok(()) 117 + } 118 + 119 + /// Confirm the running firmware is healthy and cancel the bootloader's 120 + /// pending rollback. No-op on partitions that aren't in pending-verify 121 + /// state (i.e., wired-flashed firmware), so safe to call on every boot's 122 + /// first successful MQTT connect. 123 + pub fn mark_app_valid() -> Result<()> { 124 + unsafe { esp!(esp_ota_mark_app_valid_cancel_rollback()) } 125 + .map_err(|e| anyhow!("esp_ota_mark_app_valid_cancel_rollback: {e}")) 126 + }
+5
firmware/src/secrets.rs
··· 16 16 pub wifi_password: &'static str, 17 17 #[default("")] 18 18 pub mqtt_url: &'static str, 19 + /// Base URL the device fetches OTA images from. The full image URL is 20 + /// `<ota_url_base>/sound-machine-<version>.bin`. Trailing slash is 21 + /// stripped at use-time so either form works. 22 + #[default("")] 23 + pub ota_url_base: &'static str, 19 24 }
+128 -74
reference/mqtt-contract.md
··· 24 24 | Drive speaker via I2S | ✓ | | 25 25 | Track playing state | ✓ | | 26 26 | Announce entities via discovery | ✓ | | 27 + | Download new firmware over HTTP, write to flash | ✓ | | 28 + | Decide when to push a new version | | ✓ (driven by `make ota-publish`) | 27 29 28 30 ## Device identity 29 31 ··· 31 33 32 34 - At boot, the firmware reads the ESP32's STA MAC and logs it loudly so it's visible in the serial monitor before any WiFi attempt. 33 35 - Topic prefix: `nightstand/<mac_hex>/...` 34 - - Discovery `unique_id`s: `nightstand_<mac_hex>_button`, `_white_noise`, `_volume`, `_rssi`, `_uptime`. Stable across firmware upgrades. 36 + - Discovery `unique_id`s: `nightstand_<mac_hex>_button`, `_white_noise`, `_volume`, `_uptime`, `_update`. Stable across firmware upgrades. 35 37 - Discovery `device.name` defaults to `"Nightstand"`. The HA UI lets the user rename each device per-unit ("Bedroom Nightstand", "Guest Room Nightstand", etc.) without breaking the MQTT contract — `unique_id` is what HA uses to track entities, not `name`. 36 38 37 39 **Why:** one firmware binary works on every unit, no per-unit table to maintain, no reflash dance after first boot. The user names devices in the place that already understands renaming (HA) instead of in firmware source. ··· 46 48 ``` 47 49 homeassistant/<type>/nightstand_<mac_hex>/<object>/config ← discovery (retain=true) 48 50 nightstand/<mac_hex>/available ← LWT + online announce (retain=true) 49 - nightstand/<mac_hex>/button ← event stream (retain=false) 50 - nightstand/<mac_hex>/state ← JSON state snapshot (retain=true) 51 + nightstand/<mac_hex>/button ← gesture sensor JSON (retain=true) 52 + nightstand/<mac_hex>/state ← audio state snapshot JSON (retain=true) 53 + nightstand/<mac_hex>/update/state ← firmware update state JSON (retain=true) 51 54 nightstand/<mac_hex>/cmd/play ← inbound: "ON" / "OFF" 52 55 nightstand/<mac_hex>/cmd/volume ← inbound: integer 0-100 56 + nightstand/<mac_hex>/cmd/update ← inbound: "install" 57 + sound-machine/firmware/latest ← shared latest_version (retain=true) 53 58 ``` 54 59 55 60 `<mac_hex>` is the lowercase 12-char STA MAC with no separators (e.g. `aabbccddeeff`). 56 61 57 - Keeping all state in a single `state` JSON topic (rather than one topic per field) simplifies the device's publish logic and HA's `value_template` wiring. 62 + Per-device state topics are split by *concern* — `state` for audio playback, `update/state` for firmware progress, `button` for the most recent gesture — because HA's update entity wants its progress fields in their own topic and mixing them would force every audio publish to also re-emit firmware fields. 63 + 64 + The shared `sound-machine/firmware/latest` topic carries the announced latest version once, retained, for every device on this firmware. One `make ota-publish` lights up the update card on every nightstand at the same time without per-device fanout. 58 65 59 66 ## Entities exposed 60 67 61 - ### 1. Button — `event` type 68 + ### 1. Button — `sensor` type 62 69 63 - Distinguishes short press, double press, and long press (≥ 2s hold). HA automations trigger on event type. 70 + Carries the most-recent gesture as a sensor state (idle/short/long/double). The device publishes the gesture on press, then publishes a retained `idle` ~800 ms later so the entity has a stable resting value — HA's automations trigger on the state transition (e.g. `to: short`) rather than on event types. 64 71 65 - Discovery topic: `homeassistant/event/nightstand_<mac_hex>/button/config` 72 + Discovery topic: `homeassistant/sensor/nightstand_<mac_hex>/button/config` 66 73 ```json 67 74 { 68 75 "name": "Button", 69 76 "unique_id": "nightstand_<mac_hex>_button", 70 77 "state_topic": "nightstand/<mac_hex>/button", 71 - "event_types": ["short", "double", "long"], 72 78 "value_template": "{{ value_json.event_type }}", 79 + "icon": "mdi:gesture-tap-button", 73 80 "device": { 74 81 "identifiers": ["nightstand_<mac_hex>"], 75 82 "name": "Nightstand", 76 83 "manufacturer": "guid.foo", 77 - "model": "Sound Machine v1", 78 - "sw_version": "0.2.0" 84 + "model": "Sound Machine", 85 + "sw_version": "0.3.4" 79 86 }, 80 87 "availability_topic": "nightstand/<mac_hex>/available" 81 88 } 82 89 ``` 83 90 84 - Event payload (published to `nightstand/<mac_hex>/button`, not retained): 91 + Payload (retained): 85 92 ```json 86 93 {"event_type": "short"} 87 94 ``` 88 - or `double`, or `long`. 95 + …where the value is one of `idle`, `short`, `long`, `double`. After ~800 ms the device publishes `{"event_type":"idle"}` so the entity returns to a stable resting state instead of stuck on the gesture. 96 + 97 + **Why not `event`-type?** Earlier firmware (≤ 0.2.0) used HA's `event` entity, which is event-as-fact-without-resting-state. HA renders that as "Unknown" any time you look at the device card outside the brief moment of a press. The `sensor` + idle-after-N-ms pattern gives the same automation triggers (`to: short`) plus a sane idle reading. 89 98 90 99 ### 2. White noise — `switch` 91 100 ··· 131 140 } 132 141 ``` 133 142 134 - ### 4. Diagnostics — `sensor` ×2 143 + ### 4. Uptime diagnostic — `sensor` 135 144 136 - Helpful for debugging; marked as diagnostic so they hide in the default device view. 145 + Helpful for debugging power blips and reconnection. Marked as diagnostic so it hides in the default device view. 137 146 138 - `homeassistant/sensor/nightstand_<mac_hex>/rssi/config`: 147 + `homeassistant/sensor/nightstand_<mac_hex>/uptime/config`: 139 148 ```json 140 149 { 141 - "name": "WiFi Signal", 142 - "unique_id": "nightstand_<mac_hex>_rssi", 150 + "name": "Uptime", 151 + "unique_id": "nightstand_<mac_hex>_uptime", 143 152 "state_topic": "nightstand/<mac_hex>/state", 144 - "value_template": "{{ value_json.rssi }}", 145 - "unit_of_measurement": "dBm", 146 - "device_class": "signal_strength", 153 + "value_template": "{{ value_json.uptime_s }}", 154 + "unit_of_measurement": "s", 155 + "device_class": "duration", 147 156 "entity_category": "diagnostic", 148 157 "device": {"identifiers": ["nightstand_<mac_hex>"]}, 149 158 "availability_topic": "nightstand/<mac_hex>/available" 150 159 } 151 160 ``` 152 161 153 - `homeassistant/sensor/nightstand_<mac_hex>/uptime/config`: 162 + (Earlier firmware also exposed an RSSI sensor; it was dropped in v0.2.0 because the value was rarely meaningful — WiFi signal at the nightstand is consistent.) 163 + 164 + ### 5. Firmware update — `update` 165 + 166 + Drives HA's standard update card: shows installed-vs-latest version, an Install button, and a progress bar during a download. State is split between a per-device JSON state topic and the shared latest-version topic: 167 + 168 + - `state_topic`: `nightstand/<mac_hex>/update/state` — JSON, retained, written by the device on connect and during an OTA. Carries `installed_version` always; `in_progress` and `update_percentage` while a download is underway. 169 + - `latest_version_topic`: `sound-machine/firmware/latest` — plain string, retained, written by `make ota-publish`. Shared across every device running this firmware. 170 + - `command_topic`: `nightstand/<mac_hex>/cmd/update` — receives the literal `install`. 171 + 172 + Discovery topic: `homeassistant/update/nightstand_<mac_hex>/firmware/config` 154 173 ```json 155 174 { 156 - "name": "Uptime", 157 - "unique_id": "nightstand_<mac_hex>_uptime", 158 - "state_topic": "nightstand/<mac_hex>/state", 159 - "value_template": "{{ value_json.uptime_s }}", 160 - "unit_of_measurement": "s", 161 - "device_class": "duration", 162 - "entity_category": "diagnostic", 175 + "name": "Firmware", 176 + "unique_id": "nightstand_<mac_hex>_update", 177 + "state_topic": "nightstand/<mac_hex>/update/state", 178 + "latest_version_topic": "sound-machine/firmware/latest", 179 + "latest_version_template": "{{ value }}", 180 + "command_topic": "nightstand/<mac_hex>/cmd/update", 181 + "payload_install": "install", 182 + "device_class": "firmware", 183 + "entity_category": "config", 163 184 "device": {"identifiers": ["nightstand_<mac_hex>"]}, 164 185 "availability_topic": "nightstand/<mac_hex>/available" 165 186 } 166 187 ``` 167 188 168 - ## State payload 189 + State payload (idle): 190 + ```json 191 + {"installed_version":"0.3.4","in_progress":false} 192 + ``` 193 + 194 + State payload during a download: 195 + ```json 196 + {"installed_version":"0.3.4","in_progress":true,"update_percentage":35} 197 + ``` 198 + 199 + The device updates `update_percentage` in 5% steps (~20 publishes per upgrade) — smooth enough for HA's progress bar, light enough that the broker isn't drinking from a hose. 200 + 201 + ## State payload (audio) 169 202 170 - Published to `nightstand/<mac_hex>/state` (retained) on every state change: 203 + Published to `nightstand/<mac_hex>/state` (retained) on every audio-state change: 171 204 172 205 ```json 173 206 { 174 207 "playing": "ON", 175 208 "volume": 65, 176 - "rssi": -58, 177 209 "uptime_s": 12847 178 210 } 179 211 ``` 180 212 181 - Single JSON payload keeps discovery templates simple and lets HA parse any field out with `value_template`. 213 + Single JSON payload keeps discovery templates simple and lets HA parse any field with `value_template`. Firmware state lives in the separate `update/state` topic so an OTA progress publish doesn't churn the audio entities. 182 214 183 215 ## Availability (LWT) 184 216 ··· 192 224 193 225 1. WiFi up → MQTT connect (with LWT registered) 194 226 2. Publish retained `online` to `nightstand/<mac_hex>/available` 195 - 3. Publish retained discovery configs for every entity (cheap — broker dedupes retained messages) 196 - 4. Publish retained initial state snapshot to `nightstand/<mac_hex>/state` 197 - 5. Subscribe to `nightstand/<mac_hex>/cmd/+` 198 - 6. Enter main loop 227 + 3. Publish retained empty payloads to any retired discovery topics (clears stale HA entities from earlier firmware versions) 228 + 4. Publish retained discovery configs for every current entity (cheap — broker dedupes retained messages) 229 + 5. Publish retained `idle` to `nightstand/<mac_hex>/button` so the gesture sensor has a stable resting value 230 + 6. Publish retained `update/state` JSON with the running `installed_version` 231 + 7. Publish retained audio state snapshot to `nightstand/<mac_hex>/state` (if cached) 232 + 8. Subscribe to `nightstand/<mac_hex>/cmd/+` and to `sound-machine/firmware/latest` 233 + 9. Call `esp_ota_mark_app_valid_cancel_rollback` — confirms the running app is healthy and stops the bootloader's pending-rollback timer (no-op for wired flashes; meaningful only after an OTA reboot) 234 + 10. Enter main loop 199 235 200 - Republishing discovery every boot is fine — it's idempotent and makes entity config portable even after HA restores from backup or the broker loses retained state. 236 + Republishing discovery every boot is fine — it's idempotent and makes entity config portable even after HA restores from backup or the broker loses retained state. Republishing empty payloads to retired topics keeps HA from carrying stale entities forward across firmware versions. 201 237 202 238 ## Button behavior 203 239 ··· 342 378 343 379 Total latency: tens of milliseconds on LAN. Feels instant. 344 380 345 - ## v1.5 planned extension: OTA via HA `update` entity 381 + ## OTA workflow 346 382 347 - Not in v1, but designed-in so we don't paint ourselves into a corner. Added when the units are enclosed and physical USB reflash becomes tedious. 383 + Firmware is delivered over plain HTTP from a static file server on the LAN. The trust boundary is already the LAN (MQTT is also plain), so TLS would only protect transit, not authenticity — secure boot + signed images is the answer for tamper resistance and isn't in scope yet. 348 384 349 - ### Mechanism 385 + ### Roles 350 386 351 - - Chris builds firmware locally, copies `.bin` to HA's `/config/www/firmware/`, publishes a `latest_version` announcement to MQTT. 352 - - HA's MQTT `update` entity compares `installed_version` vs `latest_version` and shows an "Install" button on the device card. 353 - - User clicks Install → HA publishes to `nightstand/<mac_hex>/cmd/update` → firmware downloads from `http://homeassistant.local:8123/local/firmware/sound-machine-<ver>.bin` → `esp_ota_*` writes to the inactive partition → reboot into new firmware → device reports new `installed_version`. 354 - - ESP-IDF's OTA handles two-partition rollback automatically — a bootloop reverts to the previous good firmware. 387 + - **Publisher** (Chris's dev machine): builds the binary, copies it to the static host, announces the new version on MQTT. 388 + - **Static HTTP host**: serves `<ota_url_base>/sound-machine-<version>.bin`. Plain HTTP, LAN-only. 389 + - **HA**: renders the update card from the MQTT entity, sends the install command on user click, watches the progress bar. 390 + - **Device**: subscribes to the shared latest topic and to its own `cmd/update`; on `install`, downloads + flashes + reboots; reports installed version + progress on `update/state`. 355 391 356 - ### Additional entity 392 + ### Flow 357 393 358 - `homeassistant/update/nightstand_<mac_hex>/firmware/config`: 359 - ```json 360 - { 361 - "name": "Firmware", 362 - "unique_id": "nightstand_<mac_hex>_firmware", 363 - "state_topic": "nightstand/<mac_hex>/update", 364 - "command_topic": "nightstand/<mac_hex>/cmd/update", 365 - "payload_install": "INSTALL", 366 - "latest_version_topic": "nightstand/<mac_hex>/update", 367 - "latest_version_template": "{{ value_json.latest_version }}", 368 - "value_template": "{{ value_json.installed_version }}", 369 - "release_url": "", 370 - "device": {"identifiers": ["nightstand_<mac_hex>"]}, 371 - "availability_topic": "nightstand/<mac_hex>/available" 372 - } 373 394 ``` 374 - 375 - Update state topic payload (retained, written by device on boot and after install): 376 - ```json 377 - { 378 - "installed_version": "0.2.1", 379 - "latest_version": "0.3.0" 380 - } 395 + publisher static host broker HA device 396 + │ make ota-publish: │ │ │ │ 397 + │ espflash save-image │ │ │ │ 398 + │ cp .../sound-machine-X.bin│ │ │ │ 399 + │ ─────────────────────────► (file) │ │ │ 400 + │ mosquitto_pub -L .../sound-machine/firmware/latest -m X │ │ 401 + │ ──────────────────────────────────────────► retained ─────►│ │ 402 + │ │ │ │ 403 + │ │ │ ──cmp installed│ 404 + │ │ │ vs latest───►│ 405 + │ │ │ │ 406 + │ (user clicks │ │ │ 407 + │ Install in HA) │ │ │ 408 + │ │ │ cmd/update │ 409 + │ │ │ "install" │ 410 + │ │ │ ─────────────► │ 411 + │ │ │ │ esp_https_ota_begin 412 + │ │ │ │ → GET <url> 413 + │ ────HTTP 200────────────────────────────────── │ 414 + │ │ │ │ chunks → ota_1 415 + │ │ │ update/state │ (every 5%) 416 + │ │ │ in_progress=true,update_percentage=N 417 + │ │ │ ◄─────────────── │ 418 + │ │ │ │ esp_https_ota_finish 419 + │ │ │ │ esp_restart() 420 + │ │ │ │ 421 + │ │ │ │ (reboot from ota_1) 422 + │ │ │ update/state │ 423 + │ │ │ installed=X,in_progress=false 424 + │ │ │ ◄─────────────── │ 425 + │ │ │ │ esp_ota_mark_app_valid_ 426 + │ │ │ │ cancel_rollback() 381 427 ``` 382 428 383 - Chris's release workflow (`make ota` or similar) publishes the `latest_version` field with retain=true; devices pick it up on next connect. 429 + ### Boot validation and rollback 384 430 385 - ### Why one binary works for both units 431 + Each OTA leaves the new firmware in **pending-verify** state. The device must explicitly call `esp_ota_mark_app_valid_cancel_rollback` once it confirms the new firmware works — the firmware does this on the first successful MQTT `Connected` event after boot. If the new firmware crashes before that point, or never connects to MQTT, the bootloader rolls back to the previous slot on next reset and the device comes up on the old version. Belt-and-braces against bricked devices. 432 + 433 + For wired-flashed firmware (`make flash`), the partition isn't in pending-verify state; `mark_app_valid` is a documented no-op. 434 + 435 + ### Why one binary works for every unit 436 + 437 + MAC-derived identity (see Device Identity section) means the same `sound-machine-<version>.bin` runs correctly on both nightstands without per-unit builds. The shared `sound-machine/firmware/latest` topic means one publish notifies every device — no `nightstand/+/update` fanout required. 438 + 439 + ### Compile-time vs. runtime config 386 440 387 - MAC-derived identity (see Device Identity section) means the same `sound-machine-v0.3.0.bin` runs correctly on both nightstands without per-unit builds. `mosquitto_pub -t nightstand/+/update ...` notifies both units of the new version with one command. 441 + `ota_url_base` lives in `firmware/cfg.toml` next to the WiFi and MQTT config — compile-time. Changing the firmware host is currently a wired-flash event, the same as changing WiFi credentials. (Putting the URL in the `latest_version` payload would make it pure runtime config; that's a future cleanup if hosts change often, which they don't.) 388 442 389 - ## What we're deliberately NOT including (v1) 443 + ## What we're deliberately NOT including 390 444 391 - - **Noise type selection** (pink, brown, rain, etc.) — shipping with a single hand-tuned noise generator that Chris will iterate on to match what he and his wife actually want. Parameters live in source, not in MQTT; tuning = reflash, not a runtime knob. 392 - - **RGB LED control from HA** — the onboard SK6812 will be used by firmware for local status (idle / playing / WiFi down). No HA entity for it yet. 445 + - **Noise type selection** (pink, brown, rain, etc.) — shipping with a single hand-tuned noise generator that Chris will iterate on to match what he and his wife actually want. Parameters live in source, not in MQTT; tuning = OTA, not a runtime knob. 446 + - **RGB LED control from HA** — the onboard SK6812 is used by firmware for local status (audio × net axes, OTA progress, error). No HA entity for it. 393 447 - **Media player entity** — too much complexity for what is basically a toggle. Can revisit if we want HA TTS announcements on the device. 394 448 - **Triple press patterns** — too much to remember. Single/double/long is the max. 395 - - **OTA updates** — designed in (see v1.5 section) but not built for v1. Bring-up with USB flashing; add OTA once enclosed. 449 + - **TLS / signed firmware** — LAN-only deployment; TLS without code signing only protects transit. Secure boot + signed images is the right answer when the threat model warrants it. 396 450 397 451 ## Sources 398 452
+58 -19
reference/operating-modes.md
··· 41 41 Entered on power-on or reset. Responsibilities: 42 42 1. Initialize I2S, GPIO, NVS, RGB LED 43 43 2. Read persistent state from NVS: `volume_index`, `volume_direction`, `was_playing` 44 - 3. Look up this chip's MAC in `KNOWN_DEVICES` table → logical identity 44 + 3. Read the STA MAC; the lowercase 12-char hex is the device's identity for MQTT topics and discovery `unique_id`s (see [`mqtt-contract.md`](./mqtt-contract.md)) 45 45 4. **If `was_playing == true`**: start white noise generator immediately at saved volume (power-blip recovery — don't wake the user with silence) 46 - 5. Attempt WiFi connect with 30s timeout against stored credentials 47 - 6. If WiFi connects, attempt MQTT connect with 10s timeout 46 + 5. Attempt WiFi connect against compile-time stored credentials, with 60s retry on failure 47 + 6. If WiFi connects, attempt MQTT connect (the C MQTT client manages its own reconnect) 48 48 7. Transition to ONLINE or OFFLINE based on outcome 49 + 8. On the first successful MQTT `Connected` event, call `esp_ota_mark_app_valid_cancel_rollback` to confirm the running firmware (see "OTA + rollback" below) 49 50 50 51 BOOT should complete to some steady mode within ~45 seconds worst case. 51 52 ··· 98 99 99 100 All colors are at **dim brightness** (~5-10% of full) unless noted. 100 101 101 - | State | Color | Pattern | 102 + The LED state machine composes a base color from two orthogonal axes — audio playback state and network state — and applies overrides for OTA and unrecoverable errors on top. `Updating` and `Error` win over the base; `PressFlash` is a transient brightening overlay that decays over ~150 ms. 103 + 104 + | Audio × Net | Color | Pattern | 105 + | --- | --- | --- | 106 + | Connecting (any audio) | Cyan | Slow pulse (~1 Hz) | 107 + | Online, idle | Green | Solid, very dim | 108 + | Online, playing | Green | Solid, medium-dim | 109 + | Offline, idle | Amber | Solid, very dim | 110 + | Offline, playing | Amber | Solid, medium-dim | 111 + 112 + | Override | Color | Pattern | 102 113 | --- | --- | --- | 103 - | BOOT (connecting WiFi) | Blue | Slow pulse (1 Hz) | 104 - | BOOT (connecting MQTT) | Cyan | Slow pulse (1 Hz) | 105 - | ONLINE, idle | Green | Solid, very dim | 106 - | ONLINE, playing | Green | Solid, medium-dim | 107 - | OFFLINE, idle | Amber | Solid, very dim | 108 - | OFFLINE, playing | Amber | Solid, medium-dim | 109 - | Error (I2S failed, OTA failed, etc.) | Red | Slow blink | 110 - | OTA in progress (v1.5) | Magenta | Slow pulse | 111 - | Button press ack (transient) | Flash brighter for ~100ms, then return to status color | — | 114 + | OTA download in progress | Magenta | Slow pulse (~1.25 Hz) | 115 + | Error (I2S init failed, etc.) | Red | Slow blink (~2 Hz) | 116 + | Button press ack | Brighten the current color ~50 % | Decays over 150 ms | 112 117 113 118 The button-press flash is a nice tactile confirmation — press, see a brief brighter pulse, know it registered even in the dark. 119 + 120 + The OTA-failure path explicitly clears the magenta override (via an internal `UpdateDone` signal from the OTA worker) so a failed install drops the LED back to the audio×net base color instead of leaving it stuck pulsing magenta forever. 114 121 115 122 ## Persistent state (NVS) 116 123 ··· 155 162 156 163 No exponential backoff — device is wall-powered, we don't care about battery life, and 60s is a reasonable balance between "react to the network coming back" and "not spam the broker during multi-hour outages." 157 164 165 + ## OTA + rollback 166 + 167 + The firmware ships with a two-OTA partition layout (`ota_0` and `ota_1`, each 1.875 MB) plus an `otadata` partition that records which slot is active. New firmware is written to the *inactive* slot via `esp_https_ota`; on success, otadata is flipped and the device reboots into the new slot. 168 + 169 + ### Partition layout (4 MB ESP32-PICO-D4) 170 + 171 + | Region | Offset | Size | Purpose | 172 + | --- | --- | --- | --- | 173 + | bootloader | `0x01000` | 28 KB | ESP-IDF stage-2 loader | 174 + | partition table | `0x08000` | 4 KB | This file's binary form | 175 + | nvs | `0x09000` | 24 KB | Volume, was_playing | 176 + | otadata | `0x0F000` | 8 KB | Active-slot pointer | 177 + | phy_init | `0x11000` | 4 KB | RF calibration (regenerated if missing) | 178 + | ota_0 | `0x20000` | 1.875 MB | App slot A | 179 + | ota_1 | `0x200000` | 1.875 MB | App slot B | 180 + 181 + NVS sits at the same offset as the single-slot v0.1.0/v0.2.0 layout, so the partition swap preserves persisted audio state. The 56 KB gap between phy_init and ota_0 is the cost of the 64 KB alignment requirement on app partitions. 182 + 183 + ### Pending-verify and `mark_app_valid` 184 + 185 + After an OTA reboot, the new firmware boots in **pending-verify** state. The bootloader expects the running app to call `esp_ota_mark_app_valid_cancel_rollback` once it's confident things work; if a reset happens before that call, the bootloader reverts to the previous slot on the next boot. The firmware calls this on the first MQTT `Connected` event — proving WiFi and the broker both work, which is the device's primary job. Wired-flashed firmware isn't in pending-verify state, so the call is a no-op (documented behavior). 186 + 187 + If MQTT never connects after an OTA, the device will roll back on the next reset and come up on the previous version. HA notices the `installed_version` in `update/state` reverted; the update card flips back to "Update available." 188 + 189 + The trade-off is that any post-OTA reset before MQTT comes up looks like a rollback. In practice that means: don't power-cycle a device for 30 s after clicking Install. Watching the LED flip from magenta → cyan → green is the proxy for "OTA succeeded." 190 + 191 + ### One-time wired migration 192 + 193 + The two-OTA layout is *not* the default ESP-IDF partition table. Devices going from v0.1.0 / v0.2.0 → v0.3.x must be wire-flashed once to write the new partition table; from v0.3.0 onward, every bump is OTA. The Makefile's `flash` target writes the new bootloader, partition table, and otadata in addition to the app, so the migration is a single `make flash`. 194 + 158 195 ## Error handling 159 196 160 197 | Error | Behavior | ··· 164 201 | NVS write fails | Log, keep running. State won't persist across reboot but that's a graceful degradation. | 165 202 | WiFi password wrong | Stay OFFLINE forever until updated. No good recovery. | 166 203 | MQTT broker unreachable | Stay OFFLINE, retry per strategy above. | 167 - | OTA download fails (v1.5) | Keep running current firmware. Log. Report failure via MQTT. | 168 - | OTA bootloop (v1.5) | ESP-IDF's two-partition system auto-reverts to previous firmware. User sees the device come back on old version; MQTT state reflects it. | 204 + | OTA download fails | Keep running current firmware. Log. Republish `update/state` with `in_progress: false` so HA's progress bar disappears. LED reverts from magenta to the audio×net base color. | 205 + | OTA boot fails / app crashes before mark_valid | ESP-IDF's two-partition rollback auto-reverts to the previous slot. Device comes back on the old version; HA sees `installed_version` revert and lights up the "Update available" card again. | 169 206 170 207 ## What's not in this doc 171 208 172 - - **WiFi provisioning mechanism** — for v1, credentials are hardcoded-per-flash (via `cfg.toml` or similar). SoftAP/BLE/Improv provisioning is a v2 concern if we want it. 173 - - **Audio generation parameters** — the actual noise generator's filter shape, amplitude, etc. live in source and are tuned over time. Chris will iterate on these with his wife's input once hardware is assembled. 174 - - **OTA implementation** — designed in MQTT contract for v1.5; firmware implementation TBD. 209 + - **WiFi provisioning mechanism** — credentials are baked into the binary at compile time via `cfg.toml`. SoftAP/BLE/Improv provisioning is a possible future addition. 210 + - **Audio generation parameters** — the actual noise generator's filter shape, amplitude, etc. live in source and are tuned over time. 211 + - **Secure boot / signed firmware** — we don't sign images. Threat model is LAN-only, same as MQTT being plain. 175 212 176 213 ## Sources 177 214 ··· 179 216 - [Signal chain](./signal-chain.md) — hardware audio path 180 217 - [Atom Echo pinmap](./atom-echo/pinmap.md) — GPIO usage 181 218 - [ESP-IDF NVS documentation][esp-idf-nvs] 182 - - [ESP-IDF OTA documentation][esp-idf-ota] — v1.5 reference 219 + - [ESP-IDF OTA documentation][esp-idf-ota] 220 + - [ESP-IDF App rollback (mark_app_valid)][esp-idf-rollback] 183 221 184 222 [esp-idf-nvs]: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/storage/nvs_flash.html 185 223 [esp-idf-ota]: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/ota.html 224 + [esp-idf-rollback]: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/ota.html#app-rollback