flash: stop OTA from shipping corrupt initramfs (ENOSPC silent failure)
User report: device OTA'd from keen-swallow-hail to polar-swallow-beat,
OS piece reported "update installed!" + "verified 13.3MB written",
prompted reboot — and then panicked at boot with "initramfs unpacking
failed: read error / Kernel panic - not syncing".
Cause: on a typical 600 MB Fedora ESP, the OTA flow renamed the existing
326 MB initramfs.cpio.gz to .prev FIRST (rename keeps disk usage), then
called flash_copy_file to write the new 326 MB initramfs alongside it.
Peak occupancy: 13MB (new kernel) + 13MB (kernel.prev) + 13MB
(bootmgfw.efi) + 326MB (initramfs.prev) + 326MB (writing initramfs) =
691 MB, which doesn't fit. flash_copy_file got ENOSPC mid-stream and
returned the partial byte count (positive!). The post-write check was
`if (initramfs_copied <= 0)` — passed because partial count is > 0. OS
piece reported success, user rebooted, kernel found a truncated
initramfs.cpio.gz and panicked.
Fix:
1. Pre-flight space check: stat() the source, statvfs() the ESP, account
for the old initramfs we're about to delete, abort hard if free space
wouldn't fit src + 4MB margin. Reports the math via flash_tlog so
MongoDB triage can see exactly why.
2. Skip .prev backup for initramfs entirely (no JS rollback path uses it,
and at 326MB it's too big to keep alongside on tight ESPs). Unlink
old, sync, then write new.
3. Also unlink kernel.prev to free a few more MB.
4. Verify byte count matches src_size after copy. flash_copy_file
returning SHORT (e.g. 235MB of 326MB on partial ENOSPC) is now a
hard failure — unlink the truncated dest, mark flash_ok=0, return.
No more silent shipping of corrupt initramfs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>