CLI app for developers prototyping atproto functionality
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: add test-labeler implementation plan

Six-phase implementation plan derived from the test-labeler design,
plus a test-requirements.md mapping every acceptance criterion to the
test(s) that verify it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+2279
+504
docs/implementation-plans/2026-04-13-test-labeler/phase_01.md
··· 1 + # Test-labeler Implementation Plan — Phase 1 2 + 3 + **Goal:** Get the binary crate compiling with a clap-derived subcommand tree, tokio async runtime, miette-installed error handling, and a stub `test labeler` command that parses its arguments and prints a "not yet implemented" message. 4 + 5 + **Architecture:** Single-crate Rust 2024 binary. Root `Cli` in `cli.rs` dispatches to a `Command` enum in `commands.rs`, which nests a `Test` enum in `commands/test.rs`, which holds the `Labeler(LabelerCmd)` variant in `commands/test/labeler.rs`. Sibling-file module layout (no `mod.rs`). Shared cross-feature primitives live under `common.rs` and `common/`. 6 + 7 + **Tech Stack:** `clap` (derive), `tokio` (`rt`, `macros`, `time`, `net`), `miette` (`fancy`), `thiserror`, `tracing`, `tracing-subscriber` (`env-filter`, `fmt`), `url`. 8 + 9 + **Scope:** Phase 1 of 6 (Phase 1 in the source design). 10 + 11 + **Codebase verified:** 2026-04-13 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + **Verifies: None.** 18 + 19 + Phase 1 is an infrastructure/bootstrap phase. The design's "Done when" specifies operational verification (`cargo build` succeeds, `--help` renders, stub invocation prints a placeholder). Acceptance criteria for CLI invocation modes (`test-labeler.AC1.*`) and all later stages arrive in Phases 2–6 as the stages they test come online. Explicitly, **no AC cases from the design are tested in this phase**. 20 + 21 + --- 22 + 23 + ## Codebase state at start of phase 24 + 25 + Verified 2026-04-13: 26 + 27 + - **`Cargo.toml`** exists at repository root. Package name `atproto-devtool`, version `0.0.0`, Rust edition `2024`, `rust-version = "1.85"`, dual-licensed `MIT OR Apache-2.0`. `[dependencies]` table is **empty**. 28 + - **`src/main.rs`** exists containing only the default `fn main() { println!("Hello, world!"); }` stub. Not yet async; nothing imported. 29 + - No other `src/` modules. No `tests/`, no `examples/`, no `benches/`. 30 + - No `CLAUDE.md`, no `AGENTS.md`. Global Rust conventions (user `~/.claude/CLAUDE.md`) apply: sibling-file module pattern, Rust 2024, `#[expect]` not `#[allow]`, sentence-case headings, Oxford commas, no fully-qualified `std::fmt::Display`, no workspace for single-crate projects, comments end with periods. 31 + - `docs/design-plans/2026-04-13-test-labeler.md` present and committed. `docs/implementation-plans/2026-04-13-test-labeler/` is being populated by this plan. 32 + - `.gitignore` contains `/target` and `/.worktrees/`. 33 + - No existing patterns to follow — this phase lays the foundation. 34 + 35 + **Implication for tasks below:** every file listed under "Create" is genuinely new. No "create or update" conditionals anywhere. `Cargo.toml`'s `[dependencies]` table is the only pre-existing file that is _modified_. 36 + 37 + --- 38 + 39 + ## External dependency notes 40 + 41 + Crate versions verified against `cargo read` on 2026-04-13: 42 + 43 + - `clap` 4.6.0 — derive macros via `features = ["derive"]`. Rust-version 1.85, edition 2024. 44 + - `tokio` 1.51.1 — features `rt`, `macros`, `time`, `net` are sufficient for the single-threaded `current_thread` runtime this binary uses. `tokio::main(flavor = "current_thread")` attribute macro requires the `macros` feature. 45 + - `miette` 7.6.0 — `features = ["fancy"]` pulls in `GraphicalReportHandler`, the panic handler (`miette::set_panic_hook`), the `MietteHandlerOpts` builder used to honour `NO_COLOR`, and the `fancy` renderer for `Diagnostic` types. `miette::Result<T>` is `Result<T, miette::Report>`. 46 + - `thiserror` 2.0.18 — `#[derive(thiserror::Error)]` for typed error enums returned from stage code. 47 + - `tracing` 0.1.44 and `tracing-subscriber` 0.3.23 — subscriber built with `EnvFilter` + `fmt` layer writing to `stderr`. Filter directive upgraded to `debug` when `--verbose` is passed. 48 + - `url` 2.5.8 — newtype wrapper around parsed URLs used by every later stage; pulled in early so CLI parsing can reject non-URL targets with a clap error. 49 + 50 + There are no non-obvious gotchas for this infrastructure phase. The current-thread tokio flavor matches the single-sequential-stage nature of the pipeline and avoids needlessly pulling `tokio` multi-thread scheduler code. 51 + 52 + --- 53 + 54 + <!-- START_TASK_1 --> 55 + ### Task 1: Populate `Cargo.toml` with Phase 1 dependencies 56 + 57 + **Files:** 58 + - Modify: `Cargo.toml` (the `[dependencies]` table, currently empty, lines 9–10) 59 + 60 + **Implementation:** 61 + 62 + Add the following dependencies under `[dependencies]`, keeping the existing `[package]` section unchanged. Group in alphabetical order within a single table (the crate is not a workspace yet, per the user's global rule: "Only create a Cargo workspace when the project genuinely has two or more crates"): 63 + 64 + ```toml 65 + [dependencies] 66 + clap = { version = "4.6", features = ["derive"] } 67 + humantime = "2.1" 68 + miette = { version = "7.6", features = ["fancy"] } 69 + thiserror = "2.0" 70 + tokio = { version = "1.51", features = ["rt", "macros", "time", "net"] } 71 + tracing = "0.1" 72 + tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] } 73 + url = "2.5" 74 + ``` 75 + 76 + `humantime` is used by Task 4's `parse_subscribe_timeout` value parser — using a maintained crate is preferable to hand-rolling a duration parser in the CLI layer. 77 + 78 + No dev-dependencies in this phase. 79 + 80 + **Verification:** 81 + 82 + Run: `cargo check` 83 + Expected: compiles successfully (the stub `src/main.rs` remains unchanged at this point and still prints "Hello, world!"; the new dependencies resolve and download). 84 + 85 + Run: `cargo build` 86 + Expected: succeeds. 87 + 88 + **Commit:** 89 + 90 + ```bash 91 + git add Cargo.toml Cargo.lock 92 + git commit -m "Add Phase 1 dependencies for CLI bootstrap" 93 + ``` 94 + <!-- END_TASK_1 --> 95 + 96 + <!-- START_TASK_2 --> 97 + ### Task 2: Create `src/common.rs` + `src/common/diagnostics.rs` skeleton 98 + 99 + **Files:** 100 + - Create: `src/common.rs` 101 + - Create: `src/common/diagnostics.rs` 102 + 103 + **Implementation:** 104 + 105 + `src/common.rs` declares the `common` module tree. Because this project uses the sibling-file layout (no `mod.rs`), `common.rs` contains only `pub mod` declarations: 106 + 107 + ```rust 108 + //! Cross-feature primitives shared by every `atproto-devtool` subcommand. 109 + 110 + pub mod diagnostics; 111 + ``` 112 + 113 + `src/common/diagnostics.rs` holds miette helpers that every feature will need. Phase 1 seeds it with just two items: installation of the miette handler respecting `NO_COLOR`, and a thin wrapper type for a re-usable `NamedSource` constructed from raw bytes. Both are used by Phase 3 onwards, but installing them here lets `main.rs` wire the handler without reaching into a feature module. 114 + 115 + ```rust 116 + //! Shared miette configuration and `NamedSource` helpers. 117 + 118 + use std::sync::Arc; 119 + 120 + use miette::{GraphicalTheme, MietteHandlerOpts, NamedSource}; 121 + 122 + // Re-exports so Phase 3 onwards can import `LabeledSpan` / `SourceSpan` from a 123 + // single path. Unused in Phase 1 itself; declared here to keep the diagnostics 124 + // surface area in one file. 125 + pub use miette::{LabeledSpan, SourceSpan}; 126 + 127 + /// Install the miette panic hook and graphical report handler. 128 + /// 129 + /// Honours `NO_COLOR=1` by dropping to an unstyled theme. Call this exactly once 130 + /// from `main` before any `miette::Result`-returning code runs. 131 + pub fn install_miette_handler(no_color: bool) -> miette::Result<()> { 132 + // `NO_COLOR` is also respected automatically by miette when set in the 133 + // environment; passing an explicit theme here covers the `--no-color` flag 134 + // path without having to touch process-wide env vars (which is `unsafe` in 135 + // Rust 2024). 136 + miette::set_hook(Box::new(move |_| { 137 + let theme = if no_color { 138 + GraphicalTheme::unicode_nocolor() 139 + } else { 140 + GraphicalTheme::unicode() 141 + }; 142 + Box::new( 143 + MietteHandlerOpts::new() 144 + .graphical_theme(theme) 145 + .context_lines(3) 146 + .build(), 147 + ) 148 + }))?; 149 + 150 + // Install miette's panic hook so panics render through the same handler. 151 + miette::set_panic_hook(); 152 + 153 + Ok(()) 154 + } 155 + 156 + /// Build a `NamedSource` from a name and raw bytes. 157 + /// 158 + /// The bytes are cloned into an `Arc<str>`/`Arc<[u8]>` via miette's constructor, 159 + /// so callers may drop the original slice after this returns. 160 + pub fn named_source_from_bytes(name: impl Into<String>, bytes: &[u8]) -> NamedSource<Arc<[u8]>> { 161 + NamedSource::new(name, Arc::<[u8]>::from(bytes)) 162 + } 163 + 164 + /// Build a `NamedSource` from a name and a UTF-8 string slice. 165 + pub fn named_source_from_str(name: impl Into<String>, text: &str) -> NamedSource<String> { 166 + NamedSource::new(name, text.to_string()) 167 + } 168 + ``` 169 + 170 + **Note on the panic hook:** `miette::set_panic_hook` is distinct from `miette::set_hook` — the former replaces `std::panic::take_hook`, the latter installs the runtime `ReportHandler`. Both are needed for the UX described in the design. 171 + 172 + **Verification:** 173 + 174 + Run: `cargo check` 175 + Expected: compiles. No references from `main.rs` yet; the module is unused until Task 5 wires it. 176 + 177 + **Commit:** 178 + 179 + ```bash 180 + git add src/common.rs src/common/diagnostics.rs 181 + git commit -m "Add common::diagnostics scaffold" 182 + ``` 183 + <!-- END_TASK_2 --> 184 + 185 + <!-- START_SUBCOMPONENT_A (tasks 3-4) --> 186 + <!-- START_TASK_3 --> 187 + ### Task 3: Create `src/cli.rs` with the root `Cli` parser 188 + 189 + **Files:** 190 + - Create: `src/cli.rs` 191 + 192 + **Implementation:** 193 + 194 + `src/cli.rs` defines the root clap `Parser` struct and its `run` entry point. `Command` enum dispatch lives in its own file (Task 4) so new features can be added with a one-line enum edit. 195 + 196 + ```rust 197 + //! Root clap parser and dispatch entry point. 198 + 199 + use clap::Parser; 200 + use miette::Result; 201 + use tracing_subscriber::{EnvFilter, fmt}; 202 + 203 + use crate::commands::Command; 204 + use crate::common::diagnostics::install_miette_handler; 205 + 206 + /// Top-level `atproto-devtool` CLI. 207 + #[derive(Debug, Parser)] 208 + #[command( 209 + name = "atproto-devtool", 210 + version, 211 + about = "Diagnostics and conformance tooling for atproto services.", 212 + long_about = None, 213 + )] 214 + pub struct Cli { 215 + /// Enable verbose (DEBUG-level) logging to stderr. 216 + #[arg(long, global = true)] 217 + verbose: bool, 218 + 219 + /// Disable ANSI color in rendered diagnostics. 220 + #[arg(long, global = true)] 221 + no_color: bool, 222 + 223 + #[command(subcommand)] 224 + command: Command, 225 + } 226 + 227 + /// Parse `std::env::args()`, install global handlers, and dispatch. 228 + /// 229 + /// Returns `miette::Result<()>` so main can print the report via the installed 230 + /// handler. An `Err` here produces exit code 1; clap parse errors already exit 231 + /// with code 2 before this function is called. 232 + pub async fn run() -> Result<()> { 233 + let cli = Cli::parse(); 234 + 235 + install_miette_handler(cli.no_color)?; 236 + install_tracing(cli.verbose); 237 + 238 + cli.command.run().await 239 + } 240 + 241 + fn install_tracing(verbose: bool) { 242 + let default_filter = if verbose { "debug" } else { "warn" }; 243 + let filter = EnvFilter::try_from_default_env() 244 + .unwrap_or_else(|_| EnvFilter::new(default_filter)); 245 + 246 + let _ = fmt() 247 + .with_env_filter(filter) 248 + .with_writer(std::io::stderr) 249 + .try_init(); 250 + } 251 + ``` 252 + 253 + **Verification:** 254 + 255 + Run: `cargo check` 256 + Expected: compile error — `crate::commands::Command` does not yet exist. That is expected; Task 4 creates it and is the next task. Proceed to Task 4 **without** committing Task 3 in isolation. 257 + 258 + **Important:** Tasks 3 and 4 are co-dependent and form Subcomponent A. Task 4 creates the `commands::Command` type `cli.rs` references. **Do not commit Task 3 on its own — it will not compile.** Commit at the end of Task 4 with both files staged together. 259 + <!-- END_TASK_3 --> 260 + 261 + <!-- START_TASK_4 --> 262 + ### Task 4: Create `src/commands.rs` + nested `test` / `labeler` subcommand skeleton 263 + 264 + **Files:** 265 + - Create: `src/commands.rs` 266 + - Create: `src/commands/test.rs` 267 + - Create: `src/commands/test/labeler.rs` 268 + 269 + **Implementation:** 270 + 271 + `src/commands.rs` declares the `Command` enum plus the `commands` module tree. Only one top-level variant exists today (`Test`); future features extend this enum. 272 + 273 + ```rust 274 + //! Top-level subcommand dispatch. 275 + 276 + use clap::Subcommand; 277 + use miette::Result; 278 + 279 + pub mod test; 280 + 281 + use self::test::TestCmd; 282 + 283 + #[derive(Debug, Subcommand)] 284 + pub enum Command { 285 + /// Conformance and diagnostic checks against atproto services. 286 + #[command(subcommand)] 287 + Test(TestCmd), 288 + } 289 + 290 + impl Command { 291 + pub async fn run(self) -> Result<()> { 292 + match self { 293 + Command::Test(cmd) => cmd.run().await, 294 + } 295 + } 296 + } 297 + ``` 298 + 299 + `src/commands/test.rs` declares the `TestCmd` enum and its own submodule tree: 300 + 301 + ```rust 302 + //! `atproto-devtool test ...` subcommand tree. 303 + 304 + use clap::Subcommand; 305 + use miette::Result; 306 + 307 + pub mod labeler; 308 + 309 + use self::labeler::LabelerCmd; 310 + 311 + #[derive(Debug, Subcommand)] 312 + pub enum TestCmd { 313 + /// Run the labeler conformance suite against an atproto labeler. 314 + Labeler(LabelerCmd), 315 + } 316 + 317 + impl TestCmd { 318 + pub async fn run(self) -> Result<()> { 319 + match self { 320 + TestCmd::Labeler(cmd) => cmd.run().await, 321 + } 322 + } 323 + } 324 + ``` 325 + 326 + `src/commands/test/labeler.rs` holds the clap `Args` struct and a stub `run` that prints a placeholder. Later phases flesh out the pipeline entry point; the signature `pub async fn run(self) -> Result<()>` is final. 327 + 328 + Phase 1 intentionally accepts the target as a single `String` (not yet parsed into a typed `LabelerTarget`) to avoid pulling in identity primitives before Phase 2. The final `--subscribe-timeout` and `--did` flags are _declared_ here so `--help` is complete, matching the design's AC1.6, but the run body does not yet use them. No AC is tested in this phase. 329 + 330 + ```rust 331 + //! `atproto-devtool test labeler <target>` command. 332 + 333 + use std::time::Duration; 334 + 335 + use clap::Args; 336 + use miette::Result; 337 + 338 + /// Run the labeler conformance suite against a handle, DID, or endpoint URL. 339 + #[derive(Debug, Args)] 340 + pub struct LabelerCmd { 341 + /// Handle (`alice.example`), DID (`did:plc:...` / `did:web:...`), or labeler endpoint URL. 342 + pub target: String, 343 + 344 + /// Explicit DID override. Required (and combined with the target URL) when 345 + /// `target` is a raw endpoint URL and you want identity/crypto checks to run. 346 + #[arg(long)] 347 + pub did: Option<String>, 348 + 349 + /// Per-connection time budget for the subscription-layer checks. 350 + /// 351 + /// Minimum 1 second; values below 1 second are rejected at parse time. 352 + #[arg( 353 + long, 354 + default_value = "5s", 355 + value_parser = parse_subscribe_timeout, 356 + )] 357 + pub subscribe_timeout: Duration, 358 + } 359 + 360 + impl LabelerCmd { 361 + pub async fn run(self) -> Result<()> { 362 + println!("atproto-devtool test labeler (not yet implemented)"); 363 + println!(" target = {}", self.target); 364 + println!(" did = {:?}", self.did); 365 + println!(" subscribe_timeout = {:?}", self.subscribe_timeout); 366 + Ok(()) 367 + } 368 + } 369 + 370 + pub(crate) fn parse_subscribe_timeout(raw: &str) -> Result<Duration, String> { 371 + let parsed = humantime::parse_duration(raw) 372 + .map_err(|e| format!("invalid --subscribe-timeout value `{raw}`: {e}"))?; 373 + 374 + if parsed < Duration::from_secs(1) { 375 + return Err(format!( 376 + "--subscribe-timeout must be at least 1 second (got {parsed:?})" 377 + )); 378 + } 379 + 380 + Ok(parsed) 381 + } 382 + ``` 383 + 384 + `humantime::parse_duration` accepts `5s`, `1500ms`, `2m`, `1h30m`, etc. — strictly richer than a hand-rolled parser, and maintained. `parse_subscribe_timeout` is marked `pub(crate)` so Phase 5 Task 4 can test it directly for `test-labeler.AC4.7`. 385 + 386 + Note on `--subscribe-timeout`: the design's `test-labeler.AC4.7` requires that `--subscribe-timeout 0` (and other invalid durations) be rejected as a clap parse error. Phase 1 declares the flag with the value parser so the help text and validation rejection are in place; later phases (Phase 5) wire the parsed value into the subscription stage and add the AC4.7 test. 387 + 388 + **Verification:** 389 + 390 + With Task 3's `src/cli.rs` already on disk, run: `cargo check` 391 + Expected: compiles cleanly. 392 + 393 + Run: `cargo build` 394 + Expected: succeeds. 395 + 396 + Run: `cargo run -- test labeler --help` 397 + Expected: renders help text including the `--did`, `--subscribe-timeout`, and `--verbose`/`--no-color` global flags. (Manual visual check; no snapshot yet.) 398 + 399 + **Commit:** commit Tasks 3 and 4 together. 400 + 401 + ```bash 402 + git add src/cli.rs src/commands.rs src/commands/test.rs src/commands/test/labeler.rs 403 + git commit -m "Add CLI root and test labeler subcommand stub" 404 + ``` 405 + <!-- END_TASK_4 --> 406 + <!-- END_SUBCOMPONENT_A --> 407 + 408 + <!-- START_TASK_5 --> 409 + ### Task 5: Add `src/lib.rs` and rewrite `src/main.rs` as a tokio + miette bootstrap 410 + 411 + **Files:** 412 + - Create: `src/lib.rs` 413 + - Modify: `src/main.rs` (replace existing `fn main()` at lines 1–3 in its entirety) 414 + 415 + **Why both a library and a binary:** Rust integration tests under `tests/*.rs` link against the crate as an external library. A pure-binary crate (one with only `src/main.rs`) cannot be exercised from `tests/*.rs` — integration tests would be unable to reach `crate::commands::test::labeler::pipeline::run_pipeline` or the other per-stage entry points. Phases 3–6 all add `tests/*.rs` integration tests that construct fake clients and call into the pipeline directly, so Phase 1 must establish a library target up front. Cargo auto-detects the dual lib+bin layout when both `src/lib.rs` and `src/main.rs` are present; no `[[bin]]` or `[lib]` stanzas are required. 416 + 417 + `src/lib.rs` contents: 418 + 419 + ```rust 420 + //! `atproto-devtool` library crate. 421 + //! 422 + //! The binary at `src/main.rs` is a thin tokio bootstrap over [`cli::run`]. 423 + //! All pipeline modules are re-exported here so integration tests under 424 + //! `tests/*.rs` can reach them as `atproto_devtool::commands::test::labeler::...`. 425 + 426 + pub mod cli; 427 + pub mod commands; 428 + pub mod common; 429 + ``` 430 + 431 + `src/main.rs` contents: 432 + 433 + ```rust 434 + //! `atproto-devtool` binary entry point. 435 + 436 + use miette::Result; 437 + 438 + #[tokio::main(flavor = "current_thread")] 439 + async fn main() -> Result<()> { 440 + atproto_devtool::cli::run().await 441 + } 442 + ``` 443 + 444 + **Verification:** 445 + 446 + Run: `cargo build` 447 + Expected: succeeds. 448 + 449 + Run: `cargo run -- test labeler --help` 450 + Expected: prints subcommand help including `--did`, `--subscribe-timeout`, `--verbose`, `--no-color`, and the positional `<TARGET>`. Exits 0. 451 + 452 + Run: `cargo run -- test labeler example.bsky.social` 453 + Expected: prints the "not yet implemented" placeholder with the parsed target, `did = None`, and `subscribe_timeout = 5s`. Exits 0. 454 + 455 + Run: `cargo run -- test labeler --subscribe-timeout 500ms example.bsky.social` 456 + Expected: clap parse error, exit code 2, stderr message contains `"at least 1 second"` (the floor check fires because `humantime` parses `500ms` successfully but `parse_subscribe_timeout` rejects anything under 1 second). 457 + 458 + Run: `cargo run -- test labeler --subscribe-timeout 0 example.bsky.social` 459 + Expected: clap parse error, exit code 2. Stderr contains humantime's `"time unit needed"` message (because `humantime::parse_duration` requires a unit suffix on every numeric component and rejects bare `0` at parse time, before the floor check runs). Any non-zero exit with a clap error is acceptable here — we are only verifying that bare-integer input is rejected, not the specific error text. 460 + 461 + Run: `cargo run -- test labeler` 462 + Expected: clap parse error, exit code 2, stderr message about the missing `<TARGET>` argument. 463 + 464 + Run: `cargo run -- --help` 465 + Expected: root help lists `test` subcommand and the global `--verbose` / `--no-color` flags. 466 + 467 + **Commit:** 468 + 469 + ```bash 470 + git add src/lib.rs src/main.rs 471 + git commit -m "Wire tokio + miette bootstrap for atproto-devtool" 472 + ``` 473 + <!-- END_TASK_5 --> 474 + 475 + <!-- START_TASK_6 --> 476 + ### Task 6: Final phase verification 477 + 478 + **Files:** none modified. 479 + 480 + **Steps:** 481 + 482 + Run the full verification battery in order, from the crate root: 483 + 484 + 1. `cargo fmt --check` — expected: exit 0, no output. 485 + 2. `cargo clippy -- -D warnings` — expected: clean (no warnings and no errors). If clippy flags a stylistic nit in the stub duration parser, address it in this task rather than deferring. 486 + 3. `cargo build` — expected: succeeds. 487 + 4. `cargo run -- test labeler --help` — expected: help text as in Task 5. 488 + 5. `cargo run -- test labeler example.bsky.social` — expected: placeholder output as in Task 5. 489 + 490 + **Done when:** all five commands succeed with the expected output. No commit is created by this task (it is verification-only) unless clippy surfaced a fix, in which case commit that fix atomically with message `chore: fix Phase 1 clippy lint`. 491 + <!-- END_TASK_6 --> 492 + 493 + --- 494 + 495 + ## Phase 1 done-when checklist 496 + 497 + - `cargo build` succeeds from a clean state. 498 + - `cargo run -- test labeler --help` prints subcommand help with `--did`, `--subscribe-timeout`, `--verbose`, `--no-color`, and the positional `<TARGET>` argument. 499 + - `cargo run -- test labeler example.bsky.social` prints the parsed-args placeholder and exits 0. 500 + - `cargo run -- test labeler --subscribe-timeout 500ms example.bsky.social` exits 2 with a clap parse error mentioning the 1-second floor (`"at least 1 second"` substring). 501 + - `cargo run -- test labeler --subscribe-timeout 0 example.bsky.social` exits 2 with a clap parse error (humantime's `"time unit needed"` message — the bare `0` is rejected at parse time before the floor check). 502 + - `cargo clippy -- -D warnings` and `cargo fmt --check` are clean. 503 + - Commit history contains the four atomic commits listed above (dependencies, `common::diagnostics`, CLI + subcommand scaffold, lib + main bootstrap). 504 + - No acceptance criteria from the design plan are tested yet — this is an infrastructure phase by design.
+280
docs/implementation-plans/2026-04-13-test-labeler/phase_02.md
··· 1 + # Test-labeler Implementation Plan — Phase 2 2 + 3 + **Goal:** Build a reusable `common::identity` module covering handle resolution, DID-document fetching, service-entry lookup, and multikey parsing — the primitives every future `test` subcommand will rely on. 4 + 5 + **Architecture:** Single module file `src/common/identity.rs` exposing four async/sync functions plus the types they return. Functions are written against a narrow `HttpClient` trait and a narrow `DnsResolver` trait so the unit tests can replay recorded fixtures without network access. `resolve_did` returns both the parsed document _and_ the raw bytes (wrapped in a struct) so Phase 3 onwards can attach those bytes to `miette` diagnostics. All resolution paths are instrumented with `tracing::debug!` calls gated by the existing `--verbose` flag. 6 + 7 + **Tech Stack:** adds `reqwest` (`rustls-tls`, `json`), `hickory-resolver`, `multibase`, `k256` (`ecdsa`), `p256` (`ecdsa`), `serde`, `serde_json`, `async-trait` (for the narrow IO traits). 8 + 9 + **Scope:** Phase 2 of 6. 10 + 11 + **Codebase verified:** 2026-04-13. Phase 1 landed the `common.rs` + `common/diagnostics.rs` scaffold, `Cargo.toml` dependency set, and the CLI bootstrap. `src/common/identity.rs` **does not yet exist**; nothing in the repository imports it. Phase 2 creates it fresh and adds one `pub mod identity;` line to `src/common.rs`. 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + **Verifies: None.** 18 + 19 + Phase 2 is an infrastructure/library phase. Per the source design's "Done when" section: _"Infrastructure/library phase — tests verify primitive correctness but do not yet cover acceptance criteria (those arrive once wired into the labeler pipeline in Phase 3)."_ Unit tests in this phase verify primitive correctness (e.g., "did:plc resolver parses a recorded document", "k256 multikey parses to the correct key") but do **not** exercise any AC from the design — the ACs they enable are evaluated in Phase 3. 20 + 21 + --- 22 + 23 + ## Codebase state at start of phase 24 + 25 + - Phase 1 present and committed: `Cargo.toml` has the bootstrap dependency set; `src/main.rs` is the tokio/miette bootstrap; `src/common.rs` declares `pub mod diagnostics;` only; `src/common/diagnostics.rs` holds the shared miette helpers. 26 + - `src/common/identity.rs` does not exist. 27 + - No test fixtures directory yet (`tests/fixtures/` will be created in this phase, subdirectory `tests/fixtures/identity/`). 28 + - No `CLAUDE.md` in the repo. User-global Rust conventions apply (sibling-file modules, Rust 2024, `#[expect]`, etc.). 29 + - `Cargo.lock` exists from Phase 1 build. 30 + 31 + --- 32 + 33 + ## External dependency notes 34 + 35 + All crate versions verified via `cargo read` on 2026-04-13: 36 + 37 + - **`reqwest` 0.13.2** — used with features `rustls-tls`, `json`, `gzip`. We deliberately avoid the default `native-tls` feature so the binary does not link against OpenSSL on Linux. `reqwest::Client` is cheap to clone and should be shared across the run. 38 + - **`hickory-resolver` 0.25.2** — public API of note: `TokioResolver::tokio(ResolverConfig, ResolverOpts)` builds an async resolver; `.txt_lookup(&str)` is the method used for the `_atproto.<handle>` TXT lookup. Default config reads `/etc/resolv.conf` on Unix. There is no stable `from_system_conf` any more; use `TokioResolver::builder_tokio()` followed by `.build()` in 0.25. 39 + - **`multibase` 0.9.2** — `decode(&str) -> Result<(Base, Vec<u8>)>` returns the base encoding marker plus raw bytes. For atproto multikeys we expect `Base::Base58Btc` (prefix `z`), and the decoded bytes begin with a two-byte varint curve tag (`0xe7 0x01` for `secp256k1` / `k256`, `0x80 0x24` for `p256`). 40 + - **`k256` 0.13.4** and **`p256` 0.13.2** — use feature `ecdsa`. `VerifyingKey::from_sec1_bytes(&[u8])` parses both compressed and uncompressed SEC1 encodings. atproto multikeys use **compressed** 33-byte SEC1 bodies after the curve-tag prefix. 41 + - **`serde` 1.0.228** + **`serde_json` 1.0.149** — standard derive usage. Per user global rules, **do not use `#[serde(flatten)]` or `#[serde(untagged)]`**; use custom visitors where needed. Prefer `serde_ignored` if we need to warn on unknown fields, but Phase 2 does not require this. 42 + - **`async-trait` 0.1** — needed because `HttpClient` and `DnsResolver` traits declare async methods; Rust 2024 still requires it for trait objects that box async futures. 43 + 44 + **atproto specs consulted:** 45 + 46 + - **Handle resolution** — `https://atproto.com/specs/handle` — resolvers try in order: DNS TXT lookup on `_atproto.<handle>` (record shape `did=did:plc:...`), then HTTPS GET on `https://<handle>/.well-known/atproto-did` (body is the DID string, trimmed). A handle may use either mechanism; the resolver returns on the first success and falls back to the second on a "no such record" error from the first. 47 + - **did:plc** — `https://did-method-plc.github.io/did-method-plc/` — resolve by `GET https://plc.directory/{did}` → JSON DID document. 48 + - **did:web** — `https://w3c-ccg.github.io/did-method-web/` — strip `did:web:` prefix, percent-decode, split on `:` into `host[/path/...]`; resolve via `https://<host>[/path]/.well-known/did.json` (no path) or `https://<host>/<path>/did.json` (with path segments). Phase 2 implements the no-path form plus the simple path form; richer forms can be added later. 49 + - **Multikey** — `https://w3c-ccg.github.io/multikey/` — multibase string whose decoded bytes are a multicodec-prefixed raw public key. The two curves this tool accepts are `secp256k1-pub` (varint `0xe7 0x01`) and `p256-pub` (varint `0x80 0x24`). Both use compressed SEC1 for the key body. 50 + 51 + --- 52 + 53 + <!-- START_TASK_1 --> 54 + ### Task 1: Add Phase 2 dependencies to `Cargo.toml` 55 + 56 + **Files:** 57 + - Modify: `Cargo.toml` `[dependencies]` (append, keep Phase 1 entries alphabetised in the merged result) 58 + 59 + **Implementation:** 60 + 61 + Add the following, merging alphabetically with the Phase 1 block: 62 + 63 + ```toml 64 + async-trait = "0.1" 65 + hickory-resolver = "0.25" 66 + k256 = { version = "0.13", features = ["ecdsa"] } 67 + multibase = "0.9" 68 + p256 = { version = "0.13", features = ["ecdsa"] } 69 + reqwest = { version = "0.13", default-features = false, features = ["rustls-tls", "json", "gzip"] } 70 + serde = { version = "1.0", features = ["derive"] } 71 + serde_json = "1.0" 72 + ``` 73 + 74 + Also add `[dev-dependencies]`: 75 + 76 + ```toml 77 + [dev-dependencies] 78 + tokio = { version = "1.51", features = ["rt", "macros", "test-util", "time"] } 79 + ``` 80 + 81 + (The runtime `tokio` line keeps its Phase 1 features; dev-dependencies only adds `test-util` for `tokio::time::pause()` in the unit tests.) 82 + 83 + **Verification:** 84 + 85 + Run: `cargo check` 86 + Expected: all new crates resolve and compile. 87 + 88 + **Commit:** 89 + ```bash 90 + git add Cargo.toml Cargo.lock 91 + git commit -m "Add Phase 2 dependencies for identity primitives" 92 + ``` 93 + <!-- END_TASK_1 --> 94 + 95 + <!-- START_SUBCOMPONENT_A (tasks 2-4) --> 96 + <!-- START_TASK_2 --> 97 + ### Task 2: Define `common::identity` types and narrow IO traits 98 + 99 + **Files:** 100 + - Create: `src/common/identity.rs` (types and traits only; function bodies arrive in Task 3) 101 + - Modify: `src/common.rs` (add `pub mod identity;`) 102 + 103 + **Implementation:** 104 + 105 + The types declared in this task: 106 + 107 + - `DidDocument` — minimal serde struct mirroring only the fields we need (`id`, `also_known_as`, `verification_method`, `service`). Uses explicit field-level `#[serde(rename = ...)]` attributes; **no `#[serde(flatten)]`** per user-global rules. 108 + - `RawDidDocument { parsed: DidDocument, source_bytes: Arc<[u8]>, source_name: String }` — what `resolve_did` actually returns. The bytes and name feed `NamedSource` in Phase 3 diagnostics. 109 + - `Service { id, type_, service_endpoint }` and `VerificationMethod { id, type_, controller, public_key_multibase }` with the matching `#[serde(rename)]`s. 110 + - `ParsedMultikey { curve: Curve, verifying_key: AnyVerifyingKey }` where `Curve` is `Secp256k1 | P256` and `AnyVerifyingKey` wraps the curve-specific types. 111 + - `AnyVerifyingKey` — enum `{ K256(k256::ecdsa::VerifyingKey), P256(p256::ecdsa::VerifyingKey) }` with a `verify_prehash(&self, prehash: &[u8; 32], sig: &Signature) -> Result<(), SignatureError>` helper that dispatches to the correct curve. Phase 6 uses this; declaring it here keeps the curve-awareness in one place. 112 + - `IdentityError` — `thiserror::Error` enum with variants covering `DnsLookupFailed`, `HandleHttpFallbackFailed`, `DidParseError`, `DidResolutionFailed { status, body }`, `DidDocumentDecodeFailed`, `MultikeyDecodeFailed`, `UnsupportedCurve`, `HttpTransport`. Every non-trivial variant carries `#[source]` causes and `#[backtrace]` where applicable. The error type is wrapped later by `IdentityFacts` miette diagnostics in Phase 3. 113 + - Traits: 114 + - `#[async_trait] pub trait HttpClient: Send + Sync { async fn get_bytes(&self, url: &Url) -> Result<(StatusCode, Vec<u8>), IdentityError>; }` 115 + - `#[async_trait] pub trait DnsResolver: Send + Sync { async fn txt_lookup(&self, name: &str) -> Result<Vec<String>, IdentityError>; }` 116 + - Real implementations live in Task 3; test fakes live in Task 4. 117 + 118 + Type-level invariants to preserve: 119 + - `DidDocument.id` is typed as `String` for Phase 2 (we accept anything the server returned and parse it into a `Did` newtype only when we need to reason about the method). Phase 3 can tighten this if needed. 120 + - `RawDidDocument::source_bytes` uses `Arc<[u8]>` so cloning into `NamedSource` in Phase 3 is cheap. 121 + - `ParsedMultikey::curve` and the `AnyVerifyingKey` variant tag MUST agree (constructor is private, built only via `parse_multikey`). 122 + 123 + **Testing:** deferred to Task 4. 124 + 125 + **Verification:** 126 + 127 + Run: `cargo check` 128 + Expected: compiles. No tests yet. 129 + 130 + **Commit:** 131 + ```bash 132 + git add src/common.rs src/common/identity.rs 133 + git commit -m "Add common::identity types and IO traits" 134 + ``` 135 + <!-- END_TASK_2 --> 136 + 137 + <!-- START_TASK_3 --> 138 + ### Task 3: Implement the four resolution primitives 139 + 140 + **Files:** 141 + - Modify: `src/common/identity.rs` — add the function bodies and the two real-IO implementations. 142 + 143 + **Implementation:** 144 + 145 + Four functions to add, plus the real trait impls: 146 + 147 + **`pub async fn resolve_handle(handle: &str, http: &dyn HttpClient, dns: &dyn DnsResolver) -> Result<Did, IdentityError>`** 148 + 149 + Logic: 150 + 1. Validate `handle` syntactically — at least one `.`, all ASCII, no leading/trailing dot. Return `IdentityError::InvalidHandle` on failure. 151 + 2. DNS path: look up `_atproto.<handle>` via `dns.txt_lookup`. For each record, strip leading/trailing whitespace; if it starts with `did=`, parse the tail as a DID. Return the first match. 152 + 3. HTTPS fallback: `GET https://<handle>/.well-known/atproto-did`. Expect `200 OK` with a plain-text body that is a valid DID when trimmed. Return `IdentityError::HandleHttpFallbackFailed` on non-200 or an invalid DID body. 153 + 4. If both paths fail, return `IdentityError::HandleUnresolvable { dns_error, http_error }`. Both underlying errors are captured; Phase 3 surfaces them in the miette diagnostic. 154 + 155 + `Did` here is a local newtype `struct Did(String)` with `fn method(&self) -> DidMethod { .. }` returning `Plc | Web | Other`. Keep it in the same file. 156 + 157 + **`pub async fn resolve_did(did: &Did, http: &dyn HttpClient) -> Result<RawDidDocument, IdentityError>`** 158 + 159 + Logic by method: 160 + - `did:plc:<id>` → `GET https://plc.directory/{did}`. Expect `200 OK`; body is JSON. 161 + - `did:web:<host>[:path...]` → URL-decode path segments, then `GET https://<host>/<path>/.well-known/did.json` when path is empty, else `https://<host>/<path>/did.json`. Expect `200 OK`; body is JSON. 162 + - Any other method → `IdentityError::UnsupportedDidMethod(method.to_string())`. 163 + 164 + Parse the body bytes with `serde_json::from_slice::<DidDocument>`. On parse failure return `IdentityError::DidDocumentDecodeFailed { source_name, source_bytes, cause }` so Phase 3 can use the bytes directly as a `NamedSource`. On success, return `RawDidDocument { parsed, source_bytes: Arc::from(body), source_name }` where `source_name` is the URL the document came from. 165 + 166 + **`pub fn find_service<'a>(doc: &'a DidDocument, id_fragment: &str, expected_type: &str) -> Option<&'a Service>`** 167 + 168 + Spec-correct matching: a DID doc service entry has `id: "<did>#<fragment>"` or sometimes just `"#<fragment>"`. Accept both forms. Match `type` case-sensitively (atproto uses PascalCase type names like `"AtprotoLabeler"`). 169 + 170 + **`pub fn parse_multikey(raw: &str) -> Result<ParsedMultikey, IdentityError>`** 171 + 172 + Logic: 173 + 1. `multibase::decode(raw)` → `(Base::Base58Btc, bytes)`. Require `Base58Btc`; anything else is `IdentityError::UnsupportedMultibase(_)`. 174 + 2. Parse a varint from the start of `bytes` with `unsigned-varint` manually (two bytes is enough for both supported curves; write a tiny inline decoder rather than adding a dep — it is three lines of code). Curve dispatch: 175 + - `0xe7 0x01` (65 + 128, `secp256k1-pub`) → `k256::ecdsa::VerifyingKey::from_sec1_bytes(rest)` → `AnyVerifyingKey::K256`. 176 + - `0x80 0x24` (decimal `4608`, `p256-pub`) → `p256::ecdsa::VerifyingKey::from_sec1_bytes(rest)` → `AnyVerifyingKey::P256`. 177 + - Any other prefix → `IdentityError::UnsupportedCurve { codec_prefix: Vec<u8> }`. 178 + 3. `rest` MUST be exactly 33 bytes (compressed SEC1). Length mismatch is `IdentityError::MultikeyLengthInvalid`. 179 + 180 + **Real trait implementations:** 181 + 182 + - `struct RealHttpClient { inner: reqwest::Client }` implementing `HttpClient`. Build the client in `RealHttpClient::new()` with a conservative default timeout (10s) and rustls. `get_bytes` calls `self.inner.get(url.clone()).send().await?.error_for_status()?.bytes().await?` and maps failures to `IdentityError::HttpTransport`. 183 + - `struct RealDnsResolver { inner: hickory_resolver::TokioResolver }` implementing `DnsResolver`. Build via `TokioResolver::builder_tokio().build()`. `txt_lookup` calls `self.inner.txt_lookup(name).await?.iter().map(|r| r.to_string()).collect()`. 184 + 185 + Both real types have a `pub fn new() -> Self` that never fails for the DNS case (system config issues surface on first query) and returns `Result<Self, IdentityError>` for the HTTP case. 186 + 187 + **Tracing:** every resolution function emits a `tracing::debug!(target = "atproto_devtool::identity", ..)` span at entry including the input. Details (URL, status, bytes-read) log at debug level. This is wired to stderr by Phase 1's `install_tracing` when `--verbose` is passed. 188 + 189 + **Testing:** covered in Task 4. 190 + 191 + **Verification:** 192 + 193 + Run: `cargo check` 194 + Expected: compiles. 195 + 196 + Run: `cargo build` 197 + Expected: succeeds. 198 + 199 + **Commit:** 200 + ```bash 201 + git add src/common/identity.rs 202 + git commit -m "Implement handle, DID, service, and multikey resolution" 203 + ``` 204 + <!-- END_TASK_3 --> 205 + 206 + <!-- START_TASK_4 --> 207 + ### Task 4: Unit tests over recorded fixtures 208 + 209 + **Verifies:** no design AC — this task verifies primitive correctness only, per the Phase 2 "Done when" in the source design. 210 + 211 + **Files:** 212 + - Create: `tests/fixtures/identity/plc_bsky_labeler.json` (recorded DID document for a well-known Bluesky labeler; byte-for-byte what `plc.directory` returned on the fixture-capture date) 213 + - Create: `tests/fixtures/identity/plc_bsky_labeler.txt` (short README describing the source URL and capture date) 214 + - Create: `tests/fixtures/identity/web_example.json` (a did:web document for `did:web:example.com`; minimal hand-authored) 215 + - Create: `tests/fixtures/identity/multikey_k256.txt` (single-line text: a known-good k256 multikey string) 216 + - Create: `tests/fixtures/identity/multikey_p256.txt` (single-line text: a known-good p256 multikey string) 217 + - Create: `tests/fixtures/README.md` (one paragraph explaining the overall fixture directory and why fixtures are versioned with the repo) 218 + - Modify: `src/common/identity.rs` — append a `#[cfg(test)] mod tests { .. }` block at end of the file with the test cases below, plus the fake `HttpClient`/`DnsResolver` implementations they need. 219 + 220 + **Test implementations (as `#[tokio::test]` or `#[test]` as appropriate):** 221 + 222 + Each test constructs a `FakeHttpClient` and/or `FakeDnsResolver` from a map of URL→bytes or name→records, then calls the primitive under test. 223 + 224 + - `fake_clients` helper — in the same test module, two structs: `FakeHttpClient { responses: HashMap<Url, (StatusCode, Vec<u8>)> }` and `FakeDnsResolver { records: HashMap<String, Vec<String>> }`. Both implement the corresponding trait. Any lookup not in the map returns a `NotFound`-flavored `IdentityError` so tests can exercise the fallback path. 225 + - `resolve_handle_via_dns` — DNS fake returns `["did=did:plc:abc123"]` for `_atproto.alice.example`. Asserts the resolved DID is `did:plc:abc123` and HTTP client is never called. 226 + - `resolve_handle_via_https_fallback` — DNS fake returns `NotFound`; HTTP fake returns `200 OK` body `"did:plc:abc123\n"` for `https://alice.example/.well-known/atproto-did`. Asserts the trimmed DID returned matches. 227 + - `resolve_handle_both_paths_fail` — both fakes return `NotFound`. Asserts `IdentityError::HandleUnresolvable` variant with both causes populated. 228 + - `resolve_did_plc_success` — HTTP fake returns the `plc_bsky_labeler.json` bytes for the PLC URL. Asserts the `RawDidDocument.parsed` contains the expected `id`, at least one service with type `AtprotoLabeler`, and that `source_bytes` equals the input. 229 + - `resolve_did_web_success` — HTTP fake returns `web_example.json` for `https://example.com/.well-known/did.json`. Asserts parse succeeds and the `source_name` field equals the URL. 230 + - `resolve_did_decode_failure_preserves_bytes` — HTTP fake returns 200 with `b"not valid json"`. Asserts the error variant is `DidDocumentDecodeFailed` and that `source_bytes` of the error equals `b"not valid json"` (the Phase 3 diagnostic relies on this). 231 + - `find_service_matches_both_id_forms` — constructs a `DidDocument` in-memory with two services, one with `id: "did:plc:abc#atproto_labeler"` and one with `id: "#atproto_pds"`. Asserts both are matched by `find_service(&doc, "atproto_labeler", "AtprotoLabeler")` and `find_service(&doc, "atproto_pds", "AtprotoPersonalDataServer")` respectively. 232 + - `find_service_type_mismatch_returns_none` — same doc, but searching for type `WrongType`. Asserts `None`. 233 + - `parse_multikey_k256` — reads `multikey_k256.txt`, parses, asserts `curve == Secp256k1`, asserts the decoded `VerifyingKey`'s SEC1 compressed bytes (`verifying_key.to_sec1_bytes()` on the `K256` branch) match a hex literal captured at fixture-authoring time. This is a pure parsing round-trip test — no signature verification — so the failure mode is unambiguous. 234 + - `parse_multikey_p256` — mirror of the above with p256, comparing against the P256 branch's `to_sec1_bytes()` output. 235 + - `parse_multikey_unsupported_curve` — builds an in-memory multikey string whose decoded varint prefix is `0x01 0x00` (a reserved codec). Asserts `UnsupportedCurve`. 236 + - `parse_multikey_not_base58btc` — builds a `base16`-prefixed multikey. Asserts `UnsupportedMultibase`. 237 + - `parse_multikey_wrong_length` — constructs a valid-prefix multikey with a 10-byte body. Asserts `MultikeyLengthInvalid`. 238 + 239 + All tests run with `cargo test -p atproto-devtool` or simply `cargo test` (single-crate project). 240 + 241 + **Fixture generation note:** the k256/p256 multikey strings committed under `tests/fixtures/identity/multikey_k256.txt` and `multikey_p256.txt` are generated **once** at phase implementation time from ephemeral keypairs; the expected SEC1 hex strings embedded in the test bodies are derived from those same multikeys at fixture-authoring time. They are NOT the real labeler's production keys. Document the generation procedure as a two-line comment above the test. 242 + 243 + Signature verification using the decoded keys is deferred to Phase 6, where it is exercised end-to-end against real label fixtures via the crypto stage. Phase 2's tests cover only the parse → `VerifyingKey` primitive round-trip. 244 + 245 + **Verification:** 246 + 247 + Run: `cargo test` 248 + Expected: all new tests pass. No network access occurs (verified by the absence of real clients in the test harness — fakes are constructed explicitly). 249 + 250 + **Commit:** 251 + ```bash 252 + git add tests/fixtures/identity src/common/identity.rs tests/fixtures/README.md 253 + git commit -m "Add unit tests for common::identity primitives" 254 + ``` 255 + <!-- END_TASK_4 --> 256 + <!-- END_SUBCOMPONENT_A --> 257 + 258 + <!-- START_TASK_5 --> 259 + ### Task 5: Final phase verification 260 + 261 + **Steps:** 262 + 263 + 1. `cargo fmt --check` — expected clean. 264 + 2. `cargo clippy -- -D warnings` — expected clean. 265 + 3. `cargo test` — expected: all tests (Phase 2's unit tests) pass; no network access. 266 + 4. `cargo run -- test labeler example.bsky.social` — the Phase 1 stub still runs and prints its placeholder (Phase 2 has not wired anything into the CLI yet). Expected: placeholder output, exit 0. 267 + 268 + **Done when:** All four commands succeed; commit history shows the four atomic commits from this phase on top of the Phase 1 commits. 269 + <!-- END_TASK_5 --> 270 + 271 + --- 272 + 273 + ## Phase 2 done-when checklist 274 + 275 + - `common::identity` exposes `resolve_handle`, `resolve_did`, `find_service`, `parse_multikey` against the narrow `HttpClient` / `DnsResolver` traits. 276 + - Real `RealHttpClient` / `RealDnsResolver` impls exist but are unused outside of their constructors (Phase 3 wires them in). 277 + - Unit tests pass without network access, using the in-file `FakeHttpClient` / `FakeDnsResolver` and fixtures under `tests/fixtures/identity/`. 278 + - `cargo clippy -- -D warnings` and `cargo fmt --check` are clean. 279 + - **No acceptance criteria from the design plan are tested yet** — those are covered in Phase 3 once the primitives are wired into the labeler identity stage. 280 + - Commit history contains the four atomic commits listed above.
+423
docs/implementation-plans/2026-04-13-test-labeler/phase_03.md
··· 1 + # Test-labeler Implementation Plan — Phase 3 2 + 3 + **Goal:** Ship the first end-to-end labeler check path. The pipeline driver runs with only the identity stage implemented and renders a real `LabelerReport` to the terminal. Users can already diagnose DID-document and labeler-record misconfigurations after this phase. 4 + 5 + **Architecture:** New module tree under `src/commands/test/labeler/`: `report.rs` (report aggregator + rendering), `pipeline.rs` (target parsing + driver), `identity.rs` (first stage). The `Outcome<T>` enum, the `CheckResult` type, and the `LabelerReport` live in `report.rs`. The pipeline is non-fail-fast — subsequent stages are present in the driver but return `Outcome::Skipped("not yet implemented")` until Phase 4. The identity stage consumes `common::identity` primitives from Phase 2, producing an `IdentityFacts` that later stages depend on. 6 + 7 + **Tech Stack:** adds `atrium-api`, `atrium-xrpc-client` (for fetching `app.bsky.labeler.service/self` via `com.atproto.repo.getRecord`); `insta` (dev-dependency, snapshot tests). 8 + 9 + **Scope:** Phase 3 of 6. Implements `test-labeler.AC1.*` and `test-labeler.AC2.*` entirely. 10 + 11 + **Codebase verified:** 2026-04-13. Phase 1 + 2 landed. `src/commands/test/labeler.rs` is the Phase 1 stub; `src/common/identity.rs` is the Phase 2 primitive module. The `labeler/` subdirectory and all three new files do not yet exist. 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + This phase implements and tests: 18 + 19 + ### test-labeler.AC1: CLI skeleton and invocation modes 20 + - **test-labeler.AC1.1 Success:** `atproto-devtool test labeler <handle>` accepts an atproto handle, resolves it to a DID, and runs all four stages. 21 + - **test-labeler.AC1.2 Success:** `atproto-devtool test labeler <did>` accepts a DID directly (did:plc or did:web) and runs all four stages without a prior handle resolution. 22 + - **test-labeler.AC1.3 Success:** `atproto-devtool test labeler <https-url>` accepts a raw endpoint URL and runs the HTTP and subscription stages while marking identity and crypto checks `Skipped` with a clear reason. 23 + - **test-labeler.AC1.4 Success:** `atproto-devtool test labeler <https-url> --did <did>` unlocks identity checks (and therefore the crypto stage) against an endpoint URL, and also cross-checks that the DID's declared labeler service endpoint matches the provided URL — reporting a `SpecViolation` on mismatch. 24 + - **test-labeler.AC1.5 Failure:** A target argument that is neither a valid handle, DID, nor URL produces a clap parsing error with a helpful message and exit code `2`. 25 + - **test-labeler.AC1.6 Edge:** `atproto-devtool test labeler --help` renders the subcommand's help text including all stage-related flags (`--subscribe-timeout`, `--did`, `--verbose`, `--no-color`). 26 + 27 + ### test-labeler.AC2: Identity-layer checks 28 + - **test-labeler.AC2.1 Success:** A DID document containing a `#atproto_labeler` service entry of type `AtprotoLabeler` with a valid `serviceEndpoint`, plus a signing-key verification method parseable as k256 or p256, passes all identity checks. 29 + - **test-labeler.AC2.2 Success:** A `app.bsky.labeler.service/self` record with a non-empty `policies.labelValues` list passes the labeler-record checks. 30 + - **test-labeler.AC2.3 Failure:** A DID document missing the `atproto_labeler` service entry produces a `SpecViolation` `CheckResult` whose diagnostic's `NamedSource` is the DID-document JSON and whose `#[label]` span highlights the `service` array. 31 + - **test-labeler.AC2.4 Failure:** A DID document missing the labeler signing key produces a `SpecViolation` whose diagnostic highlights the `verificationMethod` array. 32 + - **test-labeler.AC2.5 Failure:** A labeler `serviceEndpoint` that is not a valid HTTPS URL produces a `SpecViolation` with the offending endpoint value highlighted in the DID doc. 33 + - **test-labeler.AC2.6 Failure:** A missing `app.bsky.labeler.service/self` record (PDS returns 404) produces a `SpecViolation` distinct from a PDS transport failure. 34 + - **test-labeler.AC2.7 Failure:** An `app.bsky.labeler.service/self` record with an empty `policies.labelValues` list produces a `SpecViolation` whose diagnostic highlights the `policies` field in the re-serialized record. 35 + - **test-labeler.AC2.8 NetworkError:** A DNS failure resolving a handle, an unreachable `plc.directory`, or an unreachable PDS produces a `NetworkError` result that is called out separately in the summary and does not by itself fail the run. 36 + 37 + Phases 4–6 cover the remaining AC groups (AC3, AC4, AC5, AC6). Where this phase must cross-link with later stages (e.g., rendering "not yet implemented" for HTTP/subscription/crypto), the stub pipeline driver marks those stages `Skipped` with a stable reason string so the rendered report is forward-compatible. 38 + 39 + --- 40 + 41 + ## Codebase state at start of phase 42 + 43 + Verified 2026-04-13 (post Phases 1 and 2): 44 + 45 + - `src/common/identity.rs` exists with `resolve_handle`, `resolve_did`, `find_service`, `parse_multikey`, `RealHttpClient`, `RealDnsResolver`, the `HttpClient` and `DnsResolver` traits, the `DidDocument` / `RawDidDocument` types, the `AnyVerifyingKey` enum, and the `IdentityError` type. 46 + - `src/commands/test/labeler.rs` contains the Phase 1 stub `LabelerCmd` struct that accepts `target: String`, `did: Option<String>`, `subscribe_timeout: Duration` and prints a placeholder in its `run` method. 47 + - No `src/commands/test/labeler/` subdirectory yet. No `report.rs`, `pipeline.rs`, or `identity.rs` under it. 48 + - No `tests/fixtures/labeler/` yet. Phase 2 seeded `tests/fixtures/identity/` only. 49 + - `insta` is not yet a dev-dependency. 50 + - No real `atrium-api` or `atrium-xrpc-client` dependency. 51 + 52 + --- 53 + 54 + ## External dependency notes 55 + 56 + - **`atrium-api` 0.25.8** provides: 57 + - `atrium_api::com::atproto::repo::get_record::Parameters` / `Output` / `Error` with `const NSID: &str = "com.atproto.repo.getRecord"`. 58 + - `atrium_api::client::AtpServiceClient::new(xrpc)` to construct a typed client over any `atrium_xrpc::XrpcClient`. 59 + - `atrium_api::app::bsky::labeler::defs::{LabelerPolicies, LabelerPoliciesData}` — the typed labeler policies record. 60 + - **`atrium-xrpc-client` 0.5.15** provides: 61 + - `atrium_xrpc_client::reqwest::ReqwestClient::new(base_uri)` — the concrete client built on top of `reqwest::Client`. This is the only client impl we need. 62 + - **`miette` 7.6** `NamedSource<T>` + `#[source_code]` + `#[label]` span derive. The `#[label("...")]` span attribute on a `SourceSpan` field works with the `Diagnostic` derive when `#[source_code]` is a `NamedSource` field on the same struct. 63 + - **`insta` 1.47.2** — dev-only. `insta::assert_snapshot!` for plain text snapshots of the rendered `LabelerReport`. Snapshots live in `tests/snapshots/`. 64 + 65 + **atproto spec notes:** 66 + 67 + - The labeler service record is stored under collection `app.bsky.labeler.service` at `rkey = "self"`. Its policy field is `policies.labelValues: Vec<String>` (plus an optional `policies.labelValueDefinitions`). A non-empty `labelValues` is the minimum bar the tool checks. 68 + - The `#atproto_labeler` DID service entry has `type: "AtprotoLabeler"` and `serviceEndpoint: <https URL>`. 69 + - The `#atproto_pds` service entry has `type: "AtprotoPersonalDataServer"` and the `serviceEndpoint` used to fetch the labeler record via the `com.atproto.repo.getRecord` XRPC call. 70 + 71 + --- 72 + 73 + <!-- START_TASK_1 --> 74 + ### Task 1: Add Phase 3 dependencies 75 + 76 + **Files:** 77 + - Modify: `Cargo.toml` (`[dependencies]` and `[dev-dependencies]`) 78 + 79 + **Implementation:** 80 + 81 + Append to `[dependencies]` (alphabetised overall): 82 + 83 + ```toml 84 + atrium-api = "0.25" 85 + atrium-xrpc-client = { version = "0.5", default-features = false, features = ["reqwest"] } 86 + ``` 87 + 88 + Append to `[dev-dependencies]`: 89 + 90 + ```toml 91 + insta = { version = "1.47", features = ["yaml"] } 92 + ``` 93 + 94 + **Verification:** 95 + 96 + Run: `cargo check` 97 + Expected: new crates resolve and compile. 98 + 99 + **Commit:** 100 + ```bash 101 + git add Cargo.toml Cargo.lock 102 + git commit -m "Add Phase 3 dependencies (atrium, insta)" 103 + ``` 104 + <!-- END_TASK_1 --> 105 + 106 + <!-- START_TASK_2 --> 107 + ### Task 2: `LabelerReport`, `CheckResult`, `CheckStatus` and renderer 108 + 109 + **Verifies:** test-labeler.AC6.1, test-labeler.AC6.5 partial (glyph/reason rendering for `Skipped`); the rest of AC6 arrives in Phase 6 once exit-code wiring is complete. 110 + 111 + **Files:** 112 + - Create: `src/commands/test/labeler.rs` — _modify_: add `pub mod report; pub mod pipeline; pub mod identity;` declarations (keep the existing `LabelerCmd` struct but rewrite `run` in Task 5). 113 + - Create: `src/commands/test/labeler/report.rs` 114 + 115 + **Implementation:** 116 + 117 + `report.rs` contains: 118 + 119 + - `pub enum CheckStatus { Pass, SpecViolation, NetworkError, Advisory, Skipped }` — the five rendering severities. `Pass` maps to `[OK]`, `SpecViolation` to `[FAIL]`, `NetworkError` to `[NET]`, `Advisory` to `[WARN]`, `Skipped` to `[SKIP]`. 120 + - `pub struct CheckResult { pub id: &'static str, pub stage: Stage, pub status: CheckStatus, pub summary: Cow<'static, str>, pub diagnostic: Option<miette::Report>, pub skipped_reason: Option<Cow<'static, str>> }`. 121 + - `pub enum Stage { Identity, Http, Subscription, Crypto }` with `fn label(self) -> &'static str` returning the section heading. 122 + - `pub struct LabelerReport { pub header: ReportHeader, pub results: Vec<CheckResult>, pub started_at: Instant, pub finished_at: Option<Instant> }`. 123 + - `pub struct ReportHeader { pub target: String, pub resolved_did: Option<String>, pub pds_endpoint: Option<String>, pub labeler_endpoint: Option<String> }`. 124 + - `LabelerReport::new(header)`, `LabelerReport::record(result: CheckResult)`, `LabelerReport::finish(&mut self)` helpers. 125 + - `LabelerReport::exit_code(&self) -> i32` computing exit code per the severity rules: any `SpecViolation` → 1; else 0. (`NetworkError` and `Advisory` don't fail the run; see AC6.3 / AC6.4.) 126 + - `LabelerReport::summary_counts(&self) -> SummaryCounts` with fields `pass`, `spec_violation`, `network_error`, `advisory`, `skipped`. Network errors are reported as a separate count (AC6.3 calls this out). 127 + - `LabelerReport::render<W: io::Write>(&self, out: &mut W, config: &RenderConfig) -> io::Result<()>` — the main renderer. Writes: 128 + 1. One header line with the target and resolved DID/PDS/labeler endpoint, plus elapsed time. 129 + 2. One section per `Stage`, in order `Identity → Http → Subscription → Crypto`. Each section prints: 130 + - A heading line `== Identity ==`. 131 + - Every `CheckResult` for that stage as `[GLYPH] <summary> — <skipped_reason or nothing>`. 132 + - For each result with `status ∈ {SpecViolation, NetworkError, Advisory}` that carries a `diagnostic`, render the miette `Report` using a **local** `GraphicalReportHandler` the renderer constructs from `RenderConfig`, not the process-global handler installed in Phase 1. See "Why a local handler" below. `Skipped` checks do NOT render a miette diagnostic; they print only the `skipped_reason`. 133 + 3. A summary footer: `Summary: X passed, Y failed (spec), Z network errors, W advisories, V skipped. Exit code: <code>`. 134 + 135 + - `pub struct RenderConfig { pub no_color: bool }` — controls whether miette diagnostics are rendered with the graphical theme's color. When `no_color` is true, the local handler uses `GraphicalTheme::unicode_nocolor()`. 136 + 137 + **Why a local handler, not the Phase 1 global:** Phase 1 installs a process-global miette handler via `miette::set_hook`. That global is fine for `main`'s top-level `miette::Report` error rendering, but using it here would make `LabelerReport::render` race-sensitive: snapshot tests in Phases 3–6 run in parallel by default, and two tests rendering with different `RenderConfig::no_color` values would clobber each other's global state. Instead, `LabelerReport::render` constructs its own `GraphicalReportHandler` with the per-call theme, calls `handler.render_report(&mut buf, diag.as_ref())` for each diagnostic, and writes the resulting string into `out`. The global is used only for the top-level bootstrap error path (`miette::Result<()>` returned from `main`), and this renderer never depends on it being installed. 138 + 139 + - A `Display` impl for `CheckStatus` producing the ASCII glyph; used for both the text renderer and debug output. 140 + 141 + **Testing (unit-level, in the same file):** 142 + 143 + - `exit_code_only_advisory_is_zero` — empty report with one `Advisory` CheckResult → `exit_code() == 0`. 144 + - `exit_code_only_network_errors_is_zero` — one `NetworkError` → `exit_code() == 0`. 145 + - `exit_code_with_spec_violation_is_one` — one `SpecViolation` → `exit_code() == 1`. 146 + - `summary_counts_partition_correct` — mixed report yields correct per-severity counts. 147 + - `render_basic_glyphs` — a tiny hand-rolled report with one pass, one fail, one skipped. Capture the rendered bytes with a `Vec<u8>` writer; assert the byte sequence contains `[OK]`, `[FAIL]`, and `[SKIP]` — NOT a full snapshot (the full rendered shape is locked down in Task 6 via `insta`). 148 + 149 + **Verification:** 150 + 151 + Run: `cargo test` 152 + Expected: the unit tests above pass. 153 + 154 + **Commit:** 155 + ```bash 156 + git add src/commands/test/labeler.rs src/commands/test/labeler/report.rs 157 + git commit -m "Add LabelerReport types and renderer" 158 + ``` 159 + <!-- END_TASK_2 --> 160 + 161 + <!-- START_TASK_3 --> 162 + ### Task 3: `LabelerTarget` parsing and pipeline driver skeleton 163 + 164 + **Verifies:** test-labeler.AC1.1, AC1.2, AC1.3, AC1.4, AC1.5, AC1.6. 165 + 166 + **Files:** 167 + - Create: `src/commands/test/labeler/pipeline.rs` 168 + 169 + **Implementation:** 170 + 171 + Types: 172 + 173 + ```text 174 + pub enum LabelerTarget { 175 + Identified { identifier: AtIdentifier, explicit_did: Option<Did> }, 176 + Endpoint { url: Url, did: Option<Did> }, 177 + } 178 + 179 + pub enum AtIdentifier { 180 + Handle(String), 181 + Did(Did), 182 + } 183 + 184 + pub struct LabelerOptions<'a> { 185 + pub http: &'a dyn HttpClient, 186 + pub dns: &'a dyn DnsResolver, 187 + pub subscribe_timeout: Duration, 188 + pub verbose: bool, 189 + } 190 + ``` 191 + 192 + Functions: 193 + 194 + - `pub fn parse_target(raw: &str, explicit_did: Option<&str>) -> Result<LabelerTarget, TargetParseError>` — called once at CLI argument binding time. The rules (verifies AC1.1 through AC1.5): 195 + - If `raw` starts with `did:` → `AtIdentifier::Did` in `Identified`. `explicit_did`, if also provided, is rejected as "ambiguous — target is already a DID" (an error, not a silent override). This keeps the flag's meaning clear. 196 + - Else if `raw` starts with `https://` or `http://` → parse as `Url`; `Endpoint { url, did: explicit_did.map(Did::new) }`. Reject `http://` with a helpful error pointing the user at `https://`. 197 + - Else if `raw` contains a `.` and otherwise matches the atproto handle grammar → `AtIdentifier::Handle`. `explicit_did`, if provided, is carried in `Identified::explicit_did` so the pipeline can cross-check the resolved DID matches. 198 + - Else → `TargetParseError::UnrecognizedTarget { raw: raw.to_string() }` with a helpful message listing the three accepted forms. This error is returned from `LabelerCmd::run` via `map_err` into a miette diagnostic and exits 2 (bootstrap/usage error) — AC1.5. 199 + 200 + - `pub async fn run_pipeline(target: LabelerTarget, opts: LabelerOptions<'_>) -> LabelerReport` — the driver. Pseudocode: 201 + 1. Build a `ReportHeader` from the target (URL/handle/DID). Initialize `LabelerReport::new(header)`. 202 + 2. Call `identity::run(&target, opts.http, opts.dns) -> IdentityOutcome`, where `IdentityOutcome = Outcome<IdentityFacts>`. Push every per-check `CheckResult` emitted by the identity stage. 203 + - `LabelerTarget::Endpoint { did: None, .. }` → identity stage returns `Outcome::Skipped("no DID supplied; run with a handle, a DID, or --did <did>")` (AC1.3). 204 + - `LabelerTarget::Endpoint { did: Some(d), url }` → identity stage runs and, after resolving `d`, emits an _additional_ `CheckResult` cross-checking the `#atproto_labeler` serviceEndpoint against `url`. On mismatch, severity `SpecViolation` (AC1.4). 205 + 3. For each subsequent stage (HTTP, Subscription, Crypto), push a `CheckResult { status: Skipped, skipped_reason: "not yet implemented (phase N)" }` per stage into the report. Phase 4/5/6 replace these calls with real stage implementations. 206 + 4. `report.finish()` and return. 207 + 208 + - `struct TargetParseError(String)` — `impl From<_> for miette::Report` so the CLI layer can propagate it. 209 + 210 + **Testing (unit, in-file):** 211 + 212 + - `parse_target_handle` — `parse_target("alice.bsky.social", None)` → `Identified::Handle`. AC1.1. 213 + - `parse_target_did_plc` / `parse_target_did_web` — AC1.2. 214 + - `parse_target_endpoint_https` → `Endpoint { did: None, .. }`. AC1.3. 215 + - `parse_target_endpoint_with_explicit_did` → `Endpoint { did: Some(..), .. }`. AC1.4 (the cross-check itself is tested in Task 4). 216 + - `parse_target_endpoint_http_rejected` — `parse_target("http://evil.example", None)` returns an error pointing at `https://`. Related to AC1.5 but not the exact case. 217 + - `parse_target_unrecognised` — `parse_target("not a handle or did", None)` → `TargetParseError::UnrecognizedTarget`. AC1.5. 218 + - `parse_target_did_with_conflicting_flag` — `parse_target("did:plc:abc", Some("did:web:example.com"))` → error mentioning ambiguity. 219 + 220 + **Verification:** 221 + 222 + Run: `cargo test` 223 + Expected: pass. 224 + 225 + **Commit:** 226 + ```bash 227 + git add src/commands/test/labeler/pipeline.rs 228 + git commit -m "Add LabelerTarget parsing and pipeline driver skeleton" 229 + ``` 230 + <!-- END_TASK_3 --> 231 + 232 + <!-- START_SUBCOMPONENT_A (tasks 4-6) --> 233 + <!-- START_TASK_4 --> 234 + ### Task 4: Identity stage — `IdentityFacts` and all identity checks 235 + 236 + **Verifies:** test-labeler.AC2.1, AC2.2, AC2.3, AC2.4, AC2.5, AC2.6, AC2.7, AC2.8, AC1.4 (endpoint/DID cross-check). 237 + 238 + **Files:** 239 + - Create: `src/commands/test/labeler/identity.rs` 240 + 241 + **Implementation:** 242 + 243 + Types: 244 + 245 + ```text 246 + pub struct IdentityFacts { 247 + pub did: Did, 248 + pub raw_did_doc: RawDidDocument, 249 + pub labeler_endpoint: Url, 250 + pub pds_endpoint: Url, 251 + pub signing_key_id: String, 252 + pub signing_key: AnyVerifyingKey, 253 + pub labeler_record_bytes: Arc<[u8]>, 254 + pub labeler_policies: LabelerPolicies, 255 + } 256 + 257 + pub struct IdentityStageOutput { 258 + pub facts: Option<IdentityFacts>, 259 + pub results: Vec<CheckResult>, 260 + } 261 + 262 + pub async fn run( 263 + target: &LabelerTarget, 264 + http: &dyn HttpClient, 265 + dns: &dyn DnsResolver, 266 + ) -> IdentityStageOutput; 267 + ``` 268 + 269 + The driver emits **named checks** (constant IDs) in this order: 270 + 271 + 1. **`identity::target_resolved`** — handle → DID resolved, or DID already known. Skipped for `Endpoint { did: None }`. DNS failure resolving a handle → `NetworkError` (AC2.8). Invalid handle → `SpecViolation`. 272 + 2. **`identity::did_document_fetched`** — `resolve_did` returned `RawDidDocument`. `plc.directory` unreachable → `NetworkError` (AC2.8); JSON decode failure → `SpecViolation` with diagnostic carrying the raw bytes. 273 + 3. **`identity::labeler_service_present`** — `find_service(&doc, "atproto_labeler", "AtprotoLabeler")`. Missing → `SpecViolation` with `NamedSource` = the DID doc bytes and `#[label("service array")]` span covering the `"service"` JSON key range (AC2.3). Found → `Pass`. 274 + 4. **`identity::labeler_endpoint_is_https`** — parse the service endpoint as a `Url` and require scheme `https`. Non-https → `SpecViolation`, span over the endpoint value (AC2.5). 275 + 5. **`identity::labeler_endpoint_matches_flag`** — only when target is `Endpoint { did: Some(_), url }`: compare the resolved labeler endpoint to the flag URL with scheme and authority normalized. Mismatch → `SpecViolation` (AC1.4). Otherwise skipped. 276 + 6. **`identity::signing_key_present`** — `find_verification_method(&doc, "atproto_label")` (a small helper, co-located) returning the multikey string. Missing → `SpecViolation`, span over the `verificationMethod` array key (AC2.4). Present but parseable via `parse_multikey` → `Pass`; unparseable → `SpecViolation` with the multikey string in the diagnostic. 277 + 7. **`identity::pds_endpoint_present`** — `find_service(&doc, "atproto_pds", "AtprotoPersonalDataServer")` → `Pass` or `SpecViolation`. 278 + 8. **`identity::labeler_record_fetched`** — construct an `atrium_xrpc_client::reqwest::ReqwestClient` against the PDS endpoint, build an `AtpServiceClient`, call `client.service.com.atproto.repo.get_record(params)` with `repo = did`, `collection = "app.bsky.labeler.service"`, `rkey = "self"`. Map: 279 + - 404 → `SpecViolation` "PDS has no labeler service record" (AC2.6). 280 + - Network / TLS / DNS failure → `NetworkError` (AC2.8) distinct from 404. 281 + - 200 with successfully deserialised `LabelerPolicies` → `Pass`, capture the raw bytes for the next check's `NamedSource`. 282 + 9. **`identity::labeler_record_policies_nonempty`** — inspect `policies.label_values`. Empty → `SpecViolation` with diagnostic carrying re-serialised record bytes and a span over the `"policies"` JSON key (AC2.7). Non-empty → `Pass` (AC2.2). 283 + 284 + **Diagnostic types** — each failing check owns a `#[derive(thiserror::Error, miette::Diagnostic)]` struct with `#[source_code] named_source: NamedSource<Arc<[u8]>>` and `#[label("...")] span: SourceSpan`. Each diagnostic has a stable `code` string of the form `labeler::identity::<check_id>`. A small helper `json_span_for_key(bytes: &[u8], key: &str) -> SourceSpan` scans the JSON byte slice for the first occurrence of `"<key>"` at the top level and returns the span covering the key (and optionally the value). For Phase 3 a simple substring search is acceptable since the DID documents and labeler records are small; if this ever mis-hits, Phase 6 can replace it with a proper JSON tokenizer. 285 + 286 + The stage populates `IdentityFacts` only when every check listed above produces a `Pass` (or the optional cross-check is `Pass`). Any `SpecViolation` or `NetworkError` blocks `IdentityFacts` from being produced and downstream stages that depend on it are skipped by the pipeline with `skipped_reason = "blocked by identity stage failures"`. 287 + 288 + **Testing:** deferred to Task 6. Unit tests for the `json_span_for_key` helper live in this file as a `#[cfg(test)]` submodule. 289 + 290 + **Verification:** 291 + 292 + Run: `cargo check` 293 + Expected: compiles. 294 + 295 + **Commit:** 296 + ```bash 297 + git add src/commands/test/labeler/identity.rs 298 + git commit -m "Implement labeler identity stage with per-check diagnostics" 299 + ``` 300 + <!-- END_TASK_4 --> 301 + 302 + <!-- START_TASK_5 --> 303 + ### Task 5: Wire `LabelerCmd::run` into the pipeline and render the report 304 + 305 + **Verifies:** test-labeler.AC6.1 (partial — all-OK render on a passing run), AC6.6 (NO_COLOR honored by the render path). 306 + 307 + **Files:** 308 + - Modify: `src/commands/test/labeler.rs` (replace the stub `LabelerCmd::run` body; keep the existing `LabelerCmd` struct, the `parse_subscribe_timeout` helper, and the `humantime_like` submodule unchanged). 309 + 310 + **Implementation:** 311 + 312 + `LabelerCmd::run` now: 313 + 314 + 1. Parses the target using `pipeline::parse_target(&self.target, self.did.as_deref())`, mapping its error to a `miette::Report` and returning immediately (exit 2 via the clap/bootstrap path when the error is from argument parsing; exit 1 otherwise). 315 + 2. Builds `RealHttpClient::new()` and `RealDnsResolver::new()` once. 316 + 3. Calls `pipeline::run_pipeline(target, LabelerOptions { http: &http, dns: &dns, subscribe_timeout: self.subscribe_timeout, verbose }).await`. 317 + 4. Calls `report.render(&mut stdout_lock, &RenderConfig { no_color })` where `no_color` is threaded through from `cli::run` (add a `no_color: bool` to `LabelerOptions` _or_ use a thread-local set at handler-install time — choose the parameter approach, it is explicit). 318 + 5. Returns `Ok(())` if `report.exit_code() == 0`, otherwise returns a minimal `miette::Report` whose only role is to produce the non-zero exit code. (The actual diagnostics were already rendered in step 4; we do NOT rely on returning `Err` as the primary rendering path.) 319 + 320 + **Non-obvious detail:** because the report is rendered via `println!`-style writes directly to stdout and miette diagnostics are rendered inline within that output, the `Err`-return-for-exit-code path should use a blank `miette::miette!("")` message that the main handler will swallow. An alternative — calling `std::process::exit(report.exit_code())` — skips tokio/drop semantics and is avoided per user Rust conventions. 321 + 322 + A cleaner pattern (chosen here): `LabelerCmd::run` returns `Result<ExitCode, miette::Report>`, and the dispatch chain in `cli::run` matches on `ExitCode` before returning from `main`. This requires propagating `std::process::ExitCode` up through `cli.rs::run` and `main.rs`. Update both. 323 + 324 + **Verification:** 325 + 326 + Run: `cargo build` 327 + Expected: succeeds. 328 + 329 + Run: `cargo run -- test labeler moderation.bsky.app` 330 + Expected (online integration; skip in CI): real resolution of the public Bluesky moderation labeler; identity section prints a mix of `[OK]` lines; subsequent stages print `[SKIP]` "not yet implemented". Exit 0. 331 + 332 + Run: `cargo run -- test labeler not-a-thing` 333 + Expected: `TargetParseError` rendered as a miette diagnostic, exit 2 (AC1.5). 334 + 335 + Run: `cargo run -- test labeler https://mod.bsky.app` 336 + Expected: identity checks `[SKIP]` with reason "no DID supplied"; HTTP/subscription `[SKIP]` "not yet implemented"; exit 0 (AC1.3). 337 + 338 + **Commit:** 339 + ```bash 340 + git add src/commands/test/labeler.rs src/cli.rs src/main.rs 341 + git commit -m "Wire test labeler pipeline into CLI and exit codes" 342 + ``` 343 + <!-- END_TASK_5 --> 344 + 345 + <!-- START_TASK_6 --> 346 + ### Task 6: Snapshot tests for the identity stage 347 + 348 + **Verifies:** all of AC1.1–AC1.6 and AC2.1–AC2.8, via the rendered report for each fixture. 349 + 350 + **Files:** 351 + - Create: `tests/fixtures/labeler/identity/` — recorded DID documents and labeler records, one subdirectory per named case: 352 + - `healthy_plc/` — `did.json`, `labeler_record.json`. A real healthy labeler document (hand-sanitized if needed); signing key resolves through `parse_multikey` cleanly. 353 + - `missing_service/` — DID doc with no `#atproto_labeler` service entry (AC2.3). 354 + - `missing_signing_key/` — DID doc with no `#atproto_label` verification method (AC2.4). 355 + - `non_https_endpoint/` — DID doc whose service endpoint is `http://...` (AC2.5). 356 + - `missing_labeler_record/` — healthy DID doc, but the PDS returns 404 for the record (AC2.6). 357 + - `empty_policies/` — healthy DID doc, record present but `policies.labelValues` is `[]` (AC2.7). 358 + - `plc_directory_unreachable/` — DID doc fetch returns transport error (AC2.8). 359 + - `endpoint_mismatch/` — `Endpoint { url, did: Some(...) }` where the DID's service endpoint does not match `url` (AC1.4). 360 + - `endpoint_only_no_did/` — `Endpoint { url, did: None }` to cover AC1.3 (all identity checks skipped with the "no DID supplied" reason). 361 + - Create: `tests/labeler_identity.rs` — integration-level snapshot tests using `insta::assert_snapshot!`. Each test constructs a `FakeHttpClient` + `FakeDnsResolver` seeded from one fixture directory, runs `pipeline::run_pipeline(target, opts).await`, renders the report to a `Vec<u8>` with `RenderConfig { no_color: true }`, and asserts the resulting string matches a locked snapshot in `tests/snapshots/`. 362 + - Create: `tests/snapshots/labeler_identity__*.snap` — produced by `cargo insta test --accept` on first run and committed alongside the fixtures. 363 + 364 + **Test list (one test function per fixture case):** 365 + 366 + - `healthy_plc_renders_all_ok` — verifies AC2.1, AC2.2. Snapshot shows all identity checks `[OK]`. 367 + - `missing_service_renders_spec_violation_with_span` — verifies AC2.3. Snapshot shows `[FAIL] identity::labeler_service_present` plus miette diagnostic with `source` pointing at the DID doc bytes and a span over `"service"`. 368 + - `missing_signing_key_renders_spec_violation` — AC2.4. 369 + - `non_https_endpoint_renders_spec_violation` — AC2.5. 370 + - `missing_labeler_record_renders_404_distinct_from_transport` — AC2.6. Assert that the failure diagnostic's `code` is `labeler::identity::labeler_record_fetched` AND the message distinguishes "PDS returned 404" from a transport error. 371 + - `empty_policies_renders_spec_violation_with_span` — AC2.7. 372 + - `plc_directory_unreachable_renders_network_error` — AC2.8. Assert severity `NetworkError` and that the summary footer lists 1 network error, 0 spec violations, exit 0. 373 + - `endpoint_mismatch_spec_violation` — AC1.4. 374 + - `endpoint_only_no_did_skips_identity` — AC1.3. Assert all identity checks render `[SKIP]` with the "no DID supplied" reason, and HTTP/subscription still render `[SKIP]` "not yet implemented". 375 + - `handle_resolution_happy_path` — AC1.1. Uses `FakeDnsResolver` returning `"did=did:plc:..."`. Re-uses the `healthy_plc` fixture files for the DID doc + labeler record. 376 + - `did_plc_direct_happy_path` — AC1.2 for did:plc. 377 + - `did_web_direct_happy_path` — AC1.2 for did:web (reuses a hand-authored did:web fixture). 378 + 379 + **Snapshot strategy:** 380 + - Use `insta::with_settings!{ { filters => vec![(r"elapsed: [0-9]+ms", "elapsed: XXms")] }, { insta::assert_snapshot!(..) } }` to mask the wall-clock time in the header so snapshots are deterministic. 381 + - Commit snapshots under `tests/snapshots/` (the `insta` default). 382 + - Add a top-level `insta.yaml` (if not yet present) or rely on defaults — no custom config required. 383 + 384 + **Verification:** 385 + 386 + Run: `cargo insta test --accept` (first time, to capture snapshots) 387 + Run: `cargo test` (subsequent runs) 388 + Expected: all 12 tests pass. On code changes that alter rendered output, `cargo insta review` is used to approve changes explicitly. 389 + 390 + **Commit:** 391 + ```bash 392 + git add tests/labeler_identity.rs tests/fixtures/labeler/identity tests/snapshots 393 + git commit -m "Add identity stage snapshot tests covering AC1 and AC2" 394 + ``` 395 + <!-- END_TASK_6 --> 396 + <!-- END_SUBCOMPONENT_A --> 397 + 398 + <!-- START_TASK_7 --> 399 + ### Task 7: Final phase verification 400 + 401 + **Steps:** 402 + 403 + 1. `cargo fmt --check` — clean. 404 + 2. `cargo clippy -- -D warnings` — clean. 405 + 3. `cargo test` — all tests including the 12 Phase 3 snapshot tests pass. 406 + 4. `cargo run -- test labeler --help` — AC1.6: help renders with `--did`, `--subscribe-timeout`, `--verbose`, `--no-color`, positional `TARGET`. 407 + 5. (Optional, online) `cargo run -- test labeler moderation.bsky.app` — identity stage all `[OK]`, later stages `[SKIP]`. 408 + 409 + **Done when:** steps 1–4 pass. Step 5 is a smoke check against a real labeler and is allowed to vary from one release to the next; it is not part of the `cargo test` pass criterion. 410 + <!-- END_TASK_7 --> 411 + 412 + --- 413 + 414 + ## Phase 3 done-when checklist 415 + 416 + - `src/commands/test/labeler/{report,pipeline,identity}.rs` exist and compile. 417 + - `atrium-api` and `atrium-xrpc-client` are in the dependency graph; `insta` is a dev-dependency. 418 + - All 12 Phase 3 snapshot tests pass. Snapshots are committed. 419 + - Running the binary against a real healthy labeler produces an identity section of all `[OK]` glyphs and exits 0. 420 + - Running with an unparseable target exits 2 with a clap/miette error (AC1.5). 421 + - Running with `https://<endpoint>` (no `--did`) renders identity stage as `[SKIP]` with reason "no DID supplied" (AC1.3). 422 + - AC6 items beyond AC6.1/AC6.5/AC6.6 remain to be completed in Phase 6 — they are acknowledged here only as far as rendering/exit-code plumbing. 423 + - Commit history shows the six atomic commits from this phase.
+245
docs/implementation-plans/2026-04-13-test-labeler/phase_04.md
··· 1 + # Test-labeler Implementation Plan — Phase 4 2 + 3 + **Goal:** Implement the HTTP stage. Hit `com.atproto.label.queryLabels` on the labeler endpoint, verify schema conformance, exercise pagination once, and publish the collected labels into `HttpFacts::first_page` for the crypto stage to consume in Phase 6. 4 + 5 + **Architecture:** New module `src/commands/test/labeler/http.rs`. Uses the typed `atrium-api` client (already in the dependency graph from Phase 3) over an `atrium-xrpc-client::reqwest::ReqwestClient` pointed at the labeler endpoint. The stage emits named per-check `CheckResult`s and returns an `HttpFacts { first_page: Vec<Label>, pagination_ok: bool, first_page_raw_bytes: Arc<[u8]> }`. The pipeline driver (Phase 3) is edited to call this stage between identity and the still-stubbed subscription stage. Tests run against a small in-process `FakeXrpcClient` implementing `atrium_xrpc::XrpcClient` directly, so no real sockets are opened. 6 + 7 + **Tech Stack:** No new crates. Uses `atrium-api` + `atrium-xrpc-client` + existing fixture/testing infrastructure. 8 + 9 + **Scope:** Phase 4 of 6. 10 + 11 + **Codebase verified:** 2026-04-13 (post Phase 3). `src/commands/test/labeler/{report,pipeline,identity}.rs` exist. `src/commands/test/labeler/http.rs` does **not** yet exist. 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + ### test-labeler.AC3: HTTP-layer checks (`queryLabels`) 18 + - **test-labeler.AC3.1 Success:** A labeler endpoint that responds to `com.atproto.label.queryLabels` with a well-formed lexicon response decodes into typed labels and passes the schema check. 19 + - **test-labeler.AC3.2 Success:** A labeler that honors the `cursor` parameter — returning a distinct page when called with a cursor from the first page — passes the pagination round-trip check. 20 + - **test-labeler.AC3.3 Success:** A labeler that returns an empty labels array passes the schema and pagination checks and contributes a distinct `Advisory` ("labeler has no published labels") to inform downstream stages. 21 + - **test-labeler.AC3.4 Failure:** A `queryLabels` response that omits required fields or otherwise fails lexicon decoding produces a `SpecViolation` whose diagnostic carries the response JSON as its `NamedSource`. 22 + - **test-labeler.AC3.5 Failure:** A labeler that ignores the `cursor` parameter (returns the first page again) produces a `SpecViolation` on the pagination check. 23 + - **test-labeler.AC3.6 NetworkError:** An unreachable or TLS-failing labeler endpoint produces a `NetworkError` that does not cascade into `queryLabels` schema failures. 24 + 25 + No other AC groups are covered in this phase. 26 + 27 + --- 28 + 29 + ## Codebase state at start of phase 30 + 31 + Post Phase 3: 32 + - `src/commands/test/labeler/report.rs` — the `LabelerReport` and `CheckResult` types. 33 + - `src/commands/test/labeler/pipeline.rs` — `LabelerTarget`, `parse_target`, `run_pipeline`. The driver currently emits `Skipped("not yet implemented (phase N)")` for HTTP, Subscription, and Crypto. 34 + - `src/commands/test/labeler/identity.rs` — complete Phase 3 implementation. 35 + - `src/commands/test/labeler.rs` — wires CLI → pipeline → render. 36 + - No `http.rs` yet; no `tests/fixtures/labeler/http/`. 37 + 38 + --- 39 + 40 + ## External dependency notes 41 + 42 + - **`atrium-api::com::atproto::label::query_labels`** is a generated module containing `ParametersData { cursor: Option<String>, limit: Option<LimitedNonZeroU8<250>>, sources: Option<Vec<Did>>, uri_patterns: Vec<String> }` and `OutputData { cursor: Option<String>, labels: Vec<Label> }`. The wire field name for `uri_patterns` is `uriPatterns` via `#[serde(rename_all = "camelCase")]`. The test calls `query_labels` with `uri_patterns = vec!["*".into()]` as a catch-all pattern. 43 + - **`atrium-api::com::atproto::label::defs::Label`** — the typed label record returned in the `labels` field. Used directly for Phase 4's schema check: any JSON body that fails to decode into `Output` is a `SpecViolation`. The Phase 4 HTTP stage captures the raw bytes of the response before handing them to `atrium_xrpc` so a decode failure's diagnostic can attach the offending JSON as a `NamedSource`. Because `atrium_xrpc::XrpcClient` does the decode internally, capturing the raw bytes requires providing a small wrapper client whose `send_xrpc` impl tees the HTTP body into a `Vec<u8>` before delegating — see Task 2. 44 + - **`atrium-xrpc::XrpcClient` / `atrium-xrpc::HttpClient`** (re-exported through `atrium-xrpc-client::reqwest::ReqwestClient`) — trait surface is small: a single `async fn send_http(..)` that returns bytes. The typed client layer decodes those bytes into `Output`. We exploit this split in Phase 4: the tee-client wraps the inner `reqwest::Client`-backed `ReqwestClient` and clones the response body, making the raw bytes available for diagnostics without re-issuing the request. 45 + 46 + --- 47 + 48 + <!-- START_SUBCOMPONENT_A (tasks 1-3) --> 49 + <!-- START_TASK_1 --> 50 + ### Task 1: `HttpFacts`, `HttpStageOutput`, and stage entry point 51 + 52 + **Verifies:** structural precondition for AC3.1–AC3.6 (types only; tests arrive in Task 3). 53 + 54 + **Files:** 55 + - Create: `src/commands/test/labeler/http.rs` (types and stage signature) 56 + - Modify: `src/commands/test/labeler.rs` to add `pub mod http;` 57 + 58 + **Implementation:** 59 + 60 + ```text 61 + pub struct HttpFacts { 62 + pub first_page: Vec<Label>, 63 + pub first_page_raw_bytes: Arc<[u8]>, 64 + pub first_page_source_url: String, 65 + pub pagination_ok: bool, 66 + } 67 + 68 + pub struct HttpStageOutput { 69 + pub facts: Option<HttpFacts>, 70 + pub results: Vec<CheckResult>, 71 + } 72 + 73 + pub async fn run( 74 + labeler_endpoint: &Url, 75 + http: &dyn RawHttpTee, 76 + ) -> HttpStageOutput; 77 + ``` 78 + 79 + `RawHttpTee` is a new narrow async trait defined here (similar in shape to the `HttpClient` trait from Phase 2, but specialised for returning both the decoded typed response and the raw bytes): 80 + 81 + ```text 82 + #[async_trait] 83 + pub trait RawHttpTee: Send + Sync { 84 + /// Perform a `com.atproto.label.queryLabels` call against the labeler. 85 + /// Returns both the raw response body and, if decoding succeeded, the typed Output. 86 + async fn query_labels( 87 + &self, 88 + cursor: Option<&str>, 89 + ) -> Result<RawXrpcResponse, HttpStageError>; 90 + } 91 + 92 + pub struct RawXrpcResponse { 93 + pub status: StatusCode, 94 + pub raw_body: Arc<[u8]>, 95 + pub decoded: Result<query_labels::Output, serde_json::Error>, 96 + pub source_url: String, 97 + } 98 + ``` 99 + 100 + The trait signature takes only the wire-level knob the Phase 4 stage actually cares about — the opaque `cursor` string — rather than an atrium `Parameters` value. The stage always sends `uriPatterns=*` and `limit=50`; those are implementation details of the tee, not caller-configurable inputs. Using `cursor: Option<&str>` keeps the trait minimal and mirrors what the real implementation will actually use when constructing its URL (see Task 2). 101 + 102 + The trait keeps HTTP IO out of the stage body, so Task 3 can supply a fake implementation. The real implementation (Task 2) issues a direct reqwest GET against `<endpoint>/xrpc/com.atproto.label.queryLabels` with the hard-coded `uriPatterns=*&limit=50` query string and the optional `&cursor=<cursor>` when present. 103 + 104 + Seven CheckResult IDs are declared as `const`s in this file so both the real stage and the fake-driven tests reference them by the same name: 105 + 106 + - `http::endpoint_reachable` 107 + - `http::query_labels_schema_first_page` 108 + - `http::query_labels_empty_advisory` 109 + - `http::query_labels_schema_second_page` 110 + - `http::pagination_round_trip` 111 + - `http::pagination_ignored_cursor` 112 + 113 + Also add an error enum `HttpStageError { Transport(reqwest::Error), DecodeFailed { raw_body: Arc<[u8]>, source: serde_json::Error, source_url: String } }`. The decode path returns `raw_body` so the Phase 4 diagnostic can attach it as `NamedSource`. 114 + 115 + **Verification:** 116 + 117 + Run: `cargo check` 118 + Expected: compiles. 119 + 120 + **Commit:** 121 + ```bash 122 + git add src/commands/test/labeler/http.rs src/commands/test/labeler.rs 123 + git commit -m "Add HTTP stage types and RawHttpTee trait" 124 + ``` 125 + <!-- END_TASK_1 --> 126 + 127 + <!-- START_TASK_2 --> 128 + ### Task 2: `RealHttpTee` implementation and pipeline wiring 129 + 130 + **Verifies:** end-to-end plumbing for AC3.1, AC3.2, AC3.3, AC3.6 (exercised by Task 3 tests). 131 + 132 + **Files:** 133 + - Modify: `src/commands/test/labeler/http.rs` — add the real implementation and the stage body. 134 + - Modify: `src/commands/test/labeler/pipeline.rs` — replace the HTTP-stage `Skipped` stub with a real call. 135 + 136 + **`RealHttpTee` implementation:** 137 + 138 + A small newtype wrapping a `reqwest::Client`, pointed at the labeler's base URL. It performs the `queryLabels` call directly as an HTTP GET to `<endpoint>/xrpc/com.atproto.label.queryLabels?uriPatterns=*&limit=50[&cursor=...]`, reads the full response body into a `Bytes`, converts to `Arc<[u8]>`, and then attempts `serde_json::from_slice::<query_labels::Output>` on the bytes. This intentionally bypasses the atrium typed client for the Phase 4 stage specifically because we need the raw bytes on-hand whether the decode succeeded or failed. 139 + 140 + Rationale: atrium's `AtpServiceClient` does the decode internally and discards the bytes, which would make AC3.4's "diagnostic carries the response JSON as its NamedSource" impossible without re-requesting. Issuing the HTTP GET ourselves and running the same typed decode is structurally cleaner and keeps the atrium typed `Output` / `Label` types for downstream Phase 6 consumption. 141 + 142 + Return values: 143 + - Transport/TLS error → `Err(HttpStageError::Transport(e))`. 144 + - Non-2xx → `Ok(RawXrpcResponse { status, raw_body, decoded: Err(..), .. })` where the decode error reflects "unexpected status" — the stage body (below) classifies this as a `SpecViolation` only if status is 200-range but decode failed; otherwise it is a `NetworkError` distinct from "labeler returned a malformed body". 145 + - 2xx success → attempt `serde_json::from_slice`; populate `decoded` accordingly. 146 + 147 + **Stage body (`http::run`):** 148 + 149 + 1. Emit `http::endpoint_reachable` — tee a zero-pattern GET. Any network/TLS failure → `NetworkError` and return `HttpStageOutput { facts: None, results }` immediately (AC3.6). 4xx/5xx with a readable body → `Pass` on endpoint reachable but subsequent checks may fail. (We still call the real `queryLabels` next.) 150 + 2. Emit `http::query_labels_schema_first_page`: 151 + - Call `tee.query_labels(None)` (no cursor → first page). 152 + - On decode success → `Pass` and capture the typed `Output` + raw bytes for later. 153 + - On decode failure → `SpecViolation` with diagnostic `{ code: "labeler::http::schema_failure", source_code: NamedSource::new(source_url, raw_body), label: span-over-first-error-offset }`. AC3.4. Return `HttpStageOutput { facts: None, results }` — subsequent checks cannot run. 154 + - On transport failure → `NetworkError` + return. AC3.6. 155 + 3. If the first page has an empty `labels` array, emit `http::query_labels_empty_advisory` with severity `Advisory`, summary `"labeler has no published labels"` (AC3.3). Still proceed to pagination check — an empty-but-legal response should still honor `cursor`. 156 + 4. Emit `http::query_labels_schema_second_page` + `http::pagination_round_trip`: 157 + - If the first page had a `cursor` in the response, call `tee.query_labels(Some(&cursor))` again with that cursor. 158 + - Second call decode failure → `SpecViolation` `http::query_labels_schema_second_page`. 159 + - Decode success but `output.labels == first_page.labels` (deep equality) → `SpecViolation` `http::pagination_ignored_cursor`, summary "labeler ignored the cursor parameter". AC3.5. 160 + - Decode success and labels differ (or second page is empty with no cursor → "we hit the end") → `Pass` `http::pagination_round_trip`. 161 + - If the first page had **no** cursor, emit `http::pagination_round_trip` as `Pass` with summary "first page was complete; pagination not exercised". This is not a violation — labelers smaller than the limit legitimately have no cursor. 162 + 5. Build `HttpFacts { first_page, first_page_raw_bytes, first_page_source_url, pagination_ok }` and return. 163 + 164 + **Pipeline driver wiring:** 165 + 166 + Replace the Phase 3 stub `Skipped` for HTTP with: 167 + - If `identity_facts.is_some()` → derive the labeler endpoint URL from `IdentityFacts::labeler_endpoint`. 168 + - Else if `target == LabelerTarget::Endpoint { url, .. }` → use `url` directly (AC1.3 path: endpoint-only invocations still run the HTTP stage). 169 + - Else → HTTP stage is `Skipped("identity stage produced no labeler endpoint")`. 170 + - Build a `RealHttpTee` and call `http::run(endpoint, &tee).await`. Append all results. Propagate `HttpFacts` into the pipeline state for Phase 6's crypto stage to consume later. 171 + 172 + **Verification:** 173 + 174 + Run: `cargo check` and `cargo build` 175 + Expected: compiles. 176 + 177 + Run: `cargo run -- test labeler moderation.bsky.app` (online) 178 + Expected: HTTP section renders with `[OK]` for reachable, schema, pagination round-trip; later stages still `[SKIP]`. 179 + 180 + **Commit:** 181 + ```bash 182 + git add src/commands/test/labeler/http.rs src/commands/test/labeler/pipeline.rs 183 + git commit -m "Implement HTTP stage against real labeler endpoint" 184 + ``` 185 + <!-- END_TASK_2 --> 186 + 187 + <!-- START_TASK_3 --> 188 + ### Task 3: Snapshot tests for the HTTP stage 189 + 190 + **Verifies:** AC3.1, AC3.2, AC3.3, AC3.4, AC3.5, AC3.6. 191 + 192 + **Files:** 193 + - Create: `tests/fixtures/labeler/http/` — one subdirectory per case, each containing one or more recorded response JSON files: 194 + - `healthy/first_page.json`, `healthy/second_page.json` — two distinct non-empty pages, with a `cursor` on the first and none on the second. AC3.1, AC3.2. 195 + - `empty/first_page.json` — `{ "cursor": null, "labels": [] }`. AC3.3. 196 + - `malformed/first_page.json` — JSON that fails `serde_json::from_slice::<Output>` (e.g., `"labels"` is `42` instead of an array). AC3.4. 197 + - `ignored_cursor/first_page.json`, `ignored_cursor/second_page.json` — two pages with identical `labels` arrays. AC3.5. 198 + - `unreachable/` — empty directory, tests use a fake that returns transport error without looking anything up. AC3.6. 199 + - Create: `tests/labeler_http.rs` — integration-level snapshot tests. Defines a `FakeRawHttpTee { responses: Vec<RawXrpcResponse>, transport_error: bool }` that returns queued responses in order and ignores the request parameters except for the presence of `cursor`. 200 + - Update: `tests/snapshots/` — capture the new snapshots with `cargo insta test --accept`. 201 + 202 + **Test list:** 203 + 204 + - `http_healthy_renders_all_ok` — AC3.1 + AC3.2. Snapshot shows `[OK]` for reachable, schema_first_page, pagination_round_trip. 205 + - `http_empty_labeler_emits_advisory` — AC3.3. Snapshot shows `[WARN]` on `http::query_labels_empty_advisory` and `[OK]` for the schema + pagination checks. Exit code remains 0 (the rendered report's summary line includes 1 advisory). 206 + - `http_malformed_schema_fails_with_source_span` — AC3.4. Snapshot shows `[FAIL] http::query_labels_schema_first_page` with the miette diagnostic rendering the malformed JSON as source (matching the raw bytes fixture). Assert the rendered string contains `"labels"`. 207 + - `http_ignored_cursor_fails` — AC3.5. Snapshot shows `[FAIL] http::pagination_ignored_cursor` with summary "labeler ignored the cursor parameter". 208 + - `http_transport_error_renders_network_error` — AC3.6. Fake tee's `query_labels` returns `HttpStageError::Transport(_)`. Snapshot shows `[NET] http::endpoint_reachable`, exit 0, summary footer mentions 1 network error. Downstream identity stage in this test uses the `healthy_plc` fixture from Phase 3, so identity is `[OK]` — asserting AC3.6 does not cascade into schema failures. 209 + 210 + **Verification:** 211 + 212 + Run: `cargo insta test --accept` (first run) 213 + Run: `cargo test` (subsequent runs) 214 + Expected: all 5 new tests pass. 215 + 216 + **Commit:** 217 + ```bash 218 + git add tests/labeler_http.rs tests/fixtures/labeler/http tests/snapshots 219 + git commit -m "Add HTTP stage snapshot tests covering AC3" 220 + ``` 221 + <!-- END_TASK_3 --> 222 + <!-- END_SUBCOMPONENT_A --> 223 + 224 + <!-- START_TASK_4 --> 225 + ### Task 4: Final phase verification 226 + 227 + **Steps:** 228 + 1. `cargo fmt --check` — clean. 229 + 2. `cargo clippy -- -D warnings` — clean. 230 + 3. `cargo test` — all Phase 3 and Phase 4 tests pass (17 total from both phases). 231 + 4. (Optional, online) `cargo run -- test labeler moderation.bsky.app` — HTTP section populated with `[OK]`s. 232 + 233 + **Done when:** steps 1–3 pass. 234 + <!-- END_TASK_4 --> 235 + 236 + --- 237 + 238 + ## Phase 4 done-when checklist 239 + 240 + - `src/commands/test/labeler/http.rs` exists and exports `HttpFacts`, `HttpStageOutput`, `run`, the `RawHttpTee` trait, and a real `RealHttpTee` implementation. 241 + - Pipeline driver calls `http::run` between the identity stage and the still-stubbed subscription stage. 242 + - All five Phase 4 snapshot tests pass, verifying AC3.1 through AC3.6. 243 + - Running the binary against a real labeler shows a populated HTTP section. 244 + - `cargo clippy -- -D warnings` and `cargo fmt --check` clean. 245 + - Commit history shows the three atomic commits from this phase.
+287
docs/implementation-plans/2026-04-13-test-labeler/phase_05.md
··· 1 + # Test-labeler Implementation Plan — Phase 5 2 + 3 + **Goal:** Add the subscription stage — a two-connection strategy against `com.atproto.label.subscribeLabels` with a configurable per-connection time budget. First connection does backfill with `cursor=0`; a second connection runs the live-tail check if the backfill did not also exercise the live tail. 4 + 5 + **Architecture:** New module `src/commands/test/labeler/subscription.rs`. Contains the two-CBOR-block frame parser (header + payload), the `BackfillOutcome` and `LiveTailOutcome` enums, the `SubscriptionFacts`, and the stage `run` function. The stage is written against a narrow `WebSocketClient` trait so unit tests replay recorded frame sequences from a fake. The real implementation uses `tokio-tungstenite` with rustls native-roots. `ciborium` decodes each of the two CBOR blocks; the stage validates header `op`/`t` fields and routes `#labels` and `#info` message types. 6 + 7 + **Tech Stack:** adds `tokio-tungstenite` 0.29 (`rustls-tls-native-roots`), `ciborium` 0.2 (strict-decoding mode), `futures-util` (for `StreamExt`), `bytes`. No new dev-dependencies. 8 + 9 + **Scope:** Phase 5 of 6. 10 + 11 + **Codebase verified:** 2026-04-13 (post Phase 4). `src/commands/test/labeler/subscription.rs` does **not** yet exist. Pipeline driver calls Phase 4 HTTP stage and still emits `Skipped("not yet implemented (phase 5)")` for subscription. 12 + 13 + --- 14 + 15 + ## Acceptance Criteria Coverage 16 + 17 + ### test-labeler.AC4: Subscription-layer checks (`subscribeLabels`) 18 + - **test-labeler.AC4.1 Success (backfill completes within budget):** A labeler whose backfill flushes within `--subscribe-timeout` followed by a ≥500ms idle gap produces a `Pass` backfill check and an implicit `Pass` live-tail check ("live tail observed after backfill completed"). 19 + - **test-labeler.AC4.2 Success (backfill exceeds budget):** A labeler whose backfill is still producing frames at the end of the budget produces an `Advisory` backfill check and triggers a second connection for the live-tail check. A clean live-tail connection (any frames decoded, or connection held open with no decode errors) produces a `Pass` live-tail check. 20 + - **test-labeler.AC4.3 Success (empty labeler):** A labeler producing no frames at all during the budget produces an `Advisory` backfill check ("labeler has no published labels") and a `Skipped` live-tail check with the same reason. 21 + - **test-labeler.AC4.4 Failure:** A frame that fails the two-CBOR-block decode, or a `#labels` payload that fails lexicon decoding, produces a `SpecViolation` `CheckResult` independent of the backfill/live-tail pass/fail dimension. The diagnostic carries the offending frame bytes as `NamedSource`. 22 + - **test-labeler.AC4.5 Failure:** A WebSocket handshake that succeeds but whose first frame has `op: -1` (error) with a malformed `#info` payload produces a `SpecViolation` on decode. 23 + - **test-labeler.AC4.6 NetworkError:** An unreachable WebSocket endpoint or a handshake TLS failure produces a `NetworkError` rather than a subscription `Fail`. 24 + - **test-labeler.AC4.7 Edge:** Passing `--subscribe-timeout 0` (or another invalid duration) produces a clap parse error; values below a reasonable floor (e.g., 1s) are rejected with a helpful message. 25 + 26 + AC4.7 is structurally in place since Phase 1 (the `parse_subscribe_timeout` value parser rejects sub-1s). Phase 5 adds an **integration test** asserting the clap rejection still fires and stderr mentions "at least 1 second". 27 + 28 + --- 29 + 30 + ## Codebase state at start of phase 31 + 32 + Phase 1 → Phase 4 complete. `subscription.rs` does not exist. The Phase 1 `LabelerCmd::subscribe_timeout` flag with `parse_subscribe_timeout` value parser and the 1-second floor remain in `src/commands/test/labeler.rs`; Phase 5 consumes the already-parsed `Duration` through `LabelerOptions::subscribe_timeout`. 33 + 34 + --- 35 + 36 + ## External dependency notes 37 + 38 + - **`tokio-tungstenite` 0.29.0**: 39 + - `tokio_tungstenite::connect_async(request)` returns `Result<(WebSocketStream<MaybeTlsStream<TcpStream>>, Response), WsError>`. Build the request with `tokio_tungstenite::tungstenite::client::IntoClientRequest::into_client_request(&url)` so custom headers are possible even though Phase 5 does not need any. 40 + - `WebSocketStream` implements `Stream<Item = Result<Message, WsError>>` via `futures-util::StreamExt`. Each `Message::Binary(Bytes)` is a full WebSocket binary frame; text frames are ignored with a `SpecViolation` diagnostic. 41 + - Features: enable `rustls-tls-native-roots` (default feature set for 0.29 includes the non-TLS transports only). Do **not** enable the default `connect` feature with native-tls. 42 + - Build with `url::Url::parse(...)?` then mutate the scheme from `https` to `wss`. (`com.atproto.label.subscribeLabels` lives under `wss://<endpoint>/xrpc/com.atproto.label.subscribeLabels[?cursor=0]`.) 43 + - **`ciborium` 0.2.2**: 44 + - `ciborium::de::from_reader::<Value, _>(&mut &bytes[..])` decodes one top-level CBOR item and returns the remainder of the slice (the `Read` impl for `&[u8]` advances a cursor). To decode **two sequential** blocks from a single WebSocket frame, wrap the frame bytes in a `&mut &[u8]` cursor and call `from_reader` twice; the cursor advances naturally. 45 + - We also use `ciborium::Value` for the header and decode the payload to a concrete lexicon type via `ciborium::de::from_reader::<PayloadStruct, _>` for the `#labels` case. Two possible approaches: 46 + 1. Decode the header into `HeaderData { op: i64, t: Option<String> }`, then pick a decoder for the payload based on `header.t`. 47 + 2. Decode header as `ciborium::Value`, match on the `t` map entry manually, then decode the payload. 48 + - Prefer approach 1 with a typed `HeaderData` struct; it is the pattern atrium uses. 49 + - **`futures-util` 0.3** — needed for `StreamExt::next()` on the WebSocket stream. Add as a dependency; nothing else in the crate uses it yet. 50 + - **`bytes` 1.x** — `tokio-tungstenite` re-exports this. The raw frame payload arrives as `bytes::Bytes`; convert with `.into()` to `Vec<u8>`/`Arc<[u8]>` for diagnostics. 51 + - **atproto subscribeLabels frame format**: 52 + - Each WebSocket binary frame is **two concatenated CBOR items**: a header map and a payload map. 53 + - Header map: `{ "op": Int, "t": Option<Text> }`. 54 + - `op == 1` → message frame; `t` is a `#`-prefixed lexicon type NSID fragment like `"#labels"` or `"#info"`. 55 + - `op == -1` → error frame; payload has shape `{ "error": Text, "message": Option<Text> }`. AC4.5 specifically requires us to decode such frames and report a malformed one as `SpecViolation`. 56 + - Payload for `#labels`: `{ "seq": Int, "labels": Array<Label> }` where each label has the canonical atproto label shape. For Phase 5 decoding, we reuse `atrium_api::com::atproto::label::defs::Label` via ciborium/serde — `Label` already derives `serde::Deserialize`, so `ciborium::de::from_reader::<SubscribeLabelsPayload, _>` works out of the box. 57 + - Payload for `#info`: `{ "name": Text, "message": Option<Text> }`. 58 + 59 + --- 60 + 61 + <!-- START_TASK_1 --> 62 + ### Task 1: Phase 5 dependencies 63 + 64 + **Files:** 65 + - Modify: `Cargo.toml` `[dependencies]` 66 + 67 + **Implementation:** 68 + 69 + Append to `[dependencies]`: 70 + 71 + ```toml 72 + bytes = "1.10" 73 + ciborium = "0.2" 74 + futures-util = { version = "0.3", default-features = false, features = ["std"] } 75 + tokio-tungstenite = { version = "0.29", default-features = false, features = ["connect", "rustls-tls-native-roots"] } 76 + ``` 77 + 78 + **Verification:** 79 + 80 + Run: `cargo check` 81 + Expected: new crates resolve. 82 + 83 + **Commit:** 84 + ```bash 85 + git add Cargo.toml Cargo.lock 86 + git commit -m "Add Phase 5 dependencies (tokio-tungstenite, ciborium)" 87 + ``` 88 + <!-- END_TASK_1 --> 89 + 90 + <!-- START_SUBCOMPONENT_A (tasks 2-4) --> 91 + <!-- START_TASK_2 --> 92 + ### Task 2: Frame parser and `SubscriptionFacts` types 93 + 94 + **Files:** 95 + - Create: `src/commands/test/labeler/subscription.rs` (types + parser only) 96 + - Modify: `src/commands/test/labeler.rs` add `pub mod subscription;` 97 + 98 + **Implementation:** 99 + 100 + Types: 101 + 102 + - `pub struct FrameHeader { op: i64, t: Option<String> }` with `#[serde(rename_all = "camelCase")]` (actually the raw atproto keys are lowercase already so no rename needed). 103 + - `pub enum DecodedFrame { Labels(SubscribeLabelsPayload), Info(SubscribeInfoPayload), Error(SubscribeErrorPayload) }`. 104 + - `pub struct SubscribeLabelsPayload { seq: i64, labels: Vec<atrium_api::com::atproto::label::defs::Label> }`. 105 + - `pub struct SubscribeInfoPayload { name: String, message: Option<String> }`. 106 + - `pub struct SubscribeErrorPayload { error: String, message: Option<String> }`. 107 + - `pub enum FrameDecodeError { HeaderDecode { raw: Arc<[u8]>, cause: String }, PayloadDecode { header: FrameHeader, raw: Arc<[u8]>, cause: String }, UnknownMessageType { t: String, raw: Arc<[u8]> }, TextFrameRejected(Arc<[u8]>) }`. 108 + - `pub fn decode_frame(bytes: &[u8]) -> Result<DecodedFrame, FrameDecodeError>`: 109 + 1. Wrap the bytes in a `&mut &[u8]` cursor. 110 + 2. `ciborium::de::from_reader::<FrameHeader, _>(&mut cursor)` → header, or `HeaderDecode` error on failure. 111 + 3. Based on `header.op` and `header.t`: 112 + - `op == 1, t == Some("#labels")` → `ciborium::de::from_reader::<SubscribeLabelsPayload, _>(&mut cursor)` → `DecodedFrame::Labels` or `PayloadDecode` error. 113 + - `op == 1, t == Some("#info")` → similar for `SubscribeInfoPayload`. 114 + - `op == -1` → `ciborium::de::from_reader::<SubscribeErrorPayload, _>(&mut cursor)` → `DecodedFrame::Error` or `PayloadDecode` error. AC4.5 hits this path. 115 + - Any other combination → `UnknownMessageType`. 116 + 117 + - `pub struct SubscriptionFacts { pub backfill_outcome: BackfillOutcome, pub live_tail_outcome: Option<LiveTailOutcome>, pub decode_errors: Vec<FrameDecodeError> }`. 118 + - `pub enum BackfillOutcome { CompletedWithIdleGap { frames_observed: usize, idle_gap_ms: u64 }, ExceededBudget { frames_observed: usize }, NoFramesWithinBudget }`. 119 + - `pub enum LiveTailOutcome { FromBackfill, CleanHold { frames_observed: usize }, SkippedEmpty }`. 120 + 121 + **Unit tests (in-file `#[cfg(test)]`):** 122 + - `decode_labels_frame_valid` — construct a valid two-block byte buffer by serializing `FrameHeader { op: 1, t: Some("#labels") }` and `SubscribeLabelsPayload { seq: 0, labels: vec![] }` with `ciborium::ser::into_writer`, concatenating the two, then passing to `decode_frame`. Expect `DecodedFrame::Labels` with `seq == 0` and empty labels. 123 + - `decode_info_frame_valid` — similar for `#info`. 124 + - `decode_error_frame_valid` — op=-1, valid error payload. Expect `DecodedFrame::Error`. 125 + - `decode_frame_header_decode_failure` — feed 3 random bytes. Expect `HeaderDecode`, raw bytes preserved. 126 + - `decode_frame_payload_decode_failure` — valid header for `#labels` followed by `0xff` garbage. Expect `PayloadDecode { header, raw, .. }`. 127 + - `decode_frame_unknown_message_type` — valid header with `t = Some("#futureType")`. Expect `UnknownMessageType`. 128 + - `decode_frame_error_payload_malformed` — op=-1 but payload bytes do not decode as `SubscribeErrorPayload`. Expect `PayloadDecode`. This is the programmatic shape of AC4.5. 129 + 130 + **Verification:** 131 + 132 + Run: `cargo test` 133 + Expected: all new unit tests pass. 134 + 135 + **Commit:** 136 + ```bash 137 + git add src/commands/test/labeler/subscription.rs src/commands/test/labeler.rs 138 + git commit -m "Add subscribeLabels frame parser and types" 139 + ``` 140 + <!-- END_TASK_2 --> 141 + 142 + <!-- START_TASK_3 --> 143 + ### Task 3: `WebSocketClient` trait, real impl, and stage `run` 144 + 145 + **Files:** 146 + - Modify: `src/commands/test/labeler/subscription.rs` — add the IO trait, the real impl, and the stage body. 147 + - Modify: `src/commands/test/labeler/pipeline.rs` — replace the subscription stage stub with a real call. 148 + 149 + **Implementation:** 150 + 151 + ```text 152 + #[async_trait] 153 + pub trait WebSocketClient: Send + Sync { 154 + async fn connect( 155 + &self, 156 + url: &Url, 157 + ) -> Result<Box<dyn FrameStream>, SubscriptionStageError>; 158 + } 159 + 160 + #[async_trait] 161 + pub trait FrameStream: Send { 162 + async fn next_frame(&mut self) -> Option<Result<Vec<u8>, SubscriptionStageError>>; 163 + async fn close(&mut self); 164 + } 165 + ``` 166 + 167 + - `RealWebSocketClient` — uses `tokio_tungstenite::connect_async(request)` internally. `next_frame` filters text frames (→ error), ping/pong frames (discard), and close frames (→ `None`). Binary frames yield `Vec<u8>`. 168 + 169 + Stage `run`: 170 + 171 + ```text 172 + pub async fn run( 173 + labeler_endpoint: &Url, 174 + ws: &dyn WebSocketClient, 175 + budget_per_connection: Duration, 176 + ) -> SubscriptionStageOutput; 177 + ``` 178 + 179 + Logic: 180 + 181 + 1. Build `backfill_url = labeler_endpoint + "/xrpc/com.atproto.label.subscribeLabels?cursor=0"` with scheme `wss`. 182 + 2. **Backfill connection.** Start a `tokio::time::timeout(budget_per_connection, ...)` around the frame-draining loop: 183 + - `first_frame_at: Option<Instant> = None` 184 + - `last_frame_at: Option<Instant> = None` 185 + - `frames_observed: usize = 0` 186 + - Loop: `tokio::select!` on `next_frame` and a `tokio::time::sleep_until(last_frame_at + 500ms)` once any frame has arrived. 187 + - Frame arrived → decode via `decode_frame`. On `Ok(DecodedFrame::Labels | Info | Error)`, update counters. On `Err(..)`, push to `decode_errors` (AC4.4) and continue reading; decode errors do not abort the loop. 188 + - Idle gap timer fired → set `backfill_outcome = CompletedWithIdleGap { frames_observed, idle_gap_ms: 500 }`, mark live tail as `FromBackfill`, break. 189 + - Top-level timeout fired → if `frames_observed > 0`, `backfill_outcome = ExceededBudget { frames_observed }`; else `NoFramesWithinBudget`. Break. 190 + 3. **Live-tail connection** (only if `live_tail_outcome.is_none()`): 191 + - For `ExceededBudget`, issue a second connection to the same URL _without_ `cursor=0` (live-only). Run for `budget_per_connection`, counting frames. Any decode errors accumulate into `decode_errors`. At end-of-budget → `CleanHold { frames_observed }`. AC4.2. 192 + - For `NoFramesWithinBudget`, set `live_tail_outcome = SkippedEmpty`. AC4.3. Do not issue a second connection. 193 + 4. Emit `CheckResult`s: 194 + - `subscription::backfill` — `Pass` for `CompletedWithIdleGap`, `Advisory` for `ExceededBudget`, `Advisory` for `NoFramesWithinBudget` (summary differentiates the two). 195 + - `subscription::live_tail` — `Pass` for `FromBackfill` or `CleanHold`, `Skipped` for `SkippedEmpty` with reason `"labeler has no published labels"`. 196 + - One additional `SpecViolation` `CheckResult` per unique `FrameDecodeError` variant (dedupe by variant kind so we do not flood the report when thousands of frames fail). AC4.4, AC4.5. Each diagnostic carries the raw frame bytes as `NamedSource` and a `#[label("frame decode failure")]` span covering the first byte. 197 + 5. Return the `SubscriptionFacts`. 198 + 199 + If the initial `connect_async` fails with a transport/TLS error, emit a single `subscription::endpoint_reachable` `CheckResult` with severity `NetworkError` (AC4.6) and return with no facts. 200 + 201 + **Pipeline driver wiring:** Replace the subscription `Skipped` stub with a real call. Route the `subscribe_timeout` from `LabelerOptions::subscribe_timeout` into `run`. 202 + 203 + **Verification:** 204 + 205 + Run: `cargo build` 206 + Expected: compiles. 207 + 208 + Run: `cargo run -- test labeler moderation.bsky.app --subscribe-timeout 3s` (online) 209 + Expected: subscription section populated; backfill and live-tail checks render `[OK]` or `[WARN]` depending on the live labeler's current behaviour. 210 + 211 + **Commit:** 212 + ```bash 213 + git add src/commands/test/labeler/subscription.rs src/commands/test/labeler/pipeline.rs 214 + git commit -m "Implement subscription stage with two-connection strategy" 215 + ``` 216 + <!-- END_TASK_3 --> 217 + 218 + <!-- START_TASK_4 --> 219 + ### Task 4: Snapshot tests for the subscription stage 220 + 221 + **Verifies:** AC4.1, AC4.2, AC4.3, AC4.4, AC4.5, AC4.6, AC4.7. 222 + 223 + **Files:** 224 + - Create: `tests/fixtures/labeler/subscription/` — one subdirectory per case: 225 + - `backfill_complete/frames.bin` — pre-encoded CBOR binary frame stream (sequence of two-CBOR-block binary frames) that ends with a 500ms idle gap before the budget runs out. The fixture is generated by a `tests/support/encode_frames.rs` helper at fixture-capture time, **not** at test run time (fixtures are static bytes; the generator is a `#[ignore]`d dev binary). AC4.1. 226 + - `backfill_exceeds_budget/frames.bin` — continuous frame stream with no idle gap within the budget. AC4.2. 227 + - `empty_stream/frames.bin` — 0-byte fixture; fake simulates "no frames ever arrive". AC4.3. 228 + - `malformed_frame/frames.bin` — 2 valid frames followed by 3 random bytes. AC4.4. 229 + - `error_frame_malformed/frames.bin` — one frame whose header has `op == -1` and whose payload cannot decode as `SubscribeErrorPayload`. AC4.5. 230 + - `unreachable/` — empty. Fake's `connect` returns `SubscriptionStageError::Transport(_)`. AC4.6. 231 + - Create: `tests/labeler_subscription.rs` — snapshot tests. 232 + - Create: `tests/support/encode_frames.rs` — a tiny `#[bin]` target under `[[bin]]` in `Cargo.toml` is overkill; instead, place the generator as a `#[test] #[ignore]` function in the integration test file. When run with `cargo test --test labeler_subscription -- --ignored gen_fixtures`, it writes all `frames.bin` files from in-memory constructions. Document this in a top comment. 233 + 234 + **Test fakes:** 235 + 236 + - `FakeWebSocketClient { scripts: Vec<FakeScript> }` where `FakeScript = { frames: Vec<Vec<u8>>, inter_frame_delay: Duration, final_wait: Option<Duration>, transport_error: bool }`. Each call to `.connect` pops the next script. The returned `FakeFrameStream` uses `tokio::time::sleep` (**not** `std::thread::sleep`) for `inter_frame_delay` between frames and `final_wait` after the last frame (used to simulate the 500ms idle gap after backfill completes, or a continuing stream exceeding the budget). Both the fake's sleeps and the stage's `tokio::time::timeout`/`sleep_until` calls must run on the same paused clock. 237 + - **Runtime flavor:** every `#[tokio::test]` function in `labeler_subscription.rs` must be annotated as `#[tokio::test(flavor = "current_thread", start_paused = true)]`. This both matches the production binary's runtime (`current_thread` from Phase 1) and ensures `tokio::time::pause` is active for the whole test, so `tokio::time::advance(...)` drives both the fake's inter-frame sleeps and the stage's budget timer deterministically. Without `start_paused = true`, any `tokio::time::sleep` that fires before the first `advance` will wall-clock block. 238 + 239 + **Test list:** 240 + 241 + - `backfill_completes_within_budget_passes` — AC4.1. Script: 3 frames, inter-frame 10ms, then a 600ms idle gap. Subscribe timeout 2s. Assert `backfill: [OK]` and `live_tail: [OK]` with "observed after backfill completed" reason. 242 + - `backfill_exceeds_budget_triggers_live_tail` — AC4.2. Script 1: 20 frames, inter-frame 50ms → the 2s budget expires while still producing frames. Script 2: 1 frame then 1s idle, for the live-tail connection. Assert `backfill: [WARN] ExceededBudget`, `live_tail: [OK] CleanHold`. 243 + - `empty_stream_advisories` — AC4.3. Script: no frames, budget runs out. Assert `backfill: [WARN]` with "no published labels" reason and `live_tail: [SKIP]` with the same reason. Assert exit code is 0. 244 + - `malformed_frame_emits_spec_violation` — AC4.4. Script: 1 valid frame, 1 malformed. Assert one `[FAIL] subscription::frame_decode` `CheckResult` with source bytes equal to the malformed fixture. 245 + - `error_frame_malformed_payload_spec_violation` — AC4.5. 246 + - `unreachable_endpoint_network_error` — AC4.6. Assert severity `NetworkError`, summary calls out one network error, exit code 0. 247 + - `subscribe_timeout_below_floor_rejected` — AC4.7. Preferred implementation: unit test on the `parse_subscribe_timeout` helper in `labeler.rs`: 248 + - Assert `parse_subscribe_timeout("500ms")` returns `Err(_)` with a message containing `"at least 1 second"` (the floor check fires because `humantime` parses `500ms` successfully, then `parse_subscribe_timeout` rejects it as sub-second). 249 + - Assert `parse_subscribe_timeout("0")` returns `Err(_)` (any error is acceptable — `humantime::parse_duration` rejects bare `0` at parse time with `"time unit needed"`, before the floor check runs). 250 + - Optional end-to-end complement using `assert_cmd`: `Command::cargo_bin("atproto-devtool").args(["test", "labeler", "--subscribe-timeout", "500ms", "not-a-real-thing"])` → exit code 2, stderr contains `"at least 1 second"`. Deterministic because clap rejects the argument before any network IO runs. 251 + 252 + **Verification:** 253 + 254 + Run: `cargo insta test --accept` (first run) 255 + Run: `cargo test` (subsequent runs) 256 + Expected: all 7 new tests pass. 257 + 258 + **Commit:** 259 + ```bash 260 + git add tests/labeler_subscription.rs tests/fixtures/labeler/subscription tests/snapshots 261 + git commit -m "Add subscription stage snapshot tests covering AC4" 262 + ``` 263 + <!-- END_TASK_4 --> 264 + <!-- END_SUBCOMPONENT_A --> 265 + 266 + <!-- START_TASK_5 --> 267 + ### Task 5: Final phase verification 268 + 269 + **Steps:** 270 + 1. `cargo fmt --check` — clean. 271 + 2. `cargo clippy -- -D warnings` — clean. 272 + 3. `cargo test` — all Phase 3/4/5 tests pass. 273 + 4. (Optional, online) `cargo run -- test labeler moderation.bsky.app --subscribe-timeout 5s` — populated subscription section. 274 + 275 + **Done when:** steps 1–3 pass. 276 + <!-- END_TASK_5 --> 277 + 278 + --- 279 + 280 + ## Phase 5 done-when checklist 281 + 282 + - `src/commands/test/labeler/subscription.rs` exists, compiles, and exports the frame parser, the `WebSocketClient` and `FrameStream` traits, the `RealWebSocketClient` impl, the `SubscriptionFacts` struct, and the `run` entry point. 283 + - Pipeline driver calls `subscription::run` with the parsed `subscribe_timeout`. 284 + - All 7 Phase 5 tests pass (6 snapshot + 1 argument parser unit test). 285 + - Running against a real labeler populates the subscription section. 286 + - `cargo clippy -- -D warnings` and `cargo fmt --check` clean. 287 + - Commit history shows the four atomic commits from this phase.
+393
docs/implementation-plans/2026-04-13-test-labeler/phase_06.md
··· 1 + # Test-labeler Implementation Plan — Phase 6 2 + 3 + **Goal:** Ship the final stage — signature verification on the labels collected by the HTTP stage — and close out all outstanding acceptance criteria. This phase adds did:plc key-history fallback for rotated-out keys, the DRISL-CBOR canonicalizer, end-to-end snapshot tests covering mixed-failure runs, and a README. The tool is feature-complete at the end of this phase. 4 + 5 + **Architecture:** New module `src/commands/test/labeler/crypto.rs`. Extends `src/common/identity.rs` with a `plc_history` helper (PLC operation log fetch, lazy). The crypto stage consumes `IdentityFacts` and `HttpFacts`, canonicalizes each label to DRISL-CBOR, SHA-256 prehashes the bytes, dispatches to `k256::ecdsa::VerifyingKey::verify_prehash` or `p256::ecdsa::VerifyingKey::verify_prehash` via `AnyVerifyingKey`, and on any per-label failure lazily fetches the PLC audit log to retry against each historic verification method for the `#atproto_label` key slot. Emits a per-label `CheckResult` for failures only (passes are rolled up into a single summary `CheckResult`). 6 + 7 + The phase also finalises AC6 — exit-code semantics, `NO_COLOR` rendering, `--verbose` tracing, and the bootstrap/usage `exit 2` path — and adds the end-to-end `insta` suite that covers mixed-failure runs across all four stages. 8 + 9 + **Tech Stack:** adds `sha2` 0.11 to `[dependencies]` and `assert_cmd` 2 to `[dev-dependencies]`. All other crates are already in the graph. 10 + 11 + **Scope:** Phase 6 of 6. 12 + 13 + **Codebase verified:** 2026-04-13 (post Phase 5). `src/commands/test/labeler/crypto.rs` does **not** yet exist. `common::identity` has no PLC audit-log helper yet. Pipeline driver calls identity, HTTP, and subscription stages; crypto remains `Skipped("not yet implemented (phase 6)")`. 14 + 15 + --- 16 + 17 + ## Acceptance Criteria Coverage 18 + 19 + ### test-labeler.AC5: Crypto-layer checks (signature verification with key rotation) 20 + - **test-labeler.AC5.1 Success:** Every label in `HttpFacts::first_page` whose signature verifies against the current declared signing key produces a `Pass` on the crypto rollup with no historic-key fetch performed. 21 + - **test-labeler.AC5.2 Success (rotated-out key, did:plc):** When at least one label fails against the current key, the crypto stage fetches the PLC audit log and retries verification against each historic verification-method entry for the labeler's signing-key slot. Labels that verify against a historic key are accepted, the rollup is `Pass`, and an `Advisory` is attached listing the count and key ids involved. 22 + - **test-labeler.AC5.3 Success (empty labeler):** A labeler with zero published labels results in the crypto stage being `Skipped("labeler published no labels; nothing to verify")` and does not affect the exit code. 23 + - **test-labeler.AC5.4 Failure (current-key mismatch, no history):** A did:web labeler whose labels do not verify against the current key produces a `Fail` rollup with a diagnostic listing the current key id and stating that did:web provides no rotation history. 24 + - **test-labeler.AC5.5 Failure (current and historic mismatch, did:plc):** A did:plc labeler whose labels verify against neither the current nor any historic key produces a `Fail` rollup with a diagnostic listing every key id that was tried. 25 + - **test-labeler.AC5.6 Failure (canonicalization mismatch surfaced):** A label whose serialized-for-signing bytes cannot be produced (e.g., invalid CBOR in the fetched record) produces a per-label `Fail` with a distinct diagnostic code from signature mismatch, so misbehaviour of the canonicalizer is distinguishable from a genuine signature problem. 26 + - **test-labeler.AC5.7 NetworkError:** A failure to fetch the PLC audit log (`plc.directory` unreachable) during the historic-key path produces a `NetworkError` result and prevents the stage from issuing a false `Fail`; the labels that failed against the current key are reported as `Fail` only if history could not be consulted. 27 + 28 + ### test-labeler.AC6: Cross-cutting reporting and exit semantics 29 + - **test-labeler.AC6.1 Success:** A labeler that passes every stage produces a `LabelerReport` whose rendered output shows each stage with `[OK]` glyphs, a header listing the target and resolved DID/PDS, a summary with all-zero failure/advisory counts, and exits `0`. 30 + - **test-labeler.AC6.2 Success:** A run containing at least one `SpecViolation` exits `1`, and the summary footer shows the count broken down by severity. 31 + - **test-labeler.AC6.3 Success:** A run containing only `NetworkError` results (no `SpecViolation`s) exits `0`, with the network-error count called out separately in the summary. 32 + - **test-labeler.AC6.4 Success:** A run containing only `Advisory` results exits `0`. 33 + - **test-labeler.AC6.5 Success:** `Skipped` checks are rendered with a reason string taken from the stage's `Outcome::Skipped` variant, so users can see why a check was not run (missing DID, upstream failure, empty labeler, not-yet-implemented). 34 + - **test-labeler.AC6.6 Success:** Setting `NO_COLOR=1` suppresses ANSI color codes in the rendered output while keeping ASCII glyphs (`[OK]`/`[FAIL]`/`[SKIP]`/`[WARN]`) and miette diagnostic layout intact. 35 + - **test-labeler.AC6.7 Success:** `--verbose` raises the tracing filter, causing stage IO (HTTP requests, WebSocket frames, PLC log fetch) to be logged at DEBUG to stderr without affecting the rendered report or exit code. 36 + - **test-labeler.AC6.8 Edge:** The tool's own unrecoverable bootstrap failures (invalid CLI args, panics caught by the miette handler, tokio runtime failure) exit `2` — distinct from a `SpecViolation`-driven `1`. 37 + 38 + --- 39 + 40 + ## Codebase state at start of phase 41 + 42 + - `src/commands/test/labeler/{report,pipeline,identity,http,subscription}.rs` — all present. 43 + - `src/commands/test/labeler/crypto.rs` — does not exist. 44 + - `src/common/identity.rs` — Phase 2's module. Does not yet have a PLC history helper. 45 + - No `README.md` yet. 46 + - Pipeline driver is currently missing the crypto-stage call; it emits `Skipped("not yet implemented (phase 6)")`. 47 + 48 + --- 49 + 50 + ## External dependency notes 51 + 52 + - **`sha2` 0.11.0** — `use sha2::{Digest, Sha256};` then `let digest: [u8; 32] = Sha256::digest(canon_bytes).into();`. No other features needed. 53 + - **`k256::ecdsa::VerifyingKey`** and **`p256::ecdsa::VerifyingKey`** — verify via `ecdsa::signature::hazmat::PrehashVerifier::verify_prehash(&self, prehash: &[u8], sig: &Signature) -> Result<(), SignatureError>`. Import the trait at call site. `Signature` is `k256::ecdsa::Signature` / `p256::ecdsa::Signature` respectively; both parse from raw (`r || s`, 64 bytes) via `Signature::from_slice(..)`. 54 + - **DRISL-CBOR canonicalization** (the "strict CBOR subset required by atproto for cryptographically signed data"): 55 + - Encode maps with keys sorted lexicographically as byte strings (RFC 7049 / RFC 8949 deterministic encoding rules), not integers-first. 56 + - Encode integers in minimum-length form (no leading zeros; prefer the shortest argument byte count). 57 + - Disallow floats and disallow indefinite-length items. 58 + - Encode tags only when part of the source document. 59 + - Strip the `sig` field from the label record before encoding; the signature byte string is over the record with `sig` absent. 60 + - `ciborium` does not provide a guaranteed-deterministic encoder out of the box, so Phase 6 implements a small canonicalizer on top of `ciborium::Value`: parse the label into `ciborium::Value`, walk the tree (collecting map entries, sorting keys), reject floats and indefinite-length items, and re-serialise with `ciborium::ser::into_writer` in a mode that relies on the canonicalized `Value` tree already being in correct shape. Deterministic byte output is then asserted against a golden fixture from a real labeler's production label. 61 + - **PLC operation log audit endpoint** — `https://plc.directory/{did}/log/audit` returns a JSON array of operation objects. Each operation has `operation.verificationMethods.<fragment>` when the fragment is present in that historic revision, plus a `nullified: bool` flag. The crypto stage walks the array newest-to-oldest and extracts each distinct multikey string for the `atproto_label` fragment. 62 + - **did:web key history** — explicitly unavailable; the crypto stage reports "did:web provides no rotation history" in its Fail diagnostic. 63 + 64 + --- 65 + 66 + <!-- START_TASK_1 --> 67 + ### Task 1: Add Phase 6 dependencies 68 + 69 + **Files:** `Cargo.toml`. 70 + 71 + Append to `[dependencies]`: 72 + 73 + ```toml 74 + sha2 = "0.11" 75 + ``` 76 + 77 + Append to `[dev-dependencies]`: 78 + 79 + ```toml 80 + assert_cmd = "2.0" 81 + ``` 82 + 83 + `sha2` is used by the canonicalizer in Task 3 for the SHA-256 prehash. `assert_cmd` is used by Task 5's `tests/labeler_cli.rs` to drive the compiled binary with specific environment variables (`NO_COLOR=1`) and `--verbose` to validate AC6.6, AC6.7, and AC6.8 end-to-end. 84 + 85 + **Verification:** `cargo check` succeeds. 86 + 87 + **Commit:** 88 + ```bash 89 + git add Cargo.toml Cargo.lock 90 + git commit -m "Add Phase 6 dependencies (sha2, assert_cmd)" 91 + ``` 92 + <!-- END_TASK_1 --> 93 + 94 + <!-- START_TASK_2 --> 95 + ### Task 2: Extend `common::identity` with `plc_history` 96 + 97 + **Verifies:** mechanism for AC5.2 / AC5.5 / AC5.7 (exercised by the Phase 6 crypto tests). 98 + 99 + **Files:** 100 + - Modify: `src/common/identity.rs` — append the helper. 101 + - Create: `tests/fixtures/identity/plc_audit_log_with_rotation.json` — hand-authored (or recorded) operation-log JSON showing one key rotation for the `atproto_label` fragment. 102 + 103 + **Implementation:** 104 + 105 + Types + helper: 106 + 107 + ```text 108 + pub struct PlcHistoricKey { 109 + pub key_id: String, // the multikey string (not the DID#fragment) 110 + pub operation_cid: String, // the CID of the op that introduced it 111 + pub introduced_at: String, // ISO8601 from the op 112 + pub nullified: bool, 113 + } 114 + 115 + pub async fn plc_history_for_fragment( 116 + did: &Did, 117 + fragment: &str, 118 + http: &dyn HttpClient, 119 + ) -> Result<Vec<PlcHistoricKey>, IdentityError>; 120 + ``` 121 + 122 + Behaviour: 123 + - Only valid for `did.method() == DidMethod::Plc`. Passing a `did:web` panics in debug and returns `Err(IdentityError::UnsupportedDidMethod(..))` in release; the caller is expected to check first. 124 + - `GET https://plc.directory/{did}/log/audit` → expect 200, JSON array of operations. 125 + - For each operation (newest-first in the wire order), read `operation.verificationMethods[fragment]` if present; if it is a multikey string, construct a `PlcHistoricKey`. Skip operations that nullify the key's revision entry if `nullified == true` and the caller opts to exclude nullified keys — for Phase 6 we **include** them in the list but set `nullified = true`, so the crypto stage can choose whether to trust them. 126 + - Return the list in wire order (newest-first). 127 + - Transport failure → `Err(IdentityError::HttpTransport(..))`. 128 + - Decode failure → `Err(IdentityError::DidDocumentDecodeFailed { .. })` (reused variant, with `source_name = "plc audit log"`). 129 + 130 + **Unit tests:** 131 + - `plc_history_parses_rotation_fixture` — feed `plc_audit_log_with_rotation.json` via `FakeHttpClient`, assert two historic keys returned with expected multikey strings and `nullified == false`. 132 + - `plc_history_unsupported_method_errors` — pass `did:web:example.com`, assert `UnsupportedDidMethod`. 133 + - `plc_history_transport_error_propagates` — FakeHttpClient returns transport error, assert `HttpTransport` bubbles up. 134 + 135 + **Verification:** 136 + 137 + Run: `cargo test` 138 + Expected: pass. 139 + 140 + **Commit:** 141 + ```bash 142 + git add src/common/identity.rs tests/fixtures/identity/plc_audit_log_with_rotation.json 143 + git commit -m "Add plc_history_for_fragment helper to common::identity" 144 + ``` 145 + <!-- END_TASK_2 --> 146 + 147 + <!-- START_SUBCOMPONENT_A (tasks 3-5) --> 148 + <!-- START_TASK_3 --> 149 + ### Task 3: DRISL-CBOR canonicalizer 150 + 151 + **Verifies:** AC5.1, AC5.2, AC5.4, AC5.5, AC5.6 (prerequisite: canonical encoder must produce byte-identical output to what the labeler signed). 152 + 153 + **Files:** 154 + - Create: `src/commands/test/labeler/crypto.rs` (canonicalizer only — stage body arrives in Task 4) 155 + - Modify: `src/commands/test/labeler.rs` — `pub mod crypto;`. 156 + - Create: `tests/fixtures/labeler/crypto/reference_label.json` — a real labeler's label as JSON (captured from `moderation.bsky.app` or similar). 157 + - Create: `tests/fixtures/labeler/crypto/reference_label.cbor` — the same label's canonical DRISL-CBOR bytes, captured directly from the atproto reference encoder output at fixture-capture time. Committed as a test vector so the canonicalizer's byte output is regression-checked. 158 + 159 + **Implementation:** 160 + 161 + ```text 162 + pub struct CanonicalLabel { 163 + pub prehash: [u8; 32], // SHA-256 of canonical bytes (sig stripped) 164 + pub canonical_bytes: Vec<u8>, // the DRISL-CBOR bytes that were hashed 165 + pub signature_bytes: Vec<u8>, // raw (r || s) bytes from the `sig` field 166 + } 167 + 168 + pub enum CanonicalizeError { 169 + InvalidLabelCbor { cause: String }, 170 + FloatRejected, 171 + IndefiniteLengthRejected, 172 + MissingSigField, 173 + SigFieldWrongType, 174 + SigFieldWrongLength, 175 + } 176 + 177 + pub fn canonicalize_label_for_signing( 178 + label: &atrium_api::com::atproto::label::defs::Label, 179 + ) -> Result<CanonicalLabel, CanonicalizeError>; 180 + ``` 181 + 182 + Logic: 183 + 1. Serialize the `Label` directly to a `ciborium::value::Value` via `ciborium::value::Value::serialized(&label)`. This uses atrium's derived serde impl and yields a `Value` tree in one step — no byte round-trip. `atrium_api::com::atproto::label::defs::Label` contains no float-typed fields (all numeric fields are integers), so the produced tree is guaranteed to be free of `Value::Float` in practice; Phase 6 asserts this invariant with a runtime check (step 2). 184 + 2. Walk the `Value::Map` tree: 185 + - Reject any `Value::Float(_)` → `FloatRejected`. (Defensive: should never fire for well-formed `Label` values, but catches a future schema regression immediately.) 186 + - For each map, sort entries by their canonical CBOR-encoded key bytes (not by the raw string — RFC 8949 deterministic ordering sorts by the byte serialisation of keys). 187 + - Extract and remove the `"sig"` entry from the top-level map. Its value must be a `Value::Bytes(Vec<u8>)` — if anything else, return `SigFieldWrongType`. Length must be 64 bytes — else `SigFieldWrongLength`. 188 + 3. Re-serialize the stripped, sorted tree to canonical CBOR bytes with `ciborium::ser::into_writer`. Since the tree is already canonical (sorted maps, no floats, no indefinite lengths) and ciborium uses minimum-length integer encoding for integer-typed values, the output matches DRISL-CBOR rules for the label record schema. The `reference_label.cbor` golden fixture (loaded by `canonicalize_reference_label_golden`) is the regression check that guards against any ciborium/DRISL drift — if that test ever fails, the canonicalizer has diverged from what the real labeler signed and must be replaced with a hand-rolled serializer before the crypto stage can be trusted. This is called out explicitly in the source design's "Additional Considerations" as the load-bearing risk of the phase. 189 + 4. Compute `prehash = Sha256::digest(&canonical_bytes).into()`. 190 + 5. Return `CanonicalLabel { prehash, canonical_bytes, signature_bytes }`. 191 + 192 + **Tests (in-file):** 193 + - `canonicalize_reference_label_golden` — loads `reference_label.json`, deserialises to `Label`, canonicalizes, asserts `canonical_bytes == read(reference_label.cbor)`. **This is the single most important test in Phase 6.** If the canonicalizer drifts, this test catches it before the user sees a false `Fail` on signature verification. 194 + - `canonicalize_rejects_nan_float` — constructs a label (via `ciborium::Value` directly, bypassing `Label`'s schema) containing a float, asserts `FloatRejected`. 195 + - `canonicalize_missing_sig_errors` — asserts `MissingSigField`. 196 + - `canonicalize_sig_wrong_length_errors` — 32-byte sig, asserts `SigFieldWrongLength`. 197 + 198 + **Verification:** 199 + 200 + Run: `cargo test` 201 + Expected: pass. 202 + 203 + **Commit:** 204 + ```bash 205 + git add src/commands/test/labeler/crypto.rs src/commands/test/labeler.rs tests/fixtures/labeler/crypto 206 + git commit -m "Add DRISL-CBOR canonicalizer for label signing" 207 + ``` 208 + <!-- END_TASK_3 --> 209 + 210 + <!-- START_TASK_4 --> 211 + ### Task 4: Crypto stage `run` — current key then PLC history 212 + 213 + **Verifies:** AC5.1, AC5.2, AC5.3, AC5.4, AC5.5, AC5.6, AC5.7 (exercised by Task 5 tests). 214 + 215 + **Files:** 216 + - Modify: `src/commands/test/labeler/crypto.rs` — append the stage body. 217 + - Modify: `src/commands/test/labeler/pipeline.rs` — replace the crypto `Skipped` stub with a real call; thread `&http_client` through. 218 + 219 + **Implementation:** 220 + 221 + ```text 222 + pub struct CryptoFacts { 223 + pub verified_with_current: usize, 224 + pub verified_with_historic: Vec<HistoricKeyHit>, // (key_id, label_count) 225 + pub unverified: usize, 226 + } 227 + 228 + pub struct HistoricKeyHit { 229 + pub key_id: String, 230 + pub label_count: usize, 231 + } 232 + 233 + pub async fn run( 234 + identity: &IdentityFacts, 235 + labels: &[Label], 236 + http: &dyn HttpClient, 237 + ) -> CryptoStageOutput; 238 + ``` 239 + 240 + Logic: 241 + 242 + 1. **Empty labels** → return `CryptoStageOutput { facts: None, results: vec![Skipped("labeler published no labels; nothing to verify")] }`. AC5.3. 243 + 2. For each label in `labels`: 244 + - `canonicalize_label_for_signing(label)?` — on `Err`, emit a `SpecViolation` `CheckResult` with code `labeler::crypto::canonicalization_failed` containing the canonicalization error detail. AC5.6. Continue to the next label; this label counts as `unverified` for the rollup purposes but is reported separately so users distinguish it from a genuine signature mismatch. 245 + - Verify the signature against `identity.signing_key` (the current key) using `AnyVerifyingKey::verify_prehash(&prehash, &signature)`. On success, increment `verified_with_current`. 246 + - On failure, push the label onto a `failed_against_current: Vec<FailedLabel>` buffer and continue. 247 + 3. If `failed_against_current.is_empty()`: 248 + - Emit `CheckResult { id: "crypto::rollup", status: Pass, summary: format!("{} labels verified against current key", verified_with_current) }`. AC5.1. 249 + 4. Else if `identity.did.method() == DidMethod::Plc`: 250 + - Call `plc_history_for_fragment(&identity.did, "atproto_label", http).await`. 251 + - Transport failure → emit `CheckResult { id: "crypto::plc_history_fetch", status: NetworkError, .. }` and emit one `Fail` per label in `failed_against_current` (no history was consulted, so the failure stands but is explained). AC5.7. 252 + - Success → iterate historic keys newest-to-oldest: 253 + - For each historic key, parse via `parse_multikey`. Skip parse failures (corrupt history entries, logged at `tracing::warn!`). 254 + - For each failed label, try verify_prehash against this historic key. On success, remove from the buffer and attribute the hit to the key (accumulate into a `BTreeMap<String, usize>`). 255 + - After the loop, emit `CheckResult { id: "crypto::rollup", status: Pass, advisories: true }` only if every failed label found a historic-key match. The check is a `Pass` with an _additional_ `Advisory` `CheckResult { id: "crypto::rotated_keys_used", summary: format!("{} label(s) signed by a rotated-out key ({} distinct key id(s))", total, distinct.len()) }` listing the hit key ids. AC5.2. 256 + - If any label remained unverified, emit `CheckResult { id: "crypto::rollup", status: SpecViolation, .. }` with a diagnostic listing every key id that was tried (current + historic). AC5.5. 257 + 5. Else (`did:web`): 258 + - Emit `CheckResult { id: "crypto::rollup", status: SpecViolation, .. }` with a diagnostic stating "did:web provides no rotation history" and listing the current key id. AC5.4. 259 + 260 + **CheckResult shape guarantees:** 261 + 262 + - Exactly one `crypto::rollup` `CheckResult` per run (or zero if AC5.3 empty path). 263 + - Zero or more `crypto::canonicalization_failed` per label that failed to canonicalise. These are severity `SpecViolation` (they are not network errors; the canonicalizer produced a deterministic failure). 264 + - Zero or one `crypto::plc_history_fetch` `NetworkError` per run. 265 + - Zero or one `crypto::rotated_keys_used` `Advisory` per run. 266 + 267 + **Pipeline driver wiring:** 268 + 269 + Replace the crypto `Skipped` stub. The crypto stage runs only when `identity_facts.is_some()` AND `http_facts.is_some()` — otherwise emit `Skipped("blocked by upstream stage failures")` with the appropriate reason. If HTTP facts exist but `first_page.is_empty()`, pass the empty slice through so the stage itself emits the AC5.3 Skipped result with its canonical reason string. 270 + 271 + **Verification:** 272 + 273 + Run: `cargo check` and `cargo build` 274 + Expected: compiles. 275 + 276 + **Commit:** 277 + ```bash 278 + git add src/commands/test/labeler/crypto.rs src/commands/test/labeler/pipeline.rs 279 + git commit -m "Implement crypto stage with PLC key history fallback" 280 + ``` 281 + <!-- END_TASK_4 --> 282 + 283 + <!-- START_TASK_5 --> 284 + ### Task 5: End-to-end snapshot tests for AC5 and AC6 285 + 286 + **Verifies:** AC5.1 through AC5.7 and AC6.1 through AC6.8. 287 + 288 + **Files:** 289 + - Create: `tests/fixtures/labeler/endtoend/` — one subdirectory per end-to-end case. Each contains the full set of fixtures the four stages need: DID doc, labeler record, queryLabels responses (first + second page), subscribeLabels frame fixtures, and optionally a PLC audit log. 290 + - `all_pass/` — fully healthy labeler with real-labeler-derived fixtures. AC5.1 and AC6.1. 291 + - `identity_only_failure/` — DID doc missing the labeler signing key; all later stages skipped. AC6.2. 292 + - `http_decode_failure/` — malformed `queryLabels` response. AC6.2. 293 + - `subscription_decode_error/` — frame fixture with one malformed frame. AC6.2. 294 + - `current_key_fail_history_pass/` — did:plc labeler whose current key does not verify any labels but whose prior historic key does. AC5.2. 295 + - `current_key_fail_history_also_fail/` — did:plc labeler whose labels verify against neither current nor any historic key. AC5.5. 296 + - `did_web_current_key_fail/` — did:web labeler whose labels do not verify against current key. AC5.4. 297 + - `canonicalization_error/` — a label whose CBOR breaks the canonicalizer (e.g., contains a float — hand-authored fixture). AC5.6. 298 + - `plc_directory_unreachable/` — same as `current_key_fail_history_pass/` but the PLC audit log fetch returns transport error. AC5.7. 299 + - `empty_labeler/` — `queryLabels` returns `[]`; subscription fixture also empty. AC5.3, AC6.4. 300 + - Create: `tests/labeler_endtoend.rs` — top-level snapshot tests that assemble `FakeHttpClient`, `FakeDnsResolver`, `FakeWebSocketClient` from each fixture directory, run the full pipeline, render the report, and assert against `insta::assert_snapshot!`. Also asserts the return value of `report.exit_code()` matches the AC's expectation. 301 + - Create: `tests/labeler_cli.rs` — a handful of `std::process::Command`-based tests (using `assert_cmd` or direct `cargo run --`) that cover AC6 items requiring the real binary: 302 + - `bootstrap_failure_exits_two_on_invalid_target` — AC6.8. 303 + - `no_color_env_var_renders_without_ansi` — AC6.6. Set `NO_COLOR=1`; pipe output to a `Vec<u8>`; assert no `\x1b[` escape sequences appear and that the glyphs `[OK]`, `[FAIL]`, etc. are still present. 304 + - `verbose_flag_emits_debug_tracing_to_stderr` — AC6.7. Run with `--verbose`, assert stderr contains one or more `DEBUG atproto_devtool::identity` lines. (Uses a fake backend via an env variable, or targets a known-fast `did:plc` test fixture.) 305 + 306 + **Test list — end-to-end module:** 307 + 308 + - `all_pass_exits_zero_and_renders_all_ok` — AC5.1 + AC6.1. 309 + - `identity_only_failure_exits_one_with_severity_breakdown` — AC6.2. 310 + - `http_decode_failure_exits_one` — AC6.2. 311 + - `subscription_decode_error_exits_one` — AC6.2. 312 + - `current_key_fail_history_pass_exits_zero_with_advisory` — AC5.2. Assert severity is `Pass` on rollup, an `Advisory` is present on `crypto::rotated_keys_used`, exit 0. 313 + - `current_key_fail_history_also_fail_exits_one` — AC5.5. 314 + - `did_web_current_key_fail_exits_one` — AC5.4. 315 + - `canonicalization_error_distinct_diagnostic_code` — AC5.6. Assert the `code` on the diagnostic is `labeler::crypto::canonicalization_failed` and NOT `labeler::crypto::signature_mismatch`. 316 + - `plc_directory_unreachable_network_error_no_false_fail` — AC5.7. Assert `crypto::plc_history_fetch` is `NetworkError` and the labels that failed against the current key are reported as `Fail` only if history could not be consulted (i.e., one `Fail` per failed label, plus the `NetworkError`). Exit code is still 1 because of the `Fail`s — this is correct per the AC text "the labels that failed against the current key are reported as Fail only if history could not be consulted". 317 + - `empty_labeler_skipped_crypto_only_advisory_exits_zero` — AC5.3 + AC6.4. 318 + - `exit_code_summary_for_network_only_run_is_zero` — AC6.3. Use a fixture where all stages hit only network errors. Assert exit 0 and summary footer lists "X network errors, 0 spec violations". 319 + - `skipped_reasons_rendered` — AC6.5. Use the `endpoint_only_no_did` fixture from Phase 3 repurposed; assert each `[SKIP]` line includes the reason text. 320 + 321 + **Test list — CLI module:** 322 + 323 + - `bootstrap_failure_exits_two` — AC6.8. 324 + - `no_color_renders_without_ansi` — AC6.6. 325 + - `verbose_emits_debug_tracing` — AC6.7. 326 + 327 + **Verification:** 328 + 329 + Run: `cargo insta test --accept` (first run — captures all new snapshots) 330 + Run: `cargo test` (subsequent runs) 331 + Expected: all new tests pass. Total test count across the whole crate is roughly 35–40 at this point. 332 + 333 + **Commit:** 334 + ```bash 335 + git add tests/labeler_endtoend.rs tests/labeler_cli.rs tests/fixtures/labeler/endtoend tests/snapshots 336 + git commit -m "Add end-to-end tests covering AC5 and AC6" 337 + ``` 338 + <!-- END_TASK_5 --> 339 + <!-- END_SUBCOMPONENT_A --> 340 + 341 + <!-- START_TASK_6 --> 342 + ### Task 6: README and final polish 343 + 344 + **Verifies:** no additional AC (documentation task). 345 + 346 + **Files:** 347 + - Create: `README.md` — install instructions (`cargo install --path .`), the `--help` excerpt produced by `cargo run -- test labeler --help`, and three example invocations: 348 + - Handle target: `atproto-devtool test labeler moderation.bsky.app` 349 + - DID target: `atproto-devtool test labeler did:plc:ar7c4by46qjdydhdevvrndac` 350 + - Endpoint + DID: `atproto-devtool test labeler https://mod.bsky.app --did did:plc:ar7c4by46qjdydhdevvrndac` 351 + - Modify: `src/commands/test/labeler.rs` — audit `tracing::debug!` coverage across the four stages. Ensure HTTP requests, WebSocket frame counts, and PLC log fetches all emit `tracing::debug!` lines so `--verbose` surfaces them (AC6.7). The identity stage already has trace coverage from Phase 2; the HTTP stage should add `debug!` at request start/end; the subscription stage should add `debug!` at connect/disconnect and per decoded frame (at `trace!` level for frames, `debug!` for connect/disconnect); the crypto stage should `debug!` current-key success and each historic-key fetch attempt. 352 + 353 + **Verification:** 354 + 355 + Run: `cargo run -- test labeler --help > /tmp/help.txt` then copy relevant excerpt into `README.md`. 356 + Run: `cargo run -- test labeler --verbose https://mod.bsky.app 2> /tmp/stderr.txt` (online). Expected: `/tmp/stderr.txt` contains `DEBUG` level tracing lines. 357 + 358 + **Commit:** 359 + ```bash 360 + git add README.md src/commands/test/labeler 361 + git commit -m "Add README and audit --verbose tracing coverage" 362 + ``` 363 + <!-- END_TASK_6 --> 364 + 365 + <!-- START_TASK_7 --> 366 + ### Task 7: Final phase verification 367 + 368 + **Steps:** 369 + 1. `cargo fmt --check` — clean. 370 + 2. `cargo clippy -- -D warnings` — clean. 371 + 3. `cargo test` — all tests pass (Phase 1 through Phase 6, roughly 35–40 total). 372 + 4. `cargo run -- test labeler --help` — renders with all AC1.6-required flags. 373 + 5. (Online) `cargo run -- test labeler moderation.bsky.app` — all four stages `[OK]`, exit 0 (AC6.1, AC5.1). 374 + 6. (Online) `cargo run -- test labeler https://mod.bsky.app` — endpoint-only mode skips identity/crypto, runs HTTP/subscription, exits 0 (AC1.3). 375 + 7. `NO_COLOR=1 cargo run -- test labeler moderation.bsky.app | cat` — stdout contains `[OK]`/`[FAIL]`/`[SKIP]`/`[WARN]` glyphs and no `\x1b[` sequences (AC6.6). 376 + 377 + **Done when:** steps 1–4 and 7 pass. Steps 5 and 6 are smoke checks against a real labeler and are allowed to vary from run to run (the real labeler may have temporary incidents); they are not `cargo test` gating. 378 + <!-- END_TASK_7 --> 379 + 380 + --- 381 + 382 + ## Phase 6 done-when checklist 383 + 384 + - `src/commands/test/labeler/crypto.rs` exists with the canonicalizer and the stage body. 385 + - `src/common/identity.rs::plc_history_for_fragment` exists and is tested. 386 + - Pipeline driver wires the crypto stage between subscription and report finalization. 387 + - `README.md` present with install instructions and examples. 388 + - Every AC from the design is covered by at least one automated test and the automated test suite is green. 389 + - `cargo run -- test labeler moderation.bsky.app` — running the binary against a real healthy labeler produces an all-`[OK]` report and exits 0. 390 + - Running the same binary against a fixture labeler with a rotated-out signing key produces an `Advisory` rollup and still exits 0. 391 + - Running the same binary against a fixture labeler with an actual signature mismatch produces a `Fail` and exits 1. 392 + - `cargo clippy -- -D warnings` and `cargo fmt --check` clean. 393 + - Commit history shows the six atomic commits from this phase.
+147
docs/implementation-plans/2026-04-13-test-labeler/test-requirements.md
··· 1 + # Test requirements: `atproto-devtool test labeler` 2 + 3 + This document maps every acceptance criterion from the design plan to the concrete test(s) that verify it, names the phase in which the test is authored, and lists the fixtures and verification approach. 4 + 5 + **Design plan:** `docs/design-plans/2026-04-13-test-labeler.md` 6 + **Implementation plan:** `docs/implementation-plans/2026-04-13-test-labeler/` 7 + 8 + --- 9 + 10 + ## Conventions 11 + 12 + - **Unit tests** live in the same file as the code they test (`#[cfg(test)] mod tests { ... }`). 13 + - **Integration tests** live in `tests/*.rs` and link against `src/lib.rs`. Naming follows the stage: `labeler_identity.rs`, `labeler_http.rs`, `labeler_subscription.rs`, `labeler_crypto.rs`, `labeler_end_to_end.rs`. 14 + - **Snapshot tests** use `insta` with `cargo insta review`/`cargo insta test --accept`. Snapshots live under `tests/snapshots/`. 15 + - **CLI-level tests** drive the compiled binary via `assert_cmd::Command::cargo_bin("atproto-devtool")`. 16 + - **Subscription tests** use `#[tokio::test(flavor = "current_thread", start_paused = true)]` so `tokio::time::advance` drives both the fake's inter-frame sleeps and the stage's budget timer deterministically. 17 + - **Identity-stage report rendering** builds a local `miette::GraphicalReportHandler` per `LabelerReport::render` call rather than relying on the process-global handler, avoiding parallel-test races. 18 + - **Fixture hygiene:** on-disk fixtures under `tests/fixtures/labeler/` are captured once from real responses (recorded in commit messages), then hand-mutated for failure cases. Each fixture file names the AC it exercises. 19 + - **Do not mock external crates.** Fakes implement the narrow trait(s) the stage depends on (`RawHttpTee`, `WebSocketClient`, `PlcLogFetcher`, `IdentityResolver`) so tests can drive them without touching the network. 20 + 21 + --- 22 + 23 + ## AC coverage matrix 24 + 25 + Each AC is owned by exactly one verification test unless stated otherwise. Phase column names the phase that writes the test; some diagnostics from earlier phases are verified end-to-end again in Phase 6's `insta` suite. 26 + 27 + ### test-labeler.AC1 — CLI skeleton and invocation modes 28 + 29 + | AC | Test | Phase | Kind | Notes | 30 + |----|------|-------|------|-------| 31 + | AC1.1 | `handle_target_runs_all_stages` | 6 | `insta` end-to-end | Fake backends for all four stages; handle resolver returns a known DID; assert snapshot shows identity + HTTP + subscription + crypto stages with `[OK]` glyphs. | 32 + | AC1.2 | `did_target_runs_all_stages` | 6 | `insta` end-to-end | Same fakes, invoked with bare `did:plc:…` — handle resolver must not be called. | 33 + | AC1.3 | `url_target_skips_identity_and_crypto` | 6 | `insta` end-to-end | Invoked with `https://labeler.example/`; assert identity and crypto stages are `[SKIP]` with the documented reasons, HTTP + subscription run. | 34 + | AC1.4 | `url_with_did_cross_checks_endpoint` | 6 | `insta` end-to-end | Two variants: match (all-pass) and mismatch (`SpecViolation` on endpoint comparison). | 35 + | AC1.5 | `invalid_target_exits_two` | 1 | CLI (`assert_cmd`) | `test labeler not a valid target` → exit code 2, clap error text. | 36 + | AC1.6 | `help_lists_all_flags` | 1 | CLI (`assert_cmd`) | `test labeler --help` stdout contains `--did`, `--subscribe-timeout`, `--verbose`, `--no-color`, `<TARGET>`. | 37 + 38 + ### test-labeler.AC2 — Identity-layer checks 39 + 40 + | AC | Test | Phase | Kind | Notes | 41 + |----|------|-------|------|-------| 42 + | AC2.1 | `valid_did_doc_and_labeler_record_passes` | 3 | Integration | Fakes return fixture DID doc with `#atproto_labeler` service + parseable k256 key, plus fixture labeler record with non-empty `policies.labelValues`. Assert `Outcome::Pass` on both identity checks. | 43 + | AC2.2 | `labeler_record_with_label_values_passes` | 3 | Integration | Covered as the record-half of AC2.1's fixture; asserted independently. | 44 + | AC2.3 | `missing_service_entry_spec_violation` | 3 | `insta` | Fixture DID doc with no `#atproto_labeler`. Snapshot asserts `[FAIL]`, diagnostic `NamedSource` = the DID JSON, `#[label]` span highlights the `service` array. | 45 + | AC2.4 | `missing_signing_key_spec_violation` | 3 | `insta` | Fixture DID doc with service but no labeler verification method. | 46 + | AC2.5 | `invalid_service_endpoint_spec_violation` | 3 | `insta` | Fixture DID doc with `serviceEndpoint: "ftp://bad"`. | 47 + | AC2.6 | `missing_labeler_record_spec_violation` | 3 | Integration | Fake PDS returns 404; assert distinct `SpecViolation` code from transport failure. | 48 + | AC2.7 | `empty_label_values_spec_violation` | 3 | `insta` | Fixture labeler record with `policies.labelValues: []`; span highlights `policies`. | 49 + | AC2.8 | `dns_failure_is_network_error` | 3 | Integration | Fake identity resolver returns a DNS-style error; assert `NetworkError` and that run exit code is `0` on its own. | 50 + 51 + ### test-labeler.AC3 — HTTP-layer checks 52 + 53 + | AC | Test | Phase | Kind | Notes | 54 + |----|------|-------|------|-------| 55 + | AC3.1 | `well_formed_query_labels_decodes` | 4 | Integration | `FakeRawHttpTee` returns valid JSON; assert typed decode via `atrium_api::com::atproto::label::query_labels::Output` and `Outcome::Pass`. | 56 + | AC3.2 | `cursor_pagination_round_trip_passes` | 4 | Integration | Fake returns distinct page for cursor; assert stage called `query_labels(None)` then `query_labels(Some(&cursor))` and that responses differ. | 57 + | AC3.3 | `empty_labels_produces_advisory` | 4 | Integration | Fake returns `{ "labels": [] }`; assert pass with attached `Advisory` ("labeler has no published labels"). | 58 + | AC3.4 | `malformed_query_labels_spec_violation` | 4 | `insta` | Fake returns JSON missing `labels` field; diagnostic `NamedSource` is the response bytes; snapshot assertion. | 59 + | AC3.5 | `ignored_cursor_spec_violation` | 4 | Integration | Fake returns the same page regardless of cursor; assert `SpecViolation` on pagination check. | 60 + | AC3.6 | `unreachable_endpoint_network_error` | 4 | Integration | Fake returns transport error; assert `NetworkError` and that schema check is `Skipped`, not `Fail`. | 61 + 62 + ### test-labeler.AC4 — Subscription-layer checks 63 + 64 + | AC | Test | Phase | Kind | Notes | 65 + |----|------|-------|------|-------| 66 + | AC4.1 | `backfill_completes_within_budget_passes` | 5 | `insta` | Fake script: 3 frames + 600ms idle; `tokio::time::advance` drives the clock. | 67 + | AC4.2 | `backfill_exceeds_budget_triggers_live_tail` | 5 | `insta` | Two-script fake: budget-exceeding first connection, clean live-tail second. | 68 + | AC4.3 | `empty_stream_advisories` | 5 | `insta` | No frames at all; assert backfill `[WARN]` and live-tail `[SKIP]` with matching reason strings, exit code `0`. | 69 + | AC4.4 | `malformed_frame_emits_spec_violation` | 5 | `insta` | 1 valid + 1 malformed frame; assert `[FAIL] subscription::frame_decode` with source bytes = the malformed fixture. | 70 + | AC4.5 | `error_frame_malformed_payload_spec_violation` | 5 | `insta` | First frame is `op: -1` with malformed `#info` payload. | 71 + | AC4.6 | `unreachable_endpoint_network_error` (sub) | 5 | Integration | Fake returns transport error on connect; assert `NetworkError`, summary calls out one network error, exit `0`. | 72 + | AC4.7 | `subscribe_timeout_below_floor_rejected` | 5 (helper) + 1 (CLI) | Unit + CLI | Unit test on `parse_subscribe_timeout` asserts `"500ms"` → `Err` containing `"at least 1 second"`; `"0"` → any `Err`. Optional `assert_cmd` end-to-end with `500ms` asserts exit 2 and `"at least 1 second"` substring. | 73 + 74 + ### test-labeler.AC5 — Crypto-layer checks 75 + 76 + | AC | Test | Phase | Kind | Notes | 77 + |----|------|-------|------|-------| 78 + | AC5.1 | `current_key_verifies_all_labels_passes` | 6 | Integration | Fake `PlcLogFetcher` not called; assert stage passes without fetching history. | 79 + | AC5.2 | `rotated_out_key_verifies_via_plc_history` | 6 | Integration | Hand-crafted fixture: label signed by a historic key; fake PLC log returns the history; assert `Pass` + `Advisory` listing key ids. | 80 + | AC5.3 | `empty_labeler_skips_crypto_stage` | 6 | Integration | HttpFacts has empty `first_page`; assert `Skipped("labeler published no labels; nothing to verify")` and that it does not affect exit code. | 81 + | AC5.4 | `did_web_current_key_mismatch_fails` | 6 | `insta` | Fixture did:web labeler with bad signature; assert `Fail` rollup diagnostic listing the current key id and the "no rotation history" message. | 82 + | AC5.5 | `did_plc_current_and_historic_mismatch_fails` | 6 | `insta` | did:plc labeler, none of {current, historic} verify; assert diagnostic lists every key id tried. | 83 + | AC5.6 | `canonicalization_failure_distinct_from_sig_fail` | 6 | Integration | Fixture label with invalid CBOR in the record field; assert per-label `Fail` with a diagnostic code distinct from the signature-mismatch code. | 84 + | AC5.7 | `plc_log_fetch_failure_is_network_error` | 6 | Integration | Fake PLC log returns transport error; assert `NetworkError` and that labels failing against the current key are reported as `Fail` (not cascade-hidden). | 85 + 86 + **Plus** the canonicalizer regression fixture — `tests/fixtures/labeler/crypto/reference_label.cbor` — is a byte-exact golden. `canonicalize_label_for_signing` tests assert the output equals the golden file; drift in canonicalization breaks this test. Written in Phase 6, subcomponent A. 87 + 88 + ### test-labeler.AC6 — Cross-cutting reporting and exit semantics 89 + 90 + | AC | Test | Phase | Kind | Notes | 91 + |----|------|-------|------|-------| 92 + | AC6.1 | `all_pass_run_exits_zero_with_ok_glyphs` | 6 | `insta` end-to-end | Full fake stack; snapshot the rendered report and assert exit code `0`. | 93 + | AC6.2 | `spec_violation_run_exits_one` | 6 | `insta` end-to-end | Run with a fixture producing one `SpecViolation` in each stage; assert footer severity breakdown and exit `1`. | 94 + | AC6.3 | `network_error_only_run_exits_zero` | 6 | `insta` end-to-end | All stages produce only `NetworkError`; assert separate network-error count and exit `0`. | 95 + | AC6.4 | `advisory_only_run_exits_zero` | 6 | `insta` end-to-end | Only advisories; exit `0`. | 96 + | AC6.5 | `skipped_checks_render_reason_strings` | 6 | `insta` end-to-end | Verified alongside AC1.3's URL-mode snapshot. | 97 + | AC6.6 | `no_color_suppresses_ansi_keeps_glyphs` | 6 | CLI (`assert_cmd`) | Run with `NO_COLOR=1`; assert stdout contains `[OK]`/`[FAIL]` glyphs but no ESC sequences. | 98 + | AC6.7 | `verbose_flag_emits_debug_tracing_to_stderr` | 6 | CLI (`assert_cmd`) | Run with `--verbose`; assert stderr contains at least one `DEBUG atproto_devtool::` line and rendered report on stdout is unchanged from the non-verbose snapshot. | 99 + | AC6.8 | `bootstrap_failure_exits_two` | 6 | CLI (`assert_cmd`) | Run with an unparseable CLI arg; assert exit code `2`, distinct from `1`. Pairs with AC1.5. | 100 + 101 + --- 102 + 103 + ## Fixtures 104 + 105 + All fixtures live under `tests/fixtures/labeler/<stage>/`. Each fixture file begins with a header comment naming the AC(s) it supports. 106 + 107 + - **identity/**: `valid_did_doc.json`, `missing_service.json`, `missing_signing_key.json`, `invalid_service_endpoint.json`, `empty_label_values.json`, `valid_labeler_record.json`. 108 + - **http/**: `first_page.json`, `second_page.json`, `empty_page.json`, `missing_labels_field.json`, `ignores_cursor_page.json`. 109 + - **subscription/**: `frame_info_ok.bin`, `frame_labels_ok.bin`, `frame_malformed_cbor.bin`, `frame_error_malformed_info.bin`. Each `.bin` is the raw WebSocket payload. 110 + - **crypto/**: `reference_label.cbor` (canonicalizer golden), `label_signed_current.json`, `label_signed_historic.json`, `label_invalid_cbor.json`, `plc_log_with_rotation.json`, `did_web_doc_no_history.json`. 111 + - **end-to-end/**: composed from the above; no unique fixture files. 112 + 113 + Capture procedure (documented in Phase 2 for identity, Phase 4 for HTTP, Phase 5 for subscription, Phase 6 for crypto): 114 + 115 + 1. Run the stage against a known-good production labeler, save raw bytes. 116 + 2. Hand-mutate into failure variants with a single minimal edit per variant. 117 + 3. Commit with a message naming the source and the mutation. 118 + 119 + --- 120 + 121 + ## Verification gates 122 + 123 + Before the plan is considered complete, ALL of the following must pass from a clean worktree: 124 + 125 + 1. `cargo fmt --check` — exit 0, no output. 126 + 2. `cargo clippy -- -D warnings` — clean. 127 + 3. `cargo build` — succeeds. 128 + 4. `cargo test` — all tests pass (unit + integration). 129 + 5. `cargo insta test` — no pending snapshot changes. 130 + 6. Every AC in the table above resolves to at least one passing test under (4) or (5). 131 + 132 + The end-to-end Phase 6 `insta` suite is the final gate: if any cross-cutting AC (AC6.x) regresses but phase-local tests still pass, the report-rendering or exit-code wiring has drifted and must be fixed before merge. 133 + 134 + --- 135 + 136 + ## Manual verification 137 + 138 + A small set of ACs are difficult to exercise without real network IO. These are verified manually by running the compiled binary against known production labelers and recording the output in the PR description. None of these are required for CI green, but they are required before merging to `main`: 139 + 140 + - **AC1.1–AC1.4:** Run `cargo run -- test labeler moderation.bsky.app` and variants (handle, DID, URL, URL + `--did`). Confirm each stage produces output consistent with the fake-backed snapshots. 141 + - **AC2.8:** Run against an intentionally-bad handle (`cargo run -- test labeler does-not-resolve.invalid`) and confirm `NetworkError` classification and exit code `0`. 142 + - **AC3.6:** Run against an offline endpoint (`cargo run -- test labeler https://127.0.0.1:1/`) and confirm HTTP-stage `NetworkError`. 143 + - **AC4.6:** As AC3.6 but confirming the subscription-stage classification. 144 + - **AC5.2:** Run against `moderation.bsky.app` if/when it has rotated signing keys historically. If PLC log contains no rotation history for the labeler, this path is exercised only via the unit test with a crafted fixture. 145 + - **AC6.7:** Run `cargo run -- test labeler --verbose moderation.bsky.app 2>/tmp/stderr.txt` and confirm `/tmp/stderr.txt` contains `DEBUG` lines mentioning `atproto_devtool::identity`, `::http`, `::subscription`, and `::crypto`. 146 + 147 + Record these manual checks in the PR description with the exact commands run and the output observed.