CLI app for developers prototyping atproto functionality
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: add test-labeler design plan

Design plan for the first atproto-devtool feature, `test labeler`,
which runs a layered conformance suite (identity, HTTP, subscription,
crypto) against a labeler identified by handle, DID, or raw endpoint.
Establishes the crate's subcommand-module pattern for future features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+398
+398
docs/design-plans/2026-04-13-test-labeler.md
··· 1 + # `atproto-devtool test labeler` design 2 + 3 + ## Summary 4 + 5 + The implementation is organized as a **layered four-stage pipeline** (identity → HTTP → subscription → crypto) driving a single `LabelerReport`. Each stage is an async function with a typed input and a typed output (`Outcome<T>`), so the Rust type system enforces the dependency graph at compile time: the crypto stage literally cannot be called without the `IdentityFacts` and label slice produced by earlier stages. The pipeline driver is non-fail-fast — it runs every stage whose inputs are available regardless of prior failures — so a single invocation always produces a complete diagnostic picture rather than stopping at the first problem. 6 + 7 + Diagnostics are first-class throughout. Every non-passing result carries a `miette::Diagnostic` whose `NamedSource` attaches the offending raw bytes (JSON document, CBOR frame, serialized record) and whose `#[label]` spans highlight the exact field that caused the violation — making the tool useful for debugging rather than merely detecting. The subscription stage uses a two-connection strategy with a configurable time budget, inferring backfill completion from an idle gap heuristic. The crypto stage is designed around the reality of key rotation: it verifies label signatures against the currently declared key first, then falls back lazily to the PLC operation audit log to accept labels signed by a legitimately rotated-out key, attaching an `Advisory` rather than a `Fail` in that case. Stages are coded against narrow async traits (`DidResolver`, `HttpClient`, `WebSocketClient`, `PlcLogFetcher`) so unit tests can replay recorded fixtures from real labelers without any network access; rendered output is locked down with `insta` snapshots to catch diagnostic drift automatically. 8 + 9 + ## Definition of Done 10 + 11 + **Primary deliverable:** A Rust binary `atproto-devtool` with a modular subcommand architecture (single crate, clap-derived, async/tokio), designed so new feature modules can be added with minimal ceremony. 12 + 13 + **First feature:** `atproto-devtool test labeler <handle | endpoint> [--did <did>]` that runs a conformance suite across four layers: 14 + 15 + - **Identity:** DID document contains an `atproto_labeler` service entry of type `AtprotoLabeler` and a `#atproto_label` signing key; the labeler's PDS holds a valid `app.bsky.labeler.service/self` record with policies. 16 + - **HTTP:** `com.atproto.label.queryLabels` responds with well-formed labels and handles pagination/schema correctly. 17 + - **Subscription:** `com.atproto.label.subscribeLabels` WebSocket firehose emits valid frames and supports `cursor=0` backfill. 18 + - **Crypto:** Label signatures verify against the declared `#atproto_label` key. 19 + 20 + Skipped checks are clearly reported when inputs don't unlock them (e.g., endpoint-only invocation without `--did` skips identity/crypto checks). 21 + 22 + **Output:** Rich human-readable diagnostics via `miette`; non-zero exit on any check failure. 23 + 24 + **Out of scope (initial):** JSON output, Ozone (`tools.ozone.*`) moderation endpoints, authenticated check flows, additional subcommands beyond `test labeler`. 25 + 26 + ## Acceptance Criteria 27 + 28 + ### test-labeler.AC1: CLI skeleton and invocation modes 29 + 30 + - **test-labeler.AC1.1 Success:** `atproto-devtool test labeler <handle>` accepts an atproto handle, resolves it to a DID, and runs all four stages. 31 + - **test-labeler.AC1.2 Success:** `atproto-devtool test labeler <did>` accepts a DID directly (did:plc or did:web) and runs all four stages without a prior handle resolution. 32 + - **test-labeler.AC1.3 Success:** `atproto-devtool test labeler <https-url>` accepts a raw endpoint URL and runs the HTTP and subscription stages while marking identity and crypto checks `Skipped` with a clear reason. 33 + - **test-labeler.AC1.4 Success:** `atproto-devtool test labeler <https-url> --did <did>` unlocks identity checks (and therefore the crypto stage) against an endpoint URL, and also cross-checks that the DID's declared labeler service endpoint matches the provided URL — reporting a `SpecViolation` on mismatch. 34 + - **test-labeler.AC1.5 Failure:** A target argument that is neither a valid handle, DID, nor URL produces a clap parsing error with a helpful message and exit code `2`. 35 + - **test-labeler.AC1.6 Edge:** `atproto-devtool test labeler --help` renders the subcommand's help text including all stage-related flags (`--subscribe-timeout`, `--did`, `--verbose`, `--no-color`). 36 + 37 + ### test-labeler.AC2: Identity-layer checks 38 + 39 + - **test-labeler.AC2.1 Success:** A DID document containing a `#atproto_labeler` service entry of type `AtprotoLabeler` with a valid `serviceEndpoint`, plus a signing-key verification method parseable as k256 or p256, passes all identity checks. 40 + - **test-labeler.AC2.2 Success:** A `app.bsky.labeler.service/self` record with a non-empty `policies.labelValues` list passes the labeler-record checks. 41 + - **test-labeler.AC2.3 Failure:** A DID document missing the `atproto_labeler` service entry produces a `SpecViolation` `CheckResult` whose diagnostic's `NamedSource` is the DID-document JSON and whose `#[label]` span highlights the `service` array. 42 + - **test-labeler.AC2.4 Failure:** A DID document missing the labeler signing key produces a `SpecViolation` whose diagnostic highlights the `verificationMethod` array. 43 + - **test-labeler.AC2.5 Failure:** A labeler `serviceEndpoint` that is not a valid HTTPS URL produces a `SpecViolation` with the offending endpoint value highlighted in the DID doc. 44 + - **test-labeler.AC2.6 Failure:** A missing `app.bsky.labeler.service/self` record (PDS returns 404) produces a `SpecViolation` distinct from a PDS transport failure. 45 + - **test-labeler.AC2.7 Failure:** An `app.bsky.labeler.service/self` record with an empty `policies.labelValues` list produces a `SpecViolation` whose diagnostic highlights the `policies` field in the re-serialized record. 46 + - **test-labeler.AC2.8 NetworkError:** A DNS failure resolving a handle, an unreachable `plc.directory`, or an unreachable PDS produces a `NetworkError` result that is called out separately in the summary and does not by itself fail the run. 47 + 48 + ### test-labeler.AC3: HTTP-layer checks (`queryLabels`) 49 + 50 + - **test-labeler.AC3.1 Success:** A labeler endpoint that responds to `com.atproto.label.queryLabels` with a well-formed lexicon response decodes into typed labels and passes the schema check. 51 + - **test-labeler.AC3.2 Success:** A labeler that honors the `cursor` parameter — returning a distinct page when called with a cursor from the first page — passes the pagination round-trip check. 52 + - **test-labeler.AC3.3 Success:** A labeler that returns an empty labels array passes the schema and pagination checks and contributes a distinct `Advisory` ("labeler has no published labels") to inform downstream stages. 53 + - **test-labeler.AC3.4 Failure:** A `queryLabels` response that omits required fields or otherwise fails lexicon decoding produces a `SpecViolation` whose diagnostic carries the response JSON as its `NamedSource`. 54 + - **test-labeler.AC3.5 Failure:** A labeler that ignores the `cursor` parameter (returns the first page again) produces a `SpecViolation` on the pagination check. 55 + - **test-labeler.AC3.6 NetworkError:** An unreachable or TLS-failing labeler endpoint produces a `NetworkError` that does not cascade into `queryLabels` schema failures. 56 + 57 + ### test-labeler.AC4: Subscription-layer checks (`subscribeLabels`) 58 + 59 + - **test-labeler.AC4.1 Success (backfill completes within budget):** A labeler whose backfill flushes within `--subscribe-timeout` followed by a ≥500ms idle gap produces a `Pass` backfill check and an implicit `Pass` live-tail check ("live tail observed after backfill completed"). 60 + - **test-labeler.AC4.2 Success (backfill exceeds budget):** A labeler whose backfill is still producing frames at the end of the budget produces an `Advisory` backfill check and triggers a second connection for the live-tail check. A clean live-tail connection (any frames decoded, or connection held open with no decode errors) produces a `Pass` live-tail check. 61 + - **test-labeler.AC4.3 Success (empty labeler):** A labeler producing no frames at all during the budget produces an `Advisory` backfill check ("labeler has no published labels") and a `Skipped` live-tail check with the same reason. 62 + - **test-labeler.AC4.4 Failure:** A frame that fails the two-CBOR-block decode, or a `#labels` payload that fails lexicon decoding, produces a `SpecViolation` `CheckResult` independent of the backfill/live-tail pass/fail dimension. The diagnostic carries the offending frame bytes as `NamedSource`. 63 + - **test-labeler.AC4.5 Failure:** A WebSocket handshake that succeeds but whose first frame has `op: -1` (error) with a malformed `#info` payload produces a `SpecViolation` on decode. 64 + - **test-labeler.AC4.6 NetworkError:** An unreachable WebSocket endpoint or a handshake TLS failure produces a `NetworkError` rather than a subscription `Fail`. 65 + - **test-labeler.AC4.7 Edge:** Passing `--subscribe-timeout 0` (or another invalid duration) produces a clap parse error; values below a reasonable floor (e.g., 1s) are rejected with a helpful message. 66 + 67 + ### test-labeler.AC5: Crypto-layer checks (signature verification with key rotation) 68 + 69 + - **test-labeler.AC5.1 Success:** Every label in `HttpFacts::first_page` whose signature verifies against the current declared signing key produces a `Pass` on the crypto rollup with no historic-key fetch performed. 70 + - **test-labeler.AC5.2 Success (rotated-out key, did:plc):** When at least one label fails against the current key, the crypto stage fetches the PLC audit log and retries verification against each historic verification-method entry for the labeler's signing-key slot. Labels that verify against a historic key are accepted, the rollup is `Pass`, and an `Advisory` is attached listing the count and key ids involved. 71 + - **test-labeler.AC5.3 Success (empty labeler):** A labeler with zero published labels results in the crypto stage being `Skipped("labeler published no labels; nothing to verify")` and does not affect the exit code. 72 + - **test-labeler.AC5.4 Failure (current-key mismatch, no history):** A did:web labeler whose labels do not verify against the current key produces a `Fail` rollup with a diagnostic listing the current key id and stating that did:web provides no rotation history. 73 + - **test-labeler.AC5.5 Failure (current and historic mismatch, did:plc):** A did:plc labeler whose labels verify against neither the current nor any historic key produces a `Fail` rollup with a diagnostic listing every key id that was tried. 74 + - **test-labeler.AC5.6 Failure (canonicalization mismatch surfaced):** A label whose serialized-for-signing bytes cannot be produced (e.g., invalid CBOR in the fetched record) produces a per-label `Fail` with a distinct diagnostic code from signature mismatch, so misbehaviour of the canonicalizer is distinguishable from a genuine signature problem. 75 + - **test-labeler.AC5.7 NetworkError:** A failure to fetch the PLC audit log (`plc.directory` unreachable) during the historic-key path produces a `NetworkError` result and prevents the stage from issuing a false `Fail`; the labels that failed against the current key are reported as `Fail` only if history could not be consulted. 76 + 77 + ### test-labeler.AC6: Cross-cutting reporting and exit semantics 78 + 79 + - **test-labeler.AC6.1 Success:** A labeler that passes every stage produces a `LabelerReport` whose rendered output shows each stage with `[OK]` glyphs, a header listing the target and resolved DID/PDS, a summary with all-zero failure/advisory counts, and exits `0`. 80 + - **test-labeler.AC6.2 Success:** A run containing at least one `SpecViolation` exits `1`, and the summary footer shows the count broken down by severity. 81 + - **test-labeler.AC6.3 Success:** A run containing only `NetworkError` results (no `SpecViolation`s) exits `0`, with the network-error count called out separately in the summary. 82 + - **test-labeler.AC6.4 Success:** A run containing only `Advisory` results exits `0`. 83 + - **test-labeler.AC6.5 Success:** `Skipped` checks are rendered with a reason string taken from the stage's `Outcome::Skipped` variant, so users can see why a check was not run (missing DID, upstream failure, empty labeler, not-yet-implemented). 84 + - **test-labeler.AC6.6 Success:** Setting `NO_COLOR=1` suppresses ANSI color codes in the rendered output while keeping ASCII glyphs (`[OK]`/`[FAIL]`/`[SKIP]`/`[WARN]`) and miette diagnostic layout intact. 85 + - **test-labeler.AC6.7 Success:** `--verbose` raises the tracing filter, causing stage IO (HTTP requests, WebSocket frames, PLC log fetch) to be logged at DEBUG to stderr without affecting the rendered report or exit code. 86 + - **test-labeler.AC6.8 Edge:** The tool's own unrecoverable bootstrap failures (invalid CLI args, panics caught by the miette handler, tokio runtime failure) exit `2` — distinct from a `SpecViolation`-driven `1`. 87 + 88 + ## Glossary 89 + 90 + - **atproto (AT Protocol)**: The open, federated social-network protocol developed by Bluesky. It defines the identity, data, and communication standards that this tool validates against. 91 + - **labeler**: An atproto service that publishes moderation labels — structured judgments that clients use to filter or annotate content. A labeler is identified by its DID and exposes two XRPC endpoints for querying and streaming its labels. 92 + - **DID (Decentralized Identifier)**: A W3C standard for self-sovereign identity. In atproto, every account and service is identified by a DID; the DID resolves to a DID document that declares the entity's public keys and service endpoints. 93 + - **did:plc**: An atproto-specific DID method maintained by a centralized-but-auditable directory (`plc.directory`). Key rotations and endpoint changes are recorded as a signed operation log, which this tool consults to accept labels signed by rotated-out keys. 94 + - **did:web**: A DID method that derives identity from a domain's HTTPS server (e.g., `did:web:example.com` resolves via `https://example.com/.well-known/did.json`). Unlike did:plc, did:web provides no rotation history, so historic-key fallback is unavailable. 95 + - **PLC audit log**: The append-only sequence of signed operations at `https://plc.directory/{did}/log/audit` recording every key rotation and service change for a did:plc identity. The crypto stage walks this log newest-to-oldest to find historic signing keys. 96 + - **handle**: A human-readable atproto username (e.g., `alice.bsky.social`). Handles are resolved to DIDs via a DNS TXT record or an HTTPS `.well-known` path before any protocol checks can proceed. 97 + - **PDS (Personal Data Server)**: The server that hosts a user's or labeler's repository of records. The identity stage fetches the `app.bsky.labeler.service/self` record from the labeler's declared PDS endpoint. 98 + - **XRPC**: The HTTP-based RPC layer used by atproto. Procedure names like `com.atproto.label.queryLabels` are lexicon-defined endpoints served under the `/xrpc/` path prefix. 99 + - **`com.atproto.label.queryLabels`**: The XRPC HTTP endpoint that returns a paginated list of labels published by a labeler. The HTTP stage calls this endpoint to fetch labels and verify schema conformance and pagination behavior. 100 + - **`com.atproto.label.subscribeLabels`**: The XRPC WebSocket endpoint that streams label events as they are published, and replays historical labels when called with `cursor=0`. The subscription stage tests both the backfill replay and the live-tail behavior. 101 + - **`app.bsky.labeler.service/self`**: The well-known record in a labeler's PDS repository that declares the labeler's policy configuration, including the set of label values it may apply. Absence of this record or an empty `labelValues` list is a spec violation. 102 + - **`#atproto_labeler` service entry**: The entry in a DID document's `service` array with the fragment `#atproto_labeler` and type `AtprotoLabeler`, declaring the labeler's XRPC endpoint URL. The identity stage checks that this entry is present and well-formed. 103 + - **DRISL-CBOR**: The deterministic CBOR serialization required by atproto for data that is cryptographically signed. It mandates strict map key ordering, minimal integer encoding, and no floats or indefinite-length items. The crypto stage canonicalizes each label to this format before hashing and verifying its signature. 104 + - **multikey**: The atproto encoding for public keys in DID documents — a multibase-prefixed byte string that encodes both the curve identifier and the raw key bytes. This tool decodes multikeys to obtain `k256` (secp256k1) or `p256` (NIST P-256) verifying keys. 105 + - **miette**: A Rust diagnostics library that provides rich, human-readable error reporting with source code spans, labels, and help text. This tool uses it to attach the offending JSON or CBOR bytes to every check failure, so users see exactly which field caused a violation. 106 + - **`NamedSource` / `#[label]` spans**: The miette types that attach a named source document (e.g., a DID document's raw JSON) and highlight specific byte ranges within it in the rendered output. Each `CheckResult` failure uses these to point to the offending field. 107 + - **`insta` snapshot testing**: A Rust testing library that records the text output of a function on first run and then asserts it matches on subsequent runs. Used here to lock down rendered `LabelerReport` output so any change in diagnostic formatting is caught automatically. 108 + - **clap**: The standard Rust CLI argument-parsing library, used in derive mode here to generate the subcommand tree and argument validation declaratively. 109 + 110 + ## Architecture 111 + 112 + ### High-level shape 113 + 114 + `atproto-devtool` is a single-crate async Rust binary. The top-level CLI is a tree of clap-derived subcommands: the root `Command` enum has one initial variant `Test`, which itself is an enum whose initial variant `Labeler` holds the `test labeler` arguments. Adding a new feature is a two-line enum edit plus a new sibling module — the same pattern used by `zcash-devtool`. 115 + 116 + The first feature, `test labeler`, is a **layered pipeline** of four typed stages (identity → HTTP → subscription → crypto). Each stage is an async function whose signature consumes the outputs the stage actually needs, so the type system enforces the dependency graph: crypto verification cannot be called without a resolved signing key and a set of labels to verify. The pipeline driver runs every stage whose inputs are available — no fail-fast — so a single invocation produces a complete diagnostic picture. 117 + 118 + Each stage emits one or more `CheckResult`s tagged with a named check ID, a status (`Pass | Fail | Skipped | Advisory`), and — for non-`Pass` results — a `miette::Diagnostic` with source spans pointing into the JSON/CBOR document that caused the violation. The collected `LabelerReport` is rendered through `miette`'s `GraphicalReportHandler` at the end of the run. 119 + 120 + ### Crate layout 121 + 122 + Sibling-file module pattern throughout (no `mod.rs`): 123 + 124 + ```text 125 + atproto-devtool/ 126 + ├── Cargo.toml 127 + ├── src/ 128 + │ ├── main.rs # tokio bootstrap 129 + │ ├── cli.rs # root clap Parser + error reporting 130 + │ ├── commands.rs # top-level Command enum 131 + │ ├── commands/ 132 + │ │ ├── test.rs # Test subcommand enum { Labeler(..) } 133 + │ │ └── test/ 134 + │ │ ├── labeler.rs # clap Args + pipeline entry point 135 + │ │ └── labeler/ 136 + │ │ ├── pipeline.rs # drives the four stages, assembles Report 137 + │ │ ├── identity.rs # stage 1: DID doc + labeler record 138 + │ │ ├── http.rs # stage 2: queryLabels 139 + │ │ ├── subscription.rs # stage 3: subscribeLabels WS 140 + │ │ ├── crypto.rs # stage 4: signature verification 141 + │ │ └── report.rs # LabelerReport + miette rendering 142 + │ ├── common.rs # cross-feature primitives 143 + │ └── common/ 144 + │ ├── identity.rs # handle→DID, DID doc fetch, service lookup 145 + │ └── diagnostics.rs # miette helpers shared by all features 146 + ``` 147 + 148 + ### Pipeline contracts 149 + 150 + The pipeline operates on two input shapes that reflect the two invocation modes: 151 + 152 + ```rust 153 + pub enum LabelerTarget { 154 + Identified { identifier: AtIdentifier }, // handle or DID 155 + Endpoint { url: Url, did: Option<Did> }, // raw endpoint, optional DID 156 + } 157 + ``` 158 + 159 + Each stage returns an `Outcome<T>`: 160 + 161 + ```rust 162 + pub enum Outcome<T> { 163 + Pass(T), 164 + Fail(Diagnostic), 165 + Skipped(&'static str), 166 + } 167 + ``` 168 + 169 + Stage signatures: 170 + 171 + ```rust 172 + identity::run(&LabelerTarget, &IdentityOptions) 173 + -> Outcome<IdentityFacts>; 174 + 175 + http::run(&Url /* labeler endpoint */, &HttpOptions) 176 + -> Outcome<HttpFacts>; 177 + 178 + subscription::run(&Url, &SubscriptionOptions) 179 + -> Outcome<SubscriptionFacts>; 180 + 181 + crypto::run(&IdentityFacts, &[Label]) 182 + -> Outcome<CryptoFacts>; 183 + 184 + pub struct IdentityFacts { 185 + pub did: Did, 186 + pub did_doc: DidDocument, // carries source bytes for span-tagging 187 + pub labeler_endpoint: Url, 188 + pub pds_endpoint: Url, 189 + pub signing_key: VerifyingKey, // k256 or p256, parsed from multikey 190 + pub service_record: LabelerServiceRecord, 191 + } 192 + 193 + pub struct HttpFacts { 194 + pub first_page: Vec<Label>, // fed into crypto stage 195 + pub pagination_ok: bool, 196 + } 197 + 198 + pub struct SubscriptionFacts { 199 + pub backfill_outcome: BackfillOutcome, 200 + pub live_tail_outcome: Option<LiveTailOutcome>, 201 + pub decode_errors: Vec<FrameDecodeError>, 202 + } 203 + 204 + pub struct CryptoFacts { 205 + pub verified_with_current: usize, 206 + pub verified_with_historic: Vec<HistoricKeyHit>, // (key_id, label_count) 207 + pub unverified: usize, 208 + } 209 + ``` 210 + 211 + The `LabelerReport` aggregator exposes `record_<stage>` methods that collect per-stage `Vec<CheckResult>` values into the final rendered output. 212 + 213 + ### Error classification 214 + 215 + Every `CheckResult` carries one of three severities for exit-code purposes: 216 + 217 + - **`SpecViolation`** — the check ran successfully and the labeler does not conform. Non-zero exit. 218 + - **`NetworkError`** — the check could not run because transport/DNS/TLS failed. Reported but does not fail the run by itself (exit 0); they are called out separately in the summary. 219 + - **`Advisory`** — informational (e.g., empty labeler, rotated-out key used). Does not affect exit code. 220 + 221 + The top-level exit codes are: `0` clean, `1` at least one `SpecViolation`, `2` usage error or unrecoverable bootstrap failure. 222 + 223 + ### Subscription strategy 224 + 225 + The subscription stage runs a sequence of up to two WebSocket connections sharing the same time budget (`--subscribe-timeout`, default 5s each): 226 + 227 + 1. **Backfill check.** Connects with `cursor=0`. During the budget window: 228 + - If an idle gap (≥500ms without new frames) is observed after frames have been flowing, backfill is judged to have completed and the remaining time is implicitly the live tail. The live-tail check is marked `Pass` ("live tail observed after backfill completed") and the stage ends. 229 + - If the budget is exhausted while frames are still arriving, backfill is marked `Advisory` ("backfill exceeded budget — could not confirm full replay") and the live-tail check runs as a second connection. 230 + - If no frames arrive at all, the backfill check is marked `Advisory` ("labeler has no published labels") and the live-tail check is `Skipped` for the same reason. 231 + 2. **Live-tail check** (only if not already satisfied). Opens a fresh connection using the current high-water sequence number (live-only). Runs for the same timeout. Passes if the connection holds open cleanly and any frames that arrive decode correctly; zero frames in the window is acceptable. 232 + 233 + Both checks reuse the same two-CBOR-block frame parser. Decode errors always surface as their own `Fail` diagnostics regardless of which check captured them. 234 + 235 + ### Crypto strategy with key rotation 236 + 237 + Each label from `HttpFacts::first_page` is verified independently: 238 + 239 + 1. Canonicalize the label record to DRISL-CBOR (strict CBOR subset, `sig` field removed), SHA-256 the bytes. 240 + 2. Verify against the **currently declared** labeler signing key (`VerifyingKey` from `IdentityFacts`). If it verifies, the label is accepted. 241 + 3. If verification fails and the DID is `did:plc`, fetch the PLC operation log from `https://plc.directory/{did}/log/audit` **lazily** (only when at least one label fails against the current key). Walk operations newest-to-oldest, extract each historic `verificationMethod` entry for the labeler's signing-key slot, and retry verification. A label that verifies against a historic key is accepted with an `Advisory` attached to the rollup: "N label(s) signed by a rotated-out key — expected for labels older than the most recent rotation; confirm rotations were intentional." 242 + 4. If verification fails against the current key AND all historic keys (or history is unavailable — `did:web` or the PLC log fetch fails), the label is `Fail` with a diagnostic listing which keys were tried. 243 + 244 + Rollup: `CryptoFacts` is `Pass` iff every label verified against some key (current or historic); otherwise `Fail`. Empty-label labelers result in `Skipped("labeler published no labels; nothing to verify")` and do not fail the stage. 245 + 246 + ### Diagnostics and rendering 247 + 248 + Diagnostics use `miette`'s `Diagnostic` derive with `code`, `help`, and — where a source document exists — `source_code` (`NamedSource`) plus `#[label]` spans. Examples of source-carrying diagnostics: 249 + 250 + - `labeler::identity::signing_key_missing` highlights the `verificationMethod` array in the DID document where the labeler's signing key would live. 251 + - `labeler::identity::record_policies_empty` highlights the empty `policies` field in the re-serialized `app.bsky.labeler.service/self` record. 252 + - `labeler::crypto::signature_mismatch` highlights the `sig` field in the offending label record. 253 + 254 + `LabelerReport::render` prints, in order: a header (target, resolved DID and PDS if known, wall-clock duration), each stage as a section with ASCII status glyphs (`[OK]`, `[FAIL]`, `[SKIP]`, `[WARN]`), every non-`Pass` `CheckResult`'s full miette diagnostic via `GraphicalReportHandler` (respecting `NO_COLOR`/`CLICOLOR`), and a summary footer with counts per status and the separate network-error count. 255 + 256 + ### Testing surface 257 + 258 + Stages are written against narrow async traits for their IO surface (`DidResolver`, `HttpClient`, `WebSocketClient`, `PlcLogFetcher`). Unit tests supply fakes that replay recorded fixtures captured from real labelers (e.g., `mod.bsky.app`). Rendered `LabelerReport` output is locked down with `insta` snapshots for the golden path and one snapshot per named failure case, so any drift in diagnostics is caught automatically. 259 + 260 + ## Existing Patterns 261 + 262 + Investigation of the repository found only an empty binary crate (`src/main.rs` stub, single top-level `Cargo.toml`) with no prior conventions to follow. The design therefore introduces new patterns rather than extending existing code: 263 + 264 + - **Subcommand module shape** modeled directly on [`zcash-devtool`](https://github.com/zcash/zcash-devtool): a single binary crate, clap derive parsing, nested subcommand enums per feature area, and each command struct exposing an async `run()` method invoked from `main`. 265 + - **Sibling-file module layout** (`foo.rs` + `foo/` rather than `foo/mod.rs`) per the user's global Rust conventions. 266 + - **Typed-stage pipeline with accumulating report** is new to this codebase and becomes the default pattern for future "test" subcommands where multiple dependent checks need to be run against a single target. 267 + 268 + No existing `atproto-devtool`-style tool in the Rust ecosystem was found. The closest prior art is [`atproto-labeler-diagnostics`](https://github.com/FlippingBinary/atproto-labeler-diagnostics), a small TypeScript proof-of-concept whose check surface is substantially narrower than this design. 269 + 270 + ## Implementation Phases 271 + 272 + <!-- START_PHASE_1 --> 273 + ### Phase 1: Crate skeleton and CLI bootstrap 274 + 275 + **Goal:** Get the binary crate compiling with a clap-derived subcommand tree, working async runtime, miette-installed error handling, and a stub `test labeler` command that parses its arguments and prints a "not yet implemented" message. 276 + 277 + **Components:** 278 + - `Cargo.toml` — single-crate (no workspace) `[dependencies]` section with `clap` (derive), `tokio` (`rt`, `macros`, `time`, `net`), `miette` (`fancy`), `thiserror`, `tracing`, `tracing-subscriber`, `url`. 279 + - `src/main.rs` — `#[tokio::main(flavor = "current_thread")]` bootstrap that installs the miette panic handler and hands off to `cli::run`. 280 + - `src/cli.rs` — root `Cli` struct with global flags (`--verbose`, `--no-color`), `Command` dispatch, and `miette::Result` return type. 281 + - `src/commands.rs` + `src/commands/test.rs` + `src/commands/test/labeler.rs` — the nested subcommand enums and stub `LabelerCmd::run` that prints its parsed args. 282 + - `src/common.rs` + `src/common/diagnostics.rs` — shared miette helpers (report handler installation, `NamedSource` constructors from byte slices, JSON-pointer-to-span helpers). 283 + 284 + **Dependencies:** None (first phase). 285 + 286 + **Done when:** `cargo build` succeeds; `cargo run -- test labeler --help` prints the subcommand help; `cargo run -- test labeler example.bsky.social` prints a parsed-args placeholder message and exits 0. Infrastructure phase — operational verification only, no acceptance criteria tests. 287 + <!-- END_PHASE_1 --> 288 + 289 + <!-- START_PHASE_2 --> 290 + ### Phase 2: Shared identity primitives 291 + 292 + **Goal:** A reusable `common::identity` module covering handle resolution, DID-document fetch, service-entry lookup, and multikey parsing — the primitives that every future `test` subcommand will need. 293 + 294 + **Components:** 295 + - `src/common/identity.rs` with: 296 + - `resolve_handle(&str) -> Result<Did>` — DNS TXT lookup at `_atproto.<handle>` via `hickory-resolver`, HTTPS fallback to `https://<handle>/.well-known/atproto-did`. 297 + - `resolve_did(&Did) -> Result<DidDocument>` — `did:plc:*` via `https://plc.directory/{did}`, `did:web:*` via `https://<host>/.well-known/did.json` (including path-form targets). Returns both the parsed document and the raw bytes as a `NamedSource`. 298 + - `find_service(&DidDocument, id_fragment, service_type) -> Option<&Service>` — spec-correct matching. 299 + - `parse_multikey(&str) -> Result<VerifyingKey>` — multibase decode with curve-tag dispatch to `k256` or `p256`. 300 + - New dependencies: `reqwest` (`rustls-tls`, `json`), `hickory-resolver`, `multibase`, `k256` (`ecdsa`), `p256` (`ecdsa`), `serde`, `serde_json`. 301 + - Unit tests over recorded fixtures (real handles and DID documents from well-known Bluesky accounts and labelers) for each resolution path and each multikey curve. 302 + 303 + **Dependencies:** Phase 1. 304 + 305 + **Done when:** All identity resolution unit tests pass against recorded fixtures. Infrastructure/library phase — tests verify primitive correctness but do not yet cover acceptance criteria (those arrive once wired into the labeler pipeline in Phase 3). 306 + <!-- END_PHASE_2 --> 307 + 308 + <!-- START_PHASE_3 --> 309 + ### Phase 3: Identity stage and report scaffolding 310 + 311 + **Goal:** The first end-to-end labeler check path: run the pipeline driver with only the identity stage implemented, render a `LabelerReport` with real diagnostics. The tool becomes genuinely useful for debugging DID/record misconfigurations at the end of this phase. 312 + 313 + **Components:** 314 + - `src/commands/test/labeler/report.rs` — `LabelerReport`, `CheckResult`, `CheckStatus`, severity classification, ASCII glyph renderer, `render(&self, &mut impl ReportHandler)`. 315 + - `src/commands/test/labeler/pipeline.rs` — `LabelerTarget`, `LabelerOptions`, pipeline driver function (identity stage only for now; remaining stages return `Skipped("not yet implemented")`). 316 + - `src/commands/test/labeler/identity.rs` — the identity stage. Emits named `CheckResult`s for: DID resolves, DID doc carries `atproto_labeler` service entry of correct type, labeler endpoint URL is valid, signing key present and parseable, PDS entry present and reachable, `app.bsky.labeler.service/self` record fetchable, record schema valid, record `policies.labelValues` non-empty. 317 + - `src/commands/test/labeler.rs` — wire the clap args to the pipeline and render the report. 318 + - New dependencies: `atrium-api`, `atrium-xrpc-client` (for the `com.atproto.repo.getRecord` call against the PDS). 319 + - `insta` snapshot tests (dev-dependency) covering: golden-path labeler, missing DID, DID doc missing labeler service, DID doc missing signing key, PDS unreachable, missing/empty labeler record. 320 + 321 + **Dependencies:** Phase 2. 322 + 323 + **Acceptance criteria covered:** `test-labeler.AC1.*`, `test-labeler.AC2.*`. 324 + 325 + **Done when:** Every listed identity `CheckResult` is emitted correctly for its fixture, the `LabelerReport` renders as expected (snapshot-locked), and running the binary against a real healthy labeler exits 0 with an identity section of all `[OK]`s. 326 + <!-- END_PHASE_3 --> 327 + 328 + <!-- START_PHASE_4 --> 329 + ### Phase 4: HTTP stage (`queryLabels`) 330 + 331 + **Goal:** The HTTP stage hits `com.atproto.label.queryLabels` on the labeler endpoint, verifies schema conformance, exercises pagination once, and publishes its collected labels into `HttpFacts::first_page` for the crypto stage to consume. 332 + 333 + **Components:** 334 + - `src/commands/test/labeler/http.rs` — HTTP stage using the typed `atrium-api` client against the labeler endpoint. Emits `CheckResult`s for: endpoint reachable (TCP+TLS), `queryLabels` returns a well-formed lexicon response, optional `cursor` round-trip is honored (one second page requested with a cursor from the first), returned labels decode into typed records. 335 + - Pipeline driver update to thread `HttpFacts` into the accumulating report. 336 + - Dev fixtures: recorded `queryLabels` responses from a real labeler plus synthetic malformed responses for each failure case; tests run against a fake in-process XRPC server that replays them. 337 + 338 + **Dependencies:** Phase 3. 339 + 340 + **Acceptance criteria covered:** `test-labeler.AC3.*`. 341 + 342 + **Done when:** Running against a real labeler shows a populated HTTP section in the report with pass/fail classification matching fixtures, and snapshot tests are locked for each named failure case. 343 + <!-- END_PHASE_4 --> 344 + 345 + <!-- START_PHASE_5 --> 346 + ### Phase 5: Subscription stage (`subscribeLabels`) 347 + 348 + **Goal:** Add the two-check subscription stage — backfill check and conditional live-tail check — each time-bounded by `--subscribe-timeout` (default 5s). 349 + 350 + **Components:** 351 + - `src/commands/test/labeler/subscription.rs` with: 352 + - Two-CBOR-block frame parser for `subscribeLabels` frames (header + payload), using `ciborium` in strict mode. Validates header `op`/`t` fields, routes `#labels` / `#info` message types. 353 + - Backfill check: connects with `cursor=0`, consumes frames up to the budget, detects backfill completion via ≥500ms idle gap after frames started flowing, handles "no frames at all" as an `Advisory` distinct from the decode-error path. 354 + - Live-tail check: fresh connection with cursor set to current high-water sequence, runs the same budget, passes on clean hold or any correctly-decoded frame. Skipped when backfill already implied a live-tail observation. 355 + - `BackfillOutcome` / `LiveTailOutcome` enums feeding into `SubscriptionFacts`. 356 + - New dependencies: `tokio-tungstenite` (`rustls-tls-native-roots`), `ciborium`. 357 + - Dev fixtures: recorded frame sequences plus synthetic malformed frames; tests run against a fake WebSocket server. 358 + 359 + **Dependencies:** Phase 4. 360 + 361 + **Acceptance criteria covered:** `test-labeler.AC4.*`. 362 + 363 + **Done when:** Snapshot tests pass for each named subscription outcome (backfill-complete-within-budget, backfill-exceeds-budget, empty-stream, decode-error). Running against a real labeler shows a populated subscription section. 364 + <!-- END_PHASE_5 --> 365 + 366 + <!-- START_PHASE_6 --> 367 + ### Phase 6: Crypto stage and end-to-end polish 368 + 369 + **Goal:** The final stage — verify the signatures of labels collected by the HTTP stage, with did:plc key-history fallback for rotated-out keys. Ship docs and end-to-end tests for the full pipeline. 370 + 371 + **Components:** 372 + - `src/common/identity.rs` extended with a `plc_history` helper that lazily fetches `https://plc.directory/{did}/log/audit` and extracts historic verification methods by fragment. Tests over recorded audit logs. 373 + - `src/commands/test/labeler/crypto.rs`: 374 + - DRISL-CBOR canonicalizer wrapper over `ciborium` (strict map key ordering, minimal integer encoding, no floats). Test vectors generated from a real labeler's labels plus the reference spec. 375 + - SHA-256 prehash → `k256`/`p256` `verify_prehash` dispatch based on the `IdentityFacts::signing_key` variant. 376 + - Current-key-first verification; on any failure, lazy PLC history fetch (did:plc only) and retry against each historic key. 377 + - Empty-labeler path → `Skipped`, not `Fail`. Rotated-out-key hits → `Advisory` attached to the rollup. 378 + - End-to-end `insta` snapshot tests covering: all-pass labeler, identity-only failure, HTTP decode failure, subscription decode error, current-key signature failure that resolves against history, unresolvable signature failure (mixed-failure run) — each fixture runs the full pipeline. 379 + - `README.md` with install instructions, `--help` excerpt, and one example invocation per mode (handle, endpoint-only, endpoint + `--did`). 380 + - New dependency: `sha2`. 381 + 382 + **Dependencies:** Phase 5. 383 + 384 + **Acceptance criteria covered:** `test-labeler.AC5.*`, `test-labeler.AC6.*`. 385 + 386 + **Done when:** Full-pipeline snapshot tests pass, `cargo test` is green, running the binary against a real healthy labeler produces an all-`[OK]` report and exits 0, running against a fixture labeler with a rotated-out signing key produces an `Advisory` rollup and still exits 0, and running against a fixture labeler with an actual signature mismatch produces a `Fail` and exits 1. 387 + <!-- END_PHASE_6 --> 388 + 389 + ## Additional Considerations 390 + 391 + **Error classification discipline.** The distinction between `SpecViolation`, `NetworkError`, and `Advisory` is the single most important thing to get right for this tool's usefulness. A user debugging their own labeler needs network failures called out separately so they can re-run with connectivity fixed; a user checking another team's labeler needs advisories (empty labels, rotated keys) visibly separated from real violations. Every new check added to the suite must explicitly pick a severity — there is no default. 392 + 393 + **Canonicalization risk.** DRISL-CBOR label canonicalization is subtle (strict map key ordering, minimal integer encoding, no floats/indefinite-length items). Phase 6 should validate the canonicalizer against labels fetched from a known-good production labeler (e.g., `mod.bsky.app`) before declaring the crypto stage done, to avoid false-positive `Fail` results caused by canonicalization drift rather than real signature problems. 394 + 395 + **Future extensibility (not in scope for this design):** The layered-pipeline pattern is intentionally feature-local. If future `test` subcommands (`test pds`, `test feed-generator`) share checks with `test labeler` — most plausibly DID/PDS resolution — they will extract into `common::identity` rather than growing a general-purpose check registry. A shared registry is explicitly YAGNI territory until a second feature demands it. 396 + 397 + **Fixtures as a versioned asset.** Phase 3 onwards depends on recorded fixtures (DID docs, repo records, queryLabels responses, WebSocket frames, PLC audit logs). Store these under `tests/fixtures/<stage>/<case>.{json,cbor,bin}` with a short README per fixture describing its source and the case it exercises, so regressions can be triaged by reading one file rather than re-running network probes. 398 +