CLI app for developers prototyping atproto functionality
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: add test plan for labeler report stage

Human verification plan generated after Phase 8 completion. All 38
acceptance criteria (AC1.1–AC8.4) are automated; this plan covers
quality-of-life visual checks, real-labeler end-to-end verification,
and the release-gate pollution-avoidance URL replacement.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

authored by

Jack Grigg
Claude Opus 4.7
and committed by
Tangled
77325dd0 a01c9860

+171
+171
docs/test-plans/2026-04-17-labeler-report-stage.md
··· 1 + # Human Test Plan — Labeler Report Stage 2 + 3 + Generated after Phase 8 completion of the labeler report stage implementation. All 38 acceptance criteria (AC1.1 through AC8.4) are covered by automated tests; this plan is confirmatory and quality-of-life focused. Items that benefit from visual inspection or live-system verification are marked explicitly. 4 + 5 + ## Prerequisites 6 + 7 + - Rust toolchain pinned by `rust-toolchain.toml` (MSRV). 8 + - `cargo build --release` completes cleanly. 9 + - `cargo test` passing locally. 10 + - `cargo clippy -- -D warnings` passing. 11 + - Access to a real atproto labeler for end-to-end verification — ideally one you control so you can observe and prune test reports from the moderation queue. 12 + - Optional: a Bluesky account with a generated app-password for exercising the PDS-mediated modes. 13 + - Environment: a terminal emulator that honours ANSI escapes (for colour verification) and one that respects `NO_COLOR`. 14 + 15 + ## Phase 1: Smoke — CLI surface and help text 16 + 17 + | Step | Action | Expected | 18 + |---|---|---| 19 + | 1.1 | Run `cargo run --release -- test labeler --help`. | Exit code 0. `--help` output lists: `--did`, `--subscribe-timeout`, `--verbose`, `--no-color`, `--commit-report`, `--force-self-mint`, `--self-mint-curve` (with default `es256k`), `--report-subject-did`, `--handle`, `--app-password`, and a `<TARGET>` positional. | 20 + | 1.2 | Run `cargo run --release -- test labeler`. | Exit code 2 with a clap parse error mentioning the missing `<TARGET>`. | 21 + | 1.3 | Run `cargo run --release -- test labeler mod.bsky.app --handle alice.bsky.social`. | Exit code non-zero with a clap-style error mentioning `--app-password` (or `app_password`). | 22 + | 1.4 | Run `cargo run --release -- test labeler mod.bsky.app --app-password xxxx-xxxx-xxxx-xxxx`. | Exit code non-zero with a clap-style error mentioning `--handle`. | 23 + | 1.5 | Run `cargo run --release -- test labeler --no-color did:web:nonexistent.invalid 2>/dev/null`. | Stdout contains a rendered report with `== Identity ==` section and at least one `[FAIL]` or `[NET]` glyph. Stdout contains no ANSI escape bytes (`\x1b[`). | 24 + | 1.6 | Run `cargo run --release -- test labeler --verbose --no-color did:web:nonexistent.invalid` and inspect stderr. | Stderr contains `DEBUG`-prefixed lines from `tracing`. | 25 + 26 + ## Phase 2: Read-only conformance run against a real labeler 27 + 28 + Target a real labeler whose behaviour you trust (e.g. `mod.bsky.app` or a community labeler you operate). Do not pass `--commit-report` in this phase. 29 + 30 + | Step | Action | Expected | 31 + |---|---|---| 32 + | 2.1 | `cargo run --release -- test labeler <real-labeler-handle-or-DID> --no-color`. | Exit code 0 if the labeler is conformant. The rendered report has five stage sections in order: Identity, HTTP, Subscription, Crypto, Report. The Report section contains exactly 10 rows, whose IDs appear in this exact order: `report::contract_published`, `report::unauthenticated_rejected`, `report::malformed_bearer_rejected`, `report::wrong_aud_rejected`, `report::wrong_lxm_rejected`, `report::expired_rejected`, `report::rejected_shape_returns_400`, `report::self_mint_accepted`, `report::pds_service_auth_accepted`, `report::pds_proxied_accepted`. | 33 + | 2.2 | In the same output, confirm the gating behaviour. | `report::contract_published` is `[OK]` iff the labeler advertises both `reasonTypes` and `subjectTypes`. `report::unauthenticated_rejected` and `report::malformed_bearer_rejected` should be `[OK]` for a conformant labeler (they do unauthenticated POSTs but don't write). `report::self_mint_accepted`, `report::pds_service_auth_accepted`, and `report::pds_proxied_accepted` should be `[SKIP]` with a reason mentioning `--commit-report`. | 34 + | 2.3 | Confirm that the run emits no side-effect traffic that reached the labeler's moderation queue. Check the labeler's moderation inbox; there should be no new entries from this run. | No reports present. | 35 + | 2.4 | Run with no `--no-color` against a colour-capable terminal. | Glyphs are coloured (green `[OK]`, red `[FAIL]`, yellow `[WARN]`, blue `[SKIP]`, magenta `[NET]`). Setting `NO_COLOR=1` in the env reproduces the `--no-color` layout. | 36 + 37 + ## Phase 3: Self-mint against a locally-reachable labeler 38 + 39 + If you have (or can set up) a labeler reachable on 127.0.0.1 / localhost, the self-mint checks become real end-to-end tests. Otherwise, simulate by pointing `--target` at an SSH tunnel or local dev deployment. 40 + 41 + | Step | Action | Expected | 42 + |---|---|---| 43 + | 3.1 | `cargo run --release -- test labeler http://localhost:8080 --did did:plc:<yours> --commit-report --no-color`. | Report stage rows 3..=6 (`wrong_aud_rejected`, `wrong_lxm_rejected`, `expired_rejected`, `rejected_shape_returns_400`) run real self-minted JWT attempts. A conformant labeler returns `[OK]` for all four. Row 7 (`report::self_mint_accepted`) is `[OK]` and produces exactly one new entry in the labeler's moderation queue. | 44 + | 3.2 | Inspect the moderation queue entry created by step 3.1. | The `reason` string starts with `atproto-devtool conformance test` and ends with the 16-hex-char run id. The `reasonType` is the first `reasonTypes` advertised by the labeler (expected `com.atproto.moderation.defs#reasonSpam`). The `subject` is an account-shape `com.atproto.admin.defs#repoRef` pointing at the ephemeral did:web DID (`did:web:127.0.0.1%3A<port>`). | 45 + | 3.3 | Run the same command with `--self-mint-curve es256` added. | Self-mint rows still pass. The labeler-visible DID document (if you can observe the fetch) carries an ES256 multikey. | 46 + | 3.4 | Run the same command with `--report-subject-did did:plc:somewellknownaccount`. | Row 7 still `[OK]`. The moderation queue entry's `subject.did` is the override DID, regardless of what the labeler advertises in `subjectTypes`. | 47 + 48 + ## Phase 4: Self-mint against a remote labeler with `--force-self-mint` 49 + 50 + | Step | Action | Expected | 51 + |---|---|---| 52 + | 4.1 | Run without the flag: `cargo run --release -- test labeler mod.bsky.app --commit-report --no-color`. | Report stage rows 3..=7 are `[SKIP]` with a reason mentioning `--force-self-mint`. | 53 + | 4.2 | Re-run with `--force-self-mint`. | The tool attempts to publish the self-mint DID doc on a local port and the remote labeler cannot resolve it. Expected outcome: rows 3..=7 will likely be `[FAIL]` or `[NET]` depending on how the labeler handles unresolvable issuer DIDs. This is the expected behaviour; `--force-self-mint` is only useful when the labeler has out-of-band access to the devtool's loopback port (e.g. for dev rigs that punch through NAT). Confirm the error rendering is coherent — every failing row carries a diagnostic with a stable code. | 54 + 55 + ## Phase 5: PDS-mediated modes 56 + 57 + Use a burner or test Bluesky account. Generate an app password at <https://bsky.app/settings/app-passwords>. 58 + 59 + | Step | Action | Expected | 60 + |---|---|---| 61 + | 5.1 | `cargo run --release -- test labeler <conformant-labeler> --handle <your-handle>.bsky.social --app-password xxxx-xxxx-xxxx-xxxx --commit-report --no-color`. | Exit code 0 on a conformant labeler. Rows 8 and 9 (`pds_service_auth_accepted`, `pds_proxied_accepted`) are `[OK]`. Exactly two additional moderation queue entries appear on the labeler side, both carrying the sentinel reason and the run-id. | 62 + | 5.2 | Check the authorship of the two extra reports. | Both should appear to come from your real DID (the PDS-backed identity), not from a did:web loopback DID. | 63 + | 5.3 | Use an intentionally wrong app password. | `createSession` fails. Rows 8 and 9 are `[NET]` with a transport or auth diagnostic. Exit code 2. | 64 + | 5.4 | Point `--handle` at a non-existent PDS-resident handle. | `createSession` still fails in a recognisable way; the failure is not swallowed silently. | 65 + 66 + ## Phase 6: Exit-code matrix 67 + 68 + | Step | Action | Expected | 69 + |---|---|---| 70 + | 6.1 | Run against a conformant labeler with no flags. | Exit code 0. | 71 + | 6.2 | Run against an unreachable endpoint: `cargo run --release -- test labeler https://doesnt-exist.example.test`. | Exit code 2 (NetworkError). | 72 + | 6.3 | Run against a deliberately non-conformant labeler (e.g. one that returns 200 OK for unauthenticated createReport) with `--commit-report`. | Exit code 1 (SpecViolation precedence). Even if some rows are also `[NET]`, the exit code is 1. | 73 + 74 + ## Phase 7: Misconfigured-labeler rendering 75 + 76 + | Step | Action | Expected | 77 + |---|---|---| 78 + | 7.1 | Diff `cargo run --release -- test labeler <labeler>` output against the accepted `tests/snapshots/labeler_report__report_all_fail_misconfigured_labeler_snapshot.snap` if you have a rigged test target available. | Every `report::*` row except `report::contract_published` is a `[FAIL]` or `[WARN]` with a diagnostic. All 10 stable diagnostic codes listed below appear at least once across the rendered rows. | 79 + 80 + Stable diagnostic codes worth eyeballing for accuracy in rendered output: 81 + 82 + - `labeler::report::contract_missing` 83 + - `labeler::report::unauthenticated_accepted` 84 + - `labeler::report::malformed_bearer_accepted` 85 + - `labeler::report::wrong_aud_accepted` 86 + - `labeler::report::wrong_lxm_accepted` 87 + - `labeler::report::expired_accepted` 88 + - `labeler::report::shape_not_400` 89 + - `labeler::report::self_mint_rejected` 90 + - `labeler::report::pds_service_auth_rejected` 91 + - `labeler::report::pds_proxied_rejected` 92 + - `labeler::report::transport_error` 93 + 94 + ## End-to-End: "Day one" conformance run 95 + 96 + Purpose: validate a labeler operator's likely first interaction with the tool. 97 + 98 + Steps: 99 + 100 + 1. Operator generates or uses a did:web labeler on their own domain. 101 + 2. Operator runs `atproto-devtool test labeler <their-labeler-URL>` (no flags). 102 + 3. Verify: all read-only stages pass; Report stage shows 10 rows with write-side rows `[SKIP]`. Exit code 0. 103 + 4. Operator re-runs with `--commit-report --handle ... --app-password ...`. 104 + 5. Verify: Report stage now exercises every row including PDS-mediated ones. Exit code 0. Exactly 3 new entries in the operator's moderation queue, each with the sentinel reason and the same run-id. 105 + 6. Operator searches for "atproto-devtool conformance test" in the moderation queue and can find all three in one filter. 106 + 107 + ## End-to-End: Sentinel grep-ability 108 + 109 + Purpose: confirm the sentinel string is actually useful for operators. 110 + 111 + Steps: 112 + 113 + 1. Run a `--commit-report` conformance run against a test labeler. 114 + 2. Note the run-id printed near the top of the rendered report (or scrape it from the final row's diagnostics). 115 + 3. Filter the moderation queue by reason string containing the run-id. 116 + 4. Verify the resulting set is exactly the reports produced by this one run — no cross-contamination with reports from other runs. 117 + 118 + ## Human Verification Required 119 + 120 + Per `test-requirements.md`, there are zero required human-verification items — every AC is covered by automated tests. The items below specifically benefit from a human eye. 121 + 122 + | Item | Why Manual | Steps | 123 + |---|---|---| 124 + | Rendered colour output | ANSI-sequence presence is unit-tested but colour appropriateness (green for OK, red for FAIL, etc.) benefits from visual check. | Phase 2.4. | 125 + | Moderation queue cleanliness after a `--commit-report` run | The sentinel string's format is unit-tested, but its actual grep-ability in a real moderation UI is an ergonomics question. | Phase 5.2 and End-to-End "Sentinel grep-ability". | 126 + | Exit-code + stderr diagnostic layout when a remote labeler is conformance-broken | Specific rendered-text ergonomics (e.g. miette span placement) benefit from visual inspection even though the codes and glyphs are snapshot-pinned. | Phase 7.1. | 127 + | Sentinel `CONFORMANCE_REPORT_SUBJECT_URI` placeholder value | The requirements doc notes this is a pre-release content task, not a coverage gap. Reviewer should visually confirm the placeholder has been replaced with a real, stable URI before the release that ships the report stage. | Pre-release checklist: grep `CONFORMANCE_REPORT_SUBJECT_URI` in `src/commands/test/labeler/create_report/pollution.rs` and verify the resolved URI is a stable public record the operator community can refer to. | 128 + 129 + ## Traceability 130 + 131 + | Acceptance Criterion | Automated Test | Manual Step | 132 + |---|---|---| 133 + | AC1.1 | `ac1_1_contract_present_emits_pass` + snapshot | Phase 2.1 | 134 + | AC1.2 | `ac1_2_contract_missing_without_commit_skips_stage` + snapshot | Phase 2.2 | 135 + | AC1.3 | `ac1_3_contract_missing_with_commit_is_spec_violation` + snapshot | Phase 7.1 | 136 + | AC1.4 | `ac1_4_empty_arrays_equivalent_to_absent` | Phase 2.2 | 137 + | AC2.1 | `ac2_1_unauthenticated_401_with_envelope_passes` | Phase 2.1 | 138 + | AC2.2 | `ac2_2_unauthenticated_200_is_spec_violation` | Phase 7.1 | 139 + | AC2.3 | `ac2_1` (co-assert) | Phase 2.1 | 140 + | AC2.4 | `ac2_4_malformed_bearer_200_is_spec_violation` | Phase 7.1 | 141 + | AC2.5 | `ac2_5_401_without_envelope_still_passes` | Phase 2.1 | 142 + | AC3.1 | `ac3_1_wrong_aud_401_passes` | Phase 3.1 | 143 + | AC3.2 | `ac3_2_wrong_aud_200_is_spec_violation` | Phase 7.1 | 144 + | AC3.3 | `ac3_3_wrong_lxm_401_passes` + `ac3_4_wrong_lxm_200_is_spec_violation` | Phase 3.1 / 7.1 | 145 + | AC3.4 | `ac3_5_expired_401_passes` + misconfigured snapshot | Phase 3.1 / 7.1 | 146 + | AC3.5 | `ac3_1` (co-assert) | Phase 3.1 | 147 + | AC3.6 | `ac3_6_shape_not_400_emits_advisory` | Phase 7.1 | 148 + | AC3.7 | `ac3_7_non_local_labeler_skips_self_mint_checks` | Phase 4.1 | 149 + | AC3.8 | `ac3_8_force_self_mint_overrides_non_local` | Phase 4.2 | 150 + | AC4.1 | `ac4_1_local_labeler_accepts_with_lex_first_reason_and_account_subject` | Phase 3.1, 3.2 | 151 + | AC4.2 | `ac4_2_non_local_labeler_prefers_other_and_record` | Phase 4.2 | 152 + | AC4.3 | `ac4_3_non_2xx_is_spec_violation` | Phase 7.1 | 153 + | AC4.4 | `ac4_4_commit_false_skips` | Phase 2.2 | 154 + | AC4.5 | `ac4_5_non_viable_skip_matches_phase_6_reason` | Phase 4.1 | 155 + | AC4.6 | `ac4_1` (co-assert) | Phase 3.2 + Sentinel grep-ability | 156 + | AC5.1 | `ac5_1_full_flow_passes` | Phase 5.1, 5.2 | 157 + | AC5.2 | `ac5_2_labeler_rejects_service_auth_jwt` | Phase 7.1 | 158 + | AC5.3 | `ac5_3_pds_unreachable` | Phase 5.3 | 159 + | AC5.4 | `ac5_4_missing_creds_or_commit_skips` | Phase 2.2 | 160 + | AC6.1 | `ac6_1_proxied_pass` | Phase 5.1 | 161 + | AC6.2 | `ac6_2_labeler_side_rejection_via_proxy` | Phase 7.1 | 162 + | AC6.3 | `ac6_3_pds_rejects_proxy` | Phase 5.4 | 163 + | AC6.4 | `ac6_4_missing_creds_or_commit_skips` | Phase 2.2 | 164 + | AC7.1 | Every AC1–AC6 test + `ac7_1_row_count_is_always_10` + three end-to-end snapshots | Phase 2.1 | 165 + | AC7.2 | `ac7_2_row_order_is_stable` + every snapshot | Phase 2.1 | 166 + | AC8.1 | `ac8_1_handle_without_app_password_fails` + `ac8_1_app_password_without_handle_fails` | Phase 1.3, 1.4 | 167 + | AC8.2 (unit) | `self_mint_signer_es256k_round_trips` + `self_mint_signer_es256_round_trips` | Phase 3.3 | 168 + | AC8.2 (help) | `help_lists_all_flags` | Phase 1.1 | 169 + | AC8.3 | `ac8_3_report_subject_did_overrides_subject` | Phase 3.4 | 170 + | AC8.4 (unit) | `exit_code_*` unit tests in `report.rs` | Phase 6.1–6.3 | 171 + | AC8.4 (smoke) | `ac8_4_unreachable_endpoint_nonzero_exit` | Phase 6.2 |