···318318 https://pds.example.com/metrics
319319```
320320321321-See `docs/METRICS.md` for the metric surface.
321321+Checked-in Prometheus and Grafana examples live under:
322322+323323+- [ops/prometheus/perlsky.yml](../ops/prometheus/perlsky.yml)
324324+- [ops/grafana/prometheus-datasource.yml](../ops/grafana/prometheus-datasource.yml)
325325+- [ops/grafana/perlsky-dashboard-provider.yml](../ops/grafana/perlsky-dashboard-provider.yml)
326326+- [ops/grafana/perlsky-dashboard.json](../ops/grafana/perlsky-dashboard.json)
327327+328328+See [METRICS.md](./METRICS.md) for the metric surface and dashboard notes.
329329+330330+## Prometheus
331331+332332+Merge [ops/prometheus/perlsky.yml](../ops/prometheus/perlsky.yml) into your Prometheus config and replace the placeholder bearer token with `metrics_token` from `/etc/perlsky/perlsky.json`.
333333+334334+One minimal local scrape job looks like:
335335+336336+```yaml
337337+- job_name: perlsky
338338+ scrape_interval: 15s
339339+ scrape_timeout: 5s
340340+ metrics_path: /metrics
341341+ scheme: http
342342+ authorization:
343343+ credentials: REPLACE_WITH_PERLSKY_METRICS_TOKEN
344344+ static_configs:
345345+ - targets: ['127.0.0.1:7755']
346346+ labels:
347347+ service: perlsky
348348+```
349349+350350+Validate and reload:
351351+352352+```sh
353353+promtool check config /etc/prometheus/prometheus.yml
354354+systemctl reload prometheus || systemctl restart prometheus
355355+curl -fsS 'http://127.0.0.1:9090/api/v1/query?query=up%7Bjob%3D%22perlsky%22%7D'
356356+```
357357+358358+## Grafana
359359+360360+Provision the Prometheus data source and dashboard provider with the checked-in examples, then copy the dashboard JSON into the watched directory:
361361+362362+```sh
363363+install -d /etc/grafana/provisioning/datasources
364364+install -d /etc/grafana/provisioning/dashboards
365365+install -d /var/lib/grafana/dashboards
366366+cp /opt/perlsky/app/ops/grafana/prometheus-datasource.yml /etc/grafana/provisioning/datasources/perlsky-prometheus.yml
367367+cp /opt/perlsky/app/ops/grafana/perlsky-dashboard-provider.yml /etc/grafana/provisioning/dashboards/perlsky.yml
368368+cp /opt/perlsky/app/ops/grafana/perlsky-dashboard.json /var/lib/grafana/dashboards/perlsky-overview.json
369369+systemctl restart grafana-server || systemctl restart grafana
370370+```
371371+372372+The example data source uses the stable UID `prometheus`. Keep that UID or update the dashboard file to match your local Prometheus data source UID.
322373323374## Upgrades
324375
+35
docs/METRICS.md
···3838 Counts instrumented SQLite-backed store operations by operation and status.
3939- `perlsky_store_operation_duration_seconds`
4040 Histogram of instrumented store operation duration.
4141+- `perlsky_service_proxy_requests_total`
4242+ Counts local and upstream `app.bsky.*` proxy requests by NSID, source, and status.
4343+- `perlsky_service_proxy_request_duration_seconds`
4444+ Histogram for service-proxy request latency with the same labels.
4545+- `perlsky_service_proxy_local_post_index_cache_access_total`
4646+ Counts request-local hits, process-cache hits, and rebuilds for the local post index.
4747+- `perlsky_service_proxy_local_post_index_rebuild_duration_seconds`
4848+ Histogram of local post-index rebuild time.
4949+- `perlsky_service_proxy_local_post_index_entries`
5050+ Gauge of local post-index entry counts by kind.
5151+- `perlsky_service_proxy_local_post_resolution_total`
5252+ Counts how local post lookups were resolved: request cache, shared index, store, or non-local bypass.
5353+- `perlsky_service_proxy_profile_record_cache_total`
5454+ Counts local profile record cache hits and misses.
5555+- `perlsky_repo_resolution_total`
5656+ Counts repo/DID resolution paths, including request-cache reuse versus fallback scans.
4157- `perlsky_build_info`
4258 Static build/service info gauge.
4359···6379- crawler errors from `perlsky_crawler_requests_total{result="error"}`
6480- large ingress with low egress or vice versa on blob byte counters
6581- persistent growth in store latency histograms
8282+- sustained `result="rebuild"` growth in `perlsky_service_proxy_local_post_index_cache_access_total`
8383+- high `p95` in `perlsky_service_proxy_local_post_index_rebuild_duration_seconds`
8484+- unexpected growth in `source="list_scan"` for `perlsky_repo_resolution_total`
8585+8686+## Prometheus
8787+8888+The repo includes a checked-in example scrape job at [ops/prometheus/perlsky.yml](../ops/prometheus/perlsky.yml).
8989+9090+On the live VPS we scrape every `15s` instead of more aggressively to avoid adding pressure while Prometheus is already remote-writing to Grafana Cloud.
9191+9292+## Grafana
9393+9494+The repo includes:
9595+9696+- [ops/grafana/perlsky-dashboard.json](../ops/grafana/perlsky-dashboard.json): overview dashboard for XRPC, service-proxy, store, subscription, and blob metrics
9797+- [ops/grafana/prometheus-datasource.yml](../ops/grafana/prometheus-datasource.yml): example provisioned Prometheus data source
9898+- [ops/grafana/perlsky-dashboard-provider.yml](../ops/grafana/perlsky-dashboard-provider.yml): example dashboard provider that watches a dashboard directory
9999+100100+The dashboard expects a Prometheus data source. When provisioning, either keep the checked-in `uid` from the example data source or update the dashboard's `${DS_PROMETHEUS}` mapping during import.
6610167102## Example Scrape
68103
+69
docs/PERFORMANCE.md
···11+# Performance
22+33+This document covers two practical tools for `perlsky` performance work:
44+55+- Prometheus metrics for live visibility
66+- `script/benchmark-local-appview` for repeatable local endpoint timing
77+88+## Benchmark Script
99+1010+`script/benchmark-local-appview` spins up an ephemeral local `perlsky` daemon, seeds a small repo with posts and a reply chain, then benchmarks the hottest local appview endpoints over real HTTP:
1111+1212+- `app.bsky.actor.getProfile`
1313+- `app.bsky.feed.getAuthorFeed`
1414+- `app.bsky.feed.getPostThread`
1515+1616+Example:
1717+1818+```sh
1919+script/benchmark-local-appview --iterations 75 --warmup 15 --posts 100 --replies 12
2020+```
2121+2222+JSON output:
2323+2424+```sh
2525+script/benchmark-local-appview --format json > data/local-appview-benchmark.json
2626+```
2727+2828+The benchmark is intentionally small and deterministic. It is best for comparing one local appview change against another, not for claiming cluster-scale throughput.
2929+3030+Useful flags:
3131+3232+- `--iterations N`: measured requests per endpoint, default `50`
3333+- `--warmup N`: unmeasured warmup requests per endpoint, default `10`
3434+- `--posts N`: number of posts to seed for the author feed, default `50`
3535+- `--replies N`: length of the reply chain for the thread benchmark, default `8`
3636+- `--feed-limit N`: `getAuthorFeed` page size, default `25`
3737+- `--keep-tmp`: keep the temporary benchmark dataset on disk for inspection
3838+3939+The script also prints a metrics excerpt so it is easy to sanity-check whether local appview cache counters moved during the run.
4040+4141+## Recommended Workflow
4242+4343+For repeatable tuning:
4444+4545+1. Run the benchmark script before a perf change and save the output.
4646+2. Apply the change.
4747+3. Run the same benchmark again with the same arguments.
4848+4. Compare:
4949+ - `p50`
5050+ - `p95`
5151+ - `max`
5252+ - derived `req/s`
5353+5. Check `/metrics` to confirm that cache hit/rebuild counters changed the way you expected.
5454+6. If the change looks promising, compare the same Prometheus panels in Grafana after deployment.
5555+5656+## Current Hot Metrics
5757+5858+The most relevant local appview metrics are:
5959+6060+- `perlsky_service_proxy_requests_total`
6161+- `perlsky_service_proxy_request_duration_seconds`
6262+- `perlsky_service_proxy_local_post_index_cache_access_total`
6363+- `perlsky_service_proxy_local_post_index_rebuild_duration_seconds`
6464+- `perlsky_service_proxy_local_post_index_entries`
6565+- `perlsky_service_proxy_local_post_resolution_total`
6666+- `perlsky_service_proxy_profile_record_cache_total`
6767+- `perlsky_repo_resolution_total`
6868+6969+See [METRICS.md](./METRICS.md) for Prometheus and Grafana queries.