perlsky is a Perl 5 implementation of an AT Protocol Personal Data Server.
13
fork

Configure Feed

Select the types of activity you want to include in your feed.

Metrics#

perlsky now exposes Prometheus-style metrics at /metrics.

Security#

  • If metrics_token is configured, the endpoint requires Authorization: Bearer <token>.
  • If metrics_token is omitted, the endpoint is public.
  • For internet-facing deployments, prefer setting metrics_token and/or restricting /metrics at the reverse proxy layer.

Main Metrics#

  • perlsky_xrpc_requests_total Counts HTTP XRPC requests by method, NSID, endpoint type, and status.
  • perlsky_xrpc_request_duration_seconds Histogram for HTTP XRPC latency with the same labels.
  • perlsky_xrpc_errors_total Counts rendered XRPC failures by method, NSID, endpoint type, status, and error code.
  • perlsky_xrpc_unhandled_exceptions_total Counts true unhandled exceptions on XRPC routes by method, NSID, and endpoint type.
  • perlsky_subscription_connections_total Counts websocket subscription opens by NSID.
  • perlsky_subscription_active Gauge of active websocket subscriptions by NSID.
  • perlsky_subscription_closes_total Counts websocket closes by NSID and close code.
  • perlsky_subscription_frames_total Counts emitted websocket frames by NSID, frame type, and encoding.
  • perlsky_subscription_bytes_total Counts emitted websocket bytes by NSID and encoding.
  • perlsky_subscription_duration_seconds Histogram of websocket lifetime by NSID.
  • perlsky_crawler_requests_total Counts outbound com.atproto.sync.requestCrawl calls by crawler service and result.
  • perlsky_crawler_request_duration_seconds Histogram of outbound crawler request latency.
  • perlsky_blob_ingress_bytes_total Counts uploaded blob bytes by MIME type.
  • perlsky_blob_egress_bytes_total Counts downloaded blob bytes by MIME type.
  • perlsky_store_operations_total Counts instrumented SQLite-backed store operations by operation and status.
  • perlsky_store_operation_duration_seconds Histogram of instrumented store operation duration.
  • perlsky_service_proxy_requests_total Counts local and upstream app.bsky.* proxy requests by NSID, source, and status.
  • perlsky_service_proxy_request_duration_seconds Histogram for service-proxy request latency with the same labels.
  • perlsky_service_proxy_local_post_index_cache_access_total Counts request-local hits, process-cache hits, and rebuilds for the local post index.
  • perlsky_service_proxy_local_post_index_rebuild_duration_seconds Histogram of local post-index rebuild time.
  • perlsky_service_proxy_local_post_index_entries Gauge of local post-index entry counts by kind.
  • perlsky_service_proxy_local_post_resolution_total Counts how local post lookups were resolved: request cache, shared index, store, or non-local bypass.
  • perlsky_service_proxy_profile_record_cache_total Counts local profile record cache hits and misses.
  • perlsky_repo_resolution_total Counts repo/DID resolution paths, including request-cache reuse versus fallback scans.
  • perlsky_build_info Static build/service info gauge.

Current Store Coverage#

The store metrics currently cover the highest-signal operations on the live path:

  • transactions
  • event append and event stream reads
  • event high-watermark reads
  • blob put/get
  • label put/list
  • record list
  • repo CAR export

This is enough to understand the hot PDS paths under load without trying to wrap every SQLite call in the codebase.

Suggested Alerts#

  • high error rate on perlsky_xrpc_requests_total
  • spikes in perlsky_xrpc_errors_total for a specific nsid or error
  • any growth in perlsky_xrpc_unhandled_exceptions_total
  • sustained increase in perlsky_xrpc_request_duration_seconds
  • non-zero perlsky_subscription_active with no corresponding frame growth
  • crawler errors from perlsky_crawler_requests_total{result="error"}
  • large ingress with low egress or vice versa on blob byte counters
  • persistent growth in store latency histograms
  • sustained result="rebuild" growth in perlsky_service_proxy_local_post_index_cache_access_total
  • high p95 in perlsky_service_proxy_local_post_index_rebuild_duration_seconds
  • unexpected growth in source="list_scan" for perlsky_repo_resolution_total

Prometheus#

The repo includes a checked-in example scrape job at ops/prometheus/perlsky.yml.

On the live VPS we scrape every 15s instead of more aggressively to avoid adding pressure while Prometheus is already remote-writing to Grafana Cloud.

Grafana#

The repo includes:

The dashboard expects a Prometheus data source. When provisioning, either keep the checked-in uid from the example data source or update the dashboard's ${DS_PROMETHEUS} mapping during import.

Sentry#

Prometheus is still the main place to watch rates and latency. If you also configure sentry_dsn, perlsky will report unhandled XRPC exceptions to Sentry with request metadata and Perl stack frames. That works well as a complement to:

  • perlsky_xrpc_errors_total for handled request failures
  • perlsky_xrpc_unhandled_exceptions_total for internal 500-class failures

Example Scrape#

curl -H 'Authorization: Bearer YOUR_TOKEN' \
  http://127.0.0.1:7755/metrics