a textual notation to locate fields within atproto records (draft spec) microcosm.tngl.io/RecordPath/
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

1# RecordPath (draft spec) 2 3Informal community spec, seeking feedback. 4 5**Scope:** This document defines *RecordPath*, a textual notation to locate fields within atproto records. 6 7The proposed syntax is *mostly* compatible with Constellation's `source` record field locator syntax. 8Constellation uses a `<collection nsid>:<path>` format for its `source` parameter; *RecordPath* will replace the `<path>` part. 9 10While the driving motivation for this spec is lexicon-agnostic backlink indexing, being able to canonically reference field locations in records is broadly useful. 11 12> [!TIP] 13> For example: [Graze Turbostream](https://help.graze.social/en/article/graze-turbostream-1cmhebt/) resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement. 14> The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments? 15> 16> RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code. 17 18 19## 1. Design goals 20 211. **Canonical:** Two records that conform to the same lexicon and contain a field at the same _semantic location_ (see 2.3) produce and match the same RecordPath. 222. **Lexicon-agnostic:** Path generation and matching are derived from and operated upon record data directly. 233. **Complete:** Every field in a valid atproto record should be reachable by a RecordPath. 244. **Readable:** JSON-Path-like syntax, with only URL-unreserved syntax characters. 25 26Some RecordPaths: `text` (1), `langs[]` (2), `reply.root.uri` (3). 27 28```json 29{ 30 "$type": "app.bsky.feed.post", 31 "text": "I love a good wake up and the problem is solved.", // (1) 32 "langs": [ 33 "en" // (2) 34 ], 35 "createdAt": "2026-04-15T12:38:49.982Z", 36 "reply": { 37 "root": { 38 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy", 39 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3) 40 }, 41 "parent": { 42 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy", 43 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" 44 } 45 } 46} 47``` 48 49 50RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language. 51 52 53## 2. Proposed syntax 54 55### 2.1: Dot-separated object traversal 56 57A RecordPath is a sequence of field names separated by `.`. For a Bluesky "like" record: 58 59```json 60{ 61 "$type": "app.bsky.feed.like", 62 "createdAt": "2024-01-15T00:00:00.000Z", 63 "subject": { 64 "uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t", 65 "cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq" 66 } 67} 68``` 69 70The RecordPaths are: 71 72``` 73$type -- selects: "app.bsky.feed.like" 74createdAt -- "2026-04-15T00:00:00.000Z" 75subject -- the entire { uri: .., cid: ..} sub-object 76subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t" 77subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq" 78``` 79 80 81### 2.2 Field name character !escape 82 83Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters. 84 85RecordPath has six syntax characters, which must be escaped by a preceeding `!` if they appear in a field name in the path. 86 87| Literal | Escaped | Syntax hint | 88|---------|---------|-------------------| 89| `.` | `!.` | field separator | 90| `[` | `![` | arrays | 91| `]` | `!]` | arrays | 92| `{` | `!{` | union refs | 93| `}` | `!}` | union refs | 94| `!` | `!!` | escape | 95 96Object field names in atproto are unicode strings, but only the six syntax characters are escaped. 97 98``` 99meta.dot!.name key named "dot.name" 100meta.a!!b key named "a!b" 101meta.$unknown key named "$unknown" 102``` 103 104In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping. 105 106Handling control characters and other non-printable codes is currently undefined. 107 108 109### 2.3 Arrays are unordered sets 110 111Arrays in real atproto records *almost universally* lack index-bound significance. That is: a selector for a record's array element at `index=0` is *almost never useful*. 112 113Consider an array of **mentions** in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text! 114 115On the other hand, selecting data from an atproto record array without regard for order *is* almost always what you want. 116 117RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches). 118 119> [!NOTE] 120> It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only `longitude`s from its `[longitude, latitude]` pair format: you can RecordPath the array-pair itself and index into it at the application level. 121 122Arrays are descended into with `[]`: 123 124``` 125tags[] 126references[].title 127references[].urls[] 128``` 129 130Given: 131 132```json 133{ 134 "tags": ["art", "science"], 135 "references": [ 136 { 137 "title": "a nice paper", 138 "urls": [ 139 "https://bad-example.com/a-sketchy-source", 140 "https://example.com/a-reputable-source" 141 ] 142 }, 143 { 144 "title": "another paper", 145 "urls": ["https://example.com/source-for-another-paper"] 146 } 147 ] 148} 149``` 150 151- `tags[]` reaches `"art"` and `"science"` 152- `references[].title`: `"a nice paper"` and `"another paper"` 153- `references[].urls[]` matches all three URLs in the example 154 155 156### 2.4 Array-of-unions 157 158Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list. 159This pattern is visible in the record data by the presence of a `$type` field on the contained object. 160 161A RecordPath descending into an array-of-unions MUST include the `$type` field's value within the square brackets: 162 163``` 164facets[].features[app.bsky.richtext.facet#link].uri 165facets[].features[app.bsky.richtext.facet#mention].did 166``` 167 168So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contain unioned types. 169 170 171### 2.5 Scalar union fields 172 173An object from a union outside of an array is visible in record data by the presence of a `$type` field -- plain Objects should should not have this key, and union objects should have it. 174 175Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces: 176 177``` 178embed{app.bsky.embed.external}.uri 179embed{app.bsky.embed.record}.record.uri 180embed{app.bsky.embed.recordWithMedia}.record.record.uri 181embed{app.bsky.embed.images}.images[].image 182``` 183 184Given a post with a quoted post: 185 186```json 187{ 188 "$type": "app.bsky.feed.post", 189 "text": "my god", 190 "langs": [ "en" ], 191 "createdAt": "2026-04-16T01:39:17.721Z", 192 "embed": { 193 "$type": "app.bsky.embed.external", // captured 194 "external": { 195 "uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0", 196 "title": "Paramore - Someday (The Strokes Cover)", 197 "description": "YouTube video by microwave" 198 } 199 } 200} 201``` 202 203The RecordPath `embed{app.bsky.embed.external}.uri` reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0". 204 205- The NSID in the braces is the `$type` value verbatim, including any `#fragment` suffix if present. The `#main` suffix is never present per the lexicon spec (*"use of a `#main` suffix is invalid"* in `$type` values). Relative `#fragment` references (without NSID) only exist within lexicon definition files as shorthand; `$type` values in record data are always fully-qualified (`nsid` or `nsid#name`). So the data is already canonical and no normalization is needed. 206- The top-level record `$type` is excluded from this rule. 207 208 209## 3. Data-model assumptions 210 211RecordPath depends on several properties of the atproto [data model](https://atproto.com/specs/data-model) and [lexicon](https://atproto.com/specs/lexicon) system. 212 213### `$type` is on unions (and blobs), not on plain object refs 214 215- **Records** include `$type` at the top level (not included in RecordPaths) 216- **Union variants** MUST include `$type` (consistent with lexicon spec) 217- **Plain (non-union) object refs**: Lexicon states `$type` *"should not be included in encoded data as a discriminator."* 218 Since this is only a "should-not" and not a "must-not", valid records are technically allowed to have a `$type` where they shouldn't, and we aren't guaranteed to have a canonical RecordPath for every valid atproto record field. 219 Hopefully record serializers are following this SHOULD :/ 220 221### Forward-compatibility 222 223Lexicon evolution rules forbid **type changes** across schema revisions, so a plain object ref cannot be converted to a union in a lexicon-forward-compatibility-compliant revision under the same NSID, so RecordPath canonicalization is hopefully safe from forward-compatible lexicon changes. 224 225 226todo: 227 228- api recommendations for scalar vs vector queries 229 230 > A RecordPath is *scalar* if it contains no `[]` qualifiers, otherwise *vector*. Implementations that evaluate RecordPaths against records typically expose apis for blah blah 231 232 - maybe compare to DOM `querySelector` / `querySelectorAll` 233 234- lexicon awareness notes 235 - backlinks example: match links on non-link fields if they happen to parse 236 237- bring back the expected order of matches: depth-first-search order (probably as a "should")