a textual notation to locate fields within atproto records (draft spec) microcosm.tngl.io/RecordPath/
8
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 245 lines 10 kB view raw view rendered
1# RecordPath (draft spec) 2 3Informal community spec, seeking feedback. 4 5**Scope:** This document defines *RecordPath*, a textual notation to locate fields within atproto records. 6 7The proposed syntax is *mostly* compatible with Constellation's `source` record field locator syntax. 8Constellation uses a `<collection nsid>:<path>` format for its `source` parameter; *RecordPath* will replace the `<path>` part. 9 10While the driving motivation for this spec is lexicon-agnostic backlink indexing, being able to canonically reference field locations in records is broadly useful. 11 12> [!TIP] 13> For example: [Graze Turbostream](https://help.graze.social/en/article/graze-turbostream-1cmhebt/) resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement. 14> The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments? 15> 16> RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code. 17 18 19## 1. Design goals 20 211. **Canonical:** Two records that conform to the same lexicon and contain a field at the same _semantic location_ (see 2.3) produce and match the same RecordPath. 222. **Lexicon-agnostic:** Path generation and matching are derived from and operated upon record data directly. 233. **Complete:** Every field in a valid atproto record should be reachable by a RecordPath. 244. **Readable:** JSON-Path-like syntax, with only URL-unreserved syntax characters. 25 26Some RecordPaths: `text` (1), `langs[]` (2), `reply.root.uri` (3). 27 28```json 29{ 30 "$type": "app.bsky.feed.post", 31 "text": "I love a good wake up and the problem is solved.", // (1) 32 "langs": [ 33 "en" // (2) 34 ], 35 "createdAt": "2026-04-15T12:38:49.982Z", 36 "reply": { 37 "root": { 38 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy", 39 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3) 40 }, 41 "parent": { 42 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy", 43 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" 44 } 45 } 46} 47``` 48 49 50RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language. 51 52 53## 2. Proposed syntax 54 55### 2.1: Dot-separated object traversal 56 57A RecordPath is a sequence of field names separated by `.`. For a Bluesky "like" record: 58 59```json 60{ 61 "$type": "app.bsky.feed.like", 62 "createdAt": "2024-01-15T00:00:00.000Z", 63 "subject": { 64 "uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t", 65 "cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq" 66 } 67} 68``` 69 70The RecordPaths are: 71 72``` 73$type -- selects: "app.bsky.feed.like" 74createdAt -- "2026-04-15T00:00:00.000Z" 75subject -- the entire { uri: .., cid: ..} sub-object 76subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t" 77subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq" 78``` 79 80 81### 2.2 Field name character !escape 82 83Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters. 84 85The backslash is the conventional escape character in many languages and systems, but as per the URL-unreserved character constraint in section 1, it cannot be used here. The exclamation mark ! serves as its replacement. 86 87RecordPath has six syntax characters, which must be escaped by a preceeding `!` if they appear in a field name in the path. 88 89| Literal | Escaped | Syntax hint | 90|---------|---------|-------------------| 91| `.` | `!.` | field separator | 92| `[` | `![` | arrays | 93| `]` | `!]` | arrays | 94| `{` | `!{` | union refs | 95| `}` | `!}` | union refs | 96| `!` | `!!` | escape | 97 98Object field names in atproto are unicode strings, but only the six syntax characters are escaped. 99 100``` 101meta.dot!.name key named "dot.name" 102meta.a!!b key named "a!b" 103meta.$unknown key named "$unknown" 104``` 105 106In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping. 107 108Handling control characters and other non-printable codes is currently undefined. 109 110 111### 2.3 Arrays are unordered sets 112 113Arrays in real atproto records *almost universally* lack index-bound significance. That is: a selector for a record's array element at `index=0` is *almost never useful*. 114 115Consider an array of **mentions** in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text! 116 117On the other hand, selecting data from an atproto record array without regard for order *is* almost always what you want. 118 119RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches). 120 121> [!NOTE] 122> It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only `longitude`s from its `[longitude, latitude]` pair format: you can RecordPath the array-pair itself and index into it at the application level. 123 124Arrays are descended into with `[]`: 125 126``` 127tags[] 128references[].title 129references[].urls[] 130``` 131 132Given: 133 134```json 135{ 136 "tags": ["art", "science"], 137 "references": [ 138 { 139 "title": "a nice paper", 140 "urls": [ 141 "https://bad-example.com/a-sketchy-source", 142 "https://example.com/a-reputable-source" 143 ] 144 }, 145 { 146 "title": "another paper", 147 "urls": ["https://example.com/source-for-another-paper"] 148 } 149 ] 150} 151``` 152 153- `tags[]` reaches `"art"` and `"science"` 154- `references[].title`: `"a nice paper"` and `"another paper"` 155- `references[].urls[]` matches all three URLs in the example 156 157 158### 2.4 Array-of-unions 159 160Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list. 161This pattern is visible in the record data by the presence of a `$type` field on the contained object. 162 163A RecordPath descending into an array-of-unions MUST include the `$type` field's value within the square brackets: 164 165``` 166facets[].features[app.bsky.richtext.facet#link].uri 167facets[].features[app.bsky.richtext.facet#mention].did 168``` 169 170So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contain unioned types. 171 172 173### 2.5 Scalar union fields 174 175An object from a union outside of an array is visible in record data by the presence of a `$type` field -- plain Objects should should not have this key, and union objects should have it. 176 177Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces: 178 179``` 180embed{app.bsky.embed.external}.uri 181embed{app.bsky.embed.record}.record.uri 182embed{app.bsky.embed.recordWithMedia}.record.record.uri 183embed{app.bsky.embed.images}.images[].image 184``` 185 186Given a post with a quoted post: 187 188```json 189{ 190 "$type": "app.bsky.feed.post", 191 "text": "my god", 192 "langs": [ "en" ], 193 "createdAt": "2026-04-16T01:39:17.721Z", 194 "embed": { 195 "$type": "app.bsky.embed.external", // captured 196 "external": { 197 "uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0", 198 "title": "Paramore - Someday (The Strokes Cover)", 199 "description": "YouTube video by microwave" 200 } 201 } 202} 203``` 204 205The RecordPath `embed{app.bsky.embed.external}.uri` reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0". 206 207- The NSID in the braces is the `$type` value verbatim, including any `#fragment` suffix if present. The `#main` suffix is never present per the lexicon spec (*"use of a `#main` suffix is invalid"* in `$type` values). Relative `#fragment` references (without NSID) only exist within lexicon definition files as shorthand; `$type` values in record data are always fully-qualified (`nsid` or `nsid#name`). So the data is already canonical and no normalization is needed. 208- The top-level record `$type` is excluded from this rule. 209 210 211## 3. Data-model assumptions 212 213RecordPath depends on several properties of the atproto [data model](https://atproto.com/specs/data-model) and [lexicon](https://atproto.com/specs/lexicon) system. 214 215### `$type` is on unions (and blobs), not on plain object refs 216 217- **Records** include `$type` at the top level (not included in RecordPaths) 218- **Union variants** MUST include `$type` (consistent with lexicon spec) 219- **Plain (non-union) object refs**: Lexicon states `$type` *"should not be included in encoded data as a discriminator."* 220 Since this is only a "should-not" and not a "must-not", valid records are technically allowed to have a `$type` where they shouldn't, and we aren't guaranteed to have a canonical RecordPath for every valid atproto record field. 221 Hopefully record serializers are following this SHOULD :/ 222 223### Forward-compatibility 224 225Lexicon evolution rules forbid **type changes** across schema revisions, so a plain object ref cannot be converted to a union in a lexicon-forward-compatibility-compliant revision under the same NSID, so RecordPath canonicalization is hopefully safe from forward-compatible lexicon changes. 226 227 228#### todo 229 230- api recommendations for scalar vs vector queries 231 232 > A RecordPath is *scalar* if it contains no `[]` qualifiers, otherwise *vector*. Implementations that evaluate RecordPaths against records typically expose apis for blah blah 233 234 - maybe compare to DOM `querySelector` / `querySelectorAll` 235 236- lexicon awareness notes 237 - backlinks example: match links on non-link fields if they happen to parse 238 239- bring back the expected order of matches: depth-first-search order (probably as a "should") 240 241- include cbor/drisl -- keep json for examples, but all this should be applicable 242 243#### questions 244 245- should an empty RecordPath be legal? would match the entire record.