a textual notation to locate fields within atproto records (draft spec) microcosm.tngl.io/RecordPath/
8
fork

Configure Feed

Select the types of activity you want to include in your feed.

first part of draft

phil bacc29b5

+211
+211
spec.md
··· 1 + # RecordPath (draft spec) 2 + 3 + Informal community spec, seeking feedback. 4 + 5 + **Scope:** This document defines *RecordPath*, a textual notation to locate fields within atproto records. 6 + 7 + The proposed syntax is *mostly* compatible with Constellation's `source` record field locator syntax. 8 + Constellation uses a `<collection nsid>:<path>` format for its `source` parameter; *RecordPath* will replace the `<path>` part. 9 + 10 + While the driving motivation for this spec is lexicon-agnostic backlink indexing, it being able to canonically reference field locations in records is broadly useful. 11 + 12 + > [!TIP] 13 + > For example: [Graze Turbostream](https://help.graze.social/en/article/graze-turbostream-1cmhebt/) resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement. 14 + > The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments? 15 + > 16 + > RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code. 17 + 18 + 19 + ## 1. Design goals 20 + 21 + 1. **Pragmatically canonical:** Two records that conform to the same lexicon and contain a field at the same _semantic location_ (see 2.3) produce and match the same RecordPath. 22 + 2. **Lexicon-agnostic:** Path generation and matching are derived from and operated upon record data directly. 23 + 3. **Complete:** Every field in a valid atproto record should be reachable by a RecordPath. 24 + 4. **Readable:** JSON-Path-like syntax, with only URL-unreserved syntax characters. 25 + 26 + Some RecordPaths: `text` (1), `langs[]` (2), `reply.root.uri` (3). 27 + 28 + ```json 29 + { 30 + "$type": "app.bsky.feed.post", 31 + "text": "I love a good wake up and the problem is solved.", // (1) 32 + "langs": [ 33 + "en" // (2) 34 + ], 35 + "createdAt": "2026-04-15T12:38:49.982Z", 36 + "reply": { 37 + "root": { 38 + "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy", 39 + "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3) 40 + }, 41 + "parent": { 42 + "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy", 43 + "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" 44 + } 45 + } 46 + } 47 + ``` 48 + 49 + 50 + RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language. 51 + 52 + 53 + ## 2. Proposed syntax 54 + 55 + ### 2.1: Dot-separated object traversal 56 + 57 + A RecordPath is a sequence of field names separated by `.`. For a Bluesky "like" record: 58 + 59 + ```json 60 + { 61 + "$type": "app.bsky.feed.like", 62 + "createdAt": "2024-01-15T00:00:00.000Z", 63 + "subject": { 64 + "uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t", 65 + "cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq" 66 + } 67 + } 68 + ``` 69 + 70 + The RecordPaths are: 71 + 72 + ``` 73 + $type -- selects: "app.bsky.fee.like" 74 + createdAt -- "2026-04-15T00:00:00.000Z" 75 + subject -- the entire { uri: .., cid: ..} sub-object 76 + subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t" 77 + subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq" 78 + ``` 79 + 80 + 81 + ### 2.2 Field name character !escape 82 + 83 + Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters. 84 + 85 + RecordPath has six syntax characters, which must be escaped by a preceeding `!` if they appear in a field name in the path. 86 + 87 + | Literal | Escaped | Syntax hint | 88 + |---------|---------|-------------------| 89 + | `.` | `!.` | field separator | 90 + | `[` | `![` | arrays | 91 + | `]` | `!]` | arrays | 92 + | `{` | `!}` | union refs | 93 + | `}` | `!}` | union refs | 94 + | `!` | `!!` | escape | 95 + 96 + Object field names in atproto are unicode strings, but only the six syntax characters are escaped. 97 + 98 + ``` 99 + meta.dot!.name key named "dot.name" 100 + meta.a!!b key named "a!b" 101 + meta.$unknown key named "$unknown" 102 + ``` 103 + 104 + In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping. 105 + 106 + Handling control characters and other non-printable codes is currently undefined. 107 + 108 + 109 + ### 2.3 Arrays are unordered sets 110 + 111 + Arrays in real atproto records *almost universally* lack index-bound significance. That is: a selector for a record's array element at `index=0` is *almost never useful*. 112 + 113 + Consider an array of **mentions** in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text! 114 + 115 + On the other hand, selecting data from an atproto record array without regard for order *is* almost always what you want. 116 + 117 + RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches). 118 + 119 + > ![INFO] 120 + > It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only `longitude`s from its `[longitude, latitude]` pair format: you can RecordPath the array-pair itself and index into it at the application level. 121 + 122 + Arrays are descended into with `[]`: 123 + 124 + ``` 125 + tags[] 126 + references[].title 127 + references[].urls[] 128 + ``` 129 + 130 + Given: 131 + 132 + ```json 133 + { 134 + "tags": ["art", "science"], 135 + "references": [ 136 + { 137 + "title": "a nice paper", 138 + "urls": [ 139 + "https://bad-example.com/a-sketchy-source", 140 + "https://example.com/a-reputable-source" 141 + ] 142 + }, 143 + { 144 + "title": "another paper", 145 + "urls": ["https://example.com/source-for-another-paper"] 146 + } 147 + ] 148 + } 149 + ``` 150 + 151 + - `tags[]` reaches `"art"` and `"science"` 152 + - `references[].title`: `"a nice paper"` and `"another paper"` 153 + - `references[].urls[]` matches all three URLs in the example 154 + 155 + 156 + ### 2.4 Array-of-unions 157 + 158 + Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list. 159 + This pattern is visible in the record data by the presence of a `$type` field on the contained object. 160 + 161 + A RecordPath descending into an array-of-unions MUST include the `$type` field's value within the square brackets: 162 + 163 + ``` 164 + facets[].features[app.bsky.richtext.facet#link].uri 165 + facets[].features[app.bsky.richtext.facet#mention].did 166 + ``` 167 + 168 + So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contained unioned types. 169 + 170 + 171 + ### 2.5 Scalar union fields 172 + 173 + An object from a union outside of an array is visible in record data by the presence of a `$type` field -- plain Objects should should not have this key, and union objects should have it. 174 + 175 + Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces: 176 + 177 + ``` 178 + embed{app.bsky.embed.external}.uri 179 + embed{app.bsky.embed.record}.record.uri 180 + embed{app.bsky.embed.recordWithMedia}.record.record.uri 181 + embed{app.bsky.embed.images}.images[].image 182 + ``` 183 + 184 + Given a post with a quoted post: 185 + 186 + ```json 187 + { 188 + "$type": "app.bsky.feed.post", 189 + "text": "my god", 190 + "langs": [ "en" ], 191 + "createdAt": "2026-04-16T01:39:17.721Z", 192 + "embed": { 193 + "$type": "app.bsky.embed.external", // captured 194 + "external": { 195 + "uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0", 196 + "title": "Paramore - Someday (The Strokes Cover)", 197 + "description": "YouTube video by microwave" 198 + } 199 + } 200 + } 201 + ``` 202 + 203 + The RecordPath `embed{app.bsky.embed.external}.uri` reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0". 204 + 205 + - The NSID in the parens is the `$type` value verbatim, including any `#fragment` suffix if present. the `#main` suffix is never present per the lexicon spec so this is close-to-canonical, but i need to think more about relative `#fragments` (that omit the NSID part) and whether that introduces any non-canonical problem. 206 + - The top-level record `$type` is excluded from this rule. 207 + 208 + 209 + ## 3. Data-model assumptions 210 + 211 + RecordPath depends on several properties of the atproto [data model](https://atproto.com/specs/data-model) and [lexicon](https://atproto.com/specs/lexicon) system.