a textual notation to locate fields within atproto records (draft spec)
microcosm.tngl.io/RecordPath/
1# RecordPath (draft spec)
2
3Informal community spec, seeking feedback.
4
5**Scope:** This document defines *RecordPath*, a textual notation to locate fields within atproto records.
6
7The proposed syntax is *mostly* compatible with Constellation's `source` record field locator syntax.
8Constellation uses a `<collection nsid>:<path>` format for its `source` parameter; *RecordPath* will replace the `<path>` part.
9
10While the driving motivation for this spec is lexicon-agnostic backlink indexing, being able to canonically reference field locations in records is broadly useful.
11
12> [!TIP]
13> For example: [Graze Turbostream](https://help.graze.social/en/article/graze-turbostream-1cmhebt/) resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement.
14> The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments?
15>
16> RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code.
17
18
19## 1. Design goals
20
211. **Canonical:** Two records that conform to the same lexicon and contain a field at the same _semantic location_ (see 2.3) produce and match the same RecordPath.
222. **Lexicon-agnostic:** Path generation and matching are derived from and operated upon record data directly.
233. **Complete:** Every field in a valid atproto record should be reachable by a RecordPath.
244. **Readable:** JSON-Path-like syntax, with only URL-unreserved syntax characters.
25
26Some RecordPaths: `text` (1), `langs[]` (2), `reply.root.uri` (3).
27
28```json
29{
30 "$type": "app.bsky.feed.post",
31 "text": "I love a good wake up and the problem is solved.", // (1)
32 "langs": [
33 "en" // (2)
34 ],
35 "createdAt": "2026-04-15T12:38:49.982Z",
36 "reply": {
37 "root": {
38 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
39 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3)
40 },
41 "parent": {
42 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
43 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r"
44 }
45 }
46}
47```
48
49
50RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language.
51
52
53## 2. Proposed syntax
54
55### 2.1: Dot-separated object traversal
56
57A RecordPath is a sequence of field names separated by `.`. For a Bluesky "like" record:
58
59```json
60{
61 "$type": "app.bsky.feed.like",
62 "createdAt": "2024-01-15T00:00:00.000Z",
63 "subject": {
64 "uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t",
65 "cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
66 }
67}
68```
69
70The RecordPaths are:
71
72```
73$type -- selects: "app.bsky.feed.like"
74createdAt -- "2026-04-15T00:00:00.000Z"
75subject -- the entire { uri: .., cid: ..} sub-object
76subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t"
77subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
78```
79
80
81### 2.2 Field name character !escape
82
83Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters.
84
85The backslash is the conventional escape character in many languages and systems, but as per the URL-unreserved character constraint in section 1, it cannot be used here. The exclamation mark ! serves as its replacement.
86
87RecordPath has six syntax characters, which must be escaped by a preceeding `!` if they appear in a field name in the path.
88
89| Literal | Escaped | Syntax hint |
90|---------|---------|-------------------|
91| `.` | `!.` | field separator |
92| `[` | `![` | arrays |
93| `]` | `!]` | arrays |
94| `{` | `!{` | union refs |
95| `}` | `!}` | union refs |
96| `!` | `!!` | escape |
97
98Object field names in atproto are unicode strings, but only the six syntax characters are escaped.
99
100```
101meta.dot!.name key named "dot.name"
102meta.a!!b key named "a!b"
103meta.$unknown key named "$unknown"
104```
105
106In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping.
107
108Handling control characters and other non-printable codes is currently undefined.
109
110
111### 2.3 Arrays are unordered sets
112
113Arrays in real atproto records *almost universally* lack index-bound significance. That is: a selector for a record's array element at `index=0` is *almost never useful*.
114
115Consider an array of **mentions** in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text!
116
117On the other hand, selecting data from an atproto record array without regard for order *is* almost always what you want.
118
119RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches).
120
121> [!NOTE]
122> It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only `longitude`s from its `[longitude, latitude]` pair format: you can RecordPath the array-pair itself and index into it at the application level.
123
124Arrays are descended into with `[]`:
125
126```
127tags[]
128references[].title
129references[].urls[]
130```
131
132Given:
133
134```json
135{
136 "tags": ["art", "science"],
137 "references": [
138 {
139 "title": "a nice paper",
140 "urls": [
141 "https://bad-example.com/a-sketchy-source",
142 "https://example.com/a-reputable-source"
143 ]
144 },
145 {
146 "title": "another paper",
147 "urls": ["https://example.com/source-for-another-paper"]
148 }
149 ]
150}
151```
152
153- `tags[]` reaches `"art"` and `"science"`
154- `references[].title`: `"a nice paper"` and `"another paper"`
155- `references[].urls[]` matches all three URLs in the example
156
157
158### 2.4 Array-of-unions
159
160Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list.
161This pattern is visible in the record data by the presence of a `$type` field on the contained object.
162
163A RecordPath descending into an array-of-unions MUST include the `$type` field's value within the square brackets:
164
165```
166facets[].features[app.bsky.richtext.facet#link].uri
167facets[].features[app.bsky.richtext.facet#mention].did
168```
169
170So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contain unioned types.
171
172
173### 2.5 Scalar union fields
174
175An object from a union outside of an array is visible in record data by the presence of a `$type` field -- plain Objects should should not have this key, and union objects should have it.
176
177Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces:
178
179```
180embed{app.bsky.embed.external}.uri
181embed{app.bsky.embed.record}.record.uri
182embed{app.bsky.embed.recordWithMedia}.record.record.uri
183embed{app.bsky.embed.images}.images[].image
184```
185
186Given a post with a quoted post:
187
188```json
189{
190 "$type": "app.bsky.feed.post",
191 "text": "my god",
192 "langs": [ "en" ],
193 "createdAt": "2026-04-16T01:39:17.721Z",
194 "embed": {
195 "$type": "app.bsky.embed.external", // captured
196 "external": {
197 "uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0",
198 "title": "Paramore - Someday (The Strokes Cover)",
199 "description": "YouTube video by microwave"
200 }
201 }
202}
203```
204
205The RecordPath `embed{app.bsky.embed.external}.uri` reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0".
206
207- The NSID in the braces is the `$type` value verbatim, including any `#fragment` suffix if present. The `#main` suffix is never present per the lexicon spec (*"use of a `#main` suffix is invalid"* in `$type` values). Relative `#fragment` references (without NSID) only exist within lexicon definition files as shorthand; `$type` values in record data are always fully-qualified (`nsid` or `nsid#name`). So the data is already canonical and no normalization is needed.
208- The top-level record `$type` is excluded from this rule.
209
210
211## 3. Data-model assumptions
212
213RecordPath depends on several properties of the atproto [data model](https://atproto.com/specs/data-model) and [lexicon](https://atproto.com/specs/lexicon) system.
214
215### `$type` is on unions (and blobs), not on plain object refs
216
217- **Records** include `$type` at the top level (not included in RecordPaths)
218- **Union variants** MUST include `$type` (consistent with lexicon spec)
219- **Plain (non-union) object refs**: Lexicon states `$type` *"should not be included in encoded data as a discriminator."*
220 Since this is only a "should-not" and not a "must-not", valid records are technically allowed to have a `$type` where they shouldn't, and we aren't guaranteed to have a canonical RecordPath for every valid atproto record field.
221 Hopefully record serializers are following this SHOULD :/
222
223### Forward-compatibility
224
225Lexicon evolution rules forbid **type changes** across schema revisions, so a plain object ref cannot be converted to a union in a lexicon-forward-compatibility-compliant revision under the same NSID, so RecordPath canonicalization is hopefully safe from forward-compatible lexicon changes.
226
227
228#### todo
229
230- api recommendations for scalar vs vector queries
231
232 > A RecordPath is *scalar* if it contains no `[]` qualifiers, otherwise *vector*. Implementations that evaluate RecordPaths against records typically expose apis for blah blah
233
234 - maybe compare to DOM `querySelector` / `querySelectorAll`
235
236- lexicon awareness notes
237 - backlinks example: match links on non-link fields if they happen to parse
238
239- bring back the expected order of matches: depth-first-search order (probably as a "should")
240
241- include cbor/drisl -- keep json for examples, but all this should be applicable
242
243#### questions
244
245- should an empty RecordPath be legal? would match the entire record.