a textual notation to locate fields within atproto records (draft spec)
microcosm.tngl.io/RecordPath/
1# RecordPath (draft spec)
2
3Informal community spec, seeking feedback.
4
5**Scope:** This document defines *RecordPath*, a textual notation to locate fields within atproto records.
6
7The proposed syntax is *mostly* compatible with Constellation's `source` record field locator syntax.
8Constellation uses a `<collection nsid>:<path>` format for its `source` parameter; *RecordPath* will replace the `<path>` part.
9
10While the driving motivation for this spec is lexicon-agnostic backlink indexing, being able to canonically reference field locations in records is broadly useful.
11
12> [!TIP]
13> For example: [Graze Turbostream](https://help.graze.social/en/article/graze-turbostream-1cmhebt/) resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement.
14> The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments?
15>
16> RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code.
17
18
19## 1. Design goals
20
211. **Canonical:** Two records that conform to the same lexicon and contain a field at the same _semantic location_ (see 2.3) produce and match the same RecordPath.
222. **Lexicon-agnostic:** Path generation and matching are derived from and operated upon record data directly.
233. **Complete:** Every field in a valid atproto record should be reachable by a RecordPath.
244. **Readable:** JSON-Path-like syntax, with only URL-unreserved syntax characters.
25
26Some RecordPaths: `text` (1), `langs[]` (2), `reply.root.uri` (3).
27
28```json
29{
30 "$type": "app.bsky.feed.post",
31 "text": "I love a good wake up and the problem is solved.", // (1)
32 "langs": [
33 "en" // (2)
34 ],
35 "createdAt": "2026-04-15T12:38:49.982Z",
36 "reply": {
37 "root": {
38 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
39 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3)
40 },
41 "parent": {
42 "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
43 "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r"
44 }
45 }
46}
47```
48
49
50RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language.
51
52
53## 2. Proposed syntax
54
55### 2.1: Dot-separated object traversal
56
57A RecordPath is a sequence of field names separated by `.`. For a Bluesky "like" record:
58
59```json
60{
61 "$type": "app.bsky.feed.like",
62 "createdAt": "2024-01-15T00:00:00.000Z",
63 "subject": {
64 "uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t",
65 "cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
66 }
67}
68```
69
70The RecordPaths are:
71
72```
73$type -- selects: "app.bsky.feed.like"
74createdAt -- "2026-04-15T00:00:00.000Z"
75subject -- the entire { uri: .., cid: ..} sub-object
76subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t"
77subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
78```
79
80
81### 2.2 Field name character !escape
82
83Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters.
84
85RecordPath has six syntax characters, which must be escaped by a preceeding `!` if they appear in a field name in the path.
86
87| Literal | Escaped | Syntax hint |
88|---------|---------|-------------------|
89| `.` | `!.` | field separator |
90| `[` | `![` | arrays |
91| `]` | `!]` | arrays |
92| `{` | `!{` | union refs |
93| `}` | `!}` | union refs |
94| `!` | `!!` | escape |
95
96Object field names in atproto are unicode strings, but only the six syntax characters are escaped.
97
98```
99meta.dot!.name key named "dot.name"
100meta.a!!b key named "a!b"
101meta.$unknown key named "$unknown"
102```
103
104In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping.
105
106Handling control characters and other non-printable codes is currently undefined.
107
108
109### 2.3 Arrays are unordered sets
110
111Arrays in real atproto records *almost universally* lack index-bound significance. That is: a selector for a record's array element at `index=0` is *almost never useful*.
112
113Consider an array of **mentions** in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text!
114
115On the other hand, selecting data from an atproto record array without regard for order *is* almost always what you want.
116
117RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches).
118
119> [!NOTE]
120> It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only `longitude`s from its `[longitude, latitude]` pair format: you can RecordPath the array-pair itself and index into it at the application level.
121
122Arrays are descended into with `[]`:
123
124```
125tags[]
126references[].title
127references[].urls[]
128```
129
130Given:
131
132```json
133{
134 "tags": ["art", "science"],
135 "references": [
136 {
137 "title": "a nice paper",
138 "urls": [
139 "https://bad-example.com/a-sketchy-source",
140 "https://example.com/a-reputable-source"
141 ]
142 },
143 {
144 "title": "another paper",
145 "urls": ["https://example.com/source-for-another-paper"]
146 }
147 ]
148}
149```
150
151- `tags[]` reaches `"art"` and `"science"`
152- `references[].title`: `"a nice paper"` and `"another paper"`
153- `references[].urls[]` matches all three URLs in the example
154
155
156### 2.4 Array-of-unions
157
158Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list.
159This pattern is visible in the record data by the presence of a `$type` field on the contained object.
160
161A RecordPath descending into an array-of-unions MUST include the `$type` field's value within the square brackets:
162
163```
164facets[].features[app.bsky.richtext.facet#link].uri
165facets[].features[app.bsky.richtext.facet#mention].did
166```
167
168So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contain unioned types.
169
170
171### 2.5 Scalar union fields
172
173An object from a union outside of an array is visible in record data by the presence of a `$type` field -- plain Objects should should not have this key, and union objects should have it.
174
175Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces:
176
177```
178embed{app.bsky.embed.external}.uri
179embed{app.bsky.embed.record}.record.uri
180embed{app.bsky.embed.recordWithMedia}.record.record.uri
181embed{app.bsky.embed.images}.images[].image
182```
183
184Given a post with a quoted post:
185
186```json
187{
188 "$type": "app.bsky.feed.post",
189 "text": "my god",
190 "langs": [ "en" ],
191 "createdAt": "2026-04-16T01:39:17.721Z",
192 "embed": {
193 "$type": "app.bsky.embed.external", // captured
194 "external": {
195 "uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0",
196 "title": "Paramore - Someday (The Strokes Cover)",
197 "description": "YouTube video by microwave"
198 }
199 }
200}
201```
202
203The RecordPath `embed{app.bsky.embed.external}.uri` reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0".
204
205- The NSID in the braces is the `$type` value verbatim, including any `#fragment` suffix if present. The `#main` suffix is never present per the lexicon spec (*"use of a `#main` suffix is invalid"* in `$type` values). Relative `#fragment` references (without NSID) only exist within lexicon definition files as shorthand; `$type` values in record data are always fully-qualified (`nsid` or `nsid#name`). So the data is already canonical and no normalization is needed.
206- The top-level record `$type` is excluded from this rule.
207
208
209## 3. Data-model assumptions
210
211RecordPath depends on several properties of the atproto [data model](https://atproto.com/specs/data-model) and [lexicon](https://atproto.com/specs/lexicon) system.
212
213### `$type` is on unions (and blobs), not on plain object refs
214
215- **Records** include `$type` at the top level (not included in RecordPaths)
216- **Union variants** MUST include `$type` (consistent with lexicon spec)
217- **Plain (non-union) object refs**: Lexicon states `$type` *"should not be included in encoded data as a discriminator."*
218 Since this is only a "should-not" and not a "must-not", valid records are technically allowed to have a `$type` where they shouldn't, and we aren't guaranteed to have a canonical RecordPath for every valid atproto record field.
219 Hopefully record serializers are following this SHOULD :/
220
221### Forward-compatibility
222
223Lexicon evolution rules forbid **type changes** across schema revisions, so a plain object ref cannot be converted to a union in a lexicon-forward-compatibility-compliant revision under the same NSID, so RecordPath canonicalization is hopefully safe from forward-compatible lexicon changes.
224
225
226todo:
227
228- api recommendations for scalar vs vector queries
229
230 > A RecordPath is *scalar* if it contains no `[]` qualifiers, otherwise *vector*. Implementations that evaluate RecordPaths against records typically expose apis for blah blah
231
232 - maybe compare to DOM `querySelector` / `querySelectorAll`
233
234- lexicon awareness notes
235 - backlinks example: match links on non-link fields if they happen to parse
236
237- bring back the expected order of matches: depth-first-search order (probably as a "should")