···11+# RecordPath (draft spec)
22+33+Informal community spec, seeking feedback.
44+55+**Scope:** This document defines *RecordPath*, a textual notation to locate fields within atproto records.
66+77+The proposed syntax is *mostly* compatible with Constellation's `source` record field locator syntax.
88+Constellation uses a `<collection nsid>:<path>` format for its `source` parameter; *RecordPath* will replace the `<path>` part.
99+1010+While the driving motivation for this spec is lexicon-agnostic backlink indexing, it being able to canonically reference field locations in records is broadly useful.
1111+1212+> [!TIP]
1313+> For example: [Graze Turbostream](https://help.graze.social/en/article/graze-turbostream-1cmhebt/) resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement.
1414+> The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments?
1515+>
1616+> RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code.
1717+1818+1919+## 1. Design goals
2020+2121+1. **Pragmatically canonical:** Two records that conform to the same lexicon and contain a field at the same _semantic location_ (see 2.3) produce and match the same RecordPath.
2222+2. **Lexicon-agnostic:** Path generation and matching are derived from and operated upon record data directly.
2323+3. **Complete:** Every field in a valid atproto record should be reachable by a RecordPath.
2424+4. **Readable:** JSON-Path-like syntax, with only URL-unreserved syntax characters.
2525+2626+Some RecordPaths: `text` (1), `langs[]` (2), `reply.root.uri` (3).
2727+2828+```json
2929+{
3030+ "$type": "app.bsky.feed.post",
3131+ "text": "I love a good wake up and the problem is solved.", // (1)
3232+ "langs": [
3333+ "en" // (2)
3434+ ],
3535+ "createdAt": "2026-04-15T12:38:49.982Z",
3636+ "reply": {
3737+ "root": {
3838+ "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
3939+ "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3)
4040+ },
4141+ "parent": {
4242+ "cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
4343+ "uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r"
4444+ }
4545+ }
4646+}
4747+```
4848+4949+5050+RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language.
5151+5252+5353+## 2. Proposed syntax
5454+5555+### 2.1: Dot-separated object traversal
5656+5757+A RecordPath is a sequence of field names separated by `.`. For a Bluesky "like" record:
5858+5959+```json
6060+{
6161+ "$type": "app.bsky.feed.like",
6262+ "createdAt": "2024-01-15T00:00:00.000Z",
6363+ "subject": {
6464+ "uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t",
6565+ "cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
6666+ }
6767+}
6868+```
6969+7070+The RecordPaths are:
7171+7272+```
7373+$type -- selects: "app.bsky.fee.like"
7474+createdAt -- "2026-04-15T00:00:00.000Z"
7575+subject -- the entire { uri: .., cid: ..} sub-object
7676+subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t"
7777+subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
7878+```
7979+8080+8181+### 2.2 Field name character !escape
8282+8383+Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters.
8484+8585+RecordPath has six syntax characters, which must be escaped by a preceeding `!` if they appear in a field name in the path.
8686+8787+| Literal | Escaped | Syntax hint |
8888+|---------|---------|-------------------|
8989+| `.` | `!.` | field separator |
9090+| `[` | `![` | arrays |
9191+| `]` | `!]` | arrays |
9292+| `{` | `!}` | union refs |
9393+| `}` | `!}` | union refs |
9494+| `!` | `!!` | escape |
9595+9696+Object field names in atproto are unicode strings, but only the six syntax characters are escaped.
9797+9898+```
9999+meta.dot!.name key named "dot.name"
100100+meta.a!!b key named "a!b"
101101+meta.$unknown key named "$unknown"
102102+```
103103+104104+In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping.
105105+106106+Handling control characters and other non-printable codes is currently undefined.
107107+108108+109109+### 2.3 Arrays are unordered sets
110110+111111+Arrays in real atproto records *almost universally* lack index-bound significance. That is: a selector for a record's array element at `index=0` is *almost never useful*.
112112+113113+Consider an array of **mentions** in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text!
114114+115115+On the other hand, selecting data from an atproto record array without regard for order *is* almost always what you want.
116116+117117+RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches).
118118+119119+> ![INFO]
120120+> It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only `longitude`s from its `[longitude, latitude]` pair format: you can RecordPath the array-pair itself and index into it at the application level.
121121+122122+Arrays are descended into with `[]`:
123123+124124+```
125125+tags[]
126126+references[].title
127127+references[].urls[]
128128+```
129129+130130+Given:
131131+132132+```json
133133+{
134134+ "tags": ["art", "science"],
135135+ "references": [
136136+ {
137137+ "title": "a nice paper",
138138+ "urls": [
139139+ "https://bad-example.com/a-sketchy-source",
140140+ "https://example.com/a-reputable-source"
141141+ ]
142142+ },
143143+ {
144144+ "title": "another paper",
145145+ "urls": ["https://example.com/source-for-another-paper"]
146146+ }
147147+ ]
148148+}
149149+```
150150+151151+- `tags[]` reaches `"art"` and `"science"`
152152+- `references[].title`: `"a nice paper"` and `"another paper"`
153153+- `references[].urls[]` matches all three URLs in the example
154154+155155+156156+### 2.4 Array-of-unions
157157+158158+Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list.
159159+This pattern is visible in the record data by the presence of a `$type` field on the contained object.
160160+161161+A RecordPath descending into an array-of-unions MUST include the `$type` field's value within the square brackets:
162162+163163+```
164164+facets[].features[app.bsky.richtext.facet#link].uri
165165+facets[].features[app.bsky.richtext.facet#mention].did
166166+```
167167+168168+So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contained unioned types.
169169+170170+171171+### 2.5 Scalar union fields
172172+173173+An object from a union outside of an array is visible in record data by the presence of a `$type` field -- plain Objects should should not have this key, and union objects should have it.
174174+175175+Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces:
176176+177177+```
178178+embed{app.bsky.embed.external}.uri
179179+embed{app.bsky.embed.record}.record.uri
180180+embed{app.bsky.embed.recordWithMedia}.record.record.uri
181181+embed{app.bsky.embed.images}.images[].image
182182+```
183183+184184+Given a post with a quoted post:
185185+186186+```json
187187+{
188188+ "$type": "app.bsky.feed.post",
189189+ "text": "my god",
190190+ "langs": [ "en" ],
191191+ "createdAt": "2026-04-16T01:39:17.721Z",
192192+ "embed": {
193193+ "$type": "app.bsky.embed.external", // captured
194194+ "external": {
195195+ "uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0",
196196+ "title": "Paramore - Someday (The Strokes Cover)",
197197+ "description": "YouTube video by microwave"
198198+ }
199199+ }
200200+}
201201+```
202202+203203+The RecordPath `embed{app.bsky.embed.external}.uri` reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0".
204204+205205+- The NSID in the parens is the `$type` value verbatim, including any `#fragment` suffix if present. the `#main` suffix is never present per the lexicon spec so this is close-to-canonical, but i need to think more about relative `#fragments` (that omit the NSID part) and whether that introduces any non-canonical problem.
206206+- The top-level record `$type` is excluded from this rule.
207207+208208+209209+## 3. Data-model assumptions
210210+211211+RecordPath depends on several properties of the atproto [data model](https://atproto.com/specs/data-model) and [lexicon](https://atproto.com/specs/lexicon) system.