RecordPath (draft spec)#
Informal community spec, seeking feedback.
Scope: This document defines RecordPath, a textual notation to locate fields within atproto records.
The proposed syntax is mostly compatible with Constellation's source record field locator syntax.
Constellation uses a <collection nsid>:<path> format for its source parameter; RecordPath will replace the <path> part.
While the driving motivation for this spec is lexicon-agnostic backlink indexing, being able to canonically reference field locations in records is broadly useful.
TIP
For example: Graze Turbostream resolves references from Bluesky posts into a richly-hydrated firehose, an incredibly useful enhancement. The locations where Turbostream searches for references is currently hard-coded for Bluesky Posts, but what if you wanted a richly-hydrated feed of Tangled issue comments?
RecordPath offers a syntax that a configurable Turbostream could use to describe arbitrary reference locations: to hydrate content for any feed in the atmosphere, without changing any code.
1. Design goals#
- Canonical: Two records that conform to the same lexicon and contain a field at the same semantic location (see 2.3) produce and match the same RecordPath.
- Lexicon-agnostic: Path generation and matching are derived from and operated upon record data directly.
- Complete: Every field in a valid atproto record should be reachable by a RecordPath.
- Readable: JSON-Path-like syntax, with only URL-unreserved syntax characters.
Some RecordPaths: text (1), langs[] (2), reply.root.uri (3).
{
"$type": "app.bsky.feed.post",
"text": "I love a good wake up and the problem is solved.", // (1)
"langs": [
"en" // (2)
],
"createdAt": "2026-04-15T12:38:49.982Z",
"reply": {
"root": {
"cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
"uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r" // (3)
},
"parent": {
"cid": "bafyreieac34fnjyhuuzvgdnsyeeueyn45se5kuk6yppesn25gydjf5m5hy",
"uri": "at://did:plc:rnpkyqnmsw4ipey6eotbdnnf/app.bsky.feed.post/3mjjvykdfo22r"
}
}
}
RecordPath takes heavy inspiration from JSONPath, but is not general purpose data query language.
2. Proposed syntax#
2.1: Dot-separated object traversal#
A RecordPath is a sequence of field names separated by .. For a Bluesky "like" record:
{
"$type": "app.bsky.feed.like",
"createdAt": "2024-01-15T00:00:00.000Z",
"subject": {
"uri": "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t",
"cid": "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
}
}
The RecordPaths are:
$type -- selects: "app.bsky.feed.like"
createdAt -- "2026-04-15T00:00:00.000Z"
subject -- the entire { uri: .., cid: ..} sub-object
subject.uri -- "at://did:plc:pxa3amkp7jhfclaads3zud7q/app.bsky.feed.post/3mjkx2hpvqc2t"
subject.cid -- "bafyreicuxyp5rmsrqsf3v63ww6gc3j7q6cp7qyk6bv47c3bqkj2gzswiqq"
2.2 Field name character !escape#
Object field names can be arbitrary strings, so RecordPath needs to disambiguate its own syntax when fields contain syntax characters.
The backslash is the conventional escape character in many languages and systems, but as per the URL-unreserved character constraint in section 1, it cannot be used here. The exclamation mark ! serves as its replacement.
RecordPath has six syntax characters, which must be escaped by a preceeding ! if they appear in a field name in the path.
| Literal | Escaped | Syntax hint |
|---|---|---|
. |
!. |
field separator |
[ |
![ |
arrays |
] |
!] |
arrays |
{ |
!{ |
union refs |
} |
!} |
union refs |
! |
!! |
escape |
Object field names in atproto are unicode strings, but only the six syntax characters are escaped.
meta.dot!.name key named "dot.name"
meta.a!!b key named "a!b"
meta.$unknown key named "$unknown"
In practice it's very rare for field names in atproto to contain non-ascii characters, and most field names are camelCase alphabetic-only strings, requiring no escaping.
Handling control characters and other non-printable codes is currently undefined.
2.3 Arrays are unordered sets#
Arrays in real atproto records almost universally lack index-bound significance. That is: a selector for a record's array element at index=0 is almost never useful.
Consider an array of mentions in Bluesky posts: the order in the record data carries no meaningful information -- it can even be different from the order of appearance in the post text!
On the other hand, selecting data from an atproto record array without regard for order is almost always what you want.
RecordPath makes a pragmatic choice to treat all arrays as unordered sets. Index position is not encoded in the path, and selecting a RecordPath that traverses an array always matches zero-or-more items ("vector" matches), instead of the zero-or-one behaviour of object-only paths ("scalar" matches).
NOTE
It's a compromise and bit unfortunate, but in practice it works well. Even for a data format like geoJSON, if you want only longitudes from its [longitude, latitude] pair format: you can RecordPath the array-pair itself and index into it at the application level.
Arrays are descended into with []:
tags[]
references[].title
references[].urls[]
Given:
{
"tags": ["art", "science"],
"references": [
{
"title": "a nice paper",
"urls": [
"https://bad-example.com/a-sketchy-source",
"https://example.com/a-reputable-source"
]
},
{
"title": "another paper",
"urls": ["https://example.com/source-for-another-paper"]
}
]
}
tags[]reaches"art"and"science"references[].title:"a nice paper"and"another paper"references[].urls[]matches all three URLs in the example
2.4 Array-of-unions#
Atproto lexicons commonly use arrays of unions to collect different kinds of child objects in a list.
This pattern is visible in the record data by the presence of a $type field on the contained object.
A RecordPath descending into an array-of-unions MUST include the $type field's value within the square brackets:
facets[].features[app.bsky.richtext.facet#link].uri
facets[].features[app.bsky.richtext.facet#mention].did
So while arrays are unordered in RecordPath, their elements are always segmented by union-ref type when they contain unioned types.
2.5 Scalar union fields#
An object from a union outside of an array is visible in record data by the presence of a $type field -- plain Objects should should not have this key, and union objects should have it.
Since unions represent different child data types, the union-type is captured in RecordPath by including it within curly braces:
embed{app.bsky.embed.external}.uri
embed{app.bsky.embed.record}.record.uri
embed{app.bsky.embed.recordWithMedia}.record.record.uri
embed{app.bsky.embed.images}.images[].image
Given a post with a quoted post:
{
"$type": "app.bsky.feed.post",
"text": "my god",
"langs": [ "en" ],
"createdAt": "2026-04-16T01:39:17.721Z",
"embed": {
"$type": "app.bsky.embed.external", // captured
"external": {
"uri": "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0",
"title": "Paramore - Someday (The Strokes Cover)",
"description": "YouTube video by microwave"
}
}
}
The RecordPath embed{app.bsky.embed.external}.uri reaches the quoted post's external embed URI, "https://youtu.be/-pns419xAoc?si=K4XMQfFv-t4Q1cn0".
- The NSID in the braces is the
$typevalue verbatim, including any#fragmentsuffix if present. The#mainsuffix is never present per the lexicon spec ("use of a#mainsuffix is invalid" in$typevalues). Relative#fragmentreferences (without NSID) only exist within lexicon definition files as shorthand;$typevalues in record data are always fully-qualified (nsidornsid#name). So the data is already canonical and no normalization is needed. - The top-level record
$typeis excluded from this rule.
3. Data-model assumptions#
RecordPath depends on several properties of the atproto data model and lexicon system.
$type is on unions (and blobs), not on plain object refs#
- Records include
$typeat the top level (not included in RecordPaths) - Union variants MUST include
$type(consistent with lexicon spec) - Plain (non-union) object refs: Lexicon states
$type"should not be included in encoded data as a discriminator." Since this is only a "should-not" and not a "must-not", valid records are technically allowed to have a$typewhere they shouldn't, and we aren't guaranteed to have a canonical RecordPath for every valid atproto record field. Hopefully record serializers are following this SHOULD :/
Forward-compatibility#
Lexicon evolution rules forbid type changes across schema revisions, so a plain object ref cannot be converted to a union in a lexicon-forward-compatibility-compliant revision under the same NSID, so RecordPath canonicalization is hopefully safe from forward-compatible lexicon changes.
todo#
-
api recommendations for scalar vs vector queries
A RecordPath is scalar if it contains no
[]qualifiers, otherwise vector. Implementations that evaluate RecordPaths against records typically expose apis for blah blah- maybe compare to DOM
querySelector/querySelectorAll
- maybe compare to DOM
-
lexicon awareness notes
- backlinks example: match links on non-link fields if they happen to parse
-
bring back the expected order of matches: depth-first-search order (probably as a "should")
-
include cbor/drisl -- keep json for examples, but all this should be applicable
questions#
- should an empty RecordPath be legal? would match the entire record.