A loose federation of distributed, typed datasets
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

refactor: add extensible schema type system with union pattern and schemaType lexicon

Refactored sampleSchema to support multiple schema formats:

Schema Field Changes:
- Renamed 'jsonSchema' → 'schema' (now union type, closed: false)
- Extracted JSON Schema to separate #jsonSchemaFormat def
- Added $type discriminator to jsonSchemaFormat
- Moved ndarrayShimUri into jsonSchemaFormat (format-specific)

SchemaType Registry:
- Created ac.foundation.dataset.schemaType lexicon
- Token-based registry with knownValues pattern
- schemaType field now refs to schemaType lexicon (was enum)
- Enables adding new schema formats without breaking changes

Documentation:
- Added README_SCHEMA_TYPES.md explaining registry pattern
- Documents how to add new schema formats (Avro, Protobuf, etc.)

Updated examples to reflect new structure with discriminator.

This follows ATProto union patterns for extensible type systems.

Closes #71

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

+211 -32
.chainlink/issues.db

This is a binary file and will not be displayed.

+3 -2
.planning/examples/sampleSchema_example.json
··· 2 2 "$type": "ac.foundation.dataset.sampleSchema", 3 3 "name": "ImageSample", 4 4 "version": "1.0.0", 5 - "schemaType": "jsonschema", 6 - "jsonSchema": { 5 + "schemaType": "jsonSchema", 6 + "schema": { 7 + "$type": "ac.foundation.dataset.sampleSchema#jsonSchemaFormat", 7 8 "$schema": "http://json-schema.org/draft-07/schema#", 8 9 "title": "ImageSample", 9 10 "type": "object",
+150
.planning/lexicons/README_SCHEMA_TYPES.md
··· 1 + # Schema Type Registry 2 + 3 + This document explains the token-based registry pattern for atdata schema types. 4 + 5 + ## Pattern 6 + 7 + Schema types in atdata are managed through the `ac.foundation.dataset.schemaType` Lexicon: 8 + 9 + 1. **Single Lexicon file**: `ac.foundation.dataset.schemaType.json` 10 + 2. **Main def**: String type with `knownValues` listing supported schema types 11 + 3. **Token defs**: Each schema type has a corresponding token def (e.g., `#jsonSchema`) 12 + 4. **Reference in sampleSchema**: The `schemaType` field refs to `ac.foundation.dataset.schemaType` 13 + 14 + ## Structure 15 + 16 + ```json 17 + { 18 + "lexicon": 1, 19 + "id": "ac.foundation.dataset.schemaType", 20 + "defs": { 21 + "main": { 22 + "type": "string", 23 + "knownValues": ["jsonSchema"], 24 + "maxLength": 50 25 + }, 26 + "jsonSchema": { 27 + "type": "token", 28 + "description": "JSON Schema Draft 7 format..." 29 + } 30 + } 31 + } 32 + ``` 33 + 34 + ## Usage in sampleSchema 35 + 36 + The `schemaType` field references the schemaType Lexicon: 37 + 38 + ```json 39 + { 40 + "$type": "ac.foundation.dataset.sampleSchema", 41 + "name": "ImageSample", 42 + "version": "1.0.0", 43 + "schemaType": "jsonSchema", 44 + "schema": { 45 + "$type": "ac.foundation.dataset.sampleSchema#jsonSchemaFormat", 46 + ... 47 + } 48 + } 49 + ``` 50 + 51 + In the Lexicon definition: 52 + 53 + ```json 54 + { 55 + "schemaType": { 56 + "type": "ref", 57 + "ref": "ac.foundation.dataset.schemaType" 58 + } 59 + } 60 + ``` 61 + 62 + ## Adding New Schema Types 63 + 64 + To add support for a new schema format (e.g., Avro, Protobuf): 65 + 66 + ### 1. Add token def to schemaType Lexicon 67 + 68 + Edit `ac.foundation.dataset.schemaType.json`: 69 + 70 + ```json 71 + { 72 + "defs": { 73 + "main": { 74 + "type": "string", 75 + "knownValues": ["jsonSchema", "avro"], 76 + "maxLength": 50 77 + }, 78 + "avro": { 79 + "type": "token", 80 + "description": "Apache Avro schema format..." 81 + } 82 + } 83 + } 84 + ``` 85 + 86 + ### 2. Add format def to sampleSchema Lexicon 87 + 88 + Edit `ac.foundation.dataset.sampleSchema.json`: 89 + 90 + ```json 91 + { 92 + "defs": { 93 + "avroFormat": { 94 + "type": "object", 95 + "description": "Apache Avro schema format...", 96 + "required": ["$type", "type"], 97 + "properties": { 98 + "$type": { 99 + "type": "string", 100 + "const": "ac.foundation.dataset.sampleSchema#avroFormat" 101 + }, 102 + "type": { 103 + "type": "string" 104 + }, 105 + "fields": { 106 + "type": "array" 107 + } 108 + } 109 + } 110 + } 111 + } 112 + ``` 113 + 114 + ### 3. Update schema union refs 115 + 116 + In sampleSchema main record: 117 + 118 + ```json 119 + { 120 + "schema": { 121 + "type": "union", 122 + "refs": [ 123 + "ac.foundation.dataset.sampleSchema#jsonSchemaFormat", 124 + "ac.foundation.dataset.sampleSchema#avroFormat" 125 + ], 126 + "closed": false 127 + } 128 + } 129 + ``` 130 + 131 + ## Current Schema Types 132 + 133 + | Token Def | knownValue | Format Def | Description | 134 + |-----------|------------|------------|-------------| 135 + | `#jsonSchema` | `"jsonSchema"` | `#jsonSchemaFormat` | JSON Schema Draft 7 | 136 + 137 + ## Design Rationale 138 + 139 + This pattern provides: 140 + 141 + 1. **Centralized Registry**: Single Lexicon (`schemaType`) lists all supported types 142 + 2. **Type Safety**: Token defs provide canonical documentation for each schema type 143 + 3. **Extensibility**: New types added to `knownValues` + token defs without breaking changes 144 + 4. **Validation**: Refs ensure schemaType values are validated against known types 145 + 5. **Discoverability**: Query `ac.foundation.dataset.schemaType` to see all supported types 146 + 147 + ## References 148 + 149 + - [ATProto Lexicon Token Type](https://atproto.com/guides/lexicon) 150 + - [ATProto Lexicon Spec](.reference/atproto_lexicon_spec.md)
+42 -30
.planning/lexicons/ac.foundation.dataset.sampleSchema.json
··· 4 4 "defs": { 5 5 "main": { 6 6 "type": "record", 7 - "description": "Definition of a PackableSample-compatible sample type using JSON Schema. Supports versioning via rkey format: {NSID}@{semver}", 7 + "description": "Definition of a PackableSample-compatible sample type. Supports versioning via rkey format: {NSID}@{semver}. Schema format is extensible via union type.", 8 8 "key": "any", 9 9 "record": { 10 10 "type": "object", ··· 12 12 "name", 13 13 "version", 14 14 "schemaType", 15 - "jsonSchema", 15 + "schema", 16 16 "createdAt" 17 17 ], 18 18 "properties": { ··· 28 28 "maxLength": 100 29 29 }, 30 30 "schemaType": { 31 - "type": "string", 32 - "description": "Type of schema definition (currently only 'jsonschema' supported)", 33 - "enum": ["jsonschema"], 34 - "default": "jsonschema" 31 + "type": "ref", 32 + "ref": "ac.foundation.dataset.schemaType", 33 + "description": "Type of schema definition. This field indicates which union member is present in the schema field." 35 34 }, 36 - "jsonSchema": { 37 - "type": "object", 38 - "description": "JSON Schema Draft 7 definition for this sample type. Use standard JSON Schema with NDArray shim for array types.", 39 - "required": ["$schema", "type", "properties"], 40 - "properties": { 41 - "$schema": { 42 - "type": "string", 43 - "const": "http://json-schema.org/draft-07/schema#" 44 - }, 45 - "type": { 46 - "type": "string", 47 - "const": "object" 48 - }, 49 - "properties": { 50 - "type": "object", 51 - "minProperties": 1 52 - } 53 - } 35 + "schema": { 36 + "type": "union", 37 + "refs": ["ac.foundation.dataset.sampleSchema#jsonSchemaFormat"], 38 + "closed": false, 39 + "description": "Schema definition for this sample type. Currently supports JSON Schema Draft 7. Union allows for future schema formats (Avro, Protobuf, etc.) without breaking changes." 54 40 }, 55 41 "description": { 56 42 "type": "string", 57 43 "description": "Human-readable description of what this sample type represents", 58 44 "maxLength": 5000 59 45 }, 60 - "ndarrayShimUri": { 61 - "type": "string", 62 - "format": "uri", 63 - "description": "URI to the NDArray JSON Schema shim definition (optional, defaults to standard shim at https://foundation.ac/schemas/atdata-ndarray-bytes/1.0.0)", 64 - "maxLength": 500 65 - }, 66 46 "metadata": { 67 47 "type": "object", 68 48 "description": "Optional metadata about this schema. Common fields include author, license, and tags, but any additional fields are permitted.", ··· 94 74 "format": "datetime", 95 75 "description": "Timestamp when this schema version was created. Immutable once set (ATProto records are permanent)." 96 76 } 77 + } 78 + } 79 + }, 80 + "jsonSchemaFormat": { 81 + "type": "object", 82 + "description": "JSON Schema Draft 7 format for sample type definitions. Used with NDArray shim for array types.", 83 + "required": ["$type", "$schema", "type", "properties"], 84 + "properties": { 85 + "$type": { 86 + "type": "string", 87 + "const": "ac.foundation.dataset.sampleSchema#jsonSchemaFormat" 88 + }, 89 + "$schema": { 90 + "type": "string", 91 + "const": "http://json-schema.org/draft-07/schema#", 92 + "description": "JSON Schema version identifier" 93 + }, 94 + "type": { 95 + "type": "string", 96 + "const": "object", 97 + "description": "Sample types must be objects" 98 + }, 99 + "properties": { 100 + "type": "object", 101 + "description": "Field definitions for the sample type", 102 + "minProperties": 1 103 + }, 104 + "ndarrayShimUri": { 105 + "type": "string", 106 + "format": "uri", 107 + "description": "URI to the NDArray JSON Schema shim definition. Optional, defaults to https://foundation.ac/schemas/atdata-ndarray-bytes/1.0.0", 108 + "maxLength": 500 97 109 } 98 110 } 99 111 }
+16
.planning/lexicons/ac.foundation.dataset.schemaType.json
··· 1 + { 2 + "lexicon": 1, 3 + "id": "ac.foundation.dataset.schemaType", 4 + "defs": { 5 + "main": { 6 + "type": "string", 7 + "description": "Schema type identifier for atdata sample definitions. Known values correspond to token definitions in this Lexicon. New schema types can be added as tokens without breaking changes.", 8 + "knownValues": ["jsonSchema"], 9 + "maxLength": 50 10 + }, 11 + "jsonSchema": { 12 + "type": "token", 13 + "description": "JSON Schema Draft 7 format for sample type definitions. When schemaType is 'jsonSchema', the schema field must contain an object conforming to ac.foundation.dataset.sampleSchema#jsonSchemaFormat." 14 + } 15 + } 16 + }