Durable Streams over Toon
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

remove opus & glm protocols, accept kimi's synthesis

rektide 735fe089 38175c92

+3 -2100
+3
README.md
··· 1 + # duratoon - durable streams on TOON 2 + 3 + Alas not actually a good idea. I was hoping we could maybe find a way to re-use more values across multiple entries but I haven't figured out how to map this.
-1218
proto/state-protocol-toon.glm.md
··· 1 - # DuraTOON - Durable Streams State Protocol over TOON 2 - 3 - **Document:** DuraTOON Specification 4 - **Version:** 1.0 5 - **Date:** 2025-02-10 6 - **Status:** Extension of [Durable Streams State Protocol](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md) 7 - **Content-Type:** `text/toon` 8 - 9 - --- 10 - 11 - ## Abstract 12 - 13 - DuraTOON extends the Durable Streams State Protocol to use TOON (Token-Oriented Object Notation) as an alternative serialization format. While maintaining **complete semantic equivalence** with the base JSON protocol, DuraTOON achieves 30-60% payload reduction through TOON's tabular arrays, minimal quoting rules, and line-oriented structure. 14 - 15 - The primary optimization is **tabular array batching**: when multiple change messages share the same schema, field names are declared once in a header, eliminating per-message overhead. This CSV-like format remains human-readable while dramatically reducing bandwidth consumption. 16 - 17 - ### Key Design Insight 18 - 19 - State synchronization protocols transmit the same schemas repeatedly. A real-time application sending 1,000 presence updates repeats `"type": "presence"` and `"operation": "update"` 1,000 times. TOON's tabular format declares these once: 20 - 21 - ``` 22 - presence[1000]{type,key,value.status,value.lastSeen,headers.operation}: 23 - presence,user:1,online,1705312200000,update 24 - presence,user:2,online,1705312200100,update 25 - presence,user:3,away,1705312199000,update 26 - ... (997 more rows) 27 - ``` 28 - 29 - This optimization is transparent to protocol semantics—after decoding TOON to JSON, clients materialize state identically to the JSON variant. 30 - 31 - --- 32 - 33 - ## Table of Contents 34 - 35 - 1. [Introduction](#1-introduction) 36 - 2. [Terminology](#2-terminology) 37 - 3. [Protocol Overview](#3-protocol-overview) 38 - 4. [Message Format Mapping](#4-message-format-mapping) 39 - - 4.1. [Change Messages](#41-change-messages) 40 - - 4.2. [Tabular Change Batches](#42-tabular-change-batches) 41 - - 4.3. [Control Messages](#43-control-messages) 42 - 5. [Encoding Guidelines](#5-encoding-guidelines) 43 - 6. [Decoding Requirements](#6-decoding-requirements) 44 - 7. [State Materialization](#7-state-materialization) 45 - 8. [Schema Validation](#8-schema-validation) 46 - 9. [Security Considerations](#9-security-considerations) 47 - 10. [IANA Considerations](#10-iana-considerations) 48 - 11. [References](#11-references) 49 - 12. [Appendix A: Payload Reduction Analysis](#appendix-a-payload-reduction-analysis) 50 - 13. [Appendix B: Implementation Guidelines](#appendix-b-implementation-guidelines) 51 - 14. [Appendix C: Migration Guide](#appendix-c-migration-guide) 52 - 53 - --- 54 - 55 - ## 1. Introduction 56 - 57 - ### 1.1. Relationship to Base Protocol 58 - 59 - The Durable Streams State Protocol [STATE-PROTOCOL] defines a standard message format for state synchronization built on the Durable Streams Protocol [PROTOCOL]. It specifies: 60 - 61 - - **Change messages** ([Section 4.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#41-change-messages)) representing insert, update, and delete operations on entities 62 - - **Control messages** ([Section 4.2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#42-control-messages)) for snapshot boundaries and reset signals 63 - - **State materialization** ([Section 6](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#6-state-materialization)) rules for applying changes sequentially 64 - 65 - DuraTOON extends this specification by defining TOON encoding for the same message structures. It **does not modify protocol semantics**—it only provides an alternative serialization. 66 - 67 - ### 1.2. The Bandwidth Problem 68 - 69 - ```mermaid 70 - flowchart LR 71 - subgraph JSON["JSON Encoding"] 72 - direction TB 73 - J1["{type:'user',<br/>key:'1',<br/>value:{name:'Alice'},<br/>headers:{operation:'insert'}}"] 74 - J2["{type:'user',<br/>key:'2',<br/>value:{name:'Bob'},<br/>headers:{operation:'insert'}}"] 75 - J3["{type:'user',<br/>key:'3',<br/>value:{name:'Carol'},<br/>headers:{operation:'insert'}}"] 76 - end 77 - 78 - subgraph TOON["TOON Tabular Encoding"] 79 - direction TB 80 - H["users[3]{type,key,value.name,headers.operation}:"] 81 - R1["user,1,Alice,insert"] 82 - R2["user,2,Bob,insert"] 83 - R3["user,3,Carol,insert"] 84 - end 85 - 86 - JSON -->|~435 bytes| NW["Network"] 87 - TOON -->|~170 bytes| NW 88 - 89 - style JSON fill:#ffcccc 90 - style TOON fill:#ccffcc 91 - ``` 92 - 93 - **Figure 1:** JSON repeats field names for every message; TOON declares them once. For 1,000 messages, this difference compounds to **~260 KB saved** just from field name repetition. 94 - 95 - ### 1.3. Design Goals 96 - 97 - DuraTOON aims to be: 98 - 99 - - **Semantically Equivalent**: When decoded, TOON messages produce identical in-memory representations to JSON messages. Clients materialize the same state regardless of encoding. 100 - - **Bandwidth Efficient**: 30-60% payload reduction through tabular batching and minimal quoting rules 101 - - **Backward Compatible**: Servers support both `application/json` and `text/toon`; clients negotiate format via `Accept` header 102 - - **Human Readable**: Line-oriented, indentation-based structure remains debuggable and manually editable 103 - - **Streaming Friendly**: Explicit array lengths `[N]` enable validation without buffering entire arrays 104 - - **Format-Agnostic Materialization**: After decoding, materialization logic is identical to base protocol 105 - 106 - --- 107 - 108 - ## 2. Terminology 109 - 110 - This document uses terminology from [STATE-PROTOCOL Section 2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#2-terminology) with these additions: 111 - 112 - **TOON (Token-Oriented Object Notation)**: A compact serialization format using indentation-based structure, tabular arrays with explicit lengths, and minimal quoting rules [TOON-SPEC]. TOON preserves the JSON data model exactly. 113 - 114 - **Tabular Array**: TOON's optimization for homogeneous data. A header declares field names once; rows contain only values: 115 - ``` 116 - users[3]{name,email}: 117 - Alice,alice@example.com 118 - Bob,bob@example.com 119 - Carol,carol@example.com 120 - ``` 121 - 122 - **Field Path**: Dot-notation reference to nested fields in tabular headers (e.g., `value.name`, `headers.operation`). 123 - 124 - **Semantic Equivalence**: Two encodings produce identical data models when parsed. A TOON-encoded change message, when decoded, yields the same in-memory representation as its JSON equivalent from the base protocol. 125 - 126 - **Text/toon**: Provisional media type per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type). Implementers should monitor the TOON specification repository for updates and formal IANA registration. 127 - 128 - --- 129 - 130 - ## 3. Protocol Overview 131 - 132 - ### 3.1. Content-Type Negotiation 133 - 134 - Per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type), the provisional media type is: 135 - 136 - ``` 137 - Content-Type: text/toon 138 - ``` 139 - 140 - This media type is specified by the TOON format specification and should be used for TOON-encoded messages. TOON documents decode to the same JSON data model. 141 - 142 - **Client Request Example:** 143 - ```http 144 - GET /v1/stream/my-app HTTP/1.1 145 - Host: example.com 146 - Accept: text/toon, application/json;q=0.9 147 - ``` 148 - 149 - **Server Response (TOON):** 150 - ```http 151 - HTTP/1.1 200 OK 152 - Content-Type: text/toon 153 - ``` 154 - 155 - **Implementation Requirements:** 156 - 157 - - Clients **MAY** request `text/toon` via `Accept` header 158 - - Servers **MAY** respond with TOON if requested and supported 159 - - Implementations **MUST** support both `application/json` and `text/toon` for interoperability 160 - - Servers **SHOULD** default to `application/json` when no `Accept` header is present for backward compatibility 161 - 162 - ### 3.2. Message Flow Overview 163 - 164 - ```mermaid 165 - flowchart TD 166 - subgraph Stream["Durable Stream"] 167 - direction TB 168 - 169 - subgraph Batch["TOON Batch"] 170 - direction TB 171 - TB["changes[4]{type,key,value.name,headers.operation}:"] 172 - R1["user,1,Alice,insert"] 173 - R2["user,2,Bob,insert"] 174 - R3["user,1,Alice Smith,update"] 175 - R4["user,2,Bob Johnson,update"] 176 - end 177 - 178 - subgraph Ctrl["Control Message"] 179 - C["headers:\n control: snapshot-end\n offset: 1000"] 180 - end 181 - 182 - Batch --> Ctrl 183 - end 184 - 185 - subgraph Decode["TOON Decoder"] 186 - D1["Expand tabular array\ninto 4 change messages"] 187 - D2["Parse control\nmessage"] 188 - end 189 - 190 - subgraph Materialize["Materialization<br/>(identical to JSON variant)"] 191 - M["Apply insert(user:1, {name:Alice})\nApply insert(user:2, {name:Bob})\nApply update(user:1, {name:Alice Smith})\nApply update(user:2, {name:Bob Johnson})\nProcess snapshot-end"] 192 - end 193 - 194 - Stream --> Decode 195 - Decode --> Materialize 196 - 197 - style Stream fill:#e6f3ff 198 - style Decode fill:#fff4e6 199 - style Materialize fill:#e6ffe6 200 - ``` 201 - 202 - **Figure 2:** TOON batches are decoded into individual messages, then materialized exactly as defined in [STATE-PROTOCOL Section 6](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#6-state-materialization). 203 - 204 - ### 3.3. Recommended Batching Strategies 205 - 206 - TOON enables efficient batching patterns not present in the base specification: 207 - 208 - | Strategy | Description | Best For | 209 - |----------|-------------|----------| 210 - | **By Operation** | Group `insert`, `update`, `delete` operations separately | Bulk imports, cleanup jobs | 211 - | **By Entity Type** | Group messages for the same `type` together | Multi-type streams (chat: users, messages, reactions) | 212 - | **By Transaction** | Group messages with the same `txid` together | Atomic operations | 213 - | **Time-based** | Batch messages within time windows | High-frequency updates | 214 - 215 - **Encoding Recommendation:** 216 - 217 - Encoders **SHOULD**: 218 - 1. Accumulate consecutive change messages by schema shape 219 - 2. Encode homogeneous groups as tabular arrays 220 - 3. Fall back to TOON object format for heterogeneous messages 221 - 4. Always encode control messages as individual TOON objects (see [Section 4.3](#43-control-messages)) 222 - 223 - --- 224 - 225 - ## 4. Message Format Mapping 226 - 227 - This section maps each message structure from [STATE-PROTOCOL Section 4](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#4-message-types) to its TOON equivalent. 228 - 229 - ### 4.1. Change Messages 230 - 231 - #### 4.1.1. Single Change Message 232 - 233 - **JSON** ([from STATE-PROTOCOL Section 5.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#51-change-message-structure)): 234 - 235 - ```json 236 - { 237 - "type": "user", 238 - "key": "user:123", 239 - "value": { 240 - "name": "Alice", 241 - "email": "alice@example.com" 242 - }, 243 - "headers": { 244 - "operation": "insert", 245 - "timestamp": "2025-01-15T10:30:00Z" 246 - } 247 - } 248 - ``` 249 - 250 - **TOON equivalent:** 251 - 252 - ```toon 253 - type: user 254 - key: user:123 255 - value: 256 - name: Alice 257 - email: alice@example.com 258 - headers: 259 - operation: insert 260 - timestamp: 2025-01-15T10:30:00Z 261 - ``` 262 - 263 - **Size comparison:** 264 - 265 - | Format | Bytes | Reduction | 266 - |--------|-------|-----------| 267 - | JSON | ~145 bytes | Baseline | 268 - | TOON | ~80 bytes | **45%** | 269 - 270 - **Savings breakdown:** 271 - 272 - | Component | JSON | TOON | Reduction | 273 - |-----------|------|------|-----------| 274 - | Field name quotes | `"type":` (7 chars) | `type:` (5 chars) | 29% | 275 - | String value quotes | `"user"` (6 chars) | `user` (4 chars) | 33% | 276 - | Object delimiters | `"` `"value":{` (13 chars) | `value:` (6 chars) | 54% | 277 - | Closing braces | `}` (2 chars) | (none) | 100% | 278 - | Comma separators | 5 commas | 0 commas | 100% | 279 - 280 - #### 4.1.2. Insert Operation 281 - 282 - **JSON** ([Section 4.1.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#411-insert-operation)): 283 - 284 - ```json 285 - { 286 - "type": "user", 287 - "key": "user:123", 288 - "value": { 289 - "name": "Alice", 290 - "email": "alice@example.com" 291 - }, 292 - "headers": { 293 - "operation": "insert", 294 - "timestamp": "2025-01-15T10:30:00Z" 295 - } 296 - } 297 - ``` 298 - 299 - **TOON single message:** 300 - 301 - ```toon 302 - type: user 303 - key: user:123 304 - value: 305 - name: Alice 306 - email: alice@example.com 307 - headers: 308 - operation: insert 309 - timestamp: 2025-01-15T10:30:00Z 310 - ``` 311 - 312 - **TOON tabular batch (5 users):** 313 - 314 - ```toon 315 - users[5]{type,key,value.name,value.email,headers.operation,headers.timestamp}: 316 - user,user:1,Alice,alice1@example.com,insert,2025-01-15T10:30:00Z 317 - user,user:2,Bob,bob2@example.com,insert,2025-01-15T10:31:00Z 318 - user,user:3,Charlie,charlie3@example.com,insert,2025-01-15T10:32:00Z 319 - user,user:4,Diana,diana4@example.com,insert,2025-01-15T10:33:00Z 320 - user,user:5,Eve,eve5@example.com,insert,2025-01-15T10:34:00Z 321 - ``` 322 - 323 - **Size comparison:** 324 - 325 - | Format | 1 message | 5 messages | 100 messages | 1,000 messages | 326 - |--------|-----------|------------|--------------|-----------------| 327 - | JSON | ~145 bytes | ~725 bytes | ~14,500 bytes | ~145 KB | 328 - | TOON single | ~80 bytes | ~400 bytes | ~8,000 bytes | ~80 KB | 329 - | TOON tabular | N/A | ~215 bytes | ~4,200 bytes | ~42 KB | 330 - | **Reduction** | **45%** | **70%** | **71%** | **71%** | 331 - 332 - #### 4.1.3. Update Operation 333 - 334 - **JSON** ([Section 4.1.2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#412-update-operation)): 335 - 336 - ```json 337 - { 338 - "type": "user", 339 - "key": "user:123", 340 - "value": { 341 - "name": "Alice Smith", 342 - "email": "alice.new@example.com" 343 - }, 344 - "old_value": { 345 - "name": "Alice", 346 - "email": "alice@example.com" 347 - }, 348 - "headers": { 349 - "operation": "update", 350 - "timestamp": "2025-01-15T10:35:00Z" 351 - } 352 - } 353 - ``` 354 - 355 - **TOON single:** 356 - 357 - ```toon 358 - type: user 359 - key: user:123 360 - value: 361 - name: Alice Smith 362 - email: alice.new@example.com 363 - old_value: 364 - name: Alice 365 - email: alice@example.com 366 - headers: 367 - operation: update 368 - timestamp: 2025-01-15T10:35:00Z 369 - ``` 370 - 371 - **TOON tabular batch (3 updates):** 372 - 373 - ```toon 374 - updates[3]{type,key,value.email,old_value.email,headers.operation,headers.timestamp}: 375 - user,user:1,alice.new@example.com,alice@example.com,update,2025-01-15T10:35:00Z 376 - user,user:2,bob.updated@example.com,bob@example.com,update,2025-01-15T10:36:00Z 377 - user,user:3,charlie.new@example.com,charlie@example.com,update,2025-01-15T10:37:00Z 378 - ``` 379 - 380 - **Partial Update Optimization:** When batching updates where only certain fields change, tabular format can include only changed fields: 381 - 382 - ```toon 383 - updates[3]{type,key,value.email,headers.operation}: 384 - user,user:1,alice.new@example.com,update 385 - user,user:2,bob.updated@example.com,update 386 - user,user:3,charlie.new@example.com,update 387 - ``` 388 - 389 - #### 4.1.4. Delete Operation 390 - 391 - **JSON** ([Section 4.1.3](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#413-delete-operation)): 392 - 393 - ```json 394 - { 395 - "type": "user", 396 - "key": "user:123", 397 - "old_value": { 398 - "name": "Alice", 399 - "email": "alice@example.com" 400 - }, 401 - "headers": { 402 - "operation": "delete", 403 - "timestamp": "2025-01-15T10:40:00Z" 404 - } 405 - } 406 - ``` 407 - 408 - **TOON single (value omitted):** 409 - 410 - ```toon 411 - type: user 412 - key: user:123 413 - old_value: 414 - name: Alice 415 - email: alice@example.com 416 - headers: 417 - operation: delete 418 - timestamp: 2025-01-15T10:40:00Z 419 - ``` 420 - 421 - **TOON tabular batch (soft delete with old_value):** 422 - 423 - ```toon 424 - deletes[3]{type,key,old_value.name,old_value.email,headers.operation,headers.timestamp}: 425 - user,user:1,Alice,alice@example.com,delete,2025-01-15T10:40:00Z 426 - user,user:2,Bob,bob@example.com,delete,2025-01-15T10:41:00Z 427 - user,user:3,Charlie,charlie@example.com,delete,2025-01-15T10:42:00Z 428 - ``` 429 - 430 - **TOON tabular batch (hard delete, no old_value):** 431 - 432 - ```toon 433 - deletes[3]{type,key,headers.operation,headers.timestamp}: 434 - user,user:1,delete,2025-01-15T10:40:00Z 435 - user,user:2,delete,2025-01-15T10:41:00Z 436 - user,user:3,delete,2025-01-15T10:42:00Z 437 - ``` 438 - 439 - ### 4.2. Tabular Change Batches 440 - 441 - Tabular encoding is the primary optimization of DuraTOON. It applies when: 442 - 443 - 1. All messages in the batch are change messages (not control messages) 444 - 2. All messages share the same set of scalar fields in `value` and `old_value` 445 - 3. All messages use the same `headers` fields 446 - 447 - #### 4.2.1. Field Flattening 448 - 449 - Tabular arrays require flat column schemas. Nested objects **MUST** be flattened using dot notation in the header: 450 - 451 - | Nested Path | Tabular Column | 452 - |-------------|----------------| 453 - | `value.name` | `value.name` | 454 - | `value.email` | `value.email` | 455 - | `value.address.city` | `value.address.city` | 456 - | `headers.operation` | `headers.operation` | 457 - | `headers.timestamp` | `headers.timestamp` | 458 - | `headers.txid` | `headers.txid` | 459 - | `old_value.name` | `old_value.name` | 460 - 461 - #### 4.2.2. Named Tabular Arrays 462 - 463 - Tabular arrays **SHOULD** use descriptive names that indicate their content: 464 - 465 - | Array Name | Use Case | 466 - |------------|----------| 467 - | `changes` | Mixed operations | 468 - | `inserts` | Insert-only batch | 469 - | `updates` | Update-only batch | 470 - | `deletes` | Delete-only batch | 471 - | `users` | Entity-type-specific batch | 472 - | `messages` | Entity-type-specific batch | 473 - | `presence` | Entity-type-specific batch | 474 - 475 - #### 4.2.3. Field Ordering Recommendation 476 - 477 - For tabular arrays, field order **SHOULD** follow this convention for efficient streaming parsers: 478 - 479 - 1. `type` 480 - 2. `key` 481 - 3. `value.*` fields (alphabetically) 482 - 4. `old_value.*` fields (alphabetically, if present) 483 - 5. `headers.*` fields (alphabetically) 484 - 485 - **Example:** 486 - 487 - ```toon 488 - changes[3]{type,key,value.email,value.name,old_value.email,old_value.name,headers.operation,headers.timestamp,headers.txid}: 489 - user,1,alice@example.com,Alice,,,insert,2025-01-15T10:30:00Z,tx-001 490 - ``` 491 - 492 - #### 4.2.4. Mixed Operation Batch 493 - 494 - When batching different operations with the same schema: 495 - 496 - ```toon 497 - changes[4]{type,key,value.name,value.status,headers.operation}: 498 - user,1,Alice,active,insert 499 - user,2,Bob,active,insert 500 - user,1,Alice,inactive,update 501 - user,2,,deleted,delete 502 - ``` 503 - 504 - **Empty Cell Handling:** Empty cells between delimiters (e.g., `value.name` for delete) are decoded as omitted fields, matching the semantics of delete operations where `value` is typically omitted. 505 - 506 - #### 4.2.5. Delimiter Selection 507 - 508 - TOON supports comma (default), tab, or pipe delimiters. Selection depends on data characteristics: 509 - 510 - | Delimiter | Best For | Quoting Impact | 511 - |-----------|----------|----------------| 512 - | Comma (default) | General purpose | May require quotes for values containing commas | 513 - | Tab | Data with few quoted strings | ~10-15% better efficiency | 514 - | Pipe | Data with commas and tabs | Similar to comma | 515 - 516 - **Tab-delimited example:** 517 - 518 - ```toon 519 - users[2 ]{type key value.name value.bio headers.operation}: 520 - user 1 Alice Writer, editor insert 521 - user 2 Bob Developer, speaker insert 522 - ``` 523 - 524 - ### 4.3. Control Messages 525 - 526 - Control messages **MUST NOT** be included in tabular arrays because they have different schemas than change messages. They must be encoded as individual TOON objects. 527 - 528 - #### 4.3.1. Snapshot Boundaries 529 - 530 - **JSON** ([Section 4.2.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#421-snapshot-boundaries)): 531 - 532 - ```json 533 - { 534 - "headers": { 535 - "control": "snapshot-start", 536 - "offset": "123456_000" 537 - } 538 - } 539 - ``` 540 - 541 - **TOON:** 542 - 543 - ```toon 544 - headers: 545 - control: snapshot-start 546 - offset: 123456_000 547 - ``` 548 - 549 - #### 4.3.2. Reset Control 550 - 551 - **JSON** ([Section 4.2.2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#422-reset-control)): 552 - 553 - ```json 554 - { 555 - "headers": { 556 - "control": "reset", 557 - "offset": "123456_000" 558 - } 559 - } 560 - ``` 561 - 562 - **TOON:** 563 - 564 - ```toon 565 - headers: 566 - control: reset 567 - offset: 123456_000 568 - ``` 569 - 570 - #### 4.3.3. Mixed Arrays with Control Messages 571 - 572 - When a batch contains both change and control messages, use TOON's mixed array format: 573 - 574 - ```toon 575 - [5]: 576 - - headers: 577 - control: snapshot-start 578 - offset: 100 579 - - type: user 580 - key: user:1 581 - value: 582 - name: Alice 583 - headers: 584 - operation: insert 585 - - type: user 586 - key: user:2 587 - value: 588 - name: Bob 589 - headers: 590 - operation: insert 591 - - type: config 592 - key: theme 593 - value: dark 594 - headers: 595 - operation: insert 596 - - headers: 597 - control: snapshot-end 598 - offset: 104 599 - ``` 600 - 601 - --- 602 - 603 - ## 5. Encoding Guidelines 604 - 605 - ### 5.1. When to Use Each Format 606 - 607 - | Format | Use When | Example | 608 - |--------|----------|---------| 609 - | **TOON Tabular** | High-frequency, homogeneous messages (>1/second) | Bulk user imports, presence heartbeats, flag updates | 610 - | **TOON Object** | Low-frequency or heterogeneous messages | Mixed entity types, single operations, debugging | 611 - | **JSON** | Interoperability with non-TOON clients | Legacy system integration, debugging tools | 612 - 613 - ### 5.2. Key Folding (OPTIONAL) 614 - 615 - TOON v3.0 supports key folding for single-key object chains. This is **OPTIONAL** but can reduce indentation overhead: 616 - 617 - **Without folding:** 618 - 619 - ```toon 620 - headers: 621 - operation: insert 622 - timestamp: 2025-01-15T10:30:00Z 623 - txid: tx-001 624 - ``` 625 - 626 - **With folding:** 627 - 628 - ```toon 629 - headers.operation: insert 630 - headers.timestamp: 2025-01-15T10:30:00Z 631 - headers.txid: tx-001 632 - ``` 633 - 634 - Key folding is **NOT RECOMMENDED** for tabular arrays—use dot notation in headers instead. 635 - 636 - ### 5.3. Quoting Rules 637 - 638 - Per TOON spec [TOON-SPEC], strings require quotes only when: 639 - 640 - - Empty string (`""`) 641 - - Leading or trailing whitespace 642 - - Literally equals `true`, `false`, or `null` 643 - - Looks like a number (e.g., `42`, `3.14`) 644 - - Contains special characters: `:`, `"`, `\`, `[`, `]`, `{`, `}` 645 - - Contains the active delimiter (comma, tab, or pipe) 646 - 647 - **Valid unquoted:** 648 - 649 - ```toon 650 - name: Alice 651 - email: alice@example.com 652 - key: user-123 653 - status: active 654 - ``` 655 - 656 - **Must quote:** 657 - 658 - ```toon 659 - name: "Alice "nickname"" 660 - key: "user:123" 661 - status: "true" 662 - count: "42" 663 - description: "Hello, world!" 664 - ``` 665 - 666 - --- 667 - 668 - ## 6. Decoding Requirements 669 - 670 - ### 6.1. Tabular Array Expansion 671 - 672 - Decoders **MUST** expand tabular arrays into individual change message objects: 673 - 674 - **Input:** 675 - 676 - ```toon 677 - users[2]{type,key,value.name,headers.operation}: 678 - user,1,Alice,insert 679 - user,2,Bob,insert 680 - ``` 681 - 682 - **Decoded JSON equivalent:** 683 - 684 - ```json 685 - [ 686 - {"type":"user","key":"1","value":{"name":"Alice"},"headers":{"operation":"insert"}}, 687 - {"type":"user","key":"2","value":{"name":"Bob"},"headers":{"operation":"insert"}} 688 - ] 689 - ``` 690 - 691 - ### 6.2. Dot-Notation Unflattening 692 - 693 - Dotted field names in tabular headers **MUST** be unflattened to nested objects: 694 - 695 - | Tabular Column | Decoded JSON Path | Resulting Structure | 696 - |----------------|-------------------|-------------------| 697 - | `value.name` | `value.name` | `{"value": {"name": ...}}` | 698 - | `value.address.city` | `value.address.city` | `{"value": {"address": {"city": ...}}}` | 699 - | `headers.operation` | `headers.operation` | `{"headers": {"operation": ...}}` | 700 - | `headers.txid` | `headers.txid` | `{"headers": {"txid": ...}}` | 701 - | `old_value.email` | `old_value.email` | `{"old_value": {"email": ...}}` | 702 - 703 - ### 6.3. Empty Cell Handling 704 - 705 - Empty cells in tabular rows **MUST** be interpreted as omitted fields, NOT as empty strings or null: 706 - 707 - **Input:** 708 - 709 - ```toon 710 - deletes[2]{type,key,value.name,headers.operation}: 711 - user,1,,delete 712 - user,2,,delete 713 - ``` 714 - 715 - **Decoded:** 716 - 717 - ```json 718 - [ 719 - {"type":"user","key":"1","headers":{"operation":"delete"}}, 720 - {"type":"user","key":"2","headers":{"operation":"delete"}} 721 - ] 722 - ``` 723 - 724 - Note: `value` is absent, matching the semantics of delete operations where `value` may be omitted. 725 - 726 - ### 6.4. Length Validation 727 - 728 - Per TOON spec, the `[N]` length declaration enables validation. Decoders **MUST**: 729 - 730 - 1. Read the declared length from the array header 731 - 2. Track the actual number of rows parsed 732 - 3. Verify that actual count matches declared length 733 - 734 - **Mismatch Handling:** If the actual row count does not match the declared `[N]`, this indicates truncation or corruption. Decoders **SHOULD**: 735 - 736 - - Reject the entire batch 737 - - Log an error with mismatch details 738 - - Request replay from the last known good offset 739 - 740 - ### 6.5. Streaming Considerations 741 - 742 - For large tabular arrays, decoders **SHOULD**: 743 - 744 - 1. Read the array header to get `[N]` and field declarations 745 - 2. Pre-allocate row buffers with enforced size limits (see [Section 9.3](#93-resource-limits)) 746 - 3. Parse rows sequentially, validating field count per row 747 - 4. Decode each row into a change message 748 - 5. **Apply each message immediately** to materialized state 749 - 6. Verify actual row count matches `[N]` at the end 750 - 751 - **Important:** Do not wait for the entire array to be received before processing. This reduces latency and memory usage. 752 - 753 - ### 6.6. Error Handling 754 - 755 - | Error Condition | Recommended Action | 756 - |-----------------|-------------------| 757 - | Row count ≠ `[N]` | Reject batch, log error, request replay from last good offset | 758 - | Wrong field count in row | Skip row, log error, continue processing | 759 - | Unparseable value in cell | Skip row, log error, continue processing | 760 - | Array size exceeds limit | Reject batch, close connection, log security event | 761 - 762 - --- 763 - 764 - ## 7. State Materialization 765 - 766 - State materialization follows [STATE-PROTOCOL Section 6](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#6-state-materialization) exactly. The encoding format (JSON vs TOON) is transparent to materialization logic. 767 - 768 - ### 7.1. Materialization Process 769 - 770 - ```mermaid 771 - flowchart LR 772 - subgraph Input["TOON Input"] 773 - T["users[3]{type,key,value.name}:\n user,1,Alice\n user,2,Bob\n user,3,Carol"] 774 - end 775 - 776 - subgraph Decode["TOON Decoder"] 777 - D["Expand to 3 change messages:\n• insert(user:1, {name:Alice})\n• insert(user:2, {name:Bob})\n• insert(user:3, {name:Carol})"] 778 - end 779 - 780 - subgraph Materialize["Materialized State"] 781 - S["user:\n '1' → {name:'Alice'}\n '2' → {name:'Bob'}\n '3' → {name:'Carol'}"] 782 - end 783 - 784 - Input --> Decode 785 - Decode --> Materialize 786 - 787 - style Input fill:#e6f3ff 788 - style Decode fill:#fff4e6 789 - style Materialize fill:#e6ffe6 790 - ``` 791 - 792 - **Figure 3:** TOON-encoded batches decode to the same change messages as JSON, producing identical materialized state. 793 - 794 - ### 7.2. Materialization Rules 795 - 796 - Clients materialize state by applying decoded change messages sequentially: 797 - 798 - 1. **Process messages in stream order** (as received) 799 - 2. **For change messages:** 800 - - `insert` operations: Store the entity at `type`/`key` 801 - - `update` operations: Replace the entity at `type`/`key` 802 - - `delete` operations: Remove the entity at `type`/`key` 803 - 3. **For control messages:** 804 - - Handle according to application logic (e.g., clear state on `reset`, use snapshot boundaries for consistency checks) 805 - 806 - ### 7.3. Storage Independence 807 - 808 - The protocol does not prescribe how state is stored. Implementations **MAY** use: 809 - 810 - - In-memory maps (for simple cases) 811 - - IndexedDB (for browser persistence) 812 - - SQLite (for local databases) 813 - - TanStack DB collections (for query interfaces) 814 - - Custom storage backends 815 - 816 - The choice of serialization format (JSON vs TOON) for the protocol **does not affect** storage decisions—storage remains in whatever format the implementation chooses. 817 - 818 - --- 819 - 820 - ## 8. Schema Validation 821 - 822 - Schema validation operates on decoded values, not TOON encoding itself. Implementations **MAY** validate using Standard Schema [STANDARD-SCHEMA] per [STATE-PROTOCOL Section 7](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#7-schema-validation). 823 - 824 - ### 8.1. Validation Process 825 - 826 - 1. **Decode TOON** to in-memory representation (same as JSON) 827 - 2. **Validate** each change message's `value` and `old_value` against entity type schema 828 - 3. **Apply** only valid messages to materialized state 829 - 4. **Handle** invalid messages per implementation policy (reject, log, or quarantine) 830 - 831 - ### 8.2. TOON-Specific Validation Considerations 832 - 833 - **Array length validation:** TOON's `[N]` syntax enables early detection of truncated arrays before schema validation. 834 - 835 - **Type preservation:** TOON preserves the JSON data model exactly. Schema validation is semantically equivalent regardless of encoding. 836 - 837 - **Quoting rules:** Minimal quoting does not affect validation—decoded strings are identical to their JSON equivalents. 838 - 839 - **Validation example:** 840 - 841 - **Tabular TOON Input:** 842 - 843 - ```toon 844 - users[3]{type,key,value.name,value.age,headers.operation}: 845 - user,user:1,Alice,30,insert 846 - user,user:2,Bob,25,insert 847 - user,user:3,Charlie,35,insert 848 - ``` 849 - 850 - **After decoding**, validation proceeds as with JSON: 851 - 852 - ```javascript 853 - import { z } from 'zod' 854 - 855 - const userSchema = z.object({ 856 - name: z.string(), 857 - age: z.number().int().positive() 858 - }) 859 - 860 - // Validate each row's value object 861 - userSchema.parse({ name: "Alice", age: 30 }) // ✓ Valid 862 - userSchema.parse({ name: "Bob", age: 25 }) // ✓ Valid 863 - userSchema.parse({ name: "Charlie", age: 35 }) // ✓ Valid 864 - ``` 865 - 866 - The protocol does not require schema validation, but implementations **SHOULD** provide validation capabilities for production use. 867 - 868 - --- 869 - 870 - ## 9. Security Considerations 871 - 872 - ### 9.1. Parser Security 873 - 874 - TOON parsers **MUST** implement the same security validations as JSON parsers: 875 - 876 - - **Depth limits** to prevent stack overflow (recommended: 32 levels maximum) 877 - - **Size limits** to prevent memory exhaustion (recommended: 10 MB document maximum) 878 - - **UTF-8 validation** on all string inputs 879 - - **Recursive reference detection** to prevent infinite loops 880 - 881 - ### 9.2. Length Declaration Attacks 882 - 883 - Malicious payloads may declare large `[N]` lengths without providing rows. Decoders **SHOULD**: 884 - 885 - - Stream-process rows without pre-allocating based on declared length 886 - - Enforce maximum array size limits (see [Section 9.3](#93-resource-limits)) 887 - - Timeout or abort if rows are not received within reasonable time 888 - - Validate actual row count matches declaration 889 - 890 - ### 9.3. Resource Limits 891 - 892 - When parsing tabular arrays, implementations **MUST** enforce these limits: 893 - 894 - | Resource | Recommended Limit | Rationale | 895 - |-----------|-------------------|------------| 896 - | Array size `[N]` | 10,000 rows | Prevent memory exhaustion from large declarations | 897 - | Row length | 64 KB | Prevent buffer overflow attacks | 898 - | Nesting depth | 32 levels | Prevent stack overflow | 899 - | Document size | 10 MB | Prevent memory exhaustion | 900 - | Field count per row | 256 fields | Prevent parsing complexity attacks | 901 - 902 - **Streaming Parsing:** Implementations **SHOULD** use streaming parsers that process data incrementally rather than loading entire documents into memory. 903 - 904 - ### 9.4. Type and Key Validation 905 - 906 - Per [STATE-PROTOCOL Section 9.4](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#94-type-and-key-validation), implementations **SHOULD** validate that `type` and `key` fields contain only expected values to prevent injection of unauthorized entity types or keys. 907 - 908 - This validation is performed after decoding, regardless of JSON vs TOON encoding. 909 - 910 - --- 911 - 912 - ## 10. IANA Considerations 913 - 914 - Per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type), the provisional media type is: 915 - 916 - **Media Type:** `text/toon` 917 - 918 - | Field | Value | 919 - |-------|-------| 920 - | **Type name** | text | 921 - | **Subtype name** | toon | 922 - | **Required parameters** | None | 923 - | **Optional parameters** | `charset` (default UTF-8) | 924 - | **Encoding considerations** | 8-bit, UTF-8 encoded text with LF line endings | 925 - | **Security considerations** | See Section 9 (Security Considerations) | 926 - | **Interoperability** | Semantically equivalent to application/json when decoded | 927 - | **Published specification** | This document and [TOON-SPEC] | 928 - | **Applications** | Real-time state synchronization, durable streams, bandwidth-constrained systems requiring JSON semantics | 929 - | **Intended usage** | COMMON (upon standardization) | 930 - | **Restrictions on usage** | None | 931 - | **Author** | DuraTOON working group | 932 - | **Change controller** | ElectricSQL | 933 - | **Additional information** | File extension: .toon, Macintosh file type code: TEXT | 934 - 935 - **Note:** The `text/toon` media type is provisional. Implementers should monitor [TOON-SPEC](https://github.com/toon-format/spec) for updates and formal IANA registration. 936 - 937 - --- 938 - 939 - ## 11. References 940 - 941 - ### 11.1. Normative References 942 - 943 - **[STATE-PROTOCOL]** 944 - Durable Streams State Protocol. ElectricSQL, 2025. 945 - <https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md> 946 - 947 - **[PROTOCOL]** 948 - Durable Streams Protocol. ElectricSQL, 2025. 949 - <https://github.com/electric-sql/durable-streams/blob/main/PROTOCOL.md> 950 - 951 - **[TOON-SPEC]** 952 - TOON (Token-Oriented Object Notation) Specification v3.0. Johann Schopplich, 2025. 953 - <https://github.com/toon-format/spec> 954 - 955 - **[RFC2119]** 956 - Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. 957 - <https://www.rfc-editor.org/info/rfc2119> 958 - 959 - **[RFC3339]** 960 - Klyne, G. and C. Newman, "Date and Time on Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002. 961 - <https://www.rfc-editor.org/info/rfc3339> 962 - 963 - **[RFC6839]** 964 - Kyzivat, P., "Additional Media Type Suffixes", RFC 6839, DOI 10.17487/RFC6839, January 2013. 965 - <https://www.rfc-editor.org/info/rfc6839> 966 - 967 - **[RFC8174]** 968 - Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. 969 - <https://www.rfc-editor.org/info/rfc8174> 970 - 971 - **[RFC8259]** 972 - Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, December 2017. 973 - <https://www.rfc-editor.org/info/rfc8259> 974 - 975 - **[STANDARD-SCHEMA]** 976 - Standard Schema Specification. 977 - <https://github.com/standard-schema/spec> 978 - 979 - ### 11.2. Informative References 980 - 981 - **[JSON-SCHEMA]** 982 - Wright, A., Andrews, H., and B. Hutton, "JSON Schema: A Media Type for Describing JSON Documents", draft-wright-json-schema-00 (work in progress). 983 - 984 - **[TOON-WEBSITE]** 985 - TOON Format Official Website. 986 - <https://toonformat.dev> 987 - 988 - **[TOON-RUST]** 989 - toon-rs: Rust implementation of TOON format. Jimmy Stridh, 2025. 990 - <https://github.com/jimmystridh/toon-rs> 991 - 992 - --- 993 - 994 - ## Appendix A: Payload Reduction Analysis 995 - 996 - ### A.1. Single Message Component Breakdown 997 - 998 - | Component | JSON | TOON | Reduction | 999 - |-----------|------|------|-----------| 1000 - | Field name quotes (e.g., `"type":`) | 7 chars | 5 chars | 29% | 1001 - | String value quotes (e.g., `"user"`) | 6 chars | 4 chars | 33% | 1002 - | Object brace start (`"value":{`) | 13 chars | 6 chars | 54% | 1003 - | Object brace end (`}`) | 2 chars | 0 chars | 100% | 1004 - | Array brackets (`[` and `]`) | 2 chars | 4 chars* | -100%* | 1005 - | Commas between fields | 5 × 2 = 10 chars | 0 chars | 100% | 1006 - | **Total** | **~145 bytes** | **~80 bytes** | **~45%** | 1007 - 1008 - * TOON's `[N]:` syntax adds 4 chars instead of 2, but enables validation. Net benefit in batching scenarios. 1009 - 1010 - ### A.2. Scaling with Batch Size 1011 - 1012 - | Batch Size | JSON | TOON Single | TOON Tabular | Best Reduction | 1013 - |------------|------|-------------|--------------|----------------| 1014 - | 1 message | 145 B | 80 B | 80 B | 45% | 1015 - | 10 messages | 1,450 B | 800 B | 500 B | 65% | 1016 - | 100 messages | 14,500 B | 8,000 B | 4,200 B | 71% | 1017 - | 1,000 messages | 145 KB | 80 KB | 40 KB | 72% | 1018 - 1019 - ### A.3. Snapshot Export Analysis 1020 - 1021 - | Entity Type | JSON | TOON | Reduction | 1022 - |-------------|------|------|-----------| 1023 - | Users (100 records) | ~30 KB | ~11 KB | **63%** | 1024 - | Messages (500 records) | ~150 KB | ~52 KB | **65%** | 1025 - | Reactions (200 records) | ~60 KB | ~21 KB | **65%** | 1026 - | Snapshot metadata | ~200 bytes | ~120 bytes | **40%** | 1027 - | **Total** | **~240 KB** | **~84 KB** | **65%** | 1028 - 1029 - ### A.4. Real-World Bandwidth Impact 1030 - 1031 - | Application | Messages/sec | JSON Bandwidth/hour | TOON Bandwidth/hour | Monthly Savings | 1032 - |-------------|--------------|-------------------|---------------------|----------------| 1033 - | Chat (1M users) | 10 avg | ~2 GB | ~0.7 GB | ~30 GB | 1034 - | Presence tracking | 5 updates | ~1 GB | ~0.4 GB | ~18 GB | 1035 - | Feature flags | 1/min | ~10 MB | ~4 MB | ~180 MB | 1036 - | Collaborative editing | 20/sec | ~4 GB | ~1.5 GB | ~75 GB | 1037 - 1038 - --- 1039 - 1040 - ## Appendix B: Implementation Guidelines 1041 - 1042 - ### B.1. Rust Implementation 1043 - 1044 - **Recommended crate:** `toon` (toon-rs) with performance features 1045 - 1046 - ```toml 1047 - [dependencies] 1048 - toon = { version = "0.1", features = ["de_direct", "perf_memchr", "perf_smallvec"] } 1049 - ``` 1050 - 1051 - **Example: Encoding and Decoding** 1052 - 1053 - ```rust 1054 - use toon::{Options, Delimiter}; 1055 - 1056 - let opts = Options { 1057 - delimiter: Delimiter::Comma, // or Tab, Pipe 1058 - ..Options::default() 1059 - }; 1060 - 1061 - // Encode a batch of changes 1062 - let toon = toon::encode_to_string(&changes_batch, &opts)?; 1063 - 1064 - // Decode a batch 1065 - let changes: Vec<ChangeMessage> = toon::decode_from_str(&toon, &opts)?; 1066 - ``` 1067 - 1068 - ### B.2. TypeScript/JavaScript Implementation 1069 - 1070 - ```bash 1071 - npm install toon 1072 - ``` 1073 - 1074 - ```typescript 1075 - import { parse, stringify } from 'toon' 1076 - 1077 - // Encode a batch 1078 - const toon = stringify(changesBatch, { delimiter: ',' }) 1079 - 1080 - // Decode a batch 1081 - const changes = parse(toon) 1082 - ``` 1083 - 1084 - ### B.3. Content Negotiation Implementation 1085 - 1086 - **Server (Express.js):** 1087 - 1088 - ```typescript 1089 - import { stringify } from 'toon' 1090 - 1091 - app.get('/stream/:id', (req, res) => { 1092 - const format = req.accepts('text/toon', 'application/json') 1093 - 1094 - res.format({ 1095 - 'text/toon': () => { 1096 - res.type('text/toon') 1097 - res.send(stringify(messages, { delimiter: ',' })) 1098 - }, 1099 - 'application/json': () => { 1100 - res.json(messages) 1101 - } 1102 - }) 1103 - }) 1104 - ``` 1105 - 1106 - **Client:** 1107 - 1108 - ```typescript 1109 - const stream = await DurableStream.create({ 1110 - url: 'https://server.com/v1/stream/my-app', 1111 - contentType: 'text/toon' // Request TOON 1112 - }) 1113 - 1114 - stream.subscribeText(async (batch) => { 1115 - const messages = parse(batch) // TOON decoder 1116 - for (const msg of messages) { 1117 - state.apply(msg) 1118 - } 1119 - }) 1120 - ``` 1121 - 1122 - ### B.4. Streaming Parser Guidelines 1123 - 1124 - For optimal performance with large tabular arrays: 1125 - 1126 - 1. **Read incrementally**: Process data as it arrives, don't buffer entire documents 1127 - 2. **Validate early**: Check `[N]` and field count as you parse 1128 - 3. **Apply immediately**: Don't wait for entire batch before materializing 1129 - 4. **Use streaming APIs**: Leverage native streaming where available (Node.js streams, Rust `Read` traits) 1130 - 1131 - --- 1132 - 1133 - ## Appendix C: Migration Guide 1134 - 1135 - ### C.1. Server-Side Migration 1136 - 1137 - **Step 1: Add TOON encoding capability** 1138 - 1139 - ```typescript 1140 - import { stringify } from 'toon' 1141 - 1142 - function encodeMessages( 1143 - messages: ChangeMessage[], 1144 - format: 'json' | 'toon' 1145 - ): string { 1146 - if (format === 'toon') { 1147 - return stringify(messages, { delimiter: ',' }) 1148 - } 1149 - return JSON.stringify(messages) 1150 - } 1151 - ``` 1152 - 1153 - **Step 2: Support content negotiation** 1154 - 1155 - ```typescript 1156 - app.get('/stream/:id', (req, res) => { 1157 - const format = req.accepts('text/toon', 'application/json') 1158 - 1159 - res.format({ 1160 - 'text/toon': () => { 1161 - res.type('text/toon') 1162 - res.send(encodeToTOON(messages)) 1163 - }, 1164 - 'application/json': () => { 1165 - res.send(JSON.stringify(messages)) 1166 - } 1167 - }) 1168 - }) 1169 - ``` 1170 - 1171 - ### C.2. Client-Side Migration 1172 - 1173 - ```typescript 1174 - import { parse } from 'toon' 1175 - 1176 - const stream = await DurableStream.create({ 1177 - url: 'https://server.com/v1/stream/my-app', 1178 - contentType: 'text/toon' // New: TOON 1179 - // contentType: 'application/json' // Old: JSON 1180 - }) 1181 - 1182 - stream.subscribeText(async (batch) => { 1183 - const messages = parse(batch) // TOON decoder 1184 - for (const msg of messages) { 1185 - state.apply(msg) 1186 - } 1187 - }) 1188 - ``` 1189 - 1190 - ### C.3. Phased Rollout Strategy 1191 - 1192 - | Phase | Action | Risk | Duration | 1193 - |-------|--------|-------|----------| 1194 - | **Phase 1** | Add TOON support, keep JSON as default | Low | 1-2 weeks | 1195 - | **Phase 2** | Roll out TOON clients gradually (10% → 50% → 100%) | Medium | 2-4 weeks | 1196 - | **Phase 3** | Make TOON default, JSON as fallback | Low | 1 week | 1197 - | **Phase 4** | Deprecate JSON (future version) | High | TBD | 1198 - 1199 - ### C.4. A/B Testing Metrics 1200 - 1201 - | Metric | Measurement | Expected Change | 1202 - |--------|--------------|----------------| 1203 - | **Bandwidth** | Bytes transmitted per message | 40-60% reduction | 1204 - | **Encoding latency** | Time to encode a batch | 10-20% increase | 1205 - | **Decoding latency** | Time to decode a batch | 5-10% decrease | 1206 - | **Error rates** | Decode failures per 1M messages | No change | 1207 - | **Memory usage** | Peak memory allocation | 20-30% reduction | 1208 - | **CPU usage** | CPU time per message | ±5% (depends on implementation) | 1209 - 1210 - --- 1211 - 1212 - ## Copyright Notice 1213 - 1214 - Copyright (c) 2025 rektide 1215 - 1216 - This document extends the Durable Streams State Protocol specification. The base specification is available at <https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md>. 1217 - 1218 - This document and information contained herein are provided on an "AS IS" basis. rektide disclaims all warranties, express or implied, including but not limited to any warranty that the use of the information herein will not infringe any rights or any implied warranties of merchantability or fitness for a particular purpose.
proto/state-protocol-toon.kimi.md proto/state-protocol-toon.md
-882
proto/state-protocol-toon.opus.md
··· 1 - # Durable Streams State Protocol - TOON Encoding Extension 2 - 3 - **Document:** State Protocol TOON Extension 4 - **Version:** 0.1 5 - **Date:** 2025-02-10 6 - **Status:** Draft Extension of Durable Streams State Protocol 7 - **Content-Type:** `text/toon` 8 - 9 - --- 10 - 11 - ## Abstract 12 - 13 - This document specifies a TOON (Token-Oriented Object Notation) encoding for the Durable Streams State Protocol [STATE-PROTOCOL]. By leveraging TOON's tabular arrays, minimal quoting, and line-oriented structure, this extension achieves 30-60% reduction in payload size compared to JSON encoding while enabling efficient batching of homogeneous change messages. 14 - 15 - ### Key Optimization: Tabular Array Batching 16 - 17 - The primary optimization is **TOON's tabular array format** for batching multiple change records: 18 - 19 - 1. **Declare fields once** in the array header instead of repeating keys per record 20 - 2. **Eliminate quotes** for most values (minimal quoting rules) 21 - 3. **Eliminate braces** between objects (line-based structure) 22 - 4. **Explicit length `[N]`** for validation and streaming 23 - 24 - This is both human-readable (CSV-like) and lightweight, ideal for state synchronization. 25 - 26 - ## Table of Contents 27 - 28 - 1. [Introduction](#1-introduction) 29 - 2. [Rationale](#2-rationale) 30 - 3. [Content-Type](#3-content-type) 31 - 4. [Message Format Mapping](#4-message-format-mapping) 32 - - 4.1. [Change Messages](#41-change-messages) 33 - - 4.2. [Tabular Change Batches](#42-tabular-change-batches) 34 - - 4.3. [Control Messages](#43-control-messages) 35 - 5. [Encoding Guidelines](#5-encoding-guidelines) 36 - 6. [Decoding Requirements](#6-decoding-requirements) 37 - 7. [Examples](#7-examples) 38 - 8. [Security Considerations](#8-security-considerations) 39 - 9. [IANA Considerations](#9-iana-considerations) 40 - 10. [References](#10-references) 41 - 11. [Appendix A: Payload Reduction Analysis](#appendix-a-payload-reduction-analysis) 42 - 12. [Appendix B: Implementation Guidelines](#appendix-b-implementation-guidelines) 43 - 13. [Appendix C: Migration Guide](#appendix-c-migration-guide) 44 - 45 - --- 46 - 47 - ## 1. Introduction 48 - 49 - The base State Protocol [STATE-PROTOCOL] requires `Content-Type: application/json`. This extension defines an alternative encoding using TOON [TOON-SPEC], a token-efficient serialization format designed for LLM workloads and bandwidth-constrained environments. 50 - 51 - ### 1.1. Key Benefits 52 - 53 - | Aspect | JSON Encoding | TOON Encoding | 54 - |--------|---------------|---------------| 55 - | Payload size | Baseline | 30-60% smaller | 56 - | Batched changes | Array of objects | Tabular array (CSV-like) | 57 - | Human readability | Good | Good (indentation-based) | 58 - | Quoting overhead | All strings quoted | Minimal quoting | 59 - | Array validation | Implicit | Explicit length `[N]` | 60 - 61 - --- 62 - 63 - ## 2. Rationale 64 - 65 - ### 2.1. Original JSON Encoding 66 - 67 - Section 4.1 of STATE-PROTOCOL defines change messages as JSON objects: 68 - 69 - ```json 70 - { 71 - "type": "user", 72 - "key": "user:123", 73 - "value": { "name": "Alice", "email": "alice@example.com" }, 74 - "headers": { "operation": "insert", "timestamp": "2025-01-15T10:30:00Z" } 75 - } 76 - ``` 77 - 78 - **Overhead sources:** 79 - - Repeated field names (`"type"`, `"key"`, `"value"`, `"headers"`, `"operation"`) per message 80 - - Mandatory quoting of all string keys and values 81 - - Brace/bracket delimiters `{ } [ ]` 82 - - Colon and comma separators with quotes 83 - 84 - ### 2.2. TOON Advantages 85 - 86 - TOON addresses these overheads through: 87 - 88 - 1. **Tabular Arrays** - When multiple objects share the same schema, TOON renders them as header + rows, eliminating repeated field names 89 - 2. **Minimal Quoting** - Strings are unquoted unless they contain special characters or resemble other types 90 - 3. **Indentation-Based Structure** - Replaces braces with whitespace 91 - 4. **Explicit Lengths** - `[N]` declarations enable validation and stream recovery 92 - 93 - ### 2.3. Tabular Batching Opportunity 94 - 95 - The State Protocol frequently transmits batches of homogeneous change messages (e.g., multiple inserts of the same entity type). TOON's tabular format is ideal for this pattern: 96 - 97 - **JSON (3 user inserts):** 98 - ```json 99 - [ 100 - {"type":"user","key":"1","value":{"name":"Alice","role":"admin"},"headers":{"operation":"insert"}}, 101 - {"type":"user","key":"2","value":{"name":"Bob","role":"user"},"headers":{"operation":"insert"}}, 102 - {"type":"user","key":"3","value":{"name":"Carol","role":"user"},"headers":{"operation":"insert"}} 103 - ] 104 - ``` 105 - 106 - **TOON tabular equivalent:** 107 - ```toon 108 - [3]{type,key,value.name,value.role,headers.operation}: 109 - user,1,Alice,admin,insert 110 - user,2,Bob,user,insert 111 - user,3,Carol,user,insert 112 - ``` 113 - 114 - The TOON representation eliminates repeated field names entirely, achieving ~60% size reduction for uniform batches. 115 - 116 - --- 117 - 118 - ## 3. Content-Type 119 - 120 - Streams using TOON encoding **MUST** use: 121 - 122 - ``` 123 - Content-Type: text/toon 124 - ``` 125 - 126 - or with charset: 127 - 128 - ``` 129 - Content-Type: text/toon; charset=utf-8 130 - ``` 131 - 132 - The `+json` suffix indicates semantic equivalence to JSON when decoded (per RFC 6839). 133 - 134 - ### Content Negotiation 135 - 136 - Implementations **MUST** support both JSON (`application/json`) and TOON (`text/toon`) content types. 137 - 138 - **Client request:** 139 - ```http 140 - Accept: text/toon, application/json;q=0.9 141 - ``` 142 - 143 - **Server response (TOON preferred):** 144 - ```http 145 - Content-Type: text/toon 146 - ``` 147 - 148 - Servers **MAY** default to JSON for backward compatibility when no `Accept` header is present. 149 - 150 - --- 151 - 152 - ## 4. Message Format Mapping 153 - 154 - This section maps each STATE-PROTOCOL message structure to its TOON equivalent. 155 - 156 - ### 4.1. Change Messages 157 - 158 - #### 4.1.1. Single Change Message 159 - 160 - **STATE-PROTOCOL Section 5.1** defines change messages with fields: `type`, `key`, `value`, `old_value` (optional), and `headers`. 161 - 162 - **JSON (from spec):** 163 - ```json 164 - { 165 - "type": "user", 166 - "key": "user:123", 167 - "value": { 168 - "name": "Alice", 169 - "email": "alice@example.com" 170 - }, 171 - "headers": { 172 - "operation": "insert", 173 - "timestamp": "2025-01-15T10:30:00Z" 174 - } 175 - } 176 - ``` 177 - 178 - **TOON equivalent:** 179 - ```toon 180 - type: user 181 - key: user:123 182 - value: 183 - name: Alice 184 - email: alice@example.com 185 - headers: 186 - operation: insert 187 - timestamp: 2025-01-15T10:30:00Z 188 - ``` 189 - 190 - **Commentary:** Single messages benefit from eliminated quoting and brace removal. The indentation-based nesting is more readable and ~25% smaller. 191 - 192 - #### 4.1.2. Update with old_value 193 - 194 - **JSON:** 195 - ```json 196 - { 197 - "type": "user", 198 - "key": "user:123", 199 - "value": { "name": "Alice Smith", "email": "alice.new@example.com" }, 200 - "old_value": { "name": "Alice", "email": "alice@example.com" }, 201 - "headers": { "operation": "update", "timestamp": "2025-01-15T10:35:00Z" } 202 - } 203 - ``` 204 - 205 - **TOON equivalent:** 206 - ```toon 207 - type: user 208 - key: user:123 209 - value: 210 - name: Alice Smith 211 - email: alice.new@example.com 212 - old_value: 213 - name: Alice 214 - email: alice@example.com 215 - headers: 216 - operation: update 217 - timestamp: 2025-01-15T10:35:00Z 218 - ``` 219 - 220 - #### 4.1.3. Delete Operation 221 - 222 - **JSON (value omitted):** 223 - ```json 224 - { 225 - "type": "user", 226 - "key": "user:123", 227 - "old_value": { "name": "Alice", "email": "alice@example.com" }, 228 - "headers": { "operation": "delete", "timestamp": "2025-01-15T10:40:00Z" } 229 - } 230 - ``` 231 - 232 - **TOON equivalent:** 233 - ```toon 234 - type: user 235 - key: user:123 236 - old_value: 237 - name: Alice 238 - email: alice@example.com 239 - headers: 240 - operation: delete 241 - timestamp: 2025-01-15T10:40:00Z 242 - ``` 243 - 244 - **Commentary:** Delete messages with `value: null` in JSON can simply omit the field in TOON, as the absence is semantically equivalent. 245 - 246 - ### 4.2. Tabular Change Batches 247 - 248 - This is the primary optimization of the TOON extension. When a batch contains multiple change messages with uniform structure, they **SHOULD** be encoded as a tabular array. 249 - 250 - #### 4.2.1. Tabular Format Requirements 251 - 252 - Tabular encoding applies when: 253 - 1. All messages in the batch are change messages (not control messages) 254 - 2. All messages share the same set of scalar fields in `value` 255 - 3. All messages use the same `headers` fields 256 - 257 - #### 4.2.2. Field Flattening 258 - 259 - TOON tabular arrays require flat column schemas. Nested objects **MUST** be flattened using dot notation in the header: 260 - 261 - | Nested Path | Tabular Column | 262 - |-------------|----------------| 263 - | `value.name` | `value.name` | 264 - | `value.email` | `value.email` | 265 - | `headers.operation` | `headers.operation` | 266 - | `headers.timestamp` | `headers.timestamp` | 267 - | `headers.txid` | `headers.txid` | 268 - 269 - #### 4.2.3. Named Tabular Arrays 270 - 271 - Tabular arrays **SHOULD** use descriptive names that indicate their content: 272 - 273 - | Array Name | Use Case | 274 - |------------|----------| 275 - | `changes` | Mixed operations | 276 - | `inserts` | Insert-only batch | 277 - | `updates` | Update-only batch | 278 - | `deletes` | Delete-only batch | 279 - | `users` | Entity-type-specific batch | 280 - | `messages` | Entity-type-specific batch | 281 - 282 - #### 4.2.4. Homogeneous Batch Example 283 - 284 - **STATE-PROTOCOL Section 8.3** shows multi-type streams. For homogeneous batches within a type: 285 - 286 - **JSON:** 287 - ```json 288 - [ 289 - {"type":"user","key":"1","value":{"name":"Alice","role":"admin"},"headers":{"operation":"insert","txid":"tx-001"}}, 290 - {"type":"user","key":"2","value":{"name":"Bob","role":"user"},"headers":{"operation":"insert","txid":"tx-001"}}, 291 - {"type":"user","key":"3","value":{"name":"Carol","role":"user"},"headers":{"operation":"insert","txid":"tx-001"}} 292 - ] 293 - ``` 294 - 295 - **TOON tabular (named `users`):** 296 - ```toon 297 - users[3]{type,key,value.name,value.role,headers.operation,headers.txid}: 298 - user,1,Alice,admin,insert,tx-001 299 - user,2,Bob,user,insert,tx-001 300 - user,3,Carol,user,insert,tx-001 301 - ``` 302 - 303 - **Size comparison:** 304 - - JSON: ~380 bytes 305 - - TOON: ~185 bytes 306 - - **Reduction: ~51%** 307 - 308 - #### 4.2.5. Mixed Operation Batch 309 - 310 - When a batch contains different operations but the same schema: 311 - 312 - **TOON:** 313 - ```toon 314 - [4]{type,key,value.name,value.status,headers.operation}: 315 - user,1,Alice,active,insert 316 - user,2,Bob,active,insert 317 - user,1,Alice,inactive,update 318 - user,2,,deleted,delete 319 - ``` 320 - 321 - **Commentary:** Empty values (like `value.name` for delete) render as empty cells between delimiters. 322 - 323 - #### 4.2.6. Delimiter Selection 324 - 325 - Per TOON spec, delimiters can be comma (default), tab, or pipe. For State Protocol payloads: 326 - 327 - - **Comma** (default): Best for most cases 328 - - **Tab**: Better when values may contain commas (e.g., addresses, descriptions) 329 - - **Pipe**: Alternative when both comma and tab appear in data 330 - 331 - **Tab-delimited example:** 332 - ```toon 333 - [2 ]{type key value.name value.bio headers.operation}: 334 - user 1 Alice Writer, editor insert 335 - user 2 Bob Developer, speaker insert 336 - ``` 337 - 338 - ### 4.3. Control Messages 339 - 340 - #### 4.3.1. Snapshot Boundaries 341 - 342 - **STATE-PROTOCOL Section 4.2.1** defines snapshot-start and snapshot-end controls. 343 - 344 - **JSON:** 345 - ```json 346 - { "headers": { "control": "snapshot-start", "offset": "123456_000" } } 347 - ``` 348 - 349 - **TOON:** 350 - ```toon 351 - headers: 352 - control: snapshot-start 353 - offset: 123456_000 354 - ``` 355 - 356 - #### 4.3.2. Reset Control 357 - 358 - **JSON:** 359 - ```json 360 - { "headers": { "control": "reset", "offset": "123456_000" } } 361 - ``` 362 - 363 - **TOON:** 364 - ```toon 365 - headers: 366 - control: reset 367 - offset: 123456_000 368 - ``` 369 - 370 - #### 4.3.3. Control Messages in Batches 371 - 372 - Control messages **MUST NOT** be included in tabular arrays because they have different schemas than change messages. When a batch contains both change and control messages, use mixed array format: 373 - 374 - **TOON mixed array:** 375 - ```toon 376 - [3]: 377 - - headers: 378 - control: snapshot-start 379 - offset: 100 380 - - type: user 381 - key: 1 382 - value: 383 - name: Alice 384 - headers: 385 - operation: insert 386 - - headers: 387 - control: snapshot-end 388 - offset: 200 389 - ``` 390 - 391 - --- 392 - 393 - ## 5. Encoding Guidelines 394 - 395 - ### 5.1. Batching Strategies 396 - 397 - The TOON variant enables batching patterns not present in the original specification: 398 - 399 - | Strategy | Description | Best For | 400 - |----------|-------------|----------| 401 - | **By Operation** | Group `insert`, `update`, `delete` separately | Bulk imports, cleanup jobs | 402 - | **By Entity Type** | Group same `type` together | Multi-type streams | 403 - | **By Transaction** | Group same `txid` together | Atomic operations | 404 - | **Time-based** | Batch messages within time window | High-frequency updates | 405 - 406 - ### 5.2. Tabular Preference 407 - 408 - Encoders **SHOULD**: 409 - 1. Group consecutive change messages by schema shape 410 - 2. Encode homogeneous groups as tabular arrays 411 - 3. Fall back to object format for heterogeneous messages or control messages 412 - 413 - **When to use single messages:** 414 - - Low-frequency updates (< 1 per second) 415 - - Messages with diverse field structures 416 - - Scenarios where batching adds unacceptable latency 417 - - Debugging/development (easier to read) 418 - 419 - **When to use tabular batching:** 420 - - High-frequency updates (> 1 per second) 421 - - Messages with uniform field structure 422 - - Bulk operations (imports, initial sync, snapshots) 423 - - Production environments where bandwidth matters 424 - 425 - ### 5.3. Field Ordering 426 - 427 - For tabular arrays, field order in the header **SHOULD** follow: 428 - 1. `type` 429 - 2. `key` 430 - 3. `value.*` fields (alphabetically) 431 - 4. `old_value.*` fields (alphabetically, if present) 432 - 5. `headers.*` fields (alphabetically) 433 - 434 - Consistent ordering enables efficient streaming parsers. 435 - 436 - ### 5.4. Key Folding 437 - 438 - TOON v3.0 supports key folding for single-key chains. For change messages, key folding **MAY** be used: 439 - 440 - **Without folding:** 441 - ```toon 442 - headers: 443 - operation: insert 444 - timestamp: 2025-01-15T10:30:00Z 445 - ``` 446 - 447 - **With folding:** 448 - ```toon 449 - headers.operation: insert 450 - headers.timestamp: 2025-01-15T10:30:00Z 451 - ``` 452 - 453 - Key folding is **OPTIONAL** but can reduce indentation overhead. 454 - 455 - ### 5.5. Quoting Rules 456 - 457 - Per TOON spec, strings require quotes only when: 458 - - Empty string (`""`) 459 - - Leading/trailing whitespace 460 - - Equals `true`, `false`, or `null` literally 461 - - Looks like a number 462 - - Contains special characters (`:`, `"`, `\`, `[`, `]`, `{`, `}`) 463 - - Contains the active delimiter 464 - 465 - **Unquoted (valid):** 466 - ```toon 467 - name: Alice 468 - email: alice@example.com 469 - key: user:123 470 - ``` 471 - 472 - **Quoted (required):** 473 - ```toon 474 - name: "Alice " 475 - key: "user:123" 476 - status: "true" 477 - count: "42" 478 - ``` 479 - 480 - **Commentary:** The `key: user:123` example shows that colons within values require quoting. Implementations **MUST** handle this correctly. 481 - 482 - --- 483 - 484 - ## 6. Decoding Requirements 485 - 486 - ### 6.1. Tabular Expansion 487 - 488 - Decoders **MUST** expand tabular arrays into individual change message objects: 489 - 490 - **TOON input:** 491 - ```toon 492 - [2]{type,key,value.name,headers.operation}: 493 - user,1,Alice,insert 494 - user,2,Bob,insert 495 - ``` 496 - 497 - **Decoded objects:** 498 - ```json 499 - [ 500 - {"type":"user","key":"1","value":{"name":"Alice"},"headers":{"operation":"insert"}}, 501 - {"type":"user","key":"2","value":{"name":"Bob"},"headers":{"operation":"insert"}} 502 - ] 503 - ``` 504 - 505 - ### 6.2. Dot-Notation Unflattening 506 - 507 - Dotted field names in tabular headers **MUST** be unflattened to nested objects: 508 - 509 - | Column | Resulting Path | 510 - |--------|----------------| 511 - | `value.name` | `{"value":{"name":...}}` | 512 - | `value.address.city` | `{"value":{"address":{"city":...}}}` | 513 - | `headers.operation` | `{"headers":{"operation":...}}` | 514 - 515 - ### 6.3. Empty Cell Handling 516 - 517 - Empty cells in tabular rows **SHOULD** be interpreted as: 518 - - Omitted field (not present in decoded object) 519 - - NOT as empty string or null 520 - 521 - This matches delete operation semantics where `value` may be absent. 522 - 523 - ### 6.4. Length Validation 524 - 525 - Per TOON spec, the `[N]` length declaration enables validation. Decoders **SHOULD** verify row count matches declared length. Mismatches indicate truncation or corruption. 526 - 527 - --- 528 - 529 - ## 7. Examples 530 - 531 - ### 7.1. Chat Application Batch 532 - 533 - **Scenario:** Real-time chat with users, messages, and reactions (STATE-PROTOCOL Section 8.3). 534 - 535 - **JSON (original):** 536 - ```json 537 - [ 538 - {"type":"user","key":"user:123","value":{"name":"Alice"},"headers":{"operation":"insert"}}, 539 - {"type":"message","key":"msg:456","value":{"userId":"user:123","text":"Hello!"},"headers":{"operation":"insert"}}, 540 - {"type":"reaction","key":"reaction:789","value":{"messageId":"msg:456","emoji":"👍"},"headers":{"operation":"insert"}} 541 - ] 542 - ``` 543 - 544 - **TOON (mixed types, non-tabular):** 545 - ```toon 546 - [3]: 547 - - type: user 548 - key: user:123 549 - value: 550 - name: Alice 551 - headers: 552 - operation: insert 553 - - type: message 554 - key: msg:456 555 - value: 556 - userId: user:123 557 - text: Hello! 558 - headers: 559 - operation: insert 560 - - type: reaction 561 - key: reaction:789 562 - value: 563 - messageId: msg:456 564 - emoji: 👍 565 - headers: 566 - operation: insert 567 - ``` 568 - 569 - **Commentary:** Heterogeneous types prevent tabular optimization. The line-oriented format still provides ~20% reduction through eliminated quoting. 570 - 571 - ### 7.2. Bulk User Import 572 - 573 - **Scenario:** Importing 1000 users with uniform schema. 574 - 575 - **JSON approach:** 1000 repeated `{"type":"user","key":...,"value":{...},"headers":{...}}` objects. 576 - 577 - **TOON tabular:** 578 - ```toon 579 - [1000]{type,key,value.name,value.email,value.role,headers.operation,headers.txid}: 580 - user,1,Alice,alice@example.com,admin,insert,import-001 581 - user,2,Bob,bob@example.com,user,insert,import-001 582 - user,3,Carol,carol@example.com,user,insert,import-001 583 - ... (997 more rows) 584 - ``` 585 - 586 - **Estimated savings:** 587 - - JSON: ~100 bytes/user × 1000 = 100KB 588 - - TOON: 80 byte header + 50 bytes/row × 1000 = 50KB 589 - - **Reduction: ~50%** 590 - 591 - ### 7.3. Presence Heartbeats 592 - 593 - **Scenario:** High-frequency presence updates (STATE-PROTOCOL Section 8.2). 594 - 595 - **TOON tabular (tab-delimited for readability):** 596 - ```toon 597 - [5 ]{type key value.status value.lastSeen headers.operation}: 598 - presence user:1 online 1705312200000 update 599 - presence user:2 online 1705312200100 update 600 - presence user:3 away 1705312199000 update 601 - presence user:4 online 1705312200200 update 602 - presence user:5 offline 1705312100000 update 603 - ``` 604 - 605 - ### 7.4. Snapshot with Boundaries 606 - 607 - **Scenario:** Full state snapshot with control messages. 608 - 609 - **TOON:** 610 - ```toon 611 - [5]: 612 - - headers: 613 - control: snapshot-start 614 - offset: 1000 615 - - type: user 616 - key: 1 617 - value: 618 - name: Alice 619 - active: true 620 - headers: 621 - operation: insert 622 - - type: user 623 - key: 2 624 - value: 625 - name: Bob 626 - active: true 627 - headers: 628 - operation: insert 629 - - type: config 630 - key: theme 631 - value: dark 632 - headers: 633 - operation: insert 634 - - headers: 635 - control: snapshot-end 636 - offset: 1003 637 - ``` 638 - 639 - --- 640 - 641 - ## 8. Security Considerations 642 - 643 - ### 8.1. Parser Security 644 - 645 - TOON parsers **MUST** implement the same security validations as JSON parsers: 646 - - Depth limits to prevent stack overflow 647 - - Size limits to prevent memory exhaustion 648 - - UTF-8 validation 649 - 650 - ### 8.2. Injection Prevention 651 - 652 - Per STATE-PROTOCOL Section 9.4, `type` and `key` fields require validation. TOON's quoting rules do not alter this requirement. Implementations **MUST** validate field contents after decoding. 653 - 654 - ### 8.3. Length Declaration Attacks 655 - 656 - Malicious payloads may declare large `[N]` lengths without providing rows. Decoders **SHOULD**: 657 - - Stream process rows without pre-allocating based on declared length 658 - - Timeout or abort on missing rows 659 - - Validate actual row count matches declaration 660 - 661 - ### 8.4. Resource Limits 662 - 663 - When parsing tabular arrays, implementations **MUST** enforce limits: 664 - 665 - | Resource | Recommended Limit | Rationale | 666 - |----------|-------------------|-----------| 667 - | Array size `[N]` | 10,000 rows | Prevent memory exhaustion | 668 - | Row length | 64 KB | Prevent buffer overflow | 669 - | Nesting depth | 32 levels | Prevent stack overflow | 670 - | Document size | 10 MB | Prevent memory exhaustion | 671 - 672 - Implementations **SHOULD** use streaming parsing to avoid loading entire arrays into memory. 673 - 674 - --- 675 - 676 - ## 9. IANA Considerations 677 - 678 - Per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type), the provisional media type is: 679 - 680 - **Media Type:** `text/toon` 681 - 682 - | Field | Value | 683 - |-------|-------| 684 - | Type name | text | 685 - | Subtype name | toon | 686 - | Required parameters | None | 687 - | Optional parameters | `charset` (default UTF-8) | 688 - | Encoding considerations | 8-bit, UTF-8 encoded text with LF line endings | 689 - | Security considerations | See Section 8 | 690 - | Interoperability | Semantically equivalent to application/json when decoded | 691 - | Published specification | This document and [TOON-SPEC] | 692 - | Intended usage | COMMON (upon standardization) | 693 - | Applications | Real-time state synchronization, durable streams, bandwidth-constrained systems requiring JSON semantics | 694 - | Restrictions on usage | None | 695 - | Additional information | File extension: .toon, Macintosh file type code: TEXT | 696 - 697 - **Note:** The `text/toon` media type is provisional. Implementers should monitor [TOON-SPEC](https://github.com/toon-format/spec) for updates and formal IANA registration. 698 - 699 - --- 700 - 701 - ## 10. References 702 - 703 - ### 10.1. Normative References 704 - 705 - **[STATE-PROTOCOL]** 706 - Durable Streams State Protocol. ElectricSQL, 2025. 707 - <https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md> 708 - 709 - **[TOON-SPEC]** 710 - TOON (Token-Oriented Object Notation) Specification v3.0. 711 - <https://github.com/toon-format/spec> 712 - 713 - **[PROTOCOL]** 714 - Durable Streams Protocol. ElectricSQL, 2025. 715 - <https://github.com/durable-streams/durable-streams/blob/main/PROTOCOL.md> 716 - 717 - ### 10.2. Informative References 718 - 719 - **[TOON-FORMAT]** 720 - TOON Format Website. 721 - <https://toonformat.dev> 722 - 723 - --- 724 - 725 - ## Appendix A: Payload Reduction Analysis 726 - 727 - ### A.1. Single Change Message 728 - 729 - | Field | JSON | TOON | Savings | 730 - |-------|------|------|---------| 731 - | `type` field | `"type":"user",` | `type: user` | 4 chars (36%) | 732 - | `key` field | `"key":"user:123",` | `key: user:123` | 6 chars (38%) | 733 - | `value` object | `"value":{` | `value:` | 7 chars (100%) | 734 - | `headers` object | `"headers":{` | `headers:` | 9 chars (100%) | 735 - | `operation` field | `"operation":"insert"` | `operation: insert` | 2 chars (18%) | 736 - | Closing braces | `}}` | (none) | 2 chars (100%) | 737 - | **Total** | **~145 bytes** | **~80 bytes** | **~45%** | 738 - 739 - ### A.2. Tabular Batch Scaling 740 - 741 - | Batch Size | JSON | TOON | Reduction | 742 - |------------|------|------|-----------| 743 - | 1 message | 145 bytes | 80 bytes | 45% | 744 - | 10 messages | 1,450 bytes | 500 bytes | 65% | 745 - | 100 messages | 14,500 bytes | 4,200 bytes | 71% | 746 - | 1,000 messages | 145 KB | 40 KB | 72% | 747 - 748 - ### A.3. Snapshot Export (100 users, 500 messages) 749 - 750 - | Entity Type | JSON | TOON | Reduction | 751 - |-------------|------|------|-----------| 752 - | Users (100 records) | ~30 KB | ~11 KB | **63%** | 753 - | Messages (500 records) | ~150 KB | ~52 KB | **65%** | 754 - | Snapshot metadata | ~200 bytes | ~120 bytes | **40%** | 755 - | **Total** | **~180 KB** | **~63 KB** | **65%** | 756 - 757 - ### A.4. Real-World Bandwidth Impact 758 - 759 - | Application | Messages/sec | JSON Bandwidth | TOON Bandwidth | Monthly Savings | 760 - |-------------|--------------|----------------|----------------|-----------------| 761 - | Chat (1M users) | 10 msg/sec | ~2 GB/hr | ~0.7 GB/hr | ~30 GB/month | 762 - | Presence tracking | 5 updates/sec | ~1 GB/hr | ~0.4 GB/hr | ~18 GB/month | 763 - | Feature flags | 1 update/min | ~10 MB/day | ~4 MB/day | ~180 MB/month | 764 - 765 - --- 766 - 767 - ## Appendix B: Implementation Guidelines 768 - 769 - ### B.1. Rust Crate Recommendations 770 - 771 - | Use Case | Recommended Crate | Features | 772 - |----------|-------------------|----------| 773 - | Full spec compliance | `toon` (toon-rs) | `de_direct`, `perf_memchr`, `perf_smallvec` | 774 - | Performance-critical | `toon-rust` | SIMD optimizations | 775 - | Serde integration | `serde_toon` | `toon!` macro | 776 - 777 - **Example configuration:** 778 - ```toml 779 - [dependencies] 780 - toon = { version = "0.1", features = ["de_direct", "perf_memchr"] } 781 - ``` 782 - 783 - ### B.2. Delimiter Selection 784 - 785 - | Delimiter | Best For | Quoting Impact | 786 - |-----------|----------|----------------| 787 - | Comma (default) | General purpose | May require quotes for values with commas | 788 - | Tab | Data with few quoted strings | ~10-15% better efficiency | 789 - | Pipe | Data with commas and tabs | Similar to comma | 790 - 791 - ### B.3. Streaming Tabular Arrays 792 - 793 - 1. Read array header to get `[N]` and field declarations 794 - 2. Pre-allocate buffers for N rows (with limits) 795 - 3. Parse rows sequentially, validating field count 796 - 4. Decode each row into change message 797 - 5. Apply messages immediately (don't wait for full array) 798 - 799 - ### B.4. Error Handling 800 - 801 - | Error Condition | Recommended Action | 802 - |-----------------|-------------------| 803 - | Row count ≠ `[N]` | Reject batch, request replay from last good offset | 804 - | Wrong field count | Skip row, log error, continue | 805 - | Decode failure | Skip row, log error, continue | 806 - 807 - ### B.5. Hybrid Encoding Strategy 808 - 809 - For maximum efficiency, implementations **MAY**: 810 - 1. Accumulate change messages by type 811 - 2. Encode each type batch as tabular array 812 - 3. Interleave control messages as individual objects 813 - 4. Concatenate into single TOON document 814 - 815 - --- 816 - 817 - ## Appendix C: Migration Guide 818 - 819 - ### C.1. Server-Side Changes 820 - 821 - **Add TOON encoding capability:** 822 - ```typescript 823 - import { stringify } from 'toon' 824 - 825 - function encodeMessages(messages: ChangeMessage[], format: 'json' | 'toon'): string { 826 - if (format === 'toon') { 827 - return stringify(messages, { delimiter: ',' }) 828 - } 829 - return JSON.stringify(messages) 830 - } 831 - ``` 832 - 833 - **Support content negotiation:** 834 - ```typescript 835 - app.get('/stream/:id', (req, res) => { 836 - const format = req.accepts('text/toon', 'application/json') 837 - res.format({ 838 - 'text/toon': () => { 839 - res.type('text/toon') 840 - res.send(encodeToTOON(messages)) 841 - }, 842 - 'application/json': () => { 843 - res.send(JSON.stringify(messages)) 844 - } 845 - }) 846 - }) 847 - ``` 848 - 849 - ### C.2. Client-Side Changes 850 - 851 - ```typescript 852 - const stream = await DurableStream.create({ 853 - url: 'https://server.com/v1/stream/my-app', 854 - contentType: 'text/toon' // Request TOON 855 - }) 856 - 857 - res.subscribeText(async (batch) => { 858 - const messages = parse(batch) // TOON decoder 859 - for (const msg of messages) { 860 - state.apply(msg) 861 - } 862 - }) 863 - ``` 864 - 865 - ### C.3. Phased Rollout 866 - 867 - | Phase | Description | Risk | 868 - |-------|-------------|------| 869 - | **Phase 1** | Add TOON support, keep JSON default | Low | 870 - | **Phase 2** | Roll out TOON clients gradually | Medium | 871 - | **Phase 3** | Make TOON default, JSON fallback | Low | 872 - | **Phase 4** | Deprecate JSON (future) | High | 873 - 874 - ### C.4. A/B Testing Metrics 875 - 876 - | Metric | Expected Change | 877 - |--------|-----------------| 878 - | Bandwidth | 40-60% reduction | 879 - | Encoding latency | 10-20% increase | 880 - | Decoding latency | 5-10% decrease | 881 - | Error rates | No change | 882 - | Memory usage | 20-30% reduction |