···11+# duratoon - durable streams on TOON
22+33+Alas not actually a good idea. I was hoping we could maybe find a way to re-use more values across multiple entries but I haven't figured out how to map this.
-1218
proto/state-protocol-toon.glm.md
···11-# DuraTOON - Durable Streams State Protocol over TOON
22-33-**Document:** DuraTOON Specification
44-**Version:** 1.0
55-**Date:** 2025-02-10
66-**Status:** Extension of [Durable Streams State Protocol](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md)
77-**Content-Type:** `text/toon`
88-99----
1010-1111-## Abstract
1212-1313-DuraTOON extends the Durable Streams State Protocol to use TOON (Token-Oriented Object Notation) as an alternative serialization format. While maintaining **complete semantic equivalence** with the base JSON protocol, DuraTOON achieves 30-60% payload reduction through TOON's tabular arrays, minimal quoting rules, and line-oriented structure.
1414-1515-The primary optimization is **tabular array batching**: when multiple change messages share the same schema, field names are declared once in a header, eliminating per-message overhead. This CSV-like format remains human-readable while dramatically reducing bandwidth consumption.
1616-1717-### Key Design Insight
1818-1919-State synchronization protocols transmit the same schemas repeatedly. A real-time application sending 1,000 presence updates repeats `"type": "presence"` and `"operation": "update"` 1,000 times. TOON's tabular format declares these once:
2020-2121-```
2222-presence[1000]{type,key,value.status,value.lastSeen,headers.operation}:
2323- presence,user:1,online,1705312200000,update
2424- presence,user:2,online,1705312200100,update
2525- presence,user:3,away,1705312199000,update
2626- ... (997 more rows)
2727-```
2828-2929-This optimization is transparent to protocol semantics—after decoding TOON to JSON, clients materialize state identically to the JSON variant.
3030-3131----
3232-3333-## Table of Contents
3434-3535-1. [Introduction](#1-introduction)
3636-2. [Terminology](#2-terminology)
3737-3. [Protocol Overview](#3-protocol-overview)
3838-4. [Message Format Mapping](#4-message-format-mapping)
3939- - 4.1. [Change Messages](#41-change-messages)
4040- - 4.2. [Tabular Change Batches](#42-tabular-change-batches)
4141- - 4.3. [Control Messages](#43-control-messages)
4242-5. [Encoding Guidelines](#5-encoding-guidelines)
4343-6. [Decoding Requirements](#6-decoding-requirements)
4444-7. [State Materialization](#7-state-materialization)
4545-8. [Schema Validation](#8-schema-validation)
4646-9. [Security Considerations](#9-security-considerations)
4747-10. [IANA Considerations](#10-iana-considerations)
4848-11. [References](#11-references)
4949-12. [Appendix A: Payload Reduction Analysis](#appendix-a-payload-reduction-analysis)
5050-13. [Appendix B: Implementation Guidelines](#appendix-b-implementation-guidelines)
5151-14. [Appendix C: Migration Guide](#appendix-c-migration-guide)
5252-5353----
5454-5555-## 1. Introduction
5656-5757-### 1.1. Relationship to Base Protocol
5858-5959-The Durable Streams State Protocol [STATE-PROTOCOL] defines a standard message format for state synchronization built on the Durable Streams Protocol [PROTOCOL]. It specifies:
6060-6161-- **Change messages** ([Section 4.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#41-change-messages)) representing insert, update, and delete operations on entities
6262-- **Control messages** ([Section 4.2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#42-control-messages)) for snapshot boundaries and reset signals
6363-- **State materialization** ([Section 6](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#6-state-materialization)) rules for applying changes sequentially
6464-6565-DuraTOON extends this specification by defining TOON encoding for the same message structures. It **does not modify protocol semantics**—it only provides an alternative serialization.
6666-6767-### 1.2. The Bandwidth Problem
6868-6969-```mermaid
7070-flowchart LR
7171- subgraph JSON["JSON Encoding"]
7272- direction TB
7373- J1["{type:'user',<br/>key:'1',<br/>value:{name:'Alice'},<br/>headers:{operation:'insert'}}"]
7474- J2["{type:'user',<br/>key:'2',<br/>value:{name:'Bob'},<br/>headers:{operation:'insert'}}"]
7575- J3["{type:'user',<br/>key:'3',<br/>value:{name:'Carol'},<br/>headers:{operation:'insert'}}"]
7676- end
7777-7878- subgraph TOON["TOON Tabular Encoding"]
7979- direction TB
8080- H["users[3]{type,key,value.name,headers.operation}:"]
8181- R1["user,1,Alice,insert"]
8282- R2["user,2,Bob,insert"]
8383- R3["user,3,Carol,insert"]
8484- end
8585-8686- JSON -->|~435 bytes| NW["Network"]
8787- TOON -->|~170 bytes| NW
8888-8989- style JSON fill:#ffcccc
9090- style TOON fill:#ccffcc
9191-```
9292-9393-**Figure 1:** JSON repeats field names for every message; TOON declares them once. For 1,000 messages, this difference compounds to **~260 KB saved** just from field name repetition.
9494-9595-### 1.3. Design Goals
9696-9797-DuraTOON aims to be:
9898-9999-- **Semantically Equivalent**: When decoded, TOON messages produce identical in-memory representations to JSON messages. Clients materialize the same state regardless of encoding.
100100-- **Bandwidth Efficient**: 30-60% payload reduction through tabular batching and minimal quoting rules
101101-- **Backward Compatible**: Servers support both `application/json` and `text/toon`; clients negotiate format via `Accept` header
102102-- **Human Readable**: Line-oriented, indentation-based structure remains debuggable and manually editable
103103-- **Streaming Friendly**: Explicit array lengths `[N]` enable validation without buffering entire arrays
104104-- **Format-Agnostic Materialization**: After decoding, materialization logic is identical to base protocol
105105-106106----
107107-108108-## 2. Terminology
109109-110110-This document uses terminology from [STATE-PROTOCOL Section 2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#2-terminology) with these additions:
111111-112112-**TOON (Token-Oriented Object Notation)**: A compact serialization format using indentation-based structure, tabular arrays with explicit lengths, and minimal quoting rules [TOON-SPEC]. TOON preserves the JSON data model exactly.
113113-114114-**Tabular Array**: TOON's optimization for homogeneous data. A header declares field names once; rows contain only values:
115115-```
116116-users[3]{name,email}:
117117- Alice,alice@example.com
118118- Bob,bob@example.com
119119- Carol,carol@example.com
120120-```
121121-122122-**Field Path**: Dot-notation reference to nested fields in tabular headers (e.g., `value.name`, `headers.operation`).
123123-124124-**Semantic Equivalence**: Two encodings produce identical data models when parsed. A TOON-encoded change message, when decoded, yields the same in-memory representation as its JSON equivalent from the base protocol.
125125-126126-**Text/toon**: Provisional media type per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type). Implementers should monitor the TOON specification repository for updates and formal IANA registration.
127127-128128----
129129-130130-## 3. Protocol Overview
131131-132132-### 3.1. Content-Type Negotiation
133133-134134-Per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type), the provisional media type is:
135135-136136-```
137137-Content-Type: text/toon
138138-```
139139-140140-This media type is specified by the TOON format specification and should be used for TOON-encoded messages. TOON documents decode to the same JSON data model.
141141-142142-**Client Request Example:**
143143-```http
144144-GET /v1/stream/my-app HTTP/1.1
145145-Host: example.com
146146-Accept: text/toon, application/json;q=0.9
147147-```
148148-149149-**Server Response (TOON):**
150150-```http
151151-HTTP/1.1 200 OK
152152-Content-Type: text/toon
153153-```
154154-155155-**Implementation Requirements:**
156156-157157-- Clients **MAY** request `text/toon` via `Accept` header
158158-- Servers **MAY** respond with TOON if requested and supported
159159-- Implementations **MUST** support both `application/json` and `text/toon` for interoperability
160160-- Servers **SHOULD** default to `application/json` when no `Accept` header is present for backward compatibility
161161-162162-### 3.2. Message Flow Overview
163163-164164-```mermaid
165165-flowchart TD
166166- subgraph Stream["Durable Stream"]
167167- direction TB
168168-169169- subgraph Batch["TOON Batch"]
170170- direction TB
171171- TB["changes[4]{type,key,value.name,headers.operation}:"]
172172- R1["user,1,Alice,insert"]
173173- R2["user,2,Bob,insert"]
174174- R3["user,1,Alice Smith,update"]
175175- R4["user,2,Bob Johnson,update"]
176176- end
177177-178178- subgraph Ctrl["Control Message"]
179179- C["headers:\n control: snapshot-end\n offset: 1000"]
180180- end
181181-182182- Batch --> Ctrl
183183- end
184184-185185- subgraph Decode["TOON Decoder"]
186186- D1["Expand tabular array\ninto 4 change messages"]
187187- D2["Parse control\nmessage"]
188188- end
189189-190190- subgraph Materialize["Materialization<br/>(identical to JSON variant)"]
191191- M["Apply insert(user:1, {name:Alice})\nApply insert(user:2, {name:Bob})\nApply update(user:1, {name:Alice Smith})\nApply update(user:2, {name:Bob Johnson})\nProcess snapshot-end"]
192192- end
193193-194194- Stream --> Decode
195195- Decode --> Materialize
196196-197197- style Stream fill:#e6f3ff
198198- style Decode fill:#fff4e6
199199- style Materialize fill:#e6ffe6
200200-```
201201-202202-**Figure 2:** TOON batches are decoded into individual messages, then materialized exactly as defined in [STATE-PROTOCOL Section 6](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#6-state-materialization).
203203-204204-### 3.3. Recommended Batching Strategies
205205-206206-TOON enables efficient batching patterns not present in the base specification:
207207-208208-| Strategy | Description | Best For |
209209-|----------|-------------|----------|
210210-| **By Operation** | Group `insert`, `update`, `delete` operations separately | Bulk imports, cleanup jobs |
211211-| **By Entity Type** | Group messages for the same `type` together | Multi-type streams (chat: users, messages, reactions) |
212212-| **By Transaction** | Group messages with the same `txid` together | Atomic operations |
213213-| **Time-based** | Batch messages within time windows | High-frequency updates |
214214-215215-**Encoding Recommendation:**
216216-217217-Encoders **SHOULD**:
218218-1. Accumulate consecutive change messages by schema shape
219219-2. Encode homogeneous groups as tabular arrays
220220-3. Fall back to TOON object format for heterogeneous messages
221221-4. Always encode control messages as individual TOON objects (see [Section 4.3](#43-control-messages))
222222-223223----
224224-225225-## 4. Message Format Mapping
226226-227227-This section maps each message structure from [STATE-PROTOCOL Section 4](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#4-message-types) to its TOON equivalent.
228228-229229-### 4.1. Change Messages
230230-231231-#### 4.1.1. Single Change Message
232232-233233-**JSON** ([from STATE-PROTOCOL Section 5.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#51-change-message-structure)):
234234-235235-```json
236236-{
237237- "type": "user",
238238- "key": "user:123",
239239- "value": {
240240- "name": "Alice",
241241- "email": "alice@example.com"
242242- },
243243- "headers": {
244244- "operation": "insert",
245245- "timestamp": "2025-01-15T10:30:00Z"
246246- }
247247-}
248248-```
249249-250250-**TOON equivalent:**
251251-252252-```toon
253253-type: user
254254-key: user:123
255255-value:
256256- name: Alice
257257- email: alice@example.com
258258-headers:
259259- operation: insert
260260- timestamp: 2025-01-15T10:30:00Z
261261-```
262262-263263-**Size comparison:**
264264-265265-| Format | Bytes | Reduction |
266266-|--------|-------|-----------|
267267-| JSON | ~145 bytes | Baseline |
268268-| TOON | ~80 bytes | **45%** |
269269-270270-**Savings breakdown:**
271271-272272-| Component | JSON | TOON | Reduction |
273273-|-----------|------|------|-----------|
274274-| Field name quotes | `"type":` (7 chars) | `type:` (5 chars) | 29% |
275275-| String value quotes | `"user"` (6 chars) | `user` (4 chars) | 33% |
276276-| Object delimiters | `"` `"value":{` (13 chars) | `value:` (6 chars) | 54% |
277277-| Closing braces | `}` (2 chars) | (none) | 100% |
278278-| Comma separators | 5 commas | 0 commas | 100% |
279279-280280-#### 4.1.2. Insert Operation
281281-282282-**JSON** ([Section 4.1.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#411-insert-operation)):
283283-284284-```json
285285-{
286286- "type": "user",
287287- "key": "user:123",
288288- "value": {
289289- "name": "Alice",
290290- "email": "alice@example.com"
291291- },
292292- "headers": {
293293- "operation": "insert",
294294- "timestamp": "2025-01-15T10:30:00Z"
295295- }
296296-}
297297-```
298298-299299-**TOON single message:**
300300-301301-```toon
302302-type: user
303303-key: user:123
304304-value:
305305- name: Alice
306306- email: alice@example.com
307307-headers:
308308- operation: insert
309309- timestamp: 2025-01-15T10:30:00Z
310310-```
311311-312312-**TOON tabular batch (5 users):**
313313-314314-```toon
315315-users[5]{type,key,value.name,value.email,headers.operation,headers.timestamp}:
316316- user,user:1,Alice,alice1@example.com,insert,2025-01-15T10:30:00Z
317317- user,user:2,Bob,bob2@example.com,insert,2025-01-15T10:31:00Z
318318- user,user:3,Charlie,charlie3@example.com,insert,2025-01-15T10:32:00Z
319319- user,user:4,Diana,diana4@example.com,insert,2025-01-15T10:33:00Z
320320- user,user:5,Eve,eve5@example.com,insert,2025-01-15T10:34:00Z
321321-```
322322-323323-**Size comparison:**
324324-325325-| Format | 1 message | 5 messages | 100 messages | 1,000 messages |
326326-|--------|-----------|------------|--------------|-----------------|
327327-| JSON | ~145 bytes | ~725 bytes | ~14,500 bytes | ~145 KB |
328328-| TOON single | ~80 bytes | ~400 bytes | ~8,000 bytes | ~80 KB |
329329-| TOON tabular | N/A | ~215 bytes | ~4,200 bytes | ~42 KB |
330330-| **Reduction** | **45%** | **70%** | **71%** | **71%** |
331331-332332-#### 4.1.3. Update Operation
333333-334334-**JSON** ([Section 4.1.2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#412-update-operation)):
335335-336336-```json
337337-{
338338- "type": "user",
339339- "key": "user:123",
340340- "value": {
341341- "name": "Alice Smith",
342342- "email": "alice.new@example.com"
343343- },
344344- "old_value": {
345345- "name": "Alice",
346346- "email": "alice@example.com"
347347- },
348348- "headers": {
349349- "operation": "update",
350350- "timestamp": "2025-01-15T10:35:00Z"
351351- }
352352-}
353353-```
354354-355355-**TOON single:**
356356-357357-```toon
358358-type: user
359359-key: user:123
360360-value:
361361- name: Alice Smith
362362- email: alice.new@example.com
363363-old_value:
364364- name: Alice
365365- email: alice@example.com
366366-headers:
367367- operation: update
368368- timestamp: 2025-01-15T10:35:00Z
369369-```
370370-371371-**TOON tabular batch (3 updates):**
372372-373373-```toon
374374-updates[3]{type,key,value.email,old_value.email,headers.operation,headers.timestamp}:
375375- user,user:1,alice.new@example.com,alice@example.com,update,2025-01-15T10:35:00Z
376376- user,user:2,bob.updated@example.com,bob@example.com,update,2025-01-15T10:36:00Z
377377- user,user:3,charlie.new@example.com,charlie@example.com,update,2025-01-15T10:37:00Z
378378-```
379379-380380-**Partial Update Optimization:** When batching updates where only certain fields change, tabular format can include only changed fields:
381381-382382-```toon
383383-updates[3]{type,key,value.email,headers.operation}:
384384- user,user:1,alice.new@example.com,update
385385- user,user:2,bob.updated@example.com,update
386386- user,user:3,charlie.new@example.com,update
387387-```
388388-389389-#### 4.1.4. Delete Operation
390390-391391-**JSON** ([Section 4.1.3](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#413-delete-operation)):
392392-393393-```json
394394-{
395395- "type": "user",
396396- "key": "user:123",
397397- "old_value": {
398398- "name": "Alice",
399399- "email": "alice@example.com"
400400- },
401401- "headers": {
402402- "operation": "delete",
403403- "timestamp": "2025-01-15T10:40:00Z"
404404- }
405405-}
406406-```
407407-408408-**TOON single (value omitted):**
409409-410410-```toon
411411-type: user
412412-key: user:123
413413-old_value:
414414- name: Alice
415415- email: alice@example.com
416416-headers:
417417- operation: delete
418418- timestamp: 2025-01-15T10:40:00Z
419419-```
420420-421421-**TOON tabular batch (soft delete with old_value):**
422422-423423-```toon
424424-deletes[3]{type,key,old_value.name,old_value.email,headers.operation,headers.timestamp}:
425425- user,user:1,Alice,alice@example.com,delete,2025-01-15T10:40:00Z
426426- user,user:2,Bob,bob@example.com,delete,2025-01-15T10:41:00Z
427427- user,user:3,Charlie,charlie@example.com,delete,2025-01-15T10:42:00Z
428428-```
429429-430430-**TOON tabular batch (hard delete, no old_value):**
431431-432432-```toon
433433-deletes[3]{type,key,headers.operation,headers.timestamp}:
434434- user,user:1,delete,2025-01-15T10:40:00Z
435435- user,user:2,delete,2025-01-15T10:41:00Z
436436- user,user:3,delete,2025-01-15T10:42:00Z
437437-```
438438-439439-### 4.2. Tabular Change Batches
440440-441441-Tabular encoding is the primary optimization of DuraTOON. It applies when:
442442-443443-1. All messages in the batch are change messages (not control messages)
444444-2. All messages share the same set of scalar fields in `value` and `old_value`
445445-3. All messages use the same `headers` fields
446446-447447-#### 4.2.1. Field Flattening
448448-449449-Tabular arrays require flat column schemas. Nested objects **MUST** be flattened using dot notation in the header:
450450-451451-| Nested Path | Tabular Column |
452452-|-------------|----------------|
453453-| `value.name` | `value.name` |
454454-| `value.email` | `value.email` |
455455-| `value.address.city` | `value.address.city` |
456456-| `headers.operation` | `headers.operation` |
457457-| `headers.timestamp` | `headers.timestamp` |
458458-| `headers.txid` | `headers.txid` |
459459-| `old_value.name` | `old_value.name` |
460460-461461-#### 4.2.2. Named Tabular Arrays
462462-463463-Tabular arrays **SHOULD** use descriptive names that indicate their content:
464464-465465-| Array Name | Use Case |
466466-|------------|----------|
467467-| `changes` | Mixed operations |
468468-| `inserts` | Insert-only batch |
469469-| `updates` | Update-only batch |
470470-| `deletes` | Delete-only batch |
471471-| `users` | Entity-type-specific batch |
472472-| `messages` | Entity-type-specific batch |
473473-| `presence` | Entity-type-specific batch |
474474-475475-#### 4.2.3. Field Ordering Recommendation
476476-477477-For tabular arrays, field order **SHOULD** follow this convention for efficient streaming parsers:
478478-479479-1. `type`
480480-2. `key`
481481-3. `value.*` fields (alphabetically)
482482-4. `old_value.*` fields (alphabetically, if present)
483483-5. `headers.*` fields (alphabetically)
484484-485485-**Example:**
486486-487487-```toon
488488-changes[3]{type,key,value.email,value.name,old_value.email,old_value.name,headers.operation,headers.timestamp,headers.txid}:
489489- user,1,alice@example.com,Alice,,,insert,2025-01-15T10:30:00Z,tx-001
490490-```
491491-492492-#### 4.2.4. Mixed Operation Batch
493493-494494-When batching different operations with the same schema:
495495-496496-```toon
497497-changes[4]{type,key,value.name,value.status,headers.operation}:
498498- user,1,Alice,active,insert
499499- user,2,Bob,active,insert
500500- user,1,Alice,inactive,update
501501- user,2,,deleted,delete
502502-```
503503-504504-**Empty Cell Handling:** Empty cells between delimiters (e.g., `value.name` for delete) are decoded as omitted fields, matching the semantics of delete operations where `value` is typically omitted.
505505-506506-#### 4.2.5. Delimiter Selection
507507-508508-TOON supports comma (default), tab, or pipe delimiters. Selection depends on data characteristics:
509509-510510-| Delimiter | Best For | Quoting Impact |
511511-|-----------|----------|----------------|
512512-| Comma (default) | General purpose | May require quotes for values containing commas |
513513-| Tab | Data with few quoted strings | ~10-15% better efficiency |
514514-| Pipe | Data with commas and tabs | Similar to comma |
515515-516516-**Tab-delimited example:**
517517-518518-```toon
519519-users[2 ]{type key value.name value.bio headers.operation}:
520520- user 1 Alice Writer, editor insert
521521- user 2 Bob Developer, speaker insert
522522-```
523523-524524-### 4.3. Control Messages
525525-526526-Control messages **MUST NOT** be included in tabular arrays because they have different schemas than change messages. They must be encoded as individual TOON objects.
527527-528528-#### 4.3.1. Snapshot Boundaries
529529-530530-**JSON** ([Section 4.2.1](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#421-snapshot-boundaries)):
531531-532532-```json
533533-{
534534- "headers": {
535535- "control": "snapshot-start",
536536- "offset": "123456_000"
537537- }
538538-}
539539-```
540540-541541-**TOON:**
542542-543543-```toon
544544-headers:
545545- control: snapshot-start
546546- offset: 123456_000
547547-```
548548-549549-#### 4.3.2. Reset Control
550550-551551-**JSON** ([Section 4.2.2](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#422-reset-control)):
552552-553553-```json
554554-{
555555- "headers": {
556556- "control": "reset",
557557- "offset": "123456_000"
558558- }
559559-}
560560-```
561561-562562-**TOON:**
563563-564564-```toon
565565-headers:
566566- control: reset
567567- offset: 123456_000
568568-```
569569-570570-#### 4.3.3. Mixed Arrays with Control Messages
571571-572572-When a batch contains both change and control messages, use TOON's mixed array format:
573573-574574-```toon
575575-[5]:
576576- - headers:
577577- control: snapshot-start
578578- offset: 100
579579- - type: user
580580- key: user:1
581581- value:
582582- name: Alice
583583- headers:
584584- operation: insert
585585- - type: user
586586- key: user:2
587587- value:
588588- name: Bob
589589- headers:
590590- operation: insert
591591- - type: config
592592- key: theme
593593- value: dark
594594- headers:
595595- operation: insert
596596- - headers:
597597- control: snapshot-end
598598- offset: 104
599599-```
600600-601601----
602602-603603-## 5. Encoding Guidelines
604604-605605-### 5.1. When to Use Each Format
606606-607607-| Format | Use When | Example |
608608-|--------|----------|---------|
609609-| **TOON Tabular** | High-frequency, homogeneous messages (>1/second) | Bulk user imports, presence heartbeats, flag updates |
610610-| **TOON Object** | Low-frequency or heterogeneous messages | Mixed entity types, single operations, debugging |
611611-| **JSON** | Interoperability with non-TOON clients | Legacy system integration, debugging tools |
612612-613613-### 5.2. Key Folding (OPTIONAL)
614614-615615-TOON v3.0 supports key folding for single-key object chains. This is **OPTIONAL** but can reduce indentation overhead:
616616-617617-**Without folding:**
618618-619619-```toon
620620-headers:
621621- operation: insert
622622- timestamp: 2025-01-15T10:30:00Z
623623- txid: tx-001
624624-```
625625-626626-**With folding:**
627627-628628-```toon
629629-headers.operation: insert
630630-headers.timestamp: 2025-01-15T10:30:00Z
631631-headers.txid: tx-001
632632-```
633633-634634-Key folding is **NOT RECOMMENDED** for tabular arrays—use dot notation in headers instead.
635635-636636-### 5.3. Quoting Rules
637637-638638-Per TOON spec [TOON-SPEC], strings require quotes only when:
639639-640640-- Empty string (`""`)
641641-- Leading or trailing whitespace
642642-- Literally equals `true`, `false`, or `null`
643643-- Looks like a number (e.g., `42`, `3.14`)
644644-- Contains special characters: `:`, `"`, `\`, `[`, `]`, `{`, `}`
645645-- Contains the active delimiter (comma, tab, or pipe)
646646-647647-**Valid unquoted:**
648648-649649-```toon
650650-name: Alice
651651-email: alice@example.com
652652-key: user-123
653653-status: active
654654-```
655655-656656-**Must quote:**
657657-658658-```toon
659659-name: "Alice "nickname""
660660-key: "user:123"
661661-status: "true"
662662-count: "42"
663663-description: "Hello, world!"
664664-```
665665-666666----
667667-668668-## 6. Decoding Requirements
669669-670670-### 6.1. Tabular Array Expansion
671671-672672-Decoders **MUST** expand tabular arrays into individual change message objects:
673673-674674-**Input:**
675675-676676-```toon
677677-users[2]{type,key,value.name,headers.operation}:
678678- user,1,Alice,insert
679679- user,2,Bob,insert
680680-```
681681-682682-**Decoded JSON equivalent:**
683683-684684-```json
685685-[
686686- {"type":"user","key":"1","value":{"name":"Alice"},"headers":{"operation":"insert"}},
687687- {"type":"user","key":"2","value":{"name":"Bob"},"headers":{"operation":"insert"}}
688688-]
689689-```
690690-691691-### 6.2. Dot-Notation Unflattening
692692-693693-Dotted field names in tabular headers **MUST** be unflattened to nested objects:
694694-695695-| Tabular Column | Decoded JSON Path | Resulting Structure |
696696-|----------------|-------------------|-------------------|
697697-| `value.name` | `value.name` | `{"value": {"name": ...}}` |
698698-| `value.address.city` | `value.address.city` | `{"value": {"address": {"city": ...}}}` |
699699-| `headers.operation` | `headers.operation` | `{"headers": {"operation": ...}}` |
700700-| `headers.txid` | `headers.txid` | `{"headers": {"txid": ...}}` |
701701-| `old_value.email` | `old_value.email` | `{"old_value": {"email": ...}}` |
702702-703703-### 6.3. Empty Cell Handling
704704-705705-Empty cells in tabular rows **MUST** be interpreted as omitted fields, NOT as empty strings or null:
706706-707707-**Input:**
708708-709709-```toon
710710-deletes[2]{type,key,value.name,headers.operation}:
711711- user,1,,delete
712712- user,2,,delete
713713-```
714714-715715-**Decoded:**
716716-717717-```json
718718-[
719719- {"type":"user","key":"1","headers":{"operation":"delete"}},
720720- {"type":"user","key":"2","headers":{"operation":"delete"}}
721721-]
722722-```
723723-724724-Note: `value` is absent, matching the semantics of delete operations where `value` may be omitted.
725725-726726-### 6.4. Length Validation
727727-728728-Per TOON spec, the `[N]` length declaration enables validation. Decoders **MUST**:
729729-730730-1. Read the declared length from the array header
731731-2. Track the actual number of rows parsed
732732-3. Verify that actual count matches declared length
733733-734734-**Mismatch Handling:** If the actual row count does not match the declared `[N]`, this indicates truncation or corruption. Decoders **SHOULD**:
735735-736736-- Reject the entire batch
737737-- Log an error with mismatch details
738738-- Request replay from the last known good offset
739739-740740-### 6.5. Streaming Considerations
741741-742742-For large tabular arrays, decoders **SHOULD**:
743743-744744-1. Read the array header to get `[N]` and field declarations
745745-2. Pre-allocate row buffers with enforced size limits (see [Section 9.3](#93-resource-limits))
746746-3. Parse rows sequentially, validating field count per row
747747-4. Decode each row into a change message
748748-5. **Apply each message immediately** to materialized state
749749-6. Verify actual row count matches `[N]` at the end
750750-751751-**Important:** Do not wait for the entire array to be received before processing. This reduces latency and memory usage.
752752-753753-### 6.6. Error Handling
754754-755755-| Error Condition | Recommended Action |
756756-|-----------------|-------------------|
757757-| Row count ≠ `[N]` | Reject batch, log error, request replay from last good offset |
758758-| Wrong field count in row | Skip row, log error, continue processing |
759759-| Unparseable value in cell | Skip row, log error, continue processing |
760760-| Array size exceeds limit | Reject batch, close connection, log security event |
761761-762762----
763763-764764-## 7. State Materialization
765765-766766-State materialization follows [STATE-PROTOCOL Section 6](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#6-state-materialization) exactly. The encoding format (JSON vs TOON) is transparent to materialization logic.
767767-768768-### 7.1. Materialization Process
769769-770770-```mermaid
771771-flowchart LR
772772- subgraph Input["TOON Input"]
773773- T["users[3]{type,key,value.name}:\n user,1,Alice\n user,2,Bob\n user,3,Carol"]
774774- end
775775-776776- subgraph Decode["TOON Decoder"]
777777- D["Expand to 3 change messages:\n• insert(user:1, {name:Alice})\n• insert(user:2, {name:Bob})\n• insert(user:3, {name:Carol})"]
778778- end
779779-780780- subgraph Materialize["Materialized State"]
781781- S["user:\n '1' → {name:'Alice'}\n '2' → {name:'Bob'}\n '3' → {name:'Carol'}"]
782782- end
783783-784784- Input --> Decode
785785- Decode --> Materialize
786786-787787- style Input fill:#e6f3ff
788788- style Decode fill:#fff4e6
789789- style Materialize fill:#e6ffe6
790790-```
791791-792792-**Figure 3:** TOON-encoded batches decode to the same change messages as JSON, producing identical materialized state.
793793-794794-### 7.2. Materialization Rules
795795-796796-Clients materialize state by applying decoded change messages sequentially:
797797-798798-1. **Process messages in stream order** (as received)
799799-2. **For change messages:**
800800- - `insert` operations: Store the entity at `type`/`key`
801801- - `update` operations: Replace the entity at `type`/`key`
802802- - `delete` operations: Remove the entity at `type`/`key`
803803-3. **For control messages:**
804804- - Handle according to application logic (e.g., clear state on `reset`, use snapshot boundaries for consistency checks)
805805-806806-### 7.3. Storage Independence
807807-808808-The protocol does not prescribe how state is stored. Implementations **MAY** use:
809809-810810-- In-memory maps (for simple cases)
811811-- IndexedDB (for browser persistence)
812812-- SQLite (for local databases)
813813-- TanStack DB collections (for query interfaces)
814814-- Custom storage backends
815815-816816-The choice of serialization format (JSON vs TOON) for the protocol **does not affect** storage decisions—storage remains in whatever format the implementation chooses.
817817-818818----
819819-820820-## 8. Schema Validation
821821-822822-Schema validation operates on decoded values, not TOON encoding itself. Implementations **MAY** validate using Standard Schema [STANDARD-SCHEMA] per [STATE-PROTOCOL Section 7](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#7-schema-validation).
823823-824824-### 8.1. Validation Process
825825-826826-1. **Decode TOON** to in-memory representation (same as JSON)
827827-2. **Validate** each change message's `value` and `old_value` against entity type schema
828828-3. **Apply** only valid messages to materialized state
829829-4. **Handle** invalid messages per implementation policy (reject, log, or quarantine)
830830-831831-### 8.2. TOON-Specific Validation Considerations
832832-833833-**Array length validation:** TOON's `[N]` syntax enables early detection of truncated arrays before schema validation.
834834-835835-**Type preservation:** TOON preserves the JSON data model exactly. Schema validation is semantically equivalent regardless of encoding.
836836-837837-**Quoting rules:** Minimal quoting does not affect validation—decoded strings are identical to their JSON equivalents.
838838-839839-**Validation example:**
840840-841841-**Tabular TOON Input:**
842842-843843-```toon
844844-users[3]{type,key,value.name,value.age,headers.operation}:
845845- user,user:1,Alice,30,insert
846846- user,user:2,Bob,25,insert
847847- user,user:3,Charlie,35,insert
848848-```
849849-850850-**After decoding**, validation proceeds as with JSON:
851851-852852-```javascript
853853-import { z } from 'zod'
854854-855855-const userSchema = z.object({
856856- name: z.string(),
857857- age: z.number().int().positive()
858858-})
859859-860860-// Validate each row's value object
861861-userSchema.parse({ name: "Alice", age: 30 }) // ✓ Valid
862862-userSchema.parse({ name: "Bob", age: 25 }) // ✓ Valid
863863-userSchema.parse({ name: "Charlie", age: 35 }) // ✓ Valid
864864-```
865865-866866-The protocol does not require schema validation, but implementations **SHOULD** provide validation capabilities for production use.
867867-868868----
869869-870870-## 9. Security Considerations
871871-872872-### 9.1. Parser Security
873873-874874-TOON parsers **MUST** implement the same security validations as JSON parsers:
875875-876876-- **Depth limits** to prevent stack overflow (recommended: 32 levels maximum)
877877-- **Size limits** to prevent memory exhaustion (recommended: 10 MB document maximum)
878878-- **UTF-8 validation** on all string inputs
879879-- **Recursive reference detection** to prevent infinite loops
880880-881881-### 9.2. Length Declaration Attacks
882882-883883-Malicious payloads may declare large `[N]` lengths without providing rows. Decoders **SHOULD**:
884884-885885-- Stream-process rows without pre-allocating based on declared length
886886-- Enforce maximum array size limits (see [Section 9.3](#93-resource-limits))
887887-- Timeout or abort if rows are not received within reasonable time
888888-- Validate actual row count matches declaration
889889-890890-### 9.3. Resource Limits
891891-892892-When parsing tabular arrays, implementations **MUST** enforce these limits:
893893-894894-| Resource | Recommended Limit | Rationale |
895895-|-----------|-------------------|------------|
896896-| Array size `[N]` | 10,000 rows | Prevent memory exhaustion from large declarations |
897897-| Row length | 64 KB | Prevent buffer overflow attacks |
898898-| Nesting depth | 32 levels | Prevent stack overflow |
899899-| Document size | 10 MB | Prevent memory exhaustion |
900900-| Field count per row | 256 fields | Prevent parsing complexity attacks |
901901-902902-**Streaming Parsing:** Implementations **SHOULD** use streaming parsers that process data incrementally rather than loading entire documents into memory.
903903-904904-### 9.4. Type and Key Validation
905905-906906-Per [STATE-PROTOCOL Section 9.4](https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md#94-type-and-key-validation), implementations **SHOULD** validate that `type` and `key` fields contain only expected values to prevent injection of unauthorized entity types or keys.
907907-908908-This validation is performed after decoding, regardless of JSON vs TOON encoding.
909909-910910----
911911-912912-## 10. IANA Considerations
913913-914914-Per [TOON-SPEC Section 18.2](https://github.com/toon-format/spec#182-provisional-media-type), the provisional media type is:
915915-916916-**Media Type:** `text/toon`
917917-918918-| Field | Value |
919919-|-------|-------|
920920-| **Type name** | text |
921921-| **Subtype name** | toon |
922922-| **Required parameters** | None |
923923-| **Optional parameters** | `charset` (default UTF-8) |
924924-| **Encoding considerations** | 8-bit, UTF-8 encoded text with LF line endings |
925925-| **Security considerations** | See Section 9 (Security Considerations) |
926926-| **Interoperability** | Semantically equivalent to application/json when decoded |
927927-| **Published specification** | This document and [TOON-SPEC] |
928928-| **Applications** | Real-time state synchronization, durable streams, bandwidth-constrained systems requiring JSON semantics |
929929-| **Intended usage** | COMMON (upon standardization) |
930930-| **Restrictions on usage** | None |
931931-| **Author** | DuraTOON working group |
932932-| **Change controller** | ElectricSQL |
933933-| **Additional information** | File extension: .toon, Macintosh file type code: TEXT |
934934-935935-**Note:** The `text/toon` media type is provisional. Implementers should monitor [TOON-SPEC](https://github.com/toon-format/spec) for updates and formal IANA registration.
936936-937937----
938938-939939-## 11. References
940940-941941-### 11.1. Normative References
942942-943943-**[STATE-PROTOCOL]**
944944-Durable Streams State Protocol. ElectricSQL, 2025.
945945-<https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md>
946946-947947-**[PROTOCOL]**
948948-Durable Streams Protocol. ElectricSQL, 2025.
949949-<https://github.com/electric-sql/durable-streams/blob/main/PROTOCOL.md>
950950-951951-**[TOON-SPEC]**
952952-TOON (Token-Oriented Object Notation) Specification v3.0. Johann Schopplich, 2025.
953953-<https://github.com/toon-format/spec>
954954-955955-**[RFC2119]**
956956-Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997.
957957-<https://www.rfc-editor.org/info/rfc2119>
958958-959959-**[RFC3339]**
960960-Klyne, G. and C. Newman, "Date and Time on Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002.
961961-<https://www.rfc-editor.org/info/rfc3339>
962962-963963-**[RFC6839]**
964964-Kyzivat, P., "Additional Media Type Suffixes", RFC 6839, DOI 10.17487/RFC6839, January 2013.
965965-<https://www.rfc-editor.org/info/rfc6839>
966966-967967-**[RFC8174]**
968968-Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017.
969969-<https://www.rfc-editor.org/info/rfc8174>
970970-971971-**[RFC8259]**
972972-Bray, T., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, December 2017.
973973-<https://www.rfc-editor.org/info/rfc8259>
974974-975975-**[STANDARD-SCHEMA]**
976976-Standard Schema Specification.
977977-<https://github.com/standard-schema/spec>
978978-979979-### 11.2. Informative References
980980-981981-**[JSON-SCHEMA]**
982982-Wright, A., Andrews, H., and B. Hutton, "JSON Schema: A Media Type for Describing JSON Documents", draft-wright-json-schema-00 (work in progress).
983983-984984-**[TOON-WEBSITE]**
985985-TOON Format Official Website.
986986-<https://toonformat.dev>
987987-988988-**[TOON-RUST]**
989989-toon-rs: Rust implementation of TOON format. Jimmy Stridh, 2025.
990990-<https://github.com/jimmystridh/toon-rs>
991991-992992----
993993-994994-## Appendix A: Payload Reduction Analysis
995995-996996-### A.1. Single Message Component Breakdown
997997-998998-| Component | JSON | TOON | Reduction |
999999-|-----------|------|------|-----------|
10001000-| Field name quotes (e.g., `"type":`) | 7 chars | 5 chars | 29% |
10011001-| String value quotes (e.g., `"user"`) | 6 chars | 4 chars | 33% |
10021002-| Object brace start (`"value":{`) | 13 chars | 6 chars | 54% |
10031003-| Object brace end (`}`) | 2 chars | 0 chars | 100% |
10041004-| Array brackets (`[` and `]`) | 2 chars | 4 chars* | -100%* |
10051005-| Commas between fields | 5 × 2 = 10 chars | 0 chars | 100% |
10061006-| **Total** | **~145 bytes** | **~80 bytes** | **~45%** |
10071007-10081008-* TOON's `[N]:` syntax adds 4 chars instead of 2, but enables validation. Net benefit in batching scenarios.
10091009-10101010-### A.2. Scaling with Batch Size
10111011-10121012-| Batch Size | JSON | TOON Single | TOON Tabular | Best Reduction |
10131013-|------------|------|-------------|--------------|----------------|
10141014-| 1 message | 145 B | 80 B | 80 B | 45% |
10151015-| 10 messages | 1,450 B | 800 B | 500 B | 65% |
10161016-| 100 messages | 14,500 B | 8,000 B | 4,200 B | 71% |
10171017-| 1,000 messages | 145 KB | 80 KB | 40 KB | 72% |
10181018-10191019-### A.3. Snapshot Export Analysis
10201020-10211021-| Entity Type | JSON | TOON | Reduction |
10221022-|-------------|------|------|-----------|
10231023-| Users (100 records) | ~30 KB | ~11 KB | **63%** |
10241024-| Messages (500 records) | ~150 KB | ~52 KB | **65%** |
10251025-| Reactions (200 records) | ~60 KB | ~21 KB | **65%** |
10261026-| Snapshot metadata | ~200 bytes | ~120 bytes | **40%** |
10271027-| **Total** | **~240 KB** | **~84 KB** | **65%** |
10281028-10291029-### A.4. Real-World Bandwidth Impact
10301030-10311031-| Application | Messages/sec | JSON Bandwidth/hour | TOON Bandwidth/hour | Monthly Savings |
10321032-|-------------|--------------|-------------------|---------------------|----------------|
10331033-| Chat (1M users) | 10 avg | ~2 GB | ~0.7 GB | ~30 GB |
10341034-| Presence tracking | 5 updates | ~1 GB | ~0.4 GB | ~18 GB |
10351035-| Feature flags | 1/min | ~10 MB | ~4 MB | ~180 MB |
10361036-| Collaborative editing | 20/sec | ~4 GB | ~1.5 GB | ~75 GB |
10371037-10381038----
10391039-10401040-## Appendix B: Implementation Guidelines
10411041-10421042-### B.1. Rust Implementation
10431043-10441044-**Recommended crate:** `toon` (toon-rs) with performance features
10451045-10461046-```toml
10471047-[dependencies]
10481048-toon = { version = "0.1", features = ["de_direct", "perf_memchr", "perf_smallvec"] }
10491049-```
10501050-10511051-**Example: Encoding and Decoding**
10521052-10531053-```rust
10541054-use toon::{Options, Delimiter};
10551055-10561056-let opts = Options {
10571057- delimiter: Delimiter::Comma, // or Tab, Pipe
10581058- ..Options::default()
10591059-};
10601060-10611061-// Encode a batch of changes
10621062-let toon = toon::encode_to_string(&changes_batch, &opts)?;
10631063-10641064-// Decode a batch
10651065-let changes: Vec<ChangeMessage> = toon::decode_from_str(&toon, &opts)?;
10661066-```
10671067-10681068-### B.2. TypeScript/JavaScript Implementation
10691069-10701070-```bash
10711071-npm install toon
10721072-```
10731073-10741074-```typescript
10751075-import { parse, stringify } from 'toon'
10761076-10771077-// Encode a batch
10781078-const toon = stringify(changesBatch, { delimiter: ',' })
10791079-10801080-// Decode a batch
10811081-const changes = parse(toon)
10821082-```
10831083-10841084-### B.3. Content Negotiation Implementation
10851085-10861086-**Server (Express.js):**
10871087-10881088-```typescript
10891089-import { stringify } from 'toon'
10901090-10911091-app.get('/stream/:id', (req, res) => {
10921092- const format = req.accepts('text/toon', 'application/json')
10931093-10941094- res.format({
10951095- 'text/toon': () => {
10961096- res.type('text/toon')
10971097- res.send(stringify(messages, { delimiter: ',' }))
10981098- },
10991099- 'application/json': () => {
11001100- res.json(messages)
11011101- }
11021102- })
11031103-})
11041104-```
11051105-11061106-**Client:**
11071107-11081108-```typescript
11091109-const stream = await DurableStream.create({
11101110- url: 'https://server.com/v1/stream/my-app',
11111111- contentType: 'text/toon' // Request TOON
11121112-})
11131113-11141114-stream.subscribeText(async (batch) => {
11151115- const messages = parse(batch) // TOON decoder
11161116- for (const msg of messages) {
11171117- state.apply(msg)
11181118- }
11191119-})
11201120-```
11211121-11221122-### B.4. Streaming Parser Guidelines
11231123-11241124-For optimal performance with large tabular arrays:
11251125-11261126-1. **Read incrementally**: Process data as it arrives, don't buffer entire documents
11271127-2. **Validate early**: Check `[N]` and field count as you parse
11281128-3. **Apply immediately**: Don't wait for entire batch before materializing
11291129-4. **Use streaming APIs**: Leverage native streaming where available (Node.js streams, Rust `Read` traits)
11301130-11311131----
11321132-11331133-## Appendix C: Migration Guide
11341134-11351135-### C.1. Server-Side Migration
11361136-11371137-**Step 1: Add TOON encoding capability**
11381138-11391139-```typescript
11401140-import { stringify } from 'toon'
11411141-11421142-function encodeMessages(
11431143- messages: ChangeMessage[],
11441144- format: 'json' | 'toon'
11451145-): string {
11461146- if (format === 'toon') {
11471147- return stringify(messages, { delimiter: ',' })
11481148- }
11491149- return JSON.stringify(messages)
11501150-}
11511151-```
11521152-11531153-**Step 2: Support content negotiation**
11541154-11551155-```typescript
11561156-app.get('/stream/:id', (req, res) => {
11571157- const format = req.accepts('text/toon', 'application/json')
11581158-11591159- res.format({
11601160- 'text/toon': () => {
11611161- res.type('text/toon')
11621162- res.send(encodeToTOON(messages))
11631163- },
11641164- 'application/json': () => {
11651165- res.send(JSON.stringify(messages))
11661166- }
11671167- })
11681168-})
11691169-```
11701170-11711171-### C.2. Client-Side Migration
11721172-11731173-```typescript
11741174-import { parse } from 'toon'
11751175-11761176-const stream = await DurableStream.create({
11771177- url: 'https://server.com/v1/stream/my-app',
11781178- contentType: 'text/toon' // New: TOON
11791179- // contentType: 'application/json' // Old: JSON
11801180-})
11811181-11821182-stream.subscribeText(async (batch) => {
11831183- const messages = parse(batch) // TOON decoder
11841184- for (const msg of messages) {
11851185- state.apply(msg)
11861186- }
11871187-})
11881188-```
11891189-11901190-### C.3. Phased Rollout Strategy
11911191-11921192-| Phase | Action | Risk | Duration |
11931193-|-------|--------|-------|----------|
11941194-| **Phase 1** | Add TOON support, keep JSON as default | Low | 1-2 weeks |
11951195-| **Phase 2** | Roll out TOON clients gradually (10% → 50% → 100%) | Medium | 2-4 weeks |
11961196-| **Phase 3** | Make TOON default, JSON as fallback | Low | 1 week |
11971197-| **Phase 4** | Deprecate JSON (future version) | High | TBD |
11981198-11991199-### C.4. A/B Testing Metrics
12001200-12011201-| Metric | Measurement | Expected Change |
12021202-|--------|--------------|----------------|
12031203-| **Bandwidth** | Bytes transmitted per message | 40-60% reduction |
12041204-| **Encoding latency** | Time to encode a batch | 10-20% increase |
12051205-| **Decoding latency** | Time to decode a batch | 5-10% decrease |
12061206-| **Error rates** | Decode failures per 1M messages | No change |
12071207-| **Memory usage** | Peak memory allocation | 20-30% reduction |
12081208-| **CPU usage** | CPU time per message | ±5% (depends on implementation) |
12091209-12101210----
12111211-12121212-## Copyright Notice
12131213-12141214-Copyright (c) 2025 rektide
12151215-12161216-This document extends the Durable Streams State Protocol specification. The base specification is available at <https://github.com/durable-streams/durable-streams/blob/main/packages/state/STATE-PROTOCOL.md>.
12171217-12181218-This document and information contained herein are provided on an "AS IS" basis. rektide disclaims all warranties, express or implied, including but not limited to any warranty that the use of the information herein will not infringe any rights or any implied warranties of merchantability or fitness for a particular purpose.