Rewrite subtable section · corporate.fm/hobbes@8089388

+21 -29

1 changed file

expand all

rfds

+21 -29

rfds/001_xks_storage_engine.md

··· 250 250 251 251 ### Subtables 252 252 253 - Each sub consists of key/value pairs in sorted order followed by their indexes and then a trailer indicating the number of pairs. 253 + A subtable ("sub") is a subset of a table which contains a range of key/value pairs. 254 + Subs are limited to 64 KiB in size, but could be smaller due to compression 255 + (compression is future work). 254 256 255 - Subs target a particular max size (64 KiB) but are intentionally variable in length to allow for efficient compression 256 - (compression is future work and not specified in this document). 257 + The key/value pairs are written contiguously in the following format: 257 258 258 259 ``` 259 - # Key/value pair: 260 + # Key/Value Pair 260 261 key (variable) | version (8 bytes) | type (1 byte) | value (variable) 261 - 262 - # Key index: 263 - offset (3 bytes) | key length (2 bytes) 264 - 265 - # Trailer 266 - pair_count (2 bytes) 267 262 ``` 268 263 269 264 The `key` and `value` are self-explanatory. 270 - 271 - The `version` is the 8-byte database version (we may be able to compress these in the future). 272 - 273 - The `type` is either `0` for an insert or `1` for a tombstone. 274 - Note that for a tombstone the value would naturally be 0 bytes in length. 275 - 276 - The `offset`s and `length`s of the keys are written consecutively after the pair data. 277 - Note that the lengths of the values are stored implicitly as a value takes up all remaining space between `offset`s. 278 - 279 - The `pair_count` tells us how many offsets/lengths to expect. 265 + The `version` is the database version of the key. 280 266 281 - The algorithm to perform a lookup within a sub is as follows: 267 + The `type` is `0` for an insert or `1` for a tombstone. 268 + Note that for a tombstone the value would naturally be zero bytes in length. 282 269 283 - 1. Read the sub into memory in full as a single binary term. 270 + Because keys and values are variable in length, 271 + the pairs are followed by an array of fixed-length slots: 284 272 285 - 2. Read the last byte to determine the `pair_count`. 273 + ``` 274 + # Pair Slot 275 + offset (3 bytes) | key length (2 bytes) 276 + ``` 286 277 287 - 3. Read the offsets/lengths from the range `[byte_length(sub) - ((pair_count * 5) - 2), byte_length(sub) - 2)`. 288 - This results in a list `[{offset, key_length}, ...]` with an entry for each pair in the sub. 278 + Finally, the slot array is followed by a fixed-length sub trailer: 289 279 290 - We could also compute the `value_length` here by looking at the next `offset` for each pair. 280 + ``` 281 + # Subtable Trailer 282 + pair_count (2 bytes) 283 + ``` 291 284 292 - 4. Binary search the pairs. 293 - Using the offsets, we can look up any pair's key at the byte range `[index, index + key_length + 8)`. 294 - Note that we *include* the version (8 bytes) in the key when binary searching because this is a multiversion LSM. 285 + The `pair_count` records the number of pairs in the subtable, 286 + which is needed to binary search the slot array.

Configure Feed

Configure Feed