Add transaction path · corporate.fm/hobbes@bb6e40d

+48

2 changed files

expand all

internals

overview.md

lib

servers

begin_buffer.ex

+47

internals/overview.md

··· 24 24 Data is split into contiguous (ordered by key) shards and distributed (and replicated) amongst Storage servers. 25 25 If a Storage server fails or becomes unresponsive, 26 26 the Transaction Plane will issue commands to re-replicate its shards to restore fault tolerance. 27 + 28 + 29 + ## Transaction Path 30 + 31 + Before a transaction is started, the client must be aware of the topology of the cluster, 32 + particularly the pids of various servers in the Transaction Plane. 33 + This information is requested from the Coordinators as they are the ultimate authority and have static registered names within the cluster. 34 + Once obtained, the information can be aggressively cached on each node as it changes rarely (only after a recovery). 35 + 36 + ### Read version 37 + 38 + To begin a transaction, the client needs to get a read version at which all reads will be performed. 39 + The client chooses a `BeginBuffer` server and sends a `get_read_version()` request. 40 + 41 + The `BeginBuffer` batches a set of requests before sending its own request for a read version to the `Sequencer`. 42 + The `Sequencer` replies with a strictly monotonic read version guaranteed to be higher than any committed version 43 + (this ensures strict serializability of transactions). 44 + Additionally, the `BeginBuffer` sends requests to the fellow `TLog` servers from its generation asking whether any of them have been locked, 45 + which would indicate a recovery taking place. 46 + If no recovery is taking or has taken place, the `BeginBuffer` returns a read version to all clients in the batch. 47 + 48 + ### Performing reads 49 + 50 + Before performing a read, the client needs to ask a `CommitBuffer` for information about the relevant shard and where it's stored. 51 + This shard information can be aggressively cached as shards are moved rarely. 52 + If a Storage server turns out to no longer be serving the requested shard (i.e. the cache is stale), 53 + the client will ask the `CommitBuffer` for updated shard information. 54 + 55 + To perform reads, the client sends read requests to `Storage` servers. 56 + The requests include the read version as well as the Transaction Plane generation, 57 + which is needed to prevent reads of uncommitted data following a recovery. 58 + Reads are effectively performed against a consistent snapshot of the database at the read version. 59 + 60 + Reads which cross shard boundaries are split and sent to their respective Storage servers. 61 + The results are joined. 62 + 63 + When reads are performed, the keys or key ranges read are tracked interally as read conflicts. 64 + If and when the transaction is committed, these read conflicts will be used to perform concurrency control. 65 + 66 + ## Performing writes 67 + 68 + Writes are simply buffered internally such that they can be sent when the transaction is committed. 69 + 70 + Write conflicts are also tracked in a manner similar to read conflicts, again to be used for concurrency control. 71 + This may seem redundant (we already have the writes themselves), 72 + but in some cases advanced clients may choose to manipulate the write conflicts independently to improve performance or selectively weaken consistency guarantees. 73 + However, such manipulation is dangerous and not generally recommended.

lib/servers/begin_buffer.ex

··· 141 141 # We use the replication_factor for the current generation to do the liveness check 142 142 %TLogGeneration{replication_factor: replication_factor} = hd(state.cluster.tlog_generations) 143 143 144 + # TODO: handle teams 144 145 case length(state.check_locked_reply_ids) >= replication_factor do 145 146 true -> 146 147 read_version = state.batch_read_version

Configure Feed

Configure Feed