Fill in missing cluster database documentation · recaptime.dev/phorge@bab3690

+86 -10

1 changed file

expand all

src

docs

user

cluster

cluster_databases.diviner

+86 -10

src/docs/user/cluster/cluster_databases.diviner

··· 6 6 Overview 7 7 ======== 8 8 9 - WARNING: This feature is a very early prototype; the features this document 10 - describes are mostly speculative fantasy. 11 - 12 9 You can deploy Phabricator with multiple database hosts, configured as a master 13 10 and a set of replicas. The advantages of doing this are: 14 11 15 12 - faster recovery from disasters by promoting a replica; 16 - - graceful degradation if the master fails; 17 - - reduced load on the master; and 13 + - graceful degradation if the master fails; and 18 14 - some tools to help monitor and manage replica health. 19 15 20 16 This configuration is complex, and many installs do not need to pursue it. 21 17 22 - Phabricator can not currently be configured into a multi-master mode, nor can 23 - it be configured to automatically promote a replica to become the new master. 24 - 25 18 If you lose the master, Phabricator can degrade automatically into read-only 26 19 mode and remain available, but can not fully recover without operational 27 20 intervention unless the master recovers on its own. 28 21 22 + Phabricator will not currently send read traffic to replicas unless the master 23 + has failed, so configuring a replica will not currently spread any load away 24 + from the master. Future versions of Phabricator are expected to be able to 25 + distribute some read traffic to replicas. 26 + 27 + Phabricator can not currently be configured into a multi-master mode, nor can 28 + it be configured to automatically promote a replica to become the new master. 29 + There are no current plans to support multi-master mode or autonomous failover, 30 + although this may change in the future. 31 + 29 32 30 33 Setting up MySQL Replication 31 34 ============================ 32 35 33 - TODO: Write this section. 36 + To begin, set up a replica database server and configure MySQL replication. 37 + 38 + If you aren't sure how to do this, refer to the MySQL manual for instructions. 39 + The MySQL documentation is comprehensive and walks through the steps and 40 + options in good detail. You should understand MySQL replication before 41 + deploying it in production: Phabricator layers on top of it, and does not 42 + attempt to abstract it away. 43 + 44 + Some useful notes for configuring replication for Phabricator: 45 + 46 + **Binlog Format**: Phabricator issues some queries which MySQL will detect as 47 + unsafe if you use the `STATEMENT` binlog format (the default). Instead, use 48 + `MIXED` (recommended) or `ROW` as the `binlog_format`. 49 + 50 + **Grant `REPLICATION CLIENT` Privilege**: If you give the user that Phabricator 51 + will use to connect to the replica database server the `REPLICATION CLIENT` 52 + privilege, Phabricator's status console can give you more information about 53 + replica health and state. 54 + 55 + **Copying Data to Replicas**: Phabricator currently uses a mixture of MyISAM 56 + and InnoDB tables, so it can be difficult to guarantee that a dump is wholly 57 + consistent and suitable for loading into a replica because MySQL uses different 58 + consistency mechanisms for the different storage engines. 59 + 60 + An approach you may want to consider to limit downtime but still produce a 61 + consistent dump is to leave Phabricator running but configured in read-only 62 + mode while dumping: 63 + 64 + - Stop all the daemons. 65 + - Set `cluster.read-only` to `true` and deploy the new configuration. The 66 + web UI should now show that Phabricator is in "Read Only" mode. 67 + - Dump the database. You can do this with `bin/storage dump --for-replica` 68 + to add the `--master-data` flag to the underlying command and include a 69 + `CHANGE MASTER ...` statement in the dump. 70 + - Once the dump finishes, turn `cluster.read-only` off again to restore 71 + service. Continue loading the dump into the replica normally. 72 + 73 + **Log Expiration**: You can configure MySQL to automatically clean up old 74 + binary logs on startup with the `expire_logs_days` option. If you do not 75 + configure this and do not explicitly purge old logs with `PURGE BINARY LOGS`, 76 + the binary logs on disk will grow unboundedly and relatively quickly. 77 + 78 + Once you have a working replica, continue below to tell Phabricator about it. 34 79 35 80 36 81 Configuring Replicas ··· 207 252 Promoting a Replica 208 253 =================== 209 254 210 - TODO: Write this section. 255 + If you lose access to the master database, Phabricator will degrade into 256 + read-only mode. This is described in greater detail below. 257 + 258 + The easiest way to get out of read-only mode is to restore the master database. 259 + If the database recovers on its own or operations staff can revive it, 260 + Phabricator will return to full working order after a few moments. 261 + 262 + If you can't restore the master or are unsure you will be able to restore the 263 + master quickly, you can promote a replica to become the new master instead. 264 + 265 + Before doing this, you should first assess how far behind the master the 266 + replica was when the link died. Any data which was not replicated will either 267 + be lost or become very difficult to recover after you promote a replica. 268 + 269 + For example, if some `T1234` had been created on the master but had not yet 270 + replicated and you promote the replica, a new `T1234` may be created on the 271 + replica after promotion. Even if you can recover the master later, merging 272 + the data will be difficult because each database may have conflicting changes 273 + which can not be merged easily. 274 + 275 + If there was a significant replication delay at the time of the failure, you 276 + may wait to try harder or spend more time attempting to recover the master 277 + before choosing to promote. 278 + 279 + If you have made a choice to promote, disable replication on the replica and 280 + mark it as the `master` in `cluster.databases`. Remove the original master and 281 + deploy the configuration change to all surviving hosts. 282 + 283 + Once write service is restored, you should provision, deploy, and configure a 284 + new replica by following the steps you took the first time around. You are 285 + critically vulnerable to a second disruption until you have restored the 286 + redundancy. 211 287 212 288 213 289 Unreachable Masters

Configure Feed

Configure Feed