@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Fill in missing cluster database documentation

Summary:
Ref T10751. Provide some guidance on replicas and promotion.

I'm not trying to walk administrators through the gritty details of this. It's not too complex, they should understand it, and the MySQL documentation is pretty thorough.

Test Plan: Read documentation.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10751

Differential Revision: https://secure.phabricator.com/D15763

+86 -10
+86 -10
src/docs/user/cluster/cluster_databases.diviner
··· 6 6 Overview 7 7 ======== 8 8 9 - WARNING: This feature is a very early prototype; the features this document 10 - describes are mostly speculative fantasy. 11 - 12 9 You can deploy Phabricator with multiple database hosts, configured as a master 13 10 and a set of replicas. The advantages of doing this are: 14 11 15 12 - faster recovery from disasters by promoting a replica; 16 - - graceful degradation if the master fails; 17 - - reduced load on the master; and 13 + - graceful degradation if the master fails; and 18 14 - some tools to help monitor and manage replica health. 19 15 20 16 This configuration is complex, and many installs do not need to pursue it. 21 17 22 - Phabricator can not currently be configured into a multi-master mode, nor can 23 - it be configured to automatically promote a replica to become the new master. 24 - 25 18 If you lose the master, Phabricator can degrade automatically into read-only 26 19 mode and remain available, but can not fully recover without operational 27 20 intervention unless the master recovers on its own. 28 21 22 + Phabricator will not currently send read traffic to replicas unless the master 23 + has failed, so configuring a replica will not currently spread any load away 24 + from the master. Future versions of Phabricator are expected to be able to 25 + distribute some read traffic to replicas. 26 + 27 + Phabricator can not currently be configured into a multi-master mode, nor can 28 + it be configured to automatically promote a replica to become the new master. 29 + There are no current plans to support multi-master mode or autonomous failover, 30 + although this may change in the future. 31 + 29 32 30 33 Setting up MySQL Replication 31 34 ============================ 32 35 33 - TODO: Write this section. 36 + To begin, set up a replica database server and configure MySQL replication. 37 + 38 + If you aren't sure how to do this, refer to the MySQL manual for instructions. 39 + The MySQL documentation is comprehensive and walks through the steps and 40 + options in good detail. You should understand MySQL replication before 41 + deploying it in production: Phabricator layers on top of it, and does not 42 + attempt to abstract it away. 43 + 44 + Some useful notes for configuring replication for Phabricator: 45 + 46 + **Binlog Format**: Phabricator issues some queries which MySQL will detect as 47 + unsafe if you use the `STATEMENT` binlog format (the default). Instead, use 48 + `MIXED` (recommended) or `ROW` as the `binlog_format`. 49 + 50 + **Grant `REPLICATION CLIENT` Privilege**: If you give the user that Phabricator 51 + will use to connect to the replica database server the `REPLICATION CLIENT` 52 + privilege, Phabricator's status console can give you more information about 53 + replica health and state. 54 + 55 + **Copying Data to Replicas**: Phabricator currently uses a mixture of MyISAM 56 + and InnoDB tables, so it can be difficult to guarantee that a dump is wholly 57 + consistent and suitable for loading into a replica because MySQL uses different 58 + consistency mechanisms for the different storage engines. 59 + 60 + An approach you may want to consider to limit downtime but still produce a 61 + consistent dump is to leave Phabricator running but configured in read-only 62 + mode while dumping: 63 + 64 + - Stop all the daemons. 65 + - Set `cluster.read-only` to `true` and deploy the new configuration. The 66 + web UI should now show that Phabricator is in "Read Only" mode. 67 + - Dump the database. You can do this with `bin/storage dump --for-replica` 68 + to add the `--master-data` flag to the underlying command and include a 69 + `CHANGE MASTER ...` statement in the dump. 70 + - Once the dump finishes, turn `cluster.read-only` off again to restore 71 + service. Continue loading the dump into the replica normally. 72 + 73 + **Log Expiration**: You can configure MySQL to automatically clean up old 74 + binary logs on startup with the `expire_logs_days` option. If you do not 75 + configure this and do not explicitly purge old logs with `PURGE BINARY LOGS`, 76 + the binary logs on disk will grow unboundedly and relatively quickly. 77 + 78 + Once you have a working replica, continue below to tell Phabricator about it. 34 79 35 80 36 81 Configuring Replicas ··· 207 252 Promoting a Replica 208 253 =================== 209 254 210 - TODO: Write this section. 255 + If you lose access to the master database, Phabricator will degrade into 256 + read-only mode. This is described in greater detail below. 257 + 258 + The easiest way to get out of read-only mode is to restore the master database. 259 + If the database recovers on its own or operations staff can revive it, 260 + Phabricator will return to full working order after a few moments. 261 + 262 + If you can't restore the master or are unsure you will be able to restore the 263 + master quickly, you can promote a replica to become the new master instead. 264 + 265 + Before doing this, you should first assess how far behind the master the 266 + replica was when the link died. Any data which was not replicated will either 267 + be lost or become very difficult to recover after you promote a replica. 268 + 269 + For example, if some `T1234` had been created on the master but had not yet 270 + replicated and you promote the replica, a new `T1234` may be created on the 271 + replica after promotion. Even if you can recover the master later, merging 272 + the data will be difficult because each database may have conflicting changes 273 + which can not be merged easily. 274 + 275 + If there was a significant replication delay at the time of the failure, you 276 + may wait to try harder or spend more time attempting to recover the master 277 + before choosing to promote. 278 + 279 + If you have made a choice to promote, disable replication on the replica and 280 + mark it as the `master` in `cluster.databases`. Remove the original master and 281 + deploy the configuration change to all surviving hosts. 282 + 283 + Once write service is restored, you should provision, deploy, and configure a 284 + new replica by following the steps you took the first time around. You are 285 + critically vulnerable to a second disruption until you have restored the 286 + redundancy. 211 287 212 288 213 289 Unreachable Masters