@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Make "bin/repository thaw" workflow more clear when devices are disabled

Summary:
Ref T13216. See PHI943. If autoscale lightning strikes all your servers at once and destroys them, the path to recovery can be unclear. You're "supposed" to:

- demote all the devices;
- disable the bindings;
- bind the new servers;
- put whatever working copies you can scrape up back on disk;
- promote one of the new servers.

However, the documentation is a bit misleading (it was sort of written with "you lost one or two devices" in mind, not "you lost every device") and demote-before-disable is unnecessary and slightly risky if servers come back online. There's also a missing guardrail before the promote step which lets you accidentally skip the demotion step and end up in a confusing state. Instead:

- Add a guard rail: when you try to promote a new server, warn if inactive devices still have versions and tell the user to demote them.
- Allow demotion of inactive devices: the order "disable, demote" is safer and more intuitive than "demote, disable" and there's no reason to require the unintuitive order.
- Make the "cluster already has leaders" message more clear.
- Make the documentation more clear.

Test Plan:
- Bound a repository to two devices.
- Wrote to A to make it a leader, then disabled it (simulating a lightning strike).
- Tried to promote B. Got a new, useful error ("demote A first").
- Demoted A (before: error about demoting inactive devices; now: works fine).
- Promoted B. This worked.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13216

Differential Revision: https://secure.phabricator.com/D19793

+71 -37
+61 -23
src/applications/repository/management/PhabricatorRepositoryManagementThawWorkflow.php
··· 120 120 $repository->getDisplayName())); 121 121 } 122 122 123 - $bindings = $service->getActiveBindings(); 124 - $bindings = mpull($bindings, null, 'getDevicePHID'); 125 - if (empty($bindings[$device->getPHID()])) { 126 - throw new PhutilArgumentUsageException( 127 - pht( 128 - 'Repository "%s" has no active binding to device "%s". Only '. 129 - 'actively bound devices can be promoted or demoted.', 130 - $repository->getDisplayName(), 131 - $device->getName())); 132 - } 123 + if ($promote) { 124 + // You can only promote active devices. (You may demote active or 125 + // inactive devices.) 126 + $bindings = $service->getActiveBindings(); 127 + $bindings = mpull($bindings, null, 'getDevicePHID'); 128 + if (empty($bindings[$device->getPHID()])) { 129 + throw new PhutilArgumentUsageException( 130 + pht( 131 + 'Repository "%s" has no active binding to device "%s". Only '. 132 + 'actively bound devices can be promoted.', 133 + $repository->getDisplayName(), 134 + $device->getName())); 135 + } 133 136 134 - $versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions( 135 - $repository->getPHID()); 137 + $versions = PhabricatorRepositoryWorkingCopyVersion::loadVersions( 138 + $repository->getPHID()); 139 + $versions = mpull($versions, null, 'getDevicePHID'); 136 140 137 - $versions = mpull($versions, null, 'getDevicePHID'); 138 - $versions = array_select_keys($versions, array_keys($bindings)); 141 + // Before we promote, make sure there are no outstanding versions on 142 + // devices with inactive bindings. If there are, you need to demote 143 + // these first. 144 + $inactive = array(); 145 + foreach ($versions as $device_phid => $version) { 146 + if (isset($bindings[$device_phid])) { 147 + continue; 148 + } 149 + $inactive[$device_phid] = $version; 150 + } 139 151 140 - if ($versions && $promote) { 141 - throw new PhutilArgumentUsageException( 142 - pht( 143 - 'Unable to promote "%s" for repository "%s": the leaders for '. 144 - 'this cluster are not ambiguous.', 145 - $device->getName(), 146 - $repository->getDisplayName())); 147 - } 152 + if ($inactive) { 153 + $handles = $viewer->loadHandles(array_keys($inactive)); 154 + 155 + $handle_list = iterator_to_array($handles); 156 + $handle_list = mpull($handle_list, 'getName'); 157 + $handle_list = implode(', ', $handle_list); 148 158 149 - if ($promote) { 159 + throw new PhutilArgumentUsageException( 160 + pht( 161 + 'Repository "%s" has versions on inactive devices. Demote '. 162 + '(or reactivate) these devices before promoting a new '. 163 + 'leader: %s.', 164 + $repository->getDisplayName(), 165 + $handle_list)); 166 + } 167 + 168 + // Now, make sure there are no outstanding versions on devices with 169 + // active bindings. These also need to be demoted (or promoting is a 170 + // mistake or already happened). 171 + $active = array_select_keys($versions, array_keys($bindings)); 172 + if ($active) { 173 + $handles = $viewer->loadHandles(array_keys($active)); 174 + 175 + $handle_list = iterator_to_array($handles); 176 + $handle_list = mpull($handle_list, 'getName'); 177 + $handle_list = implode(', ', $handle_list); 178 + 179 + throw new PhutilArgumentUsageException( 180 + pht( 181 + 'Unable to promote "%s" for repository "%s" because this '. 182 + 'cluster already has one or more unambiguous leaders: %s.', 183 + $device->getName(), 184 + $repository->getDisplayName(), 185 + $handle_list)); 186 + } 187 + 150 188 PhabricatorRepositoryWorkingCopyVersion::updateVersion( 151 189 $repository->getPHID(), 152 190 $device->getPHID(),
+10 -14
src/docs/user/cluster/cluster_repositories.diviner
··· 414 414 push logs. 415 415 416 416 If you are comfortable discarding these changes, you can instruct Phabricator 417 - that it can forget about the leaders in two ways: disable the service bindings 418 - to all of the leader devices so they are no longer part of the cluster, or use 419 - `bin/repository thaw` to `--demote` the leaders explicitly. 420 - 421 - If you do this, **you will lose data**. Either action will discard any changes 422 - on the affected leaders which have not replicated to other devices in the 423 - cluster. 424 - 425 - To remove a device from the cluster, disable all of the bindings to it 426 - in Almanac, using the web UI. 417 + that it can forget about the leaders by doing this: 427 418 428 - {icon exclamation-triangle, color="red"} Any data which is only present on 429 - the disabled device will be lost. 419 + - Disable the service bindings to all of the leader devices so they are no 420 + longer part of the cluster. 421 + - Then, use `bin/repository thaw` to `--demote` the leaders explicitly. 430 422 431 - To demote a device without removing it from the cluster, run this command: 423 + To demote a device, run this command: 432 424 433 425 ``` 434 426 phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net 435 427 ``` 436 428 437 429 {icon exclamation-triangle, color="red"} Any data which is only present on 438 - **this** device will be lost. 430 + the demoted device will be lost. 431 + 432 + If you do this, **you will lose unreplicated data**. You will discard any 433 + changes on the affected leaders which have not replicated to other devices 434 + in the cluster. 439 435 440 436 441 437 Ambiguous Leaders