@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Show "Last Writer" and "Last Write At" in the UI, add more documentation

Summary:
Ref T10751. Make the UI more useful and explain what failure states mean and how to get out of them.

The `bin/repository thaw` command does not exist yet, I'll write that soon.

Test Plan: {F1238241}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10751

Differential Revision: https://secure.phabricator.com/D15766

+112 -7
+29
src/applications/diffusion/management/DiffusionRepositoryClusterManagementPanel.php
··· 104 104 ->setIcon('fa-pencil grey'); 105 105 } 106 106 107 + $write_properties = null; 108 + if ($version) { 109 + $write_properties = $version->getWriteProperties(); 110 + if ($write_properties) { 111 + try { 112 + $write_properties = phutil_json_decode($write_properties); 113 + } catch (Exception $ex) { 114 + $write_properties = null; 115 + } 116 + } 117 + } 118 + 119 + if ($write_properties) { 120 + $writer_phid = idx($write_properties, 'userPHID'); 121 + $last_writer = $viewer->renderHandle($writer_phid); 122 + 123 + $writer_epoch = idx($write_properties, 'epoch'); 124 + $writer_epoch = phabricator_datetime($writer_epoch, $viewer); 125 + } else { 126 + $last_writer = null; 127 + $writer_epoch = null; 128 + } 129 + 107 130 $rows[] = array( 108 131 $binding_icon, 109 132 phutil_tag( ··· 114 137 $device->getName()), 115 138 $version_number, 116 139 $is_writing, 140 + $last_writer, 141 + $writer_epoch, 117 142 ); 118 143 } 119 144 } ··· 126 151 pht('Device'), 127 152 pht('Version'), 128 153 pht('Writing'), 154 + pht('Last Writer'), 155 + pht('Last Write At'), 129 156 )) 130 157 ->setColumnClasses( 131 158 array( ··· 133 160 null, 134 161 null, 135 162 'right wide', 163 + null, 164 + 'date', 136 165 )); 137 166 138 167 $doc_href = PhabricatorEnv::getDoclink('Cluster: Repositories');
+1 -2
src/applications/repository/storage/PhabricatorRepositoryWorkingCopyVersion.php
··· 111 111 $conn_w, 112 112 'UPDATE %T SET 113 113 repositoryVersion = %d, 114 - isWriting = 0, 115 - writeProperties = null 114 + isWriting = 0 116 115 WHERE 117 116 repositoryPHID = %s AND 118 117 devicePHID = %s AND
+82 -5
src/docs/user/cluster/cluster_repositories.diviner
··· 123 123 normally means that it is actively receiving a push, but can also mean that 124 124 there was a write interruption. See "Write Interruptions" below for details. 125 125 126 + **Last Writer**: This column identifies the user who most recently pushed a 127 + change to this device. If the write lock is currently held, this user is 128 + the user whose change is holding the lock. 129 + 130 + **Last Write At**: When the most recent write started. If the write lock is 131 + currently held, this shows when the lock was acquired. 132 + 126 133 127 134 Write Interruptions 128 135 =================== 129 136 130 137 A repository cluster can be put into an inconsistent state by an interruption 131 - in a brief window immediately after a write. 138 + in a brief window during and immediately after a write. 132 139 133 140 Phabricator can not commit changes to a working copy (stored on disk) and to 134 141 the global state (stored in a database) atomically, so there is a narrow window 135 142 between committing these two different states when some tragedy (like a 136 143 lightning strike) can befall a server, leaving the global and local views of 137 - the repository state divergent. 144 + the repository state possibly divergent. 138 145 139 - In these cases, Phabricator fails into a "frozen" state where further writes 146 + In these cases, Phabricator fails into a frozen state where further writes 140 147 are not permitted until the failure is investigated and resolved. 141 148 142 - TODO: Complete the support tooling and provide recovery instructions. 149 + You can use the monitoring console to review the state of a frozen repository 150 + with a held write lock. The **Writing** column will show which node is holding 151 + the lock, and whoever is named in the **Last Writer** column may be able to 152 + help you figure out what happened by providing more information about what they 153 + were doing and what they observed. 154 + 155 + Because the push was not acknowledged, it is normally safe to demote the node: 156 + the user should have received an error anyway, and should not expect their push 157 + to have worked. However, data is technically at risk and you may want to 158 + investigate further and try to understand the issue in more detail before 159 + continuing. 160 + 161 + There is no way to explicitly keep the write, but if it was committed to disk 162 + you can recover it manually from the working copy on the device and then push 163 + it again. 164 + 165 + If you demote the node, the in-process write will be thrown away, even if it 166 + was complete on disk. To demote the node and release the write lock, run this 167 + command: 168 + 169 + ``` 170 + phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net 171 + ``` 172 + 173 + {icon exclamation-triangle, color="yellow"} Any committed but unacknowledged 174 + data on the device will be lost. 143 175 144 176 145 177 Loss of Leaders ··· 167 199 present on the leaders but not present on the followers by examining the 168 200 push logs. 169 201 170 - TODO: Complete the support tooling and provide recovery instructions. 202 + If you are comfortable discarding these changes, you can instruct Phabricator 203 + that it can forget about the leaders in two ways: disable the service bindings 204 + to all of the leader nodes so they are no longer part of the cluster, or 205 + use `bin/repository thaw` to `--demote` the leaders explicitly. 206 + 207 + If you do this, **you will lose data**. Either action will discard any changes 208 + on the affected leaders which have not replicated to other nodes in the cluster. 209 + 210 + To demote a device, run this command: 211 + 212 + ``` 213 + phabricator/ $ ./bin/repository thaw rXYZ --demote repo002.corp.net 214 + ``` 215 + 216 + {icon exclamation-triangle, color="red"} Any data which is only present on 217 + **this** device will be lost. 218 + 219 + 220 + Ambiguous Leaders 221 + ================= 222 + 223 + Repository clusters can also freeze if the leader nodes are ambiguous. This 224 + can happen if you replace an entire cluster with new devices suddenly, or 225 + make a mistake with the `--demote` flag. 226 + 227 + When Phabricator can not tell which node in a cluster is a leader, it freezes 228 + the cluster because it is possible that some nodes have less data and others 229 + have more, and if it choses a leader arbitrarily it may destroy some data 230 + which you would prefer to retain. 231 + 232 + To resolve this, you need to tell Phabricator which node has the most 233 + up-to-date data and promote that node to become a leader. If you do this, 234 + **you may lose data** if you promote the wrong node, and some other node 235 + really had more up-to-date data. If you want to double check, you can examine 236 + the working copies on disk before promoting, by connecting to the machines and 237 + using commands like `git log` to inspect state. 238 + 239 + Once you have identified a node which has data you're happy with, use 240 + `bin/repository thaw` to `--promote` the device: 241 + 242 + ``` 243 + phabricator/ $ ./bin/repository thaw rXYZ --promote repo002.corp.net 244 + ``` 245 + 246 + {icon exclamation-triangle, color="red"} Any data which is only present on 247 + **other** devices will be lost. 171 248 172 249 173 250 Backups