Add slightly more cluster repository documentation

+90 -4

1 changed file

expand all

src

docs

user

cluster

cluster_repositories.diviner

+90 -4

src/docs/user/cluster/cluster_repositories.diviner

··· 19 19 20 20 This configuration is complex, and many installs do not need to pursue it. 21 21 22 - This configuration is not currently supported with Subversion. 22 + This configuration is not currently supported with Subversion or Mercurial. 23 23 24 24 25 25 Repository Hosts 26 26 ================ 27 27 28 28 Repository hosts must run a complete, fully configured copy of Phabricator, 29 - including a webserver. If you make repositories available over SSH, they must 30 - also run a properly configured `sshd`. 29 + including a webserver. They must also run a properly configured `sshd`. 31 30 32 31 Generally, these hosts will run the same set of services and configuration that 33 32 web hosts run. If you prefer, you can overlay these services and put web and 34 - repository services on the same hosts. 33 + repository services on the same hosts. See @{article:Clustering Introduction} 34 + for some guidance on overlaying services. 35 35 36 36 When a user requests information about a repository that can only be satisfied 37 37 by examining a repository working copy, the webserver receiving the request ··· 57 57 Before responding to a write, replicas obtain a global lock, perform the same 58 58 version check and fetch if necessary, then allow the write to continue. 59 59 60 + Additionally, repositories passively check other nodes for updates and 61 + replicate changes in the background. After you push a change to a repositroy, 62 + it will usually spread passively to all other repository nodes within a few 63 + minutes. 64 + 65 + Even if passive replication is slow, the active replication makes acknowledged 66 + changes sequential to all observers: after a write is acknowledged, all 67 + subsequent reads are guaranteed to see it. The system does not permit stale 68 + reads, and you do not need to wait for a replication delay to see a consistent 69 + view of the repository no matter which node you ask. 70 + 60 71 61 72 HTTP vs HTTPS 62 73 ============= ··· 82 93 83 94 Other mitigations are possible, but securing a network against the NSA and 84 95 similar agents of other rogue nations is beyond the scope of this document. 96 + 97 + 98 + Monitoring Replication 99 + ====================== 100 + 101 + You can review the current status of a repository on cluster nodes in 102 + {nav Diffusion > (Repository) > Manage Repository > Cluster Configuration}. 103 + 104 + This screen shows all the configured devices which are hosting the repository 105 + and the available version. 106 + 107 + **Version**: When a repository is mutated by a push, Phabricator increases 108 + an internal version number for the repository. This column shows which version 109 + is on disk on the corresponding node. 110 + 111 + After a change is pushed, the node which received the change will have a larger 112 + version number than the other nodes. The change should be passively replicated 113 + to the remaining nodes after a brief period of time, although this can take 114 + a while if the change was large or the network connection between nodes is 115 + slow or unreliable. 116 + 117 + You can click the version number to see the corresponding push logs for that 118 + change. The logs contain details about what was changed, and can help you 119 + identify if replication is slow because a change is large or for some other 120 + reason. 121 + 122 + **Writing**: This shows that the node is currently holding a write lock. This 123 + normally means that it is actively receiving a push, but can also mean that 124 + there was a write interruption. See "Write Interruptions" below for details. 125 + 126 + 127 + Write Interruptions 128 + =================== 129 + 130 + A repository cluster can be put into an inconsistent state by an interruption 131 + in a brief window immediately after a write. 132 + 133 + Phabricator can not commit changes to a working copy (stored on disk) and to 134 + the global state (stored in a database) atomically, so there is a narrow window 135 + between committing these two different states when some tragedy (like a 136 + lightning strike) can befall a server, leaving the global and local views of 137 + the repository state divergent. 138 + 139 + In these cases, Phabricator fails into a "frozen" state where further writes 140 + are not permitted until the failure is investigated and resolved. 141 + 142 + TODO: Complete the support tooling and provide recovery instructions. 143 + 144 + 145 + Loss of Leaders 146 + =============== 147 + 148 + A more straightforward failure condition is the loss of all servers in a 149 + cluster which have the most up-to-date copy of a repository. This looks like 150 + this: 151 + 152 + - There is a cluster setup with two nodes, X and Y. 153 + - A new change is pushed to server X. 154 + - Before the change can propagate to server Y, lightning strikes server X 155 + and destroys it. 156 + 157 + Here, all of the "leader" nodes with the most up-to-date copy of the repository 158 + have been lost. Phabricator will refuse to serve this repository because it 159 + can not serve it consistently, and can not accept writes without data loss. 160 + 161 + The most straightforward way to resolve this issue is to restore any leader to 162 + service. The change will be able to replicate to other nodes once a leader 163 + comes back online. 164 + 165 + If you are unable to restore a leader or unsure that you can restore one 166 + quickly, you can use the monitoring console to review which changes are 167 + present on the leaders but not present on the followers by examining the 168 + push logs. 169 + 170 + TODO: Complete the support tooling and provide recovery instructions. 85 171 86 172 87 173 Backups

Configure Feed

Configure Feed