@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Support Aphlict clustering

Summary:
Ref T6915. This allows multiple notification servers to talk to each other:

- Every server has a list of every other server, including itself.
- Every server generates a unique fingerprint at startup, like "XjeHuPKPBKHUmXkB".
- Every time a server gets a message, it marks it with its personal fingerprint, then sends it to every other server.
- Servers do not retransmit messages that they've already seen (already marked with their fingerprint).
- Servers learn other servers' fingerprints after they send them a message, and stop sending them messages they've already seen.

This is pretty crude, and the first message to a cluster will transmit N^2 times, but N is going to be like 3 or 4 in even the most extreme cases for a very long time.

The fingerprinting stops cycles, and stops servers from sending themselves copies of messages.

We don't need to do anything more sophisticated than this because it's fine if some notifications get lost when a server dies. Clients will reconnect after a short period of time and life will continue.

Test Plan:
- Wrote two server configs.
- Started two servers.
- Told Phabricator about all four services.
- Loaded Chrome and Safari.
- Saw them connect to different servers.
- Sent messages in one, got notifications in the other (magic!).
- Saw the fingerprinting stuff work on the console, no infinite retransmission of messages, etc.

(This pretty much just worked when I ran it the first time so I probably missed something?)

{F1218835}

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T6915

Differential Revision: https://secure.phabricator.com/D15711

+507 -24
+50 -1
src/applications/aphlict/management/PhabricatorAphlictManagementWorkflow.php
··· 76 76 array( 77 77 'servers' => 'list<wild>', 78 78 'logs' => 'optional list<wild>', 79 + 'cluster' => 'optional list<wild>', 79 80 'pidfile' => 'string', 80 81 )); 81 82 } catch (Exception $ex) { ··· 193 194 'admin')); 194 195 } 195 196 196 - $logs = $data['logs']; 197 + $logs = idx($data, 'logs', array()); 197 198 foreach ($logs as $index => $log) { 198 199 PhutilTypeSpec::checkMap( 199 200 $log, ··· 216 217 'choose a different logfile location. %s', 217 218 $dir, 218 219 $ex->getMessage())); 220 + } 221 + } 222 + 223 + $peer_map = array(); 224 + 225 + $cluster = idx($data, 'cluster', array()); 226 + foreach ($cluster as $index => $peer) { 227 + PhutilTypeSpec::checkMap( 228 + $peer, 229 + array( 230 + 'host' => 'string', 231 + 'port' => 'int', 232 + 'protocol' => 'string', 233 + )); 234 + 235 + $host = $peer['host']; 236 + $port = $peer['port']; 237 + $protocol = $peer['protocol']; 238 + 239 + switch ($protocol) { 240 + case 'http': 241 + case 'https': 242 + break; 243 + default: 244 + throw new PhutilArgumentUsageException( 245 + pht( 246 + 'Configuration file specifies cluster peer ("%s", at index '. 247 + '"%s") with an invalid protocol, "%s". Valid protocols are '. 248 + '"%s" or "%s".', 249 + $host, 250 + $index, 251 + $protocol, 252 + 'http', 253 + 'https')); 254 + } 255 + 256 + $peer_key = "{$host}:{$port}"; 257 + if (!isset($peer_map[$peer_key])) { 258 + $peer_map[$peer_key] = $index; 259 + } else { 260 + throw new PhutilArgumentUsageException( 261 + pht( 262 + 'Configuration file specifies cluster peer "%s" more than '. 263 + 'once (at indexes "%s" and "%s"). Each peer must have a '. 264 + 'unique host and port combination.', 265 + $peer_key, 266 + $peer_map[$peer_key], 267 + $index)); 219 268 } 220 269 } 221 270
+3
src/applications/notification/client/PhabricatorNotificationClient.php
··· 19 19 20 20 public static function tryToPostMessage(array $data) { 21 21 $servers = PhabricatorNotificationServerRef::getEnabledAdminServers(); 22 + 23 + shuffle($servers); 24 + 22 25 foreach ($servers as $server) { 23 26 try { 24 27 $server->postMessage($data);
+49 -11
src/docs/user/cluster/cluster.diviner
··· 132 132 naturally somewhat resistant to data loss: every clone of a repository includes 133 133 the entire history. 134 134 135 + Repositories may become a scalability bottleneck, although this is rare unless 136 + your install has an unusually heavy repository read volume. Slow clones/fetches 137 + may hint at a repository capacity problem. Adding more repository hosts will 138 + provide an approximately linear increase in capacity. 139 + 135 140 For details, see @{article:Cluster: Repositories}. 136 141 137 142 ··· 146 151 at least one host remains alive. Daemons are stateless, so spreading daemons 147 152 across multiple hosts provides no resistance to data loss. 148 153 154 + Daemons can become a bottleneck, particularly if your install sees a large 155 + volume of write traffic to repositories. If the daemon task queue has a 156 + backlog, that hints at a capacity problem. If existing hosts have unused 157 + resources, increase `phd.taskmasters` until they are fully utilized. From 158 + there, adding more daemon hosts will provide an approximately linear increase 159 + in capacity. 160 + 149 161 For details, see @{article:Cluster: Daemons}. 150 162 151 163 ··· 157 169 158 170 With multiple web hosts, you can transparently survive the loss of any subset 159 171 of hosts as long as at least one host remains alive. Web hosts are stateless, 160 - so putting multiple hosts in service provides no resistance to data loss. 172 + so putting multiple hosts in service provides no resistance to data loss 173 + because no data is at risk. 174 + 175 + Web hosts can become a bottleneck, particularly if you have a workload that is 176 + heavily focused on reads from the web UI (like a public install with many 177 + anonymous users). Slow responses to web requests may hint at a web capacity 178 + problem. Adding more hosts will provide an approximately linear increase in 179 + capacity. 161 180 162 181 For details, see @{article:Cluster: Web Servers}. 163 182 164 183 184 + Cluster: Notifications 185 + ====================== 186 + 187 + Configuring multiple notification hosts is simple and has no pre-requisites. 188 + 189 + With multiple notification hosts, you can survive the loss of any subset of 190 + hosts as long as at least one host remains alive. Service may be breifly 191 + disrupted directly after the incident which destroys the other hosts. 192 + 193 + Notifications are noncritical, so this normally has little practical impact 194 + on service availability. Notifications are also stateless, so clustering this 195 + service provides no resistance to data loss because no data is at risk. 196 + 197 + Notification delivery normally requires very few resources, so adding more 198 + hosts is unlikely to have much impact on scalability. 199 + 200 + For details, see @{article:Cluster: Notifications}. 201 + 202 + 165 203 Overlaying Services 166 204 =================== 167 205 ··· 172 210 173 211 In planning a cluster, consider these blended host types: 174 212 175 - **Everything**: Run HTTP, SSH, MySQL, repositories and daemons on a single 176 - host. This is the starting point for single-node setups, and usually also the 177 - best configuration when adding the second node. 213 + **Everything**: Run HTTP, SSH, MySQL, notifications, repositories and daemons 214 + on a single host. This is the starting point for single-node setups, and 215 + usually also the best configuration when adding the second node. 178 216 179 - **Everything Except Databases**: Run HTTP, SSH, repositories and daemons on one 180 - host, and MySQL on a different host. MySQL uses many of the same resources that 181 - other services use. It's also simpler to separate than other services, and 182 - tends to benefit the most from dedicated hardware. 217 + **Everything Except Databases**: Run HTTP, SSH, notifications, repositories and 218 + daemons on one host, and MySQL on a different host. MySQL uses many of the same 219 + resources that other services use. It's also simpler to separate than other 220 + services, and tends to benefit the most from dedicated hardware. 183 221 184 222 **Repositories and Daemons**: Run repositories and daemons on the same host. 185 223 Repository hosts //must// run daemons, and it normally makes sense to ··· 208 246 This section provides some guidance on reasonable ways to scale up a cluster. 209 247 210 248 The smallest possible cluster is **two hosts**. Run everything (web, ssh, 211 - database, repositories, and daemons) on each host. One host will serve as the 212 - master; the other will serve as a replica. 249 + database, notifications, repositories, and daemons) on each host. One host will 250 + serve as the master; the other will serve as a replica. 213 251 214 252 Ideally, you should physically separate these hosts to reduce the chance that a 215 253 natural disaster or infrastructure disruption could disable or destroy both ··· 230 268 onto its own host). 231 269 232 270 After separating databases, separating repository + daemon nodes is likely 233 - the next step. 271 + the next step to consider. 234 272 235 273 To improve **availability**, add another copy of everything you run in one 236 274 datacenter to a new datacenter. For example, if you have a two-node cluster,
+174
src/docs/user/cluster/cluster_notifications.diviner
··· 1 + @title Cluster: Notifications 2 + @group intro 3 + 4 + Configuring Phabricator to use multiple notification servers. 5 + 6 + Overview 7 + ======== 8 + 9 + WARNING: This feature is a very early prototype; the features this document 10 + describes are mostly speculative fantasy. 11 + 12 + You can run multiple notification servers. The advantages of doing this 13 + are: 14 + 15 + - you can completely survive the loss of any subset so long as one 16 + remains standing; and 17 + - performance and capacity may improve. 18 + 19 + This configuration is relatively simple, but has a small impact on availability 20 + and does nothing to increase resitance to data loss. 21 + 22 + 23 + Clustering Design Goals 24 + ======================= 25 + 26 + Notification clustering aims to restore service automatically after the loss 27 + of some nodes. It does **not** attempt to guarantee that every message is 28 + delivered. 29 + 30 + Notification messages provide timely information about events, but they are 31 + never authoritative and never the only way for users to learn about events. 32 + For example, if a notification about a task update is not delivered, the next 33 + page you load will still show the notification in your notification menu. 34 + 35 + Generally, Phabricator works fine without notifications configured at all, so 36 + clustering assumes that losing some messages during a disruption is acceptable. 37 + 38 + 39 + How Clustering Works 40 + ==================== 41 + 42 + Notification clustering is very simple: notification servers relay every 43 + message they receive to a list of peers. 44 + 45 + When you configure clustering, you'll run multiple servers and tell them that 46 + the other servers exist. When any server receives a message, it retransmits it 47 + to all the severs it knows about. 48 + 49 + When a server is lost, clients will automatically reconnect after a brief 50 + delay. They may lose some notifications while their client is reconnecting, 51 + but normally this should only last for a few seconds. 52 + 53 + 54 + Configuring Aphlict 55 + =================== 56 + 57 + To configure clustering on the server side, add a `cluster` key to your 58 + Aphlict configuration file. For more details about configuring Aphlict, 59 + see @{article:Notifications User Guide: Setup and Configuration}. 60 + 61 + The `cluster` key should contain a list of `"admin"` server locations. Every 62 + message the server receives will be retransmitted to all nodes in the list. 63 + 64 + The server is smart enough to avoid sending messages in a cycle, and to avoid 65 + sending messages to itself. You can safely list every server you run in the 66 + configuration file, including the current server. 67 + 68 + You do not need to configure servers in an acyclic graph or only list //other// 69 + servers: just list everything on every server and Aphlict will figure things 70 + out from there. 71 + 72 + A simple example with two servers might look like this: 73 + 74 + ```lang=json, name="aphlict.json (Cluster)" 75 + { 76 + ... 77 + "cluster": [ 78 + { 79 + "host": "notify001.mycompany.com", 80 + "port": 22281, 81 + "protocol": "http" 82 + }, 83 + { 84 + "host": "notify002.mycompany.com", 85 + "port": 22281, 86 + "protocol": "http" 87 + } 88 + ] 89 + ... 90 + } 91 + ``` 92 + 93 + 94 + Configuring Phabricator 95 + ======================= 96 + 97 + To configure clustering on the client side, add every service you run to 98 + `notification.servers`. Generally, this will be twice as many entries as 99 + you run actual servers, since each server runs a `"client"` service and an 100 + `"admin"` service. 101 + 102 + A simple example with the two servers above (providing four total services) 103 + might look like this: 104 + 105 + ```lang=json, name="notification.servers (Cluster)" 106 + [ 107 + { 108 + "type": "client", 109 + "host": "notify001.mycompany.com", 110 + "port": 22280, 111 + "protocol": "https" 112 + }, 113 + { 114 + "type": "client", 115 + "host": "notify002.mycompany.com", 116 + "port": 22280, 117 + "protocol": "https" 118 + }, 119 + { 120 + "type": "admin", 121 + "host": "notify001.mycompany.com", 122 + "port": 22281, 123 + "protocol": "http" 124 + }, 125 + { 126 + "type": "admin", 127 + "host": "notify002.mycompany.com", 128 + "port": 22281, 129 + "protocol": "http" 130 + } 131 + ] 132 + ``` 133 + 134 + If you put all of the `"client"` servers behind a load balancer, you would 135 + just list the load balancer and let it handle pulling nodes in and out of 136 + service. 137 + 138 + ```lang=json, name="notification.servers (Cluster + Load Balancer)" 139 + [ 140 + { 141 + "type": "client", 142 + "host": "notify-lb.mycompany.com", 143 + "port": 22280, 144 + "protocol": "https" 145 + }, 146 + { 147 + "type": "admin", 148 + "host": "notify001.mycompany.com", 149 + "port": 22281, 150 + "protocol": "http" 151 + }, 152 + { 153 + "type": "admin", 154 + "host": "notify002.mycompany.com", 155 + "port": 22281, 156 + "protocol": "http" 157 + } 158 + ] 159 + ``` 160 + 161 + Notification hosts do not need to run any additional services, although they 162 + are free to do so. The notification server generally consumes few resources 163 + and is resistant to most other loads on the machine, so it's reasonable to 164 + overlay these on top of other services wherever it is convenient. 165 + 166 + 167 + Next Steps 168 + ========== 169 + 170 + Continue by: 171 + 172 + - reviewing notification configuration with 173 + @{article:Notifications User Guide: Setup and Configuration}; or 174 + - returning to @{article:Clustering Introduction}.
+13 -1
src/docs/user/configuration/notifications.diviner
··· 77 77 78 78 - `servers`: //Required list.// A list of servers to start. 79 79 - `logs`: //Optional list.// A list of logs to write to. 80 + - `cluster`: //Optional list.// A list of cluster peers. This is an advanced 81 + feature. 80 82 - `pidfile`: //Required string.// Path to a PID file. 81 83 82 84 Each server in the `servers` list should be an object with these keys: ··· 99 101 100 102 - `path`: //Required string.// Path to the log file. 101 103 104 + Each peer in the `cluster` list should be an object with these keys: 105 + 106 + - `host`: //Required string.// The peer host address. 107 + - `port`: //Required int.// The peer port. 108 + - `protocol`: //Required string.// The protocol to connect with, one of 109 + `"http"` or `"https"`. 110 + 111 + Cluster configuration is an advanced topic and can be omitted for most 112 + installs. For more information on how to configure a cluster, see 113 + @{article:Clustering Introduction} and @{article:Cluster: Notifications}. 114 + 102 115 The defaults are appropriate for simple cases, but you may need to adjust them 103 116 if you are running a more complex configuration. 104 - 105 117 106 118 Configuring Phabricator 107 119 =======================
+2
src/infrastructure/storage/management/workflow/PhabricatorStorageManagementDumpWorkflow.php
··· 42 42 43 43 list($host, $port) = $this->getBareHostAndPort($api->getHost()); 44 44 45 + $has_password = false; 46 + 45 47 $password = $api->getPassword(); 46 48 if ($password) { 47 49 if (strlen($password->openEnvelope())) {
+3 -2
src/view/page/PhabricatorStandardPageView.php
··· 539 539 540 540 if ($servers) { 541 541 if ($user && $user->isLoggedIn()) { 542 - // TODO: We could be smarter about selecting a server if there are 543 - // multiple options available. 542 + // TODO: We could tell the browser about all the servers and let it 543 + // do random reconnects to improve reliability. 544 + shuffle($servers); 544 545 $server = head($servers); 545 546 546 547 $client_uri = $server->getWebsocketURI();
+21 -1
support/aphlict/server/aphlict_server.js
··· 81 81 82 82 require('./lib/AphlictAdminServer'); 83 83 require('./lib/AphlictClientServer'); 84 - 84 + require('./lib/AphlictPeerList'); 85 + require('./lib/AphlictPeer'); 85 86 86 87 var ii; 87 88 ··· 173 174 } 174 175 } 175 176 177 + var peer_list = new JX.AphlictPeerList(); 178 + 179 + debug.log( 180 + 'This server has fingerprint "%s".', 181 + peer_list.getFingerprint()); 182 + 183 + var cluster = config.cluster || []; 184 + for (ii = 0; ii < cluster.length; ii++) { 185 + var peer = cluster[ii]; 186 + 187 + var peer_client = new JX.AphlictPeer() 188 + .setHost(peer.host) 189 + .setPort(peer.port) 190 + .setProtocol(peer.protocol); 191 + 192 + peer_list.addPeer(peer_client); 193 + } 194 + 176 195 for (ii = 0; ii < aphlict_admins.length; ii++) { 177 196 var admin_server = aphlict_admins[ii]; 178 197 admin_server.setClientServers(aphlict_clients); 198 + admin_server.setPeerList(peer_list); 179 199 }
+26 -8
support/aphlict/server/lib/AphlictAdminServer.js
··· 22 22 properties: { 23 23 clientServers: null, 24 24 logger: null, 25 + peerList: null 25 26 }, 26 27 27 28 members: { ··· 79 80 ++self._messagesIn; 80 81 81 82 try { 82 - self._transmit(instance, msg); 83 - response.writeHead(200, {'Content-Type': 'text/plain'}); 83 + self._transmit(instance, msg, response); 84 84 } catch (err) { 85 85 self.log( 86 86 '<%s> Internal Server Error! %s', ··· 139 139 /** 140 140 * Transmits a message to all subscribed listeners. 141 141 */ 142 - _transmit: function(instance, message) { 143 - var lists = this.getListenerLists(instance); 142 + _transmit: function(instance, message, response) { 143 + var peer_list = this.getPeerList(); 144 + 145 + message = peer_list.addFingerprint(message); 146 + if (message) { 147 + var lists = this.getListenerLists(instance); 144 148 145 - for (var ii = 0; ii < lists.length; ii++) { 146 - var list = lists[ii]; 147 - var listeners = list.getListeners(); 148 - this._transmitToListeners(list, listeners, message); 149 + for (var ii = 0; ii < lists.length; ii++) { 150 + var list = lists[ii]; 151 + var listeners = list.getListeners(); 152 + this._transmitToListeners(list, listeners, message); 153 + } 154 + 155 + peer_list.broadcastMessage(instance, message); 149 156 } 157 + 158 + // Respond to the caller with our fingerprint so it can stop sending 159 + // us traffic we don't need to know about if it's a peer. In particular, 160 + // this stops us from broadcasting messages to ourselves if we appear 161 + // in the cluster list. 162 + var receipt = { 163 + fingerprint: this.getPeerList().getFingerprint() 164 + }; 165 + 166 + response.writeHead(200, {'Content-Type': 'application/json'}); 167 + response.write(JSON.stringify(receipt)); 150 168 }, 151 169 152 170 _transmitToListeners: function(list, listeners, message) {
+80
support/aphlict/server/lib/AphlictPeer.js
··· 1 + 'use strict'; 2 + 3 + var JX = require('./javelin').JX; 4 + 5 + var http = require('http'); 6 + var https = require('https'); 7 + 8 + JX.install('AphlictPeer', { 9 + 10 + construct: function() { 11 + }, 12 + 13 + properties: { 14 + host: null, 15 + port: null, 16 + protocol: null, 17 + fingerprint: null 18 + }, 19 + 20 + members: { 21 + broadcastMessage: function(instance, message) { 22 + var data; 23 + try { 24 + data = JSON.stringify(message); 25 + } catch (error) { 26 + return; 27 + } 28 + 29 + // TODO: Maybe use "agent" stuff to pool connections? 30 + 31 + var options = { 32 + hostname: this.getHost(), 33 + port: this.getPort(), 34 + method: 'POST', 35 + path: '/?instance=' + instance, 36 + headers: { 37 + 'Content-Type': 'application/json', 38 + 'Content-Length': data.length 39 + } 40 + }; 41 + 42 + var onresponse = JX.bind(this, this._onresponse); 43 + 44 + var request; 45 + if (this.getProtocol() == 'https') { 46 + request = https.request(options, onresponse); 47 + } else { 48 + request = http.request(options, onresponse); 49 + } 50 + 51 + request.write(data); 52 + request.end(); 53 + }, 54 + 55 + _onresponse: function(response) { 56 + var peer = this; 57 + var data = ''; 58 + 59 + response.on('data', function(bytes) { 60 + data += bytes; 61 + }); 62 + 63 + response.on('end', function() { 64 + var message; 65 + try { 66 + message = JSON.parse(data); 67 + } catch (error) { 68 + return; 69 + } 70 + 71 + // If we got a valid receipt, update the fingerprint for this server. 72 + var fingerprint = message.fingerprint; 73 + if (fingerprint) { 74 + peer.setFingerprint(fingerprint); 75 + } 76 + }); 77 + } 78 + } 79 + 80 + });
+86
support/aphlict/server/lib/AphlictPeerList.js
··· 1 + 'use strict'; 2 + 3 + var JX = require('./javelin').JX; 4 + 5 + JX.install('AphlictPeerList', { 6 + 7 + construct: function() { 8 + this._peers = []; 9 + 10 + // Generate a new unique identify for this server. We just use this to 11 + // identify messages we have already seen and figure out which peer is 12 + // actually us, so we don't bounce messages around the cluster forever. 13 + this._fingerprint = this._generateFingerprint(); 14 + }, 15 + 16 + properties: { 17 + }, 18 + 19 + members: { 20 + _peers: null, 21 + _fingerprint: null, 22 + 23 + addPeer: function(peer) { 24 + this._peers.push(peer); 25 + return this; 26 + }, 27 + 28 + addFingerprint: function(message) { 29 + var fingerprint = this.getFingerprint(); 30 + 31 + // Check if we've already touched this message. If we have, we do not 32 + // broadcast it again. If we haven't, we add our fingerprint and then 33 + // broadcast the modified version. 34 + var touched = message.touched || []; 35 + for (var ii = 0; ii < touched.length; ii++) { 36 + if (touched[ii] == fingerprint) { 37 + return null; 38 + } 39 + } 40 + touched.push(fingerprint); 41 + 42 + message.touched = touched; 43 + return message; 44 + }, 45 + 46 + broadcastMessage: function(instance, message) { 47 + var ii; 48 + 49 + var touches = {}; 50 + var touched = message.touched; 51 + for (ii = 0; ii < touched.length; ii++) { 52 + touches[touched[ii]] = true; 53 + } 54 + 55 + var peers = this._peers; 56 + for (ii = 0; ii < peers.length; ii++) { 57 + var peer = peers[ii]; 58 + 59 + // If we know the peer's fingerprint and it has already touched 60 + // this message, don't broadcast it. 61 + var fingerprint = peer.getFingerprint(); 62 + if (fingerprint && touches[fingerprint]) { 63 + continue; 64 + } 65 + 66 + peer.broadcastMessage(instance, message); 67 + } 68 + }, 69 + 70 + getFingerprint: function() { 71 + return this._fingerprint; 72 + }, 73 + 74 + _generateFingerprint: function() { 75 + var src = '23456789abcdefghjkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ'; 76 + var len = 16; 77 + var out = []; 78 + for (var ii = 0; ii < len; ii++) { 79 + var idx = Math.floor(Math.random() * src.length); 80 + out.push(src[idx]); 81 + } 82 + return out.join(''); 83 + } 84 + } 85 + 86 + });