@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

When acquiring a GlobalLock, put good connections that just got unlucky back in the pool

Summary:
See PHI1794, which describes a connection exhaustion issue with a large number of webhook tasks in queue.

The "GlobalLock" mechanism manages a separate connection pool from the main pool, and webhook workers immediately try to grab a webhook lock with a 0-second wait when they start. So far, this is fine.

Prior to this change, good connections which fail to acqiure a lock are discarded. This can lead to connection exhaustion as the worker rapidly cycles through lock attempts: the connections will remain open for at least 60 seconds (since D16389) in an effort to avoid outbound port exhaustion, but they're effectively orphaned because they aren't part of the main pool and aren't part of the lock pool. We're basically leaking a connection every time we fail to lock.

Failing to lock doesn't mean we need to discard the connection: it's a completely suitable connection for reuse. Instead of dropping it on the floor, put it into the lock pool.

Test Plan:
- Used "bin/webhook call ... --count 10000 --background" to queue a large number of webhook calls against a slow ("sleep(15);") webhook.
- Used "bin/phd launch 32 taskmaster" to start taskmasters.
- Observed MySQL connection behavior:
- Before change: 2048 configured connections immediately exhausted.
- After change: connections stable at ~160ish.
- Ran queue for a while, saw expected single-threaded calls to webhook.

Differential Revision: https://secure.phabricator.com/D21369

+12
+12
src/infrastructure/util/PhabricatorGlobalLock.php
··· 144 144 145 145 $ok = head($result); 146 146 if (!$ok) { 147 + 148 + // See PHI1794. We failed to acquire the lock, but the connection itself 149 + // is still good. We're done with it, so add it to the pool, just as we 150 + // would if we were releasing the lock. 151 + 152 + // If we don't do this, we may establish a huge number of connections 153 + // very rapidly if many workers try to acquire a lock at once. For 154 + // example, this can happen if there are a large number of webhook tasks 155 + // in the queue. 156 + 157 + self::$pool[] = $conn; 158 + 147 159 throw id(new PhutilLockException($lock_name)) 148 160 ->setHint($this->newHint($lock_name, $wait)); 149 161 }