Use "git ls-remote" to guess if "git fetch" is a no-op

Summary:
Ref T12296. Ref T12392. Currently, when we're observing a remote repository, we periodically run `git fetch ...`.

Instead, periodically run `git ls-remote` (to list refs in the remote) and `git for-each-ref` (to list local refs) and only continue if the two lists are different.

The motivations for this are:

- In T12296, it appears that doing this is //faster// than doing a no-op `git fetch`. This effect seems to reproduce locally in a clean environment (900ms for `ls-remote` + 100ms for `for-each-ref` vs about 1.4s for `fetch`). I don't have any explanation for why this is, but there it is. This isn't a huge change, although the time we're saving does appear to mostly be local CPU time, which is good for us.
- Because we control all writes, we could cache `git for-each-ref` in the future and do fewer disk operations. This doesn't necessarily seem too valuable, though.
- This allows us to tell if a fetch will do anything or not, and make better decisions around clustering (in particular, simplify how observed repository versioning works). With `git fetch`, we can't easily distinguish between "fetch, but nothing changed" and "legitimate fetch".

If a repository updates very regularly we end up doing slightly more work this way (that is, if `ls-remote` always comes back with changes, we do a little extra work), but this is normally very rare.

This might not get non-bare repositories quite right in some cases (i.e., incorrectly detect them as changed when they are unchanged) but we haven't created non-bare repositories for many years.

Test Plan: Ran `bin/repository update --trace --verbose PHABX`, saw sensible construction of local and remote maps and accurate detection of whether a fetch would do anything or not.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12392, T12296

Differential Revision: https://secure.phabricator.com/D17497

epriestley 9 years ago 2b0ad243 aef2a39a

+85 -3

1 changed file

expand all

src

applications

repository

engine

PhabricatorRepositoryPullEngine.php

+85 -3

src/applications/repository/engine/PhabricatorRepositoryPullEngine.php

··· 120 120 pht( 121 121 'Updating the working copy for repository "%s".', 122 122 $repository->getDisplayName())); 123 + 123 124 if ($is_git) { 124 125 $this->verifyGitOrigin($repository); 125 126 $this->executeGitUpdate(); ··· 157 158 } 158 159 159 160 private function skipPull($message) { 160 - $this->log('%s', $message); 161 + $this->log($message); 161 162 $this->donePull(); 162 163 } 163 164 ··· 172 173 } 173 174 174 175 private function logPull($message) { 175 - $this->log('%s', $message); 176 + $this->log($message); 176 177 } 177 178 178 179 private function donePull() { ··· 190 191 } 191 192 192 193 private function installHook($path, array $hook_argv = array()) { 193 - $this->log('%s', pht('Installing commit hook to "%s"...', $path)); 194 + $this->log(pht('Installing commit hook to "%s"...', $path)); 194 195 195 196 $repository = $this->getRepository(); 196 197 $identifier = $this->getHookContextIdentifier($repository); ··· 339 340 throw new Exception($message); 340 341 } 341 342 343 + $remote_refs = $this->loadGitRemoteRefs($repository); 344 + $local_refs = $this->loadGitLocalRefs($repository); 345 + if ($remote_refs === $local_refs) { 346 + $this->log( 347 + pht( 348 + 'Skipping fetch because local and remote refs are already '. 349 + 'identical.')); 350 + return false; 351 + } 352 + 353 + $this->logRefDifferences($remote_refs, $local_refs); 354 + 342 355 $retry = false; 343 356 do { 344 357 // This is a local command, but needs credentials. ··· 395 408 396 409 $this->installHook($root.$path); 397 410 } 411 + 412 + private function loadGitRemoteRefs(PhabricatorRepository $repository) { 413 + $remote_envelope = $repository->getRemoteURIEnvelope(); 414 + 415 + list($stdout) = $repository->execxRemoteCommand( 416 + 'ls-remote -- %P', 417 + $remote_envelope); 418 + 419 + $map = array(); 420 + $lines = phutil_split_lines($stdout, false); 421 + foreach ($lines as $line) { 422 + list($hash, $name) = preg_split('/\s+/', $line, 2); 423 + 424 + // If the remote has a HEAD, just ignore it. 425 + if ($name == 'HEAD') { 426 + continue; 427 + } 428 + 429 + // If the remote ref is itself a remote ref, ignore it. 430 + if (preg_match('(^refs/remotes/)', $name)) { 431 + continue; 432 + } 433 + 434 + $map[$name] = $hash; 435 + } 436 + 437 + ksort($map); 438 + 439 + return $map; 440 + } 441 + 442 + private function loadGitLocalRefs(PhabricatorRepository $repository) { 443 + $refs = id(new DiffusionLowLevelGitRefQuery()) 444 + ->setRepository($repository) 445 + ->execute(); 446 + 447 + $map = array(); 448 + foreach ($refs as $ref) { 449 + $fields = $ref->getRawFields(); 450 + $map[idx($fields, 'refname')] = $ref->getCommitIdentifier(); 451 + } 452 + 453 + ksort($map); 454 + 455 + return $map; 456 + } 457 + 458 + private function logRefDifferences(array $remote, array $local) { 459 + $all = $local + $remote; 460 + 461 + $differences = array(); 462 + foreach ($all as $key => $ignored) { 463 + $remote_ref = idx($remote, $key, pht('<null>')); 464 + $local_ref = idx($local, $key, pht('<null>')); 465 + if ($remote_ref !== $local_ref) { 466 + $differences[] = pht( 467 + '%s (remote: "%s", local: "%s")', 468 + $key, 469 + $remote_ref, 470 + $local_ref); 471 + } 472 + } 473 + 474 + $this->log( 475 + pht( 476 + "Updating repository after detecting ref differences:\n%s", 477 + implode("\n", $differences))); 478 + } 479 + 398 480 399 481 400 482 /* -( Pulling Mercurial Working Copies )----------------------------------- */

Configure Feed

Configure Feed