@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Address some New Search Configuration Errata

Summary:
[ ] Write an "Upgrading: ..." guidance task with narrow instructions for installs that are upgrading.
[ ] Do we need to add an indexing activity (T11932) for installs with ElasticSearch?
[ ] We should more clearly detail exactly which versions of ElasticSearch are supported (for example, is ElasticSearch <2 no longer supported)? From T9893 it seems like we may //only// have supported ElasticSearch <2 before, so are the two regions of support totally nonoverlapping and all ElasticSearch users will need to upgrade?
[ ] Documentation should provide stronger guidance toward MySQL and away from Elastic for the vast majority of installs, because we've historically seen users choosing Elastic when they aren't actually trying to solve any specific problem.
[ ] When users search for fulltext results in Maniphest and hit too many documents, the current behavior is approximately silent failure (see T12443). D17384 has also lowered the ceiling for ElasticSearch, although previous changes lowered it for MySQL search. We should not fail silently, and ideally should build toward T12003.
[ ] D17384 added a new "keywords" field, but MySQL does not search it (I think?). The behavior should be as consistent across MySQL and Elastic as we can make it. Likely cleaner is giving "Project" objects a body, with "slugs" and "description" separated by newlines?
[ ] `PhabricatorSearchEngineTestCase` is now pointless and only detects local misconfigurations.
[ ] It would be nice to build a practical test suite instead, where we put specific documents into the index and then search for them. The upstream test could run against MySQL, and some `bin/search test` could run against a configured engine like ElasticSearch. This would make it easier to make sure that behavior was as uniform as possible across engine implementations.
[ ] Does every assigned task now match "user" in ElasticSearch?
[x] `PhabricatorElasticFulltextStorageEngine` has a `json_encode()` which should be `phutil_json_encode()`.
[ ] `PhabricatorSearchService` throws an untranslated exception.
[ ] When a search cluster is down, we probably don't degrade with much grace (unhandled exception)?
[ ] I haven't run bin/search init, but bin/search index doesn't warn me that I may want to. This might be worth adding. The UI does warn me.
[ ] bin/search init warns me that the index is "incorrect". It might be more clear to distinguish between "missing" and "incorrect", since it's more comforting to users to see "everything is as we expect, doing normal first-time setup now" than "something is wrong, fixing it".
[ ] CLI message "Initializing search service "ElasticSearch"" does not end with a period, which is inconsistent with other UI messages.
[ ] It might be nice to let bin/search commands like init and index select a specific service (or even service + host) to act on, as bin/storage --ref ... now does. You can generally get the result you want by fiddling with config.
[ ] When a service isn't writable, bin/search init reports "Search cluster has no hosts for role "write".". This is accurate but does not provide guidance: it might be more useful to the user to explain "This service is not writable, so we're skipping index check for it.".
[x] Even with write off for MySQL, bin/search index --type task --trace still updates MySQL, I think? I may be misreading the trace output. But this behavior doesn't make sense if it is the actual behavior, and it seems like reindexAbstractDocument() uses "all services", not "writable services", and the MySQL engine doesn't make sure it's writable before indexing.
[x] Searching or user fails to find task Grant users tokens when a mention is created, suggesting that stemming is not working.
[x] Searching for users finds that task, but fails to find a task containing "per user per month" in a comment, also suggesting that stemming is not working.
[x] Searching for maniphest fails to find task maniphest.query elephant, suggesting that tokenization in ElasticSearch is not as good as the MySQL tokenization for these words (see D17330).
[x] The "index incorrect" warning UI uses inconsistent title case.
[x] The "index incorrect" warning UI could format the command to be run more cleanly (with addCommand(), I think).

refs T12450

Test Plan:
* Stared blankly at the code.
* Disabled 'write' role on mysql fulltext service.
* Edited a task, ran search indexer, verified that the mysql index wasn't being updated.

Reviewers: epriestley, #blessed_reviewers

Reviewed By: epriestley, #blessed_reviewers

Subscribers: Korvin

Maniphest Tasks: T12450

Differential Revision: https://secure.phabricator.com/D17564

authored by

Mukunda Modell and committed by
20after4
699228c7 2fbc9a52

+80 -65
+9 -7
src/applications/config/check/PhabricatorElasticSearchSetupCheck.php
··· 49 49 50 50 $message = pht( 51 51 'You likely enabled cluster.search without creating the '. 52 - 'index. Run `./bin/search init` to correct the index.'); 52 + 'index. Use the following command to create a new index.'); 53 53 54 54 $this 55 55 ->newIssue('elastic.missing-index') 56 - ->setName(pht('Elasticsearch index Not Found')) 56 + ->setName(pht('Elasticsearch Index Not Found')) 57 + ->addCommand('./bin/search init') 57 58 ->setSummary($summary) 58 - ->setMessage($message) 59 - ->addRelatedPhabricatorConfig('cluster.search'); 59 + ->setMessage($message); 60 + 60 61 } else if (!$index_sane) { 61 62 $summary = pht( 62 63 'Elasticsearch index exists but needs correction.'); 63 64 64 65 $message = pht( 65 66 'Either the Phabricator schema for Elasticsearch has changed '. 66 - 'or Elasticsearch created the index automatically. Run '. 67 - '`./bin/search init` to correct the index.'); 67 + 'or Elasticsearch created the index automatically. '. 68 + 'Use the following command to rebuild the index.'); 68 69 69 70 $this 70 71 ->newIssue('elastic.broken-index') 71 - ->setName(pht('Elasticsearch index Incorrect')) 72 + ->setName(pht('Elasticsearch Index Schema Mismatch')) 73 + ->addCommand('./bin/search init') 72 74 ->setSummary($summary) 73 75 ->setMessage($message); 74 76 }
+49 -6
src/applications/search/fulltextstorage/PhabricatorElasticFulltextStorageEngine.php
··· 161 161 'simple_query_string' => array( 162 162 'query' => $query_string, 163 163 'fields' => array( 164 - '_all', 164 + PhabricatorSearchDocumentFieldType::FIELD_TITLE.'.*', 165 + PhabricatorSearchDocumentFieldType::FIELD_BODY.'.*', 166 + PhabricatorSearchDocumentFieldType::FIELD_COMMENT.'.*', 165 167 ), 166 - 'default_operator' => 'OR', 168 + 'default_operator' => 'AND', 167 169 ), 168 170 )); 169 171 ··· 175 177 'simple_query_string' => array( 176 178 'query' => $query_string, 177 179 'fields' => array( 180 + '*.raw', 178 181 PhabricatorSearchDocumentFieldType::FIELD_TITLE.'^4', 179 182 PhabricatorSearchDocumentFieldType::FIELD_BODY.'^3', 180 183 PhabricatorSearchDocumentFieldType::FIELD_COMMENT.'^1.2', ··· 332 335 'index' => array( 333 336 'auto_expand_replicas' => '0-2', 334 337 'analysis' => array( 338 + 'filter' => array( 339 + 'english_stop' => array( 340 + 'type' => 'stop', 341 + 'stopwords' => '_english_', 342 + ), 343 + 'english_stemmer' => array( 344 + 'type' => 'stemmer', 345 + 'language' => 'english', 346 + ), 347 + 'english_possessive_stemmer' => array( 348 + 'type' => 'stemmer', 349 + 'language' => 'possessive_english', 350 + ), 351 + ), 335 352 'analyzer' => array( 336 353 'english_exact' => array( 337 354 'tokenizer' => 'standard', 338 355 'filter' => array('lowercase'), 339 356 ), 357 + 'letter_stop' => array( 358 + 'tokenizer' => 'letter', 359 + 'filter' => array('lowercase', 'english_stop'), 360 + ), 361 + 'english_stem' => array( 362 + 'tokenizer' => 'standard', 363 + 'filter' => array( 364 + 'english_possessive_stemmer', 365 + 'lowercase', 366 + 'english_stop', 367 + 'english_stemmer', 368 + ), 369 + ), 340 370 ), 341 371 ), 342 372 ), ··· 356 386 // Use the custom analyzer for the corpus of text 357 387 $properties[$field] = array( 358 388 'type' => $text_type, 359 - 'analyzer' => 'english_exact', 360 - 'search_analyzer' => 'english', 361 - 'search_quote_analyzer' => 'english_exact', 389 + 'fields' => array( 390 + 'raw' => array( 391 + 'type' => $text_type, 392 + 'analyzer' => 'english_exact', 393 + 'search_analyzer' => 'english', 394 + 'search_quote_analyzer' => 'english_exact', 395 + ), 396 + 'keywords' => array( 397 + 'type' => $text_type, 398 + 'analyzer' => 'letter_stop', 399 + ), 400 + 'stems' => array( 401 + 'type' => $text_type, 402 + 'analyzer' => 'english_stem', 403 + ), 404 + ), 362 405 ); 363 406 } 364 407 ··· 505 548 array $data, $method = 'GET') { 506 549 507 550 $uri = $host->getURI($path); 508 - $data = json_encode($data); 551 + $data = phutil_json_encode($data); 509 552 $future = new HTTPSFuture($uri, $data); 510 553 if ($method != 'GET') { 511 554 $future->setMethod($method);
+9
src/infrastructure/cluster/search/PhabricatorMySQLSearchHost.php
··· 24 24 return 'mysql'; 25 25 } 26 26 27 + public function getHealthRecord() { 28 + if (!$this->healthRecord) { 29 + $ref = PhabricatorDatabaseRef::getMasterDatabaseRefForApplication( 30 + 'search'); 31 + $this->healthRecord = $ref->getHealthRecord(); 32 + } 33 + return $this->healthRecord; 34 + } 35 + 27 36 public function getConnectionStatus() { 28 37 PhabricatorDatabaseRef::queryAll(); 29 38 $ref = PhabricatorDatabaseRef::getMasterDatabaseRefForApplication('search');
-40
src/infrastructure/cluster/search/PhabricatorSearchHost.php
··· 13 13 protected $disabled; 14 14 protected $host; 15 15 protected $port; 16 - protected $hostRefs = array(); 17 16 18 17 const STATUS_OKAY = 'okay'; 19 18 const STATUS_FAIL = 'fail'; ··· 120 119 abstract public function getStatusViewColumns(); 121 120 122 121 abstract public function getConnectionStatus(); 123 - 124 - public static function reindexAbstractDocument( 125 - PhabricatorSearchAbstractDocument $doc) { 126 - 127 - $services = self::getAllServices(); 128 - $indexed = 0; 129 - foreach (self::getWritableHostForEachService() as $host) { 130 - $host->getEngine()->reindexAbstractDocument($doc); 131 - $indexed++; 132 - } 133 - if ($indexed == 0) { 134 - throw new PhabricatorClusterNoHostForRoleException('write'); 135 - } 136 - } 137 - 138 - public static function executeSearch(PhabricatorSavedQuery $query) { 139 - $services = self::getAllServices(); 140 - foreach ($services as $service) { 141 - $hosts = $service->getAllHostsForRole('read'); 142 - // try all hosts until one succeeds 143 - foreach ($hosts as $host) { 144 - $last_exception = null; 145 - try { 146 - $res = $host->getEngine()->executeSearch($query); 147 - // return immediately if we get results without an exception 148 - $host->didHealthCheck(true); 149 - return $res; 150 - } catch (Exception $ex) { 151 - // try each server in turn, only throw if none succeed 152 - $last_exception = $ex; 153 - $host->didHealthCheck(false); 154 - } 155 - } 156 - } 157 - if ($last_exception) { 158 - throw $last_exception; 159 - } 160 - return $res; 161 - } 162 122 163 123 }
+13 -12
src/infrastructure/cluster/search/PhabricatorSearchService.php
··· 46 46 47 47 public function setConfig($config) { 48 48 $this->config = $config; 49 + $this->setRoles(idx($config, 'roles', array())); 49 50 50 51 if (!isset($config['hosts'])) { 51 52 $config['hosts'] = array( ··· 67 68 return $this->config; 68 69 } 69 70 70 - public function setDisabled($disabled) { 71 - $this->disabled = $disabled; 72 - return $this; 73 - } 74 - 75 - public function getDisabled() { 76 - return $this->disabled; 77 - } 78 - 79 71 public static function getConnectionStatusMap() { 80 72 return array( 81 73 self::STATUS_OKAY => array( ··· 100 92 } 101 93 102 94 public function hasRole($role) { 103 - return isset($this->roles[$role]) && $this->roles[$role] === true; 95 + return isset($this->roles[$role]) && $this->roles[$role] !== false; 104 96 } 105 97 106 98 public function setRoles(array $roles) { ··· 160 152 * @return PhabricatorSearchHost[] 161 153 */ 162 154 public function getAllHostsForRole($role) { 155 + // if the role is explicitly set to false at the top level, then all hosts 156 + // have the role disabled. 157 + if (idx($this->config, $role) === false) { 158 + return array(); 159 + } 160 + 163 161 $hosts = array(); 164 162 foreach ($this->hosts as $host) { 165 163 if ($host->hasRole($role)) { ··· 225 223 PhabricatorSearchAbstractDocument $doc) { 226 224 $indexed = 0; 227 225 foreach (self::getAllServices() as $service) { 228 - $service->getEngine()->reindexAbstractDocument($doc); 229 - $indexed++; 226 + $hosts = $service->getAllHostsForRole('write'); 227 + if (count($hosts)) { 228 + $service->getEngine()->reindexAbstractDocument($doc); 229 + $indexed++; 230 + } 230 231 } 231 232 if ($indexed == 0) { 232 233 throw new PhabricatorClusterNoHostForRoleException('write');