@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Remove broken and unfixable "prefix" ngram behavior

Summary:
Ref T13501. The older ngram code has some "prefix" behavior that tries to handle cases where a user issues a very short (one or two character) query.

This code doesn't work, presumably never worked, and can not be made to work (or, at least, I don't see a way, and am fairly sure one does not exist).

If the user searches for "xy", we can find trigrams in the form "xy*" using the index, but not in the form "*xy". The code makes a misguided effort to look for " xy", but this will only find "xy" in words that begin with "xy", like "xylophone".

For example, searching Files for "om" does not currently find "random.txt".

Remove this behavior. Without engaging the trigram index, these queries fall back to an unidexed "LIKE" table scan, but that's about the best we can do.

Test Plan: Searched for "om", hit "random.txt".

Maniphest Tasks: T13501

Differential Revision: https://secure.phabricator.com/D21127

+20 -37
-3
src/applications/search/ngrams/PhabricatorSearchNgrams.php
··· 63 63 case 'index': 64 64 $token = ' '.$token.' '; 65 65 break; 66 - case 'prefix': 67 - $token = ' '.$token; 68 - break; 69 66 } 70 67 71 68 $len = (strlen($token) - 2);
+20 -34
src/infrastructure/query/policy/PhabricatorCursorPagedPolicyAwareQuery.php
··· 2411 2411 protected function buildNgramsJoinClause(AphrontDatabaseConnection $conn) { 2412 2412 $flat = array(); 2413 2413 foreach ($this->ngrams as $spec) { 2414 + $length = $spec['length']; 2415 + 2416 + if ($length < 3) { 2417 + continue; 2418 + } 2419 + 2414 2420 $index = $spec['index']; 2415 2421 $value = $spec['value']; 2416 - $length = $spec['length']; 2417 2422 2418 - if ($length >= 3) { 2419 - $ngrams = $index->getNgramsFromString($value, 'query'); 2420 - $prefix = false; 2421 - } else if ($length == 2) { 2422 - $ngrams = $index->getNgramsFromString($value, 'prefix'); 2423 - $prefix = false; 2424 - } else { 2425 - $ngrams = array(' '.$value); 2426 - $prefix = true; 2427 - } 2423 + $ngrams = $index->getNgramsFromString($value, 'query'); 2428 2424 2429 2425 foreach ($ngrams as $ngram) { 2430 2426 $flat[] = array( 2431 2427 'table' => $index->getTableName(), 2432 2428 'ngram' => $ngram, 2433 - 'prefix' => $prefix, 2434 2429 ); 2435 2430 } 2431 + } 2432 + 2433 + if (!$flat) { 2434 + return array(); 2436 2435 } 2437 2436 2438 2437 // MySQL only allows us to join a maximum of 61 tables per query. Each ··· 2456 2455 foreach ($flat as $spec) { 2457 2456 $table = $spec['table']; 2458 2457 $ngram = $spec['ngram']; 2459 - $prefix = $spec['prefix']; 2460 2458 2461 2459 $alias = 'ngm'.$idx++; 2462 2460 2463 - if ($prefix) { 2464 - $joins[] = qsprintf( 2465 - $conn, 2466 - 'JOIN %T %T ON %T.objectID = %Q AND %T.ngram LIKE %>', 2467 - $table, 2468 - $alias, 2469 - $alias, 2470 - $id_column, 2471 - $alias, 2472 - $ngram); 2473 - } else { 2474 - $joins[] = qsprintf( 2475 - $conn, 2476 - 'JOIN %T %T ON %T.objectID = %Q AND %T.ngram = %s', 2477 - $table, 2478 - $alias, 2479 - $alias, 2480 - $id_column, 2481 - $alias, 2482 - $ngram); 2483 - } 2461 + $joins[] = qsprintf( 2462 + $conn, 2463 + 'JOIN %T %T ON %T.objectID = %Q AND %T.ngram = %s', 2464 + $table, 2465 + $alias, 2466 + $alias, 2467 + $id_column, 2468 + $alias, 2469 + $ngram); 2484 2470 } 2485 2471 2486 2472 return $joins;