@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Stem fulltext tokens before filtering them for stopwords

Summary:
Fixes T12596. A query for a token (like "having") which stems to a stopword (like "have") currently survives filtering. Stem it first so it gets caught.

Also, for InnoDB, a custom stopword table can be configured. If it is, read that instead of the default stopword list (I configured it locally, but the default list is reasonable so we never formally recommended installs configure it).

Test Plan:
Queried for words that stem to stopwords, saw them filtered:

{F4915843}

Queried for the original problem query and saw "having" caught with "have" in the stopword list:

{F4915844}

Fiddled with local InnoDB stopword table config and saw the stopword list get loaded correctly.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12596

Differential Revision: https://secure.phabricator.com/D17728

+25 -2
+25 -2
src/applications/search/fulltextstorage/PhabricatorMySQLFulltextStorageEngine.php
··· 228 228 $fulltext_tokens[$key] = $fulltext_token; 229 229 230 230 $value = $token->getValue(); 231 + 232 + // If the value is unquoted, we'll stem it in the query, so stem it 233 + // here before performing filtering tests. See T12596. 234 + if (!$token->isQuoted()) { 235 + $value = $stemmer->stemToken($value); 236 + } 237 + 231 238 if (phutil_utf8_strlen($value) < $min_length) { 232 239 $fulltext_token->setIsShort(true); 233 240 continue; ··· 479 486 try { 480 487 $result = queryfx_one( 481 488 $conn, 482 - 'SELECT @@innodb_ft_min_token_size innodb_max'); 489 + 'SELECT @@innodb_ft_min_token_size innodb_max, 490 + @@innodb_ft_server_stopword_table innodb_stopword_config'); 483 491 } catch (AphrontQueryException $ex) { 484 492 $result = null; 485 493 } 486 494 487 495 if ($result) { 488 496 $min_len = $result['innodb_max']; 497 + 498 + $stopword_config = $result['innodb_stopword_config']; 499 + if (preg_match('(/)', $stopword_config)) { 500 + // If the setting is nonempty and contains a slash, query the 501 + // table the user has configured. 502 + $parts = explode('/', $stopword_config); 503 + list($stopword_database, $stopword_table) = $parts; 504 + } else { 505 + // Otherwise, query the InnoDB default stopword table. 506 + $stopword_database = 'INFORMATION_SCHEMA'; 507 + $stopword_table = 'INNODB_FT_DEFAULT_STOPWORD'; 508 + } 509 + 489 510 $stopwords = queryfx_all( 490 511 $conn, 491 - 'SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD'); 512 + 'SELECT * FROM %T.%T', 513 + $stopword_database, 514 + $stopword_table); 492 515 $stopwords = ipull($stopwords, 'value'); 493 516 $stopwords = array_fuse($stopwords); 494 517