@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Split Ferret engine strings for tokenization on any sequence of whitespace

Summary:
Ref T12819. Currently, strings are split only on spaces, but newlines (and, if they exist, tabs) should also split strings.

Without this, we can fail to get the proper term boundary tokens for words which begin at the start of a line or end at the end of a line.

Test Plan: Reindexed a document with "xyz\nabc", saw `"yz "` and `" ab"` term boundary tokens generate properly.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T12819

Differential Revision: https://secure.phabricator.com/D18579

+1 -1
+1 -1
src/applications/search/ferret/PhabricatorFerretEngine.php
··· 75 75 76 76 public function tokenizeString($value) { 77 77 $value = trim($value, ' '); 78 - $value = preg_split('/ +/', $value); 78 + $value = preg_split('/\s+/u', $value); 79 79 return $value; 80 80 } 81 81