@recaptime-dev's working patches + fork for Phorge, a community fork of Phabricator. (Upstream dev and stable branches are at upstream/main and upstream/stable respectively.) hq.recaptime.dev/wiki/Phorge
phorge phabricator
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Separate internal and external Query Cursors more cleanly, to fix pagination against broken objects

Summary:
Ref T13259.

(NOTE) This is "infrastructure/guts" only and breaks some stuff in Query subclasses. I'll fix that stuff in a followup, it's just going to be a larger diff that's mostly mechanical.

When a user clicks "Next Page" on a tasks view and gets `?after=100`, we want to show them the next 100 //visible// tasks. It's possible that tasks 1-100 are visible, but tasks 101-788 are not, and the next visible task is 789.

We load task ID `100` first, to make sure they can actually see it: you aren't allowed to page based on objects you can't see. If we let you, you could use "order=title&after=100", plus creative retitling of tasks, to discover the title of task 100: create tasks named "A", "B", etc., and see which one is returned first "after" task 100. If it's "D", you know task 100 must start with "C".

Assume the user can see task 100. We run a query like `id > 100` to get the next 100 tasks.

However, it's possible that few (or none) of these tasks can be seen. If the next visible task is 789, none of the tasks in the next page of results will survive policy filtering.

So, for queries after the initial query, we need to be able to page based on tasks that the user can not see: we want to be able to issue `id > 100`, then `id > 200`, and so on, until we overheat or find a page of results (if 789-889 are visible, we'll make it there before overheating).

Currently, we do this in a not-so-great way:

- We pass the external cursor (`100`) directly to the subquery.
- We query for that object using `getPagingViewer()`, which is a piece of magic that returns the real viewer on the first page and the omnipotent viewer on the 2nd..nth page. This is very sketchy.
- The subquery builds paging values based on that object (`array('id' => 100)`).
- We turn the last result from the subquery back into an external cursor (`200`) and save it for the next time.

Note that the last step happens BEFORE policy (and other) filtering.

The problems with this are:

- The phantom-schrodinger's-omnipotent-viewer thing isn't explicity bad, but it's sketchy and generally not good. It feels like it could easily lead to a mistake or bug eventually.
- We issue an extra query each time we page results, to convert the external cursor back into a map (`100`, `200`, `300`, etc).
- In T13259, there's a new problem: this only works if the object is filtered out for policy reasons and the omnipotent viewer can still see it. It doesn't work if the object is filtered for some other reason.

To expand on the third point: in T13259, we hit a case where 100+ consecutive objects are broken (they point to a nonexistent `repositoryID`). These objects get filtered unconditionally. It doesn't matter if the viewer is omnipotent or not.

In that case: we set the next external cursor from the raw results (e.g., `200`). Then we try to load it (using the omnipotent viewer) to turn it into a map of values for paging. This fails because the object isn't loadable, even as the omnipotent viewer.

---

To fix this stuff, the new approach steps back a little bit. Primarily, I'm separating "external cursors" from "internal cursors".

An "External Cursor" is a string that we can pass in `?after=X` URIs. It generally identifies an object which the user can see.

An "Internal Cursor" is a raw result from `loadPage()`, i.e. before policy filtering. Usually, (but not always) this is a `LiskDAO` object that doesn't have anything attached yet and hasn't been policy filtered.

We now do this, broadly:

- Convert the external cursor to an internal cursor.
- Execute the query using internal cursors.
- If necessary, convert the last visible result back into an external cursor at the very end.

This fixes all the problems:

- Sketchy Omnipotent Viewer: We no longer ever use an omnipotent viewer. (We pick cursors out of the result set earlier, instead.)
- Too Many Queries: We only issue one query at the beginning, when going from "external" to "internal". This query is generally unavoidable since we need to make sure the viewer can see the object and that it's a real / legitimate object. We no longer have to query an extra time for each page.
- Total Failure on Invalid Objects: we now page directly with objects out of `loadPage()`, before any filtering, so we can page over invisible or invalid objects without issues.

This change switches us over to internal/external cursors, and makes simple cases (ID-based ordering) work correctly. It doesn't work for complex cases yet since subclasses don't know how to get paging values out of an internal cursor yet. I'll update those in a followup.

Test Plan: For now, poked around a bit. Some stuff is broken, but normal ID-based lists load correctly and page properly. See next diff for a more detailed test plan.

Reviewers: amckinley

Reviewed By: amckinley

Maniphest Tasks: T13259

Differential Revision: https://secure.phabricator.com/D20291

+221 -102
+2
src/__phutil_library_map__.php
··· 4195 4195 'PhabricatorPygmentSetupCheck' => 'applications/config/check/PhabricatorPygmentSetupCheck.php', 4196 4196 'PhabricatorQuery' => 'infrastructure/query/PhabricatorQuery.php', 4197 4197 'PhabricatorQueryConstraint' => 'infrastructure/query/constraint/PhabricatorQueryConstraint.php', 4198 + 'PhabricatorQueryCursor' => 'infrastructure/query/policy/PhabricatorQueryCursor.php', 4198 4199 'PhabricatorQueryIterator' => 'infrastructure/storage/lisk/PhabricatorQueryIterator.php', 4199 4200 'PhabricatorQueryOrderItem' => 'infrastructure/query/order/PhabricatorQueryOrderItem.php', 4200 4201 'PhabricatorQueryOrderTestCase' => 'infrastructure/query/order/__tests__/PhabricatorQueryOrderTestCase.php', ··· 10294 10295 'PhabricatorPygmentSetupCheck' => 'PhabricatorSetupCheck', 10295 10296 'PhabricatorQuery' => 'Phobject', 10296 10297 'PhabricatorQueryConstraint' => 'Phobject', 10298 + 'PhabricatorQueryCursor' => 'Phobject', 10297 10299 'PhabricatorQueryIterator' => 'PhutilBufferedIterator', 10298 10300 'PhabricatorQueryOrderItem' => 'Phobject', 10299 10301 'PhabricatorQueryOrderTestCase' => 'PhabricatorTestCase',
+195 -100
src/infrastructure/query/policy/PhabricatorCursorPagedPolicyAwareQuery.php
··· 4 4 * A query class which uses cursor-based paging. This paging is much more 5 5 * performant than offset-based paging in the presence of policy filtering. 6 6 * 7 + * @task cursors Query Cursors 7 8 * @task clauses Building Query Clauses 8 9 * @task appsearch Integration with ApplicationSearch 9 10 * @task customfield Integration with CustomField ··· 15 16 abstract class PhabricatorCursorPagedPolicyAwareQuery 16 17 extends PhabricatorPolicyAwareQuery { 17 18 18 - private $afterID; 19 - private $beforeID; 19 + private $externalCursorString; 20 + private $internalCursorObject; 21 + private $isQueryOrderReversed = false; 22 + 20 23 private $applicationSearchConstraints = array(); 21 24 private $internalPaging; 22 25 private $orderVector; ··· 33 36 private $ferretQuery; 34 37 private $ferretMetadata = array(); 35 38 36 - protected function getPageCursors(array $page) { 39 + /* -( Cursors )------------------------------------------------------------ */ 40 + 41 + protected function newExternalCursorStringForResult($object) { 42 + if (!($object instanceof LiskDAO)) { 43 + throw new Exception( 44 + pht( 45 + 'Expected to be passed a result object of class "LiskDAO" in '. 46 + '"newExternalCursorStringForResult()", actually passed "%s". '. 47 + 'Return storage objects from "loadPage()" or override '. 48 + '"newExternalCursorStringForResult()".', 49 + phutil_describe_type($object))); 50 + } 51 + 52 + return (string)$object->getID(); 53 + } 54 + 55 + protected function newInternalCursorFromExternalCursor($cursor) { 56 + return $this->newInternalCursorObjectFromID($cursor); 57 + } 58 + 59 + protected function newPagingMapFromCursorObject( 60 + PhabricatorQueryCursor $cursor, 61 + array $keys) { 62 + 63 + $object = $cursor->getObject(); 64 + 37 65 return array( 38 - $this->getResultCursor(head($page)), 39 - $this->getResultCursor(last($page)), 66 + 'id' => (int)$object->getID(), 40 67 ); 41 68 } 42 69 43 - protected function getResultCursor($object) { 44 - if (!is_object($object)) { 70 + final protected function newInternalCursorObjectFromID($id) { 71 + $viewer = $this->getViewer(); 72 + 73 + $query = newv(get_class($this), array()); 74 + 75 + $query 76 + ->setParentQuery($this) 77 + ->setViewer($viewer) 78 + ->withIDs(array((int)$id)); 79 + 80 + // We're copying our order vector to the subquery so that the subquery 81 + // knows it should generate any supplemental information required by the 82 + // ordering. 83 + 84 + // For example, Phriction documents may be ordered by title, but the title 85 + // isn't a column in the "document" table: the query must JOIN the 86 + // "content" table to perform the ordering. Passing the ordering to the 87 + // subquery tells it that we need it to do that JOIN and attach relevant 88 + // paging information to the internal cursor object. 89 + 90 + // We only expect to load a single result, so the actual result order does 91 + // not matter. We only want the internal cursor for that result to look 92 + // like a cursor this parent query would generate. 93 + $query->setOrderVector($this->getOrderVector()); 94 + 95 + // We're executing the subquery normally to make sure the viewer can 96 + // actually see the object, and that it's a completely valid object which 97 + // passes all filtering and policy checks. You aren't allowed to use an 98 + // object you can't see as a cursor, since this can leak information. 99 + $result = $query->executeOne(); 100 + if (!$result) { 101 + // TODO: Raise a more tailored exception here and make the UI a little 102 + // prettier? 45 103 throw new Exception( 46 104 pht( 47 - 'Expected object, got "%s".', 48 - gettype($object))); 105 + 'Cursor "%s" does not identify a valid object in query "%s".', 106 + $id, 107 + get_class($this))); 49 108 } 50 109 51 - return $object->getID(); 110 + // Now that we made sure the viewer can actually see the object the 111 + // external cursor identifies, return the internal cursor the query 112 + // generated as a side effect while loading the object. 113 + return $query->getInternalCursorObject(); 52 114 } 53 115 54 - protected function nextPage(array $page) { 55 - // See getPagingViewer() for a description of this flag. 56 - $this->internalPaging = true; 116 + final private function getExternalCursorStringForResult($object) { 117 + $cursor = $this->newExternalCursorStringForResult($object); 57 118 58 - if ($this->beforeID !== null) { 59 - $page = array_reverse($page, $preserve_keys = true); 60 - list($before, $after) = $this->getPageCursors($page); 61 - $this->beforeID = $before; 62 - } else { 63 - list($before, $after) = $this->getPageCursors($page); 64 - $this->afterID = $after; 119 + if (!is_string($cursor)) { 120 + throw new Exception( 121 + pht( 122 + 'Expected "newExternalCursorStringForResult()" in class "%s" to '. 123 + 'return a string, but got "%s".', 124 + get_class($this), 125 + phutil_describe_type($cursor))); 65 126 } 127 + 128 + return $cursor; 129 + } 130 + 131 + final private function getExternalCursorString() { 132 + return $this->externalCursorString; 66 133 } 67 134 68 - final public function setAfterID($object_id) { 69 - $this->afterID = $object_id; 135 + final private function setExternalCursorString($external_cursor) { 136 + $this->externalCursorString = $external_cursor; 70 137 return $this; 71 138 } 72 139 73 - final protected function getAfterID() { 74 - return $this->afterID; 140 + final private function getIsQueryOrderReversed() { 141 + return $this->isQueryOrderReversed; 142 + } 143 + 144 + final private function setIsQueryOrderReversed($is_reversed) { 145 + $this->isQueryOrderReversed = $is_reversed; 146 + return $this; 147 + } 148 + 149 + final private function getInternalCursorObject() { 150 + return $this->internalCursorObject; 75 151 } 76 152 77 - final public function setBeforeID($object_id) { 78 - $this->beforeID = $object_id; 153 + final private function setInternalCursorObject( 154 + PhabricatorQueryCursor $cursor) { 155 + $this->internalCursorObject = $cursor; 79 156 return $this; 80 157 } 81 158 82 - final protected function getBeforeID() { 83 - return $this->beforeID; 159 + final private function getInternalCursorFromExternalCursor( 160 + $cursor_string) { 161 + 162 + $cursor_object = $this->newInternalCursorFromExternalCursor($cursor_string); 163 + 164 + if (!($cursor_object instanceof PhabricatorQueryCursor)) { 165 + throw new Exception( 166 + pht( 167 + 'Expected "newInternalCursorFromExternalCursor()" to return an '. 168 + 'object of class "PhabricatorQueryCursor", but got "%s" (in '. 169 + 'class "%s").', 170 + phutil_describe_type($cursor_object), 171 + get_class($this))); 172 + } 173 + 174 + return $cursor_object; 175 + } 176 + 177 + final private function getPagingMapFromCursorObject( 178 + PhabricatorQueryCursor $cursor, 179 + array $keys) { 180 + 181 + $map = $this->newPagingMapFromCursorObject($cursor, $keys); 182 + 183 + if (!is_array($map)) { 184 + throw new Exception( 185 + pht( 186 + 'Expected "newPagingMapFromCursorObject()" to return a map of '. 187 + 'paging values, but got "%s" (in class "%s").', 188 + phutil_describe_type($map), 189 + get_class($this))); 190 + } 191 + 192 + foreach ($keys as $key) { 193 + if (!array_key_exists($key, $map)) { 194 + throw new Exception( 195 + pht( 196 + 'Map returned by "newPagingMapFromCursorObject()" in class "%s" '. 197 + 'omits required key "%s".', 198 + get_class($this), 199 + $key)); 200 + } 201 + } 202 + 203 + return $map; 204 + } 205 + 206 + final protected function nextPage(array $page) { 207 + if (!$page) { 208 + return; 209 + } 210 + 211 + $cursor = id(new PhabricatorQueryCursor()) 212 + ->setObject(last($page)); 213 + 214 + $this->setInternalCursorObject($cursor); 84 215 } 85 216 86 217 final public function getFerretMetadata() { ··· 218 349 } 219 350 220 351 final protected function didLoadResults(array $results) { 221 - if ($this->beforeID) { 352 + if ($this->getIsQueryOrderReversed()) { 222 353 $results = array_reverse($results, $preserve_keys = true); 223 354 } 224 355 ··· 230 361 231 362 $this->setLimit($limit + 1); 232 363 233 - if ($pager->getAfterID()) { 234 - $this->setAfterID($pager->getAfterID()); 364 + if (strlen($pager->getAfterID())) { 365 + $this->setExternalCursorString($pager->getAfterID()); 235 366 } else if ($pager->getBeforeID()) { 236 - $this->setBeforeID($pager->getBeforeID()); 367 + $this->setExternalCursorString($pager->getBeforeID()); 368 + $this->setIsQueryOrderReversed(true); 237 369 } 238 370 239 371 $results = $this->execute(); ··· 241 373 242 374 $sliced_results = $pager->sliceResults($results); 243 375 if ($sliced_results) { 244 - list($before, $after) = $this->getPageCursors($sliced_results); 376 + 377 + // If we have results, generate external-facing cursors from the visible 378 + // results. This stops us from leaking any internal details about objects 379 + // which we loaded but which were not visible to the viewer. 245 380 246 381 if ($pager->getBeforeID() || ($count > $limit)) { 247 - $pager->setNextPageID($after); 382 + $last_object = last($sliced_results); 383 + $cursor = $this->getExternalCursorStringForResult($last_object); 384 + $pager->setNextPageID($cursor); 248 385 } 249 386 250 387 if ($pager->getAfterID() || 251 388 ($pager->getBeforeID() && ($count > $limit))) { 252 - $pager->setPrevPageID($before); 389 + $head_object = head($sliced_results); 390 + $cursor = $this->getExternalCursorStringForResult($head_object); 391 + $pager->setPrevPageID($cursor); 253 392 } 254 393 } 255 394 ··· 423 562 $orderable = $this->getOrderableColumns(); 424 563 $vector = $this->getOrderVector(); 425 564 426 - if ($this->beforeID !== null) { 427 - $cursor = $this->beforeID; 428 - $reversed = true; 429 - } else if ($this->afterID !== null) { 430 - $cursor = $this->afterID; 431 - $reversed = false; 432 - } else { 433 - // No paging is being applied to this query so we do not need to 434 - // construct a paging clause. 565 + // If we don't have a cursor object yet, it means we're trying to load 566 + // the first result page. We may need to build a cursor object from the 567 + // external string, or we may not need a paging clause yet. 568 + $cursor_object = $this->getInternalCursorObject(); 569 + if (!$cursor_object) { 570 + $external_cursor = $this->getExternalCursorString(); 571 + if ($external_cursor !== null) { 572 + $cursor_object = $this->getInternalCursorFromExternalCursor( 573 + $external_cursor); 574 + } 575 + } 576 + 577 + // If we still don't have a cursor object, this is the first result page 578 + // and we aren't paging it. We don't need to build a paging clause. 579 + if (!$cursor_object) { 435 580 return qsprintf($conn, ''); 436 581 } 437 582 583 + $reversed = $this->getIsQueryOrderReversed(); 584 + 438 585 $keys = array(); 439 586 foreach ($vector as $order) { 440 587 $keys[] = $order->getOrderKey(); 441 588 } 442 589 443 - $value_map = $this->getPagingValueMap($cursor, $keys); 590 + $value_map = $this->getPagingMapFromCursorObject( 591 + $cursor_object, 592 + $keys); 444 593 445 594 $columns = array(); 446 595 foreach ($vector as $order) { 447 596 $key = $order->getOrderKey(); 448 597 449 - if (!array_key_exists($key, $value_map)) { 450 - throw new Exception( 451 - pht( 452 - 'Query "%s" failed to return a value from getPagingValueMap() '. 453 - 'for column "%s".', 454 - get_class($this), 455 - $key)); 456 - } 457 - 458 598 $column = $orderable[$key]; 459 599 $column['value'] = $value_map[$key]; 460 600 ··· 473 613 array( 474 614 'reversed' => $reversed, 475 615 )); 476 - } 477 - 478 - 479 - /** 480 - * @task paging 481 - */ 482 - protected function getPagingValueMap($cursor, array $keys) { 483 - return array( 484 - 'id' => $cursor, 485 - ); 486 - } 487 - 488 - 489 - /** 490 - * @task paging 491 - */ 492 - protected function loadCursorObject($cursor) { 493 - $query = newv(get_class($this), array()) 494 - ->setViewer($this->getPagingViewer()) 495 - ->withIDs(array((int)$cursor)); 496 - 497 - $this->willExecuteCursorQuery($query); 498 - 499 - $object = $query->executeOne(); 500 - if (!$object) { 501 - throw new Exception( 502 - pht( 503 - 'Cursor "%s" does not identify a valid object in query "%s".', 504 - $cursor, 505 - get_class($this))); 506 - } 507 - 508 - return $object; 509 - } 510 - 511 - 512 - /** 513 - * @task paging 514 - */ 515 - protected function willExecuteCursorQuery( 516 - PhabricatorCursorPagedPolicyAwareQuery $query) { 517 - return; 518 616 } 519 617 520 618 ··· 1072 1170 array $parts, 1073 1171 $for_union = false) { 1074 1172 1075 - $is_query_reversed = false; 1076 - if ($this->getBeforeID()) { 1077 - $is_query_reversed = !$is_query_reversed; 1078 - } 1173 + $is_query_reversed = $this->getIsQueryOrderReversed(); 1079 1174 1080 1175 $sql = array(); 1081 1176 foreach ($parts as $key => $part) {
+7 -2
src/infrastructure/query/policy/PhabricatorPolicyAwareQuery.php
··· 282 282 283 283 $this->didFilterResults($removed); 284 284 285 + // NOTE: We call "nextPage()" before checking if we've found enough 286 + // results because we want to build the internal cursor object even 287 + // if we don't need to execute another query: the internal cursor may 288 + // be used by a parent query that is using this query to translate an 289 + // external cursor into an internal cursor. 290 + $this->nextPage($page); 291 + 285 292 foreach ($visible as $key => $result) { 286 293 ++$count; 287 294 ··· 311 318 // to load another page because we can deduce it would be empty. 312 319 break; 313 320 } 314 - 315 - $this->nextPage($page); 316 321 317 322 if ($overheat_limit && ($total_seen >= $overheat_limit)) { 318 323 $this->isOverheated = true;
+17
src/infrastructure/query/policy/PhabricatorQueryCursor.php
··· 1 + <?php 2 + 3 + final class PhabricatorQueryCursor 4 + extends Phobject { 5 + 6 + private $object; 7 + 8 + public function setObject($object) { 9 + $this->object = $object; 10 + return $this; 11 + } 12 + 13 + public function getObject() { 14 + return $this->object; 15 + } 16 + 17 + }