Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

liveupdate: luo_file: remember retrieve() status

LUO keeps track of successful retrieve attempts on a LUO file. It does so
to avoid multiple retrievals of the same file. Multiple retrievals cause
problems because once the file is retrieved, the serialized data
structures are likely freed and the file is likely in a very different
state from what the code expects.

The retrieve boolean in struct luo_file keeps track of this, and is passed
to the finish callback so it knows what work was already done and what it
has left to do.

All this works well when retrieve succeeds. When it fails,
luo_retrieve_file() returns the error immediately, without ever storing
anywhere that a retrieve was attempted or what its error code was. This
results in an errored LIVEUPDATE_SESSION_RETRIEVE_FD ioctl to userspace,
but nothing prevents it from trying this again.

The retry is problematic for much of the same reasons listed above. The
file is likely in a very different state than what the retrieve logic
normally expects, and it might even have freed some serialization data
structures. Attempting to access them or free them again is going to
break things.

For example, if memfd managed to restore 8 of its 10 folios, but fails on
the 9th, a subsequent retrieve attempt will try to call
kho_restore_folio() on the first folio again, and that will fail with a
warning since it is an invalid operation.

Apart from the retry, finish() also breaks. Since on failure the
retrieved bool in luo_file is never touched, the finish() call on session
close will tell the file handler that retrieve was never attempted, and it
will try to access or free the data structures that might not exist, much
in the same way as the retry attempt.

There is no sane way of attempting the retrieve again. Remember the error
retrieve returned and directly return it on a retry. Also pass this
status code to finish() so it can make the right decision on the work it
needs to do.

This is done by changing the bool to an integer. A value of 0 means
retrieve was never attempted, a positive value means it succeeded, and a
negative value means it failed and the error code is the value.

Link: https://lkml.kernel.org/r/20260216132221.987987-1-pratyush@kernel.org
Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks")
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Pratyush Yadav (Google) and committed by
Andrew Morton
f85b1c6a dd085fe9

+37 -20
+6 -3
include/linux/liveupdate.h
··· 23 23 /** 24 24 * struct liveupdate_file_op_args - Arguments for file operation callbacks. 25 25 * @handler: The file handler being called. 26 - * @retrieved: The retrieve status for the 'can_finish / finish' 27 - * operation. 26 + * @retrieve_status: The retrieve status for the 'can_finish / finish' 27 + * operation. A value of 0 means the retrieve has not been 28 + * attempted, a positive value means the retrieve was 29 + * successful, and a negative value means the retrieve failed, 30 + * and the value is the error code of the call. 28 31 * @file: The file object. For retrieve: [OUT] The callback sets 29 32 * this to the new file. For other ops: [IN] The caller sets 30 33 * this to the file being operated on. ··· 43 40 */ 44 41 struct liveupdate_file_op_args { 45 42 struct liveupdate_file_handler *handler; 46 - bool retrieved; 43 + int retrieve_status; 47 44 struct file *file; 48 45 u64 serialized_data; 49 46 void *private_data;
+25 -16
kernel/liveupdate/luo_file.c
··· 134 134 * state that is not preserved. Set by the handler's .preserve() 135 135 * callback, and must be freed in the handler's .unpreserve() 136 136 * callback. 137 - * @retrieved: A flag indicating whether a user/kernel in the new kernel has 137 + * @retrieve_status: Status code indicating whether a user/kernel in the new kernel has 138 138 * successfully called retrieve() on this file. This prevents 139 - * multiple retrieval attempts. 139 + * multiple retrieval attempts. A value of 0 means a retrieve() 140 + * has not been attempted, a positive value means the retrieve() 141 + * was successful, and a negative value means the retrieve() 142 + * failed, and the value is the error code of the call. 140 143 * @mutex: A mutex that protects the fields of this specific instance 141 144 * (e.g., @retrieved, @file), ensuring that operations like 142 145 * retrieving or finishing a file are atomic. ··· 164 161 struct file *file; 165 162 u64 serialized_data; 166 163 void *private_data; 167 - bool retrieved; 164 + int retrieve_status; 168 165 struct mutex mutex; 169 166 struct list_head list; 170 167 u64 token; ··· 301 298 luo_file->file = file; 302 299 luo_file->fh = fh; 303 300 luo_file->token = token; 304 - luo_file->retrieved = false; 305 301 mutex_init(&luo_file->mutex); 306 302 307 303 args.handler = fh; ··· 579 577 return -ENOENT; 580 578 581 579 guard(mutex)(&luo_file->mutex); 582 - if (luo_file->retrieved) { 580 + if (luo_file->retrieve_status < 0) { 581 + /* Retrieve was attempted and it failed. Return the error code. */ 582 + return luo_file->retrieve_status; 583 + } 584 + 585 + if (luo_file->retrieve_status > 0) { 583 586 /* 584 587 * Someone is asking for this file again, so get a reference 585 588 * for them. ··· 597 590 args.handler = luo_file->fh; 598 591 args.serialized_data = luo_file->serialized_data; 599 592 err = luo_file->fh->ops->retrieve(&args); 600 - if (!err) { 601 - luo_file->file = args.file; 602 - 603 - /* Get reference so we can keep this file in LUO until finish */ 604 - get_file(luo_file->file); 605 - *filep = luo_file->file; 606 - luo_file->retrieved = true; 593 + if (err) { 594 + /* Keep the error code for later use. */ 595 + luo_file->retrieve_status = err; 596 + return err; 607 597 } 608 598 609 - return err; 599 + luo_file->file = args.file; 600 + /* Get reference so we can keep this file in LUO until finish */ 601 + get_file(luo_file->file); 602 + *filep = luo_file->file; 603 + luo_file->retrieve_status = 1; 604 + 605 + return 0; 610 606 } 611 607 612 608 static int luo_file_can_finish_one(struct luo_file_set *file_set, ··· 625 615 args.handler = luo_file->fh; 626 616 args.file = luo_file->file; 627 617 args.serialized_data = luo_file->serialized_data; 628 - args.retrieved = luo_file->retrieved; 618 + args.retrieve_status = luo_file->retrieve_status; 629 619 can_finish = luo_file->fh->ops->can_finish(&args); 630 620 } 631 621 ··· 642 632 args.handler = luo_file->fh; 643 633 args.file = luo_file->file; 644 634 args.serialized_data = luo_file->serialized_data; 645 - args.retrieved = luo_file->retrieved; 635 + args.retrieve_status = luo_file->retrieve_status; 646 636 647 637 luo_file->fh->ops->finish(&args); 648 638 luo_flb_file_finish(luo_file->fh); ··· 798 788 luo_file->file = NULL; 799 789 luo_file->serialized_data = file_ser[i].data; 800 790 luo_file->token = file_ser[i].token; 801 - luo_file->retrieved = false; 802 791 mutex_init(&luo_file->mutex); 803 792 list_add_tail(&luo_file->list, &file_set->files_list); 804 793 }
+6 -1
mm/memfd_luo.c
··· 326 326 struct memfd_luo_folio_ser *folios_ser; 327 327 struct memfd_luo_ser *ser; 328 328 329 - if (args->retrieved) 329 + /* 330 + * If retrieve was successful, nothing to do. If it failed, retrieve() 331 + * already cleaned up everything it could. So nothing to do there 332 + * either. Only need to clean up when retrieve was not called. 333 + */ 334 + if (args->retrieve_status) 330 335 return; 331 336 332 337 ser = phys_to_virt(args->serialized_data);