Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

xen/x86: restore (fix) xen_set_pte_init() behavior

Commit f7c90c2aa400 ("x86/xen: don't write ptes directly in 32-bit PV
guests") needlessly (and heavily) penalized 64-bit guests here: The
majority of the early page table updates is to writable pages (which get
converted to r/o only after all the writes are done), in particular
those involved in building the direct map (which consists of all 4k
mappings in PV). On my test system this accounts for almost 16 million
hypercalls when each could simply have been a plain memory write.

Switch back to using native_set_pte(), except for updates of early
ioremap tables (where a suitable accessor exists to recognize them).
With 32-bit PV support gone, this doesn't need to be further
conditionalized (albeit backports thereof may need adjustment).

To avoid a fair number (almost 256k on my test system) of trap-and-
emulate cases appearing as a result, switch the hook in
xen_pagetable_init().

Finally commit d6b186c1e2d8 ("x86/xen: avoid m2p lookup when setting
early page table entries") inserted a function ahead of
xen_set_pte_init(), separating it from its comment (which may have been
part of the reason why the performance regression wasn't anticipated /
recognized while codeing / reviewing the change mentioned further up).
Move the function up and adjust that comment to describe the new
behavior.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/57ce1289-0297-e96e-79e1-cedafb5d9bf6@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

authored by

Jan Beulich and committed by
Boris Ostrovsky
cae73951 dc4bd2a2

+17 -7
+17 -7
arch/x86/xen/mmu_pv.c
··· 1194 1194 1195 1195 static void __init xen_pagetable_init(void) 1196 1196 { 1197 + /* 1198 + * The majority of further PTE writes is to pagetables already 1199 + * announced as such to Xen. Hence it is more efficient to use 1200 + * hypercalls for these updates. 1201 + */ 1202 + pv_ops.mmu.set_pte = __xen_set_pte; 1203 + 1197 1204 paging_init(); 1198 1205 xen_post_allocator_init(); 1199 1206 ··· 1430 1423 * 1431 1424 * Many of these PTE updates are done on unpinned and writable pages 1432 1425 * and doing a hypercall for these is unnecessary and expensive. At 1433 - * this point it is not possible to tell if a page is pinned or not, 1434 - * so always write the PTE directly and rely on Xen trapping and 1426 + * this point it is rarely possible to tell if a page is pinned, so 1427 + * mostly write the PTE directly and rely on Xen trapping and 1435 1428 * emulating any updates as necessary. 1436 1429 */ 1430 + static void __init xen_set_pte_init(pte_t *ptep, pte_t pte) 1431 + { 1432 + if (unlikely(is_early_ioremap_ptep(ptep))) 1433 + __xen_set_pte(ptep, pte); 1434 + else 1435 + native_set_pte(ptep, pte); 1436 + } 1437 + 1437 1438 __visible pte_t xen_make_pte_init(pteval_t pte) 1438 1439 { 1439 1440 unsigned long pfn; ··· 1462 1447 return native_make_pte(pte); 1463 1448 } 1464 1449 PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init); 1465 - 1466 - static void __init xen_set_pte_init(pte_t *ptep, pte_t pte) 1467 - { 1468 - __xen_set_pte(ptep, pte); 1469 - } 1470 1450 1471 1451 /* Early in boot, while setting up the initial pagetable, assume 1472 1452 everything is pinned. */