Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one

Gather all the VMA flags whose presence implies that page tables must be
copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this
rather than specifying individual flags in vma_needs_copy().

We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies
that there may be metadata contained in the page tables (that is - guard
markers) which would will not and cannot be propagated upon fork.

This was already being done manually previously in vma_needs_copy(), but
this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and
VM_UFFD_WP all of which imply the same.

Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too
- because equally a flag being VM_STICKY indicates that the VMA contains
metadat that is not propagated by being faulted in - i.e. that the VMA
metadata does not fully describe the VMA alone, and thus we must propagate
whatever metadata there is on a fork.

However, for maximum flexibility, we do not make this necessarily the case
here.

Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes and committed by
Andrew Morton
ab04b530 64212ba0

+56 -14
+26
include/linux/mm.h
··· 556 556 #define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY) 557 557 558 558 /* 559 + * Flags which should result in page tables being copied on fork. These are 560 + * flags which indicate that the VMA maps page tables which cannot be 561 + * reconsistuted upon page fault, so necessitate page table copying upon 562 + * 563 + * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot be 564 + * reasonably reconstructed on page fault. 565 + * 566 + * VM_UFFD_WP - Encodes metadata about an installed uffd 567 + * write protect handler, which cannot be 568 + * reconstructed on page fault. 569 + * 570 + * We always copy pgtables when dst_vma has uffd-wp 571 + * enabled even if it's file-backed 572 + * (e.g. shmem). Because when uffd-wp is enabled, 573 + * pgtable contains uffd-wp protection information, 574 + * that's something we can't retrieve from page cache, 575 + * and skip copying will lose those info. 576 + * 577 + * VM_MAYBE_GUARD - Could contain page guard region markers which 578 + * by design are a property of the page tables 579 + * only and thus cannot be reconstructed on page 580 + * fault. 581 + */ 582 + #define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_MAYBE_GUARD) 583 + 584 + /* 559 585 * mapping from the currently active vm_flags protection bits (the 560 586 * low four bits) to a page protection mask.. 561 587 */
+4 -14
mm/memory.c
··· 1463 1463 static bool 1464 1464 vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) 1465 1465 { 1466 + if (src_vma->vm_flags & VM_COPY_ON_FORK) 1467 + return true; 1466 1468 /* 1467 - * Always copy pgtables when dst_vma has uffd-wp enabled even if it's 1468 - * file-backed (e.g. shmem). Because when uffd-wp is enabled, pgtable 1469 - * contains uffd-wp protection information, that's something we can't 1470 - * retrieve from page cache, and skip copying will lose those info. 1469 + * The presence of an anon_vma indicates an anonymous VMA has page 1470 + * tables which naturally cannot be reconstituted on page fault. 1471 1471 */ 1472 - if (userfaultfd_wp(dst_vma)) 1473 - return true; 1474 - 1475 - if (src_vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) 1476 - return true; 1477 - 1478 1472 if (src_vma->anon_vma) 1479 - return true; 1480 - 1481 - /* Guard regions have modified page tables that require copying. */ 1482 - if (src_vma->vm_flags & VM_MAYBE_GUARD) 1483 1473 return true; 1484 1474 1485 1475 /*
+26
tools/testing/vma/vma_internal.h
··· 145 145 */ 146 146 #define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY) 147 147 148 + /* 149 + * Flags which should result in page tables being copied on fork. These are 150 + * flags which indicate that the VMA maps page tables which cannot be 151 + * reconsistuted upon page fault, so necessitate page table copying upon 152 + * 153 + * VM_PFNMAP / VM_MIXEDMAP - These contain kernel-mapped data which cannot be 154 + * reasonably reconstructed on page fault. 155 + * 156 + * VM_UFFD_WP - Encodes metadata about an installed uffd 157 + * write protect handler, which cannot be 158 + * reconstructed on page fault. 159 + * 160 + * We always copy pgtables when dst_vma has uffd-wp 161 + * enabled even if it's file-backed 162 + * (e.g. shmem). Because when uffd-wp is enabled, 163 + * pgtable contains uffd-wp protection information, 164 + * that's something we can't retrieve from page cache, 165 + * and skip copying will lose those info. 166 + * 167 + * VM_MAYBE_GUARD - Could contain page guard region markers which 168 + * by design are a property of the page tables 169 + * only and thus cannot be reconstructed on page 170 + * fault. 171 + */ 172 + #define VM_COPY_ON_FORK (VM_PFNMAP | VM_MIXEDMAP | VM_UFFD_WP | VM_MAYBE_GUARD) 173 + 148 174 #define FIRST_USER_ADDRESS 0UL 149 175 #define USER_PGTABLES_CEILING 0UL 150 176