Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: add a ptdesc flag to mark kernel page tables

The page tables used to map the kernel and userspace often have very
different handling rules. There are frequently *_kernel() variants of
functions just for kernel page tables. That's not great and has lead to
code duplication.

Instead of having completely separate call paths, allow a 'ptdesc' to be
marked as being for kernel mappings. Introduce helpers to set and clear
this status.

Note: this uses the PG_referenced bit. Page flags are a great fit for
this since it is truly a single bit of information. Use PG_referenced
itself because it's a fairly benign flag (as opposed to things like
PG_lock). It's also (according to Willy) unlikely to go away any time
soon.

PG_referenced is not in PAGE_FLAGS_CHECK_AT_FREE. It does not need to be
cleared before freeing the page, and pages coming out of the allocator
should have it cleared. Regardless, introduce an API to clear it anyway.
Having symmetry in the API makes it easier to change the underlying
implementation later, like if there was a need to move to a
PAGE_FLAGS_CHECK_AT_FREE bit.

Link: https://lkml.kernel.org/r/20251022082635.2462433-3-baolu.lu@linux.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasant Hegde <vasant.hegde@amd.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Dave Hansen and committed by
Andrew Morton
27bfafac 72f98ef9

+41
+41
include/linux/mm.h
··· 2962 2962 #endif /* CONFIG_MMU */ 2963 2963 2964 2964 enum pt_flags { 2965 + PT_kernel = PG_referenced, 2965 2966 PT_reserved = PG_reserved, 2966 2967 /* High bits are used for zone/node/section */ 2967 2968 }; ··· 2986 2985 static inline bool pagetable_is_reserved(struct ptdesc *pt) 2987 2986 { 2988 2987 return test_bit(PT_reserved, &pt->pt_flags.f); 2988 + } 2989 + 2990 + /** 2991 + * ptdesc_set_kernel - Mark a ptdesc used to map the kernel 2992 + * @ptdesc: The ptdesc to be marked 2993 + * 2994 + * Kernel page tables often need special handling. Set a flag so that 2995 + * the handling code knows this ptdesc will not be used for userspace. 2996 + */ 2997 + static inline void ptdesc_set_kernel(struct ptdesc *ptdesc) 2998 + { 2999 + set_bit(PT_kernel, &ptdesc->pt_flags.f); 3000 + } 3001 + 3002 + /** 3003 + * ptdesc_clear_kernel - Mark a ptdesc as no longer used to map the kernel 3004 + * @ptdesc: The ptdesc to be unmarked 3005 + * 3006 + * Use when the ptdesc is no longer used to map the kernel and no longer 3007 + * needs special handling. 3008 + */ 3009 + static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc) 3010 + { 3011 + /* 3012 + * Note: the 'PG_referenced' bit does not strictly need to be 3013 + * cleared before freeing the page. But this is nice for 3014 + * symmetry. 3015 + */ 3016 + clear_bit(PT_kernel, &ptdesc->pt_flags.f); 3017 + } 3018 + 3019 + /** 3020 + * ptdesc_test_kernel - Check if a ptdesc is used to map the kernel 3021 + * @ptdesc: The ptdesc being tested 3022 + * 3023 + * Call to tell if the ptdesc used to map the kernel. 3024 + */ 3025 + static inline bool ptdesc_test_kernel(const struct ptdesc *ptdesc) 3026 + { 3027 + return test_bit(PT_kernel, &ptdesc->pt_flags.f); 2989 3028 } 2990 3029 2991 3030 /**