Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: add remap_pfn_range_prepare(), remap_pfn_range_complete()

We need the ability to split PFN remap between updating the VMA and
performing the actual remap, in order to do away with the legacy f_op->mmap
hook.

To do so, update the PFN remap code to provide shared logic, and also make
remap_pfn_range_notrack() static, as its one user, io_mapping_map_user()
was removed in commit 9a4f90e24661 ("mm: remove mm/io-mapping.c").

Then, introduce remap_pfn_range_prepare(), which accepts VMA descriptor
and PFN parameters, and remap_pfn_range_complete() which accepts the same
parameters as remap_pfn_rangte().

remap_pfn_range_prepare() will set the cow vma->vm_pgoff if necessary, so
it must be supplied with a correct PFN to do so.

While we're here, also clean up the duplicated #ifdef
__HAVE_PFNMAP_TRACKING check and put into a single #ifdef/#else block.

We keep these internal to mm as they should only be used by internal
helpers.

Link: https://lkml.kernel.org/r/75b55de63249b3aa0fd5b3b08ed1d3ff19255d0d.1760959442.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chatre, Reinette <reinette.chatre@intel.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dave Martin <dave.martin@arm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes and committed by
Andrew Morton
51e38e7d 2bcd9207

+110 -48
+18 -4
include/linux/mm.h
··· 489 489 */ 490 490 #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_PFNMAP | VM_MIXEDMAP) 491 491 492 + /* 493 + * Physically remapped pages are special. Tell the 494 + * rest of the world about it: 495 + * VM_IO tells people not to look at these pages 496 + * (accesses can have side effects). 497 + * VM_PFNMAP tells the core MM that the base pages are just 498 + * raw PFN mappings, and do not have a "struct page" associated 499 + * with them. 500 + * VM_DONTEXPAND 501 + * Disable vma merging and expanding with mremap(). 502 + * VM_DONTDUMP 503 + * Omit vma from core dump, even when VM_IO turned off. 504 + */ 505 + #define VM_REMAP_FLAGS (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP) 506 + 492 507 /* This mask prevents VMA from being scanned with khugepaged */ 493 508 #define VM_NO_KHUGEPAGED (VM_SPECIAL | VM_HUGETLB) 494 509 ··· 3649 3634 3650 3635 struct vm_area_struct *find_extend_vma_locked(struct mm_struct *, 3651 3636 unsigned long addr); 3652 - int remap_pfn_range(struct vm_area_struct *, unsigned long addr, 3653 - unsigned long pfn, unsigned long size, pgprot_t); 3654 - int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr, 3655 - unsigned long pfn, unsigned long size, pgprot_t prot); 3637 + int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 3638 + unsigned long pfn, unsigned long size, pgprot_t pgprot); 3639 + 3656 3640 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); 3657 3641 int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr, 3658 3642 struct page **pages, unsigned long *num);
+4
mm/internal.h
··· 1677 1677 void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm); 1678 1678 int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm); 1679 1679 1680 + void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn); 1681 + int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr, 1682 + unsigned long pfn, unsigned long size, pgprot_t pgprot); 1683 + 1680 1684 #endif /* __MM_INTERNAL_H */
+88 -44
mm/memory.c
··· 2900 2900 return 0; 2901 2901 } 2902 2902 2903 + static int get_remap_pgoff(vm_flags_t vm_flags, unsigned long addr, 2904 + unsigned long end, unsigned long vm_start, unsigned long vm_end, 2905 + unsigned long pfn, pgoff_t *vm_pgoff_p) 2906 + { 2907 + /* 2908 + * There's a horrible special case to handle copy-on-write 2909 + * behaviour that some programs depend on. We mark the "original" 2910 + * un-COW'ed pages by matching them up with "vma->vm_pgoff". 2911 + * See vm_normal_page() for details. 2912 + */ 2913 + if (is_cow_mapping(vm_flags)) { 2914 + if (addr != vm_start || end != vm_end) 2915 + return -EINVAL; 2916 + *vm_pgoff_p = pfn; 2917 + } 2918 + 2919 + return 0; 2920 + } 2921 + 2903 2922 static int remap_pfn_range_internal(struct vm_area_struct *vma, unsigned long addr, 2904 2923 unsigned long pfn, unsigned long size, pgprot_t prot) 2905 2924 { ··· 2931 2912 if (WARN_ON_ONCE(!PAGE_ALIGNED(addr))) 2932 2913 return -EINVAL; 2933 2914 2934 - /* 2935 - * Physically remapped pages are special. Tell the 2936 - * rest of the world about it: 2937 - * VM_IO tells people not to look at these pages 2938 - * (accesses can have side effects). 2939 - * VM_PFNMAP tells the core MM that the base pages are just 2940 - * raw PFN mappings, and do not have a "struct page" associated 2941 - * with them. 2942 - * VM_DONTEXPAND 2943 - * Disable vma merging and expanding with mremap(). 2944 - * VM_DONTDUMP 2945 - * Omit vma from core dump, even when VM_IO turned off. 2946 - * 2947 - * There's a horrible special case to handle copy-on-write 2948 - * behaviour that some programs depend on. We mark the "original" 2949 - * un-COW'ed pages by matching them up with "vma->vm_pgoff". 2950 - * See vm_normal_page() for details. 2951 - */ 2952 - if (is_cow_mapping(vma->vm_flags)) { 2953 - if (addr != vma->vm_start || end != vma->vm_end) 2954 - return -EINVAL; 2955 - vma->vm_pgoff = pfn; 2956 - } 2957 - 2958 - vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); 2915 + VM_WARN_ON_ONCE((vma->vm_flags & VM_REMAP_FLAGS) != VM_REMAP_FLAGS); 2959 2916 2960 2917 BUG_ON(addr >= end); 2961 2918 pfn -= addr >> PAGE_SHIFT; ··· 2952 2957 * Variant of remap_pfn_range that does not call track_pfn_remap. The caller 2953 2958 * must have pre-validated the caching bits of the pgprot_t. 2954 2959 */ 2955 - int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr, 2960 + static int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr, 2956 2961 unsigned long pfn, unsigned long size, pgprot_t prot) 2957 2962 { 2958 2963 int error = remap_pfn_range_internal(vma, addr, pfn, size, prot); ··· 2997 3002 pfnmap_untrack(ctx->pfn, ctx->size); 2998 3003 kfree(ctx); 2999 3004 } 3000 - #endif /* __HAVE_PFNMAP_TRACKING */ 3001 3005 3002 - /** 3003 - * remap_pfn_range - remap kernel memory to userspace 3004 - * @vma: user vma to map to 3005 - * @addr: target page aligned user address to start at 3006 - * @pfn: page frame number of kernel physical memory address 3007 - * @size: size of mapping area 3008 - * @prot: page protection flags for this mapping 3009 - * 3010 - * Note: this is only safe if the mm semaphore is held when called. 3011 - * 3012 - * Return: %0 on success, negative error code otherwise. 3013 - */ 3014 - #ifdef __HAVE_PFNMAP_TRACKING 3015 - int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 3016 - unsigned long pfn, unsigned long size, pgprot_t prot) 3006 + static int remap_pfn_range_track(struct vm_area_struct *vma, unsigned long addr, 3007 + unsigned long pfn, unsigned long size, pgprot_t prot) 3017 3008 { 3018 3009 struct pfnmap_track_ctx *ctx = NULL; 3019 3010 int err; ··· 3035 3054 return err; 3036 3055 } 3037 3056 3057 + static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 3058 + unsigned long pfn, unsigned long size, pgprot_t prot) 3059 + { 3060 + return remap_pfn_range_track(vma, addr, pfn, size, prot); 3061 + } 3038 3062 #else 3039 - int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 3040 - unsigned long pfn, unsigned long size, pgprot_t prot) 3063 + static int do_remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 3064 + unsigned long pfn, unsigned long size, pgprot_t prot) 3041 3065 { 3042 3066 return remap_pfn_range_notrack(vma, addr, pfn, size, prot); 3043 3067 } 3044 3068 #endif 3069 + 3070 + void remap_pfn_range_prepare(struct vm_area_desc *desc, unsigned long pfn) 3071 + { 3072 + /* 3073 + * We set addr=VMA start, end=VMA end here, so this won't fail, but we 3074 + * check it again on complete and will fail there if specified addr is 3075 + * invalid. 3076 + */ 3077 + get_remap_pgoff(desc->vm_flags, desc->start, desc->end, 3078 + desc->start, desc->end, pfn, &desc->pgoff); 3079 + desc->vm_flags |= VM_REMAP_FLAGS; 3080 + } 3081 + 3082 + static int remap_pfn_range_prepare_vma(struct vm_area_struct *vma, unsigned long addr, 3083 + unsigned long pfn, unsigned long size) 3084 + { 3085 + unsigned long end = addr + PAGE_ALIGN(size); 3086 + int err; 3087 + 3088 + err = get_remap_pgoff(vma->vm_flags, addr, end, 3089 + vma->vm_start, vma->vm_end, 3090 + pfn, &vma->vm_pgoff); 3091 + if (err) 3092 + return err; 3093 + 3094 + vm_flags_set(vma, VM_REMAP_FLAGS); 3095 + return 0; 3096 + } 3097 + 3098 + /** 3099 + * remap_pfn_range - remap kernel memory to userspace 3100 + * @vma: user vma to map to 3101 + * @addr: target page aligned user address to start at 3102 + * @pfn: page frame number of kernel physical memory address 3103 + * @size: size of mapping area 3104 + * @prot: page protection flags for this mapping 3105 + * 3106 + * Note: this is only safe if the mm semaphore is held when called. 3107 + * 3108 + * Return: %0 on success, negative error code otherwise. 3109 + */ 3110 + int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, 3111 + unsigned long pfn, unsigned long size, pgprot_t prot) 3112 + { 3113 + int err; 3114 + 3115 + err = remap_pfn_range_prepare_vma(vma, addr, pfn, size); 3116 + if (err) 3117 + return err; 3118 + 3119 + return do_remap_pfn_range(vma, addr, pfn, size, prot); 3120 + } 3045 3121 EXPORT_SYMBOL(remap_pfn_range); 3122 + 3123 + int remap_pfn_range_complete(struct vm_area_struct *vma, unsigned long addr, 3124 + unsigned long pfn, unsigned long size, pgprot_t prot) 3125 + { 3126 + return do_remap_pfn_range(vma, addr, pfn, size, prot); 3127 + } 3046 3128 3047 3129 /** 3048 3130 * vm_iomap_memory - remap memory to userspace