Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: add vm_ops->mapped hook

Previously, when a driver needed to do something like establish a
reference count, it could do so in the mmap hook in the knowledge that the
mapping would succeed.

With the introduction of f_op->mmap_prepare this is no longer the case, as
it is invoked prior to actually establishing the mapping.

mmap_prepare is not appropriate for this kind of thing as it is called
before any merge might take place, and after which an error might occur
meaning resources could be leaked.

To take this into account, introduce a new vm_ops->mapped callback which
is invoked when the VMA is first mapped (though notably - not when it is
merged - which is correct and mirrors existing mmap/open/close behaviour).

We do better that vm_ops->open() here, as this callback can return an
error, at which point the VMA will be unmapped.

Note that vm_ops->mapped() is invoked after any mmap action is complete
(such as I/O remapping).

We intentionally do not expose the VMA at this point, exposing only the
fields that could be used, and an output parameter in case the operation
needs to update the vma->vm_private_data field.

In order to deal with stacked filesystems which invoke inner filesystem's
mmap() invocations, add __compat_vma_mapped() and invoke it on vfs_mmap()
(via compat_vma_mmap()) to ensure that the mapped callback is handled when
an mmap() caller invokes a nested filesystem's mmap_prepare() callback.

Update the mmap_prepare documentation to describe the mapped hook and make
it clear what its intended use is.

The vm_ops->mapped() call is handled by the mmap complete logic to ensure
the same code paths are handled by both the compatibility and VMA layers.

Additionally, update VMA userland test headers to reflect the change.

Link: https://lkml.kernel.org/r/4c5e98297eb0aae9565c564e1c296a112702f144.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes (Oracle) and committed by
Andrew Morton
c50ca15d 382c0f28

+120 -29
+15
Documentation/filesystems/mmap_prepare.rst
··· 25 25 mapping has been established, as the mapping may either be merged, or fail to be 26 26 mapped after the callback is complete. 27 27 28 + Mapped callback 29 + --------------- 30 + 31 + If resources need to be allocated per-mapping, or state such as a reference 32 + count needs to be manipulated, this should be done using the ``vm_ops->mapped`` 33 + hook, which itself should be set by the >mmap_prepare hook. 34 + 35 + This callback is only invoked if a new mapping has been established and was not 36 + merged with any other, and is invoked at a point where no error may occur before 37 + the mapping is established. 38 + 39 + You may return an error to the callback itself, which will cause the mapping to 40 + become unmapped and an error returned to the mmap() caller. This is useful if 41 + resources need to be allocated, and that allocation might fail. 42 + 28 43 How To Use 29 44 ========== 30 45
+8 -1
include/linux/fs.h
··· 2059 2059 } 2060 2060 2061 2061 int compat_vma_mmap(struct file *file, struct vm_area_struct *vma); 2062 + int __vma_check_mmap_hook(struct vm_area_struct *vma); 2062 2063 2063 2064 static inline int vfs_mmap(struct file *file, struct vm_area_struct *vma) 2064 2065 { 2066 + int err; 2067 + 2065 2068 if (file->f_op->mmap_prepare) 2066 2069 return compat_vma_mmap(file, vma); 2067 2070 2068 - return file->f_op->mmap(file, vma); 2071 + err = file->f_op->mmap(file, vma); 2072 + if (err) 2073 + return err; 2074 + 2075 + return __vma_check_mmap_hook(vma); 2069 2076 } 2070 2077 2071 2078 static inline int vfs_mmap_prepare(struct file *file, struct vm_area_desc *desc)
+17
include/linux/mm.h
··· 775 775 * Context: User context. May sleep. Caller holds mmap_lock. 776 776 */ 777 777 void (*close)(struct vm_area_struct *vma); 778 + /** 779 + * @mapped: Called when the VMA is first mapped in the MM. Not called if 780 + * the new VMA is merged with an adjacent VMA. 781 + * 782 + * The @vm_private_data field is an output field allowing the user to 783 + * modify vma->vm_private_data as necessary. 784 + * 785 + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if 786 + * set from f_op->mmap. 787 + * 788 + * Returns %0 on success, or an error otherwise. On error, the VMA will 789 + * be unmapped. 790 + * 791 + * Context: User context. May sleep. Caller holds mmap_lock. 792 + */ 793 + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff, 794 + const struct file *file, void **vm_private_data); 778 795 /* Called any time before splitting to check if it's allowed */ 779 796 int (*may_split)(struct vm_area_struct *vma, unsigned long addr); 780 797 int (*mremap)(struct vm_area_struct *vma);
+63 -27
mm/util.c
··· 1163 1163 EXPORT_SYMBOL(flush_dcache_folio); 1164 1164 #endif 1165 1165 1166 - /** 1167 - * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an 1168 - * existing VMA and execute any requested actions. 1169 - * @file: The file which possesss an f_op->mmap_prepare() hook. 1170 - * @vma: The VMA to apply the .mmap_prepare() hook to. 1171 - * 1172 - * Ordinarily, .mmap_prepare() is invoked directly upon mmap(). However, certain 1173 - * stacked filesystems invoke a nested mmap hook of an underlying file. 1174 - * 1175 - * Until all filesystems are converted to use .mmap_prepare(), we must be 1176 - * conservative and continue to invoke these stacked filesystems using the 1177 - * deprecated .mmap() hook. 1178 - * 1179 - * However we have a problem if the underlying file system possesses an 1180 - * .mmap_prepare() hook, as we are in a different context when we invoke the 1181 - * .mmap() hook, already having a VMA to deal with. 1182 - * 1183 - * compat_vma_mmap() is a compatibility function that takes VMA state, 1184 - * establishes a struct vm_area_desc descriptor, passes to the underlying 1185 - * .mmap_prepare() hook and applies any changes performed by it. 1186 - * 1187 - * Once the conversion of filesystems is complete this function will no longer 1188 - * be required and will be removed. 1189 - * 1190 - * Returns: 0 on success or error. 1191 - */ 1192 - int compat_vma_mmap(struct file *file, struct vm_area_struct *vma) 1166 + static int __compat_vma_mmap(struct file *file, struct vm_area_struct *vma) 1193 1167 { 1194 1168 struct vm_area_desc desc = { 1195 1169 .mm = vma->vm_mm, ··· 1195 1221 set_vma_from_desc(vma, &desc); 1196 1222 return mmap_action_complete(vma, action); 1197 1223 } 1224 + 1225 + /** 1226 + * compat_vma_mmap() - Apply the file's .mmap_prepare() hook to an 1227 + * existing VMA and execute any requested actions. 1228 + * @file: The file which possesss an f_op->mmap_prepare() hook. 1229 + * @vma: The VMA to apply the .mmap_prepare() hook to. 1230 + * 1231 + * Ordinarily, .mmap_prepare() is invoked directly upon mmap(). However, certain 1232 + * stacked filesystems invoke a nested mmap hook of an underlying file. 1233 + * 1234 + * Until all filesystems are converted to use .mmap_prepare(), we must be 1235 + * conservative and continue to invoke these stacked filesystems using the 1236 + * deprecated .mmap() hook. 1237 + * 1238 + * However we have a problem if the underlying file system possesses an 1239 + * .mmap_prepare() hook, as we are in a different context when we invoke the 1240 + * .mmap() hook, already having a VMA to deal with. 1241 + * 1242 + * compat_vma_mmap() is a compatibility function that takes VMA state, 1243 + * establishes a struct vm_area_desc descriptor, passes to the underlying 1244 + * .mmap_prepare() hook and applies any changes performed by it. 1245 + * 1246 + * Once the conversion of filesystems is complete this function will no longer 1247 + * be required and will be removed. 1248 + * 1249 + * Returns: 0 on success or error. 1250 + */ 1251 + int compat_vma_mmap(struct file *file, struct vm_area_struct *vma) 1252 + { 1253 + return __compat_vma_mmap(file, vma); 1254 + } 1198 1255 EXPORT_SYMBOL(compat_vma_mmap); 1256 + 1257 + int __vma_check_mmap_hook(struct vm_area_struct *vma) 1258 + { 1259 + /* vm_ops->mapped is not valid if mmap() is specified. */ 1260 + if (vma->vm_ops && WARN_ON_ONCE(vma->vm_ops->mapped)) 1261 + return -EINVAL; 1262 + 1263 + return 0; 1264 + } 1265 + EXPORT_SYMBOL(__vma_check_mmap_hook); 1199 1266 1200 1267 static void set_ps_flags(struct page_snapshot *ps, const struct folio *folio, 1201 1268 const struct page *page) ··· 1326 1311 } 1327 1312 } 1328 1313 1314 + static int call_vma_mapped(struct vm_area_struct *vma) 1315 + { 1316 + const struct vm_operations_struct *vm_ops = vma->vm_ops; 1317 + void *vm_private_data = vma->vm_private_data; 1318 + int err; 1319 + 1320 + if (!vm_ops || !vm_ops->mapped) 1321 + return 0; 1322 + 1323 + err = vm_ops->mapped(vma->vm_start, vma->vm_end, vma->vm_pgoff, 1324 + vma->vm_file, &vm_private_data); 1325 + if (err) 1326 + return err; 1327 + 1328 + if (vm_private_data != vma->vm_private_data) 1329 + vma->vm_private_data = vm_private_data; 1330 + return 0; 1331 + } 1332 + 1329 1333 static int mmap_action_finish(struct vm_area_struct *vma, 1330 1334 struct mmap_action *action, int err) 1331 1335 { 1332 1336 size_t len; 1333 1337 1338 + if (!err) 1339 + err = call_vma_mapped(vma); 1334 1340 if (!err && action->success_hook) 1335 1341 err = action->success_hook(vma); 1336 1342
-1
mm/vma.c
··· 2781 2781 2782 2782 if (have_mmap_prepare && allocated_new) { 2783 2783 error = mmap_action_complete(vma, &desc.action); 2784 - 2785 2784 if (error) 2786 2785 return error; 2787 2786 }
+17
tools/testing/vma/include/dup.h
··· 643 643 * Context: User context. May sleep. Caller holds mmap_lock. 644 644 */ 645 645 void (*close)(struct vm_area_struct *vma); 646 + /** 647 + * @mapped: Called when the VMA is first mapped in the MM. Not called if 648 + * the new VMA is merged with an adjacent VMA. 649 + * 650 + * The @vm_private_data field is an output field allowing the user to 651 + * modify vma->vm_private_data as necessary. 652 + * 653 + * ONLY valid if set from f_op->mmap_prepare. Will result in an error if 654 + * set from f_op->mmap. 655 + * 656 + * Returns %0 on success, or an error otherwise. On error, the VMA will 657 + * be unmapped. 658 + * 659 + * Context: User context. May sleep. Caller holds mmap_lock. 660 + */ 661 + int (*mapped)(unsigned long start, unsigned long end, pgoff_t pgoff, 662 + const struct file *file, void **vm_private_data); 646 663 /* Called any time before splitting to check if it's allowed */ 647 664 int (*may_split)(struct vm_area_struct *vma, unsigned long addr); 648 665 int (*mremap)(struct vm_area_struct *vma);