Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: add documentation for the mmap_prepare file operation callback

This documentation makes it easier for a driver/file system implementer to
correctly use this callback.

It covers the fundamentals, whilst intentionally leaving the less lovely
possible actions one might take undocumented (for instance - the
success_hook, error_hook fields in mmap_action).

The document also covers the new VMA flags implementation which is the
only one which will work correctly with mmap_prepare.

Link: https://lkml.kernel.org/r/3aebf918c213fa2aecf00a31a444119b5bdd7801.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes (Oracle) and committed by
Andrew Morton
fdd24784 3e4bb270

+143
+1
Documentation/filesystems/index.rst
··· 29 29 fiemap 30 30 files 31 31 locks 32 + mmap_prepare 32 33 multigrain-ts 33 34 mount_api 34 35 quota
+142
Documentation/filesystems/mmap_prepare.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========================== 4 + mmap_prepare callback HOWTO 5 + =========================== 6 + 7 + Introduction 8 + ============ 9 + 10 + The ``struct file->f_op->mmap()`` callback has been deprecated as it is both a 11 + stability and security risk, and doesn't always permit the merging of adjacent 12 + mappings resulting in unnecessary memory fragmentation. 13 + 14 + It has been replaced with the ``file->f_op->mmap_prepare()`` callback which 15 + solves these problems. 16 + 17 + This hook is called right at the beginning of setting up the mapping, and 18 + importantly it is invoked *before* any merging of adjacent mappings has taken 19 + place. 20 + 21 + If an error arises upon mapping, it might arise after this callback has been 22 + invoked, therefore it should be treated as effectively stateless. 23 + 24 + That is - no resources should be allocated nor state updated to reflect that a 25 + mapping has been established, as the mapping may either be merged, or fail to be 26 + mapped after the callback is complete. 27 + 28 + How To Use 29 + ========== 30 + 31 + In your driver's struct file_operations struct, specify an ``mmap_prepare`` 32 + callback rather than an ``mmap`` one, e.g. for ext4: 33 + 34 + .. code-block:: C 35 + 36 + const struct file_operations ext4_file_operations = { 37 + ... 38 + .mmap_prepare = ext4_file_mmap_prepare, 39 + }; 40 + 41 + This has a signature of ``int (*mmap_prepare)(struct vm_area_desc *)``. 42 + 43 + Examining the struct vm_area_desc type: 44 + 45 + .. code-block:: C 46 + 47 + struct vm_area_desc { 48 + /* Immutable state. */ 49 + const struct mm_struct *const mm; 50 + struct file *const file; /* May vary from vm_file in stacked callers. */ 51 + unsigned long start; 52 + unsigned long end; 53 + 54 + /* Mutable fields. Populated with initial state. */ 55 + pgoff_t pgoff; 56 + struct file *vm_file; 57 + vma_flags_t vma_flags; 58 + pgprot_t page_prot; 59 + 60 + /* Write-only fields. */ 61 + const struct vm_operations_struct *vm_ops; 62 + void *private_data; 63 + 64 + /* Take further action? */ 65 + struct mmap_action action; 66 + }; 67 + 68 + This is straightforward - you have all the fields you need to set up the 69 + mapping, and you can update the mutable and writable fields, for instance: 70 + 71 + .. code-block:: C 72 + 73 + static int ext4_file_mmap_prepare(struct vm_area_desc *desc) 74 + { 75 + int ret; 76 + struct file *file = desc->file; 77 + struct inode *inode = file->f_mapping->host; 78 + 79 + ... 80 + 81 + file_accessed(file); 82 + if (IS_DAX(file_inode(file))) { 83 + desc->vm_ops = &ext4_dax_vm_ops; 84 + vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT); 85 + } else { 86 + desc->vm_ops = &ext4_file_vm_ops; 87 + } 88 + return 0; 89 + } 90 + 91 + Importantly, you no longer have to dance around with reference counts or locks 92 + when updating these fields - **you can simply go ahead and change them**. 93 + 94 + Everything is taken care of by the mapping code. 95 + 96 + VMA Flags 97 + --------- 98 + 99 + Along with ``mmap_prepare``, VMA flags have undergone an overhaul. Where before 100 + you would invoke one of vm_flags_init(), vm_flags_reset(), vm_flags_set(), 101 + vm_flags_clear(), and vm_flags_mod() to modify flags (and to have the 102 + locking done correctly for you, this is no longer necessary. 103 + 104 + Also, the legacy approach of specifying VMA flags via ``VM_READ``, ``VM_WRITE``, 105 + etc. - i.e. using a ``-VM_xxx``- macro has changed too. 106 + 107 + When implementing mmap_prepare(), reference flags by their bit number, defined 108 + as a ``VMA_xxx_BIT`` macro, e.g. ``VMA_READ_BIT``, ``VMA_WRITE_BIT`` etc., 109 + and use one of (where ``desc`` is a pointer to struct vm_area_desc): 110 + 111 + * ``vma_desc_test_any(desc, ...)`` - Specify a comma-separated list of flags 112 + you wish to test for (whether _any_ are set), e.g. - ``vma_desc_test_any( 113 + desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`` - returns ``true`` if either are set, 114 + otherwise ``false``. 115 + * ``vma_desc_set_flags(desc, ...)`` - Update the VMA descriptor flags to set 116 + additional flags specified by a comma-separated list, 117 + e.g. - ``vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)``. 118 + * ``vma_desc_clear_flags(desc, ...)`` - Update the VMA descriptor flags to clear 119 + flags specified by a comma-separated list, e.g. - ``vma_desc_clear_flags( 120 + desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)``. 121 + 122 + Actions 123 + ======= 124 + 125 + You can now very easily have actions be performed upon a mapping once set up by 126 + utilising simple helper functions invoked upon the struct vm_area_desc 127 + pointer. These are: 128 + 129 + * mmap_action_remap() - Remaps a range consisting only of PFNs for a specific 130 + range starting a virtual address and PFN number of a set size. 131 + 132 + * mmap_action_remap_full() - Same as mmap_action_remap(), only remaps the 133 + entire mapping from ``start_pfn`` onward. 134 + 135 + * mmap_action_ioremap() - Same as mmap_action_remap(), only performs an I/O 136 + remap. 137 + 138 + * mmap_action_ioremap_full() - Same as mmap_action_ioremap(), only remaps 139 + the entire mapping from ``start_pfn`` onward. 140 + 141 + **NOTE:** The ``action`` field should never normally be manipulated directly, 142 + rather you ought to use one of these helpers.