Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: establish mm/vma_exec.c for shared exec/mm VMA functionality

Patch series "move all VMA allocation, freeing and duplication logic to
mm", v3.

Currently VMA allocation, freeing and duplication exist in kernel/fork.c,
which is a violation of separation of concerns, and leaves these functions
exposed to the rest of the kernel when they are in fact internal
implementation details.

Resolve this by moving this logic to mm, and making it internal to vma.c,
vma.h.

This also allows us, in future, to provide userland testing around this
functionality.

We additionally abstract dup_mmap() to mm, being careful to ensure
kernel/fork.c acceses this via the mm internal header so it is not exposed
elsewhere in the kernel.

As part of this change, also abstract initial stack allocation performed
in __bprm_mm_init() out of fs code into mm via the
create_init_stack_vma(), as this code uses vm_area_alloc() and
vm_area_free().

In order to do so sensibly, we introduce a new mm/vma_exec.c file, which
contains the code that is shared by mm and exec. This file is added to
both memory mapping and exec sections in MAINTAINERS so both sets of
maintainers can maintain oversight.

As part of this change, we also move relocate_vma_down() to mm/vma_exec.c
so all shared mm/exec functionality is kept in one place.

We add code shared between nommu and mmu-enabled configurations in order
to share VMA allocation, freeing and duplication code correctly while also
keeping these functions available in userland VMA testing.

This is achieved by adding a mm/vma_init.c file which is also compiled by
the userland tests.


This patch (of 4):

There is functionality that overlaps the exec and memory mapping
subsystems. While it properly belongs in mm, it is important that exec
maintainers maintain oversight of this functionality correctly.

We can establish both goals by adding a new mm/vma_exec.c file which
contains these 'glue' functions, and have fs/exec.c import them.

As a part of this change, to ensure that proper oversight is achieved, add
the file to both the MEMORY MAPPING and EXEC & BINFMT API, ELF sections.

scripts/get_maintainer.pl can correctly handle files in multiple entries
and this neatly handles the cross-over.

[akpm@linux-foundation.org: fix comment typo]
Link: https://lkml.kernel.org/r/80f0d0c6-0b68-47f9-ab78-0ab7f74677fc@lucifer.local
Link: https://lkml.kernel.org/r/cover.1745853549.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/91f2cee8f17d65214a9d83abb7011aa15f1ea690.1745853549.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes and committed by
Andrew Morton
6c36ac1e 0f428676

+145 -86
+2
MAINTAINERS
··· 8840 8840 F: include/uapi/linux/auxvec.h 8841 8841 F: include/uapi/linux/binfmts.h 8842 8842 F: include/uapi/linux/elf.h 8843 + F: mm/vma_exec.c 8843 8844 F: tools/testing/selftests/exec/ 8844 8845 N: asm/elf.h 8845 8846 N: binfmt ··· 15682 15681 F: mm/mseal.c 15683 15682 F: mm/vma.c 15684 15683 F: mm/vma.h 15684 + F: mm/vma_exec.c 15685 15685 F: mm/vma_internal.h 15686 15686 F: tools/testing/selftests/mm/merge.c 15687 15687 F: tools/testing/vma/
+3
fs/exec.c
··· 78 78 79 79 #include <trace/events/sched.h> 80 80 81 + /* For vma exec functions. */ 82 + #include "../mm/internal.h" 83 + 81 84 static int bprm_creds_from_file(struct linux_binprm *bprm); 82 85 83 86 int suid_dumpable = 0;
-1
include/linux/mm.h
··· 3278 3278 extern int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin); 3279 3279 extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); 3280 3280 extern void exit_mmap(struct mm_struct *); 3281 - int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift); 3282 3281 bool mmap_read_lock_maybe_expand(struct mm_struct *mm, struct vm_area_struct *vma, 3283 3282 unsigned long addr, bool write); 3284 3283
+1 -1
mm/Makefile
··· 37 37 mmu-$(CONFIG_MMU) := highmem.o memory.o mincore.o \ 38 38 mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \ 39 39 msync.o page_vma_mapped.o pagewalk.o \ 40 - pgtable-generic.o rmap.o vmalloc.o vma.o 40 + pgtable-generic.o rmap.o vmalloc.o vma.o vma_exec.o 41 41 42 42 43 43 ifdef CONFIG_CROSS_MEMORY_ATTACH
-83
mm/mmap.c
··· 1717 1717 } 1718 1718 subsys_initcall(init_reserve_notifier); 1719 1719 1720 - /* 1721 - * Relocate a VMA downwards by shift bytes. There cannot be any VMAs between 1722 - * this VMA and its relocated range, which will now reside at [vma->vm_start - 1723 - * shift, vma->vm_end - shift). 1724 - * 1725 - * This function is almost certainly NOT what you want for anything other than 1726 - * early executable temporary stack relocation. 1727 - */ 1728 - int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift) 1729 - { 1730 - /* 1731 - * The process proceeds as follows: 1732 - * 1733 - * 1) Use shift to calculate the new vma endpoints. 1734 - * 2) Extend vma to cover both the old and new ranges. This ensures the 1735 - * arguments passed to subsequent functions are consistent. 1736 - * 3) Move vma's page tables to the new range. 1737 - * 4) Free up any cleared pgd range. 1738 - * 5) Shrink the vma to cover only the new range. 1739 - */ 1740 - 1741 - struct mm_struct *mm = vma->vm_mm; 1742 - unsigned long old_start = vma->vm_start; 1743 - unsigned long old_end = vma->vm_end; 1744 - unsigned long length = old_end - old_start; 1745 - unsigned long new_start = old_start - shift; 1746 - unsigned long new_end = old_end - shift; 1747 - VMA_ITERATOR(vmi, mm, new_start); 1748 - VMG_STATE(vmg, mm, &vmi, new_start, old_end, 0, vma->vm_pgoff); 1749 - struct vm_area_struct *next; 1750 - struct mmu_gather tlb; 1751 - PAGETABLE_MOVE(pmc, vma, vma, old_start, new_start, length); 1752 - 1753 - BUG_ON(new_start > new_end); 1754 - 1755 - /* 1756 - * ensure there are no vmas between where we want to go 1757 - * and where we are 1758 - */ 1759 - if (vma != vma_next(&vmi)) 1760 - return -EFAULT; 1761 - 1762 - vma_iter_prev_range(&vmi); 1763 - /* 1764 - * cover the whole range: [new_start, old_end) 1765 - */ 1766 - vmg.middle = vma; 1767 - if (vma_expand(&vmg)) 1768 - return -ENOMEM; 1769 - 1770 - /* 1771 - * move the page tables downwards, on failure we rely on 1772 - * process cleanup to remove whatever mess we made. 1773 - */ 1774 - pmc.for_stack = true; 1775 - if (length != move_page_tables(&pmc)) 1776 - return -ENOMEM; 1777 - 1778 - tlb_gather_mmu(&tlb, mm); 1779 - next = vma_next(&vmi); 1780 - if (new_end > old_start) { 1781 - /* 1782 - * when the old and new regions overlap clear from new_end. 1783 - */ 1784 - free_pgd_range(&tlb, new_end, old_end, new_end, 1785 - next ? next->vm_start : USER_PGTABLES_CEILING); 1786 - } else { 1787 - /* 1788 - * otherwise, clean from old_start; this is done to not touch 1789 - * the address space in [new_end, old_start) some architectures 1790 - * have constraints on va-space that make this illegal (IA64) - 1791 - * for the others its just a little faster. 1792 - */ 1793 - free_pgd_range(&tlb, old_start, old_end, new_end, 1794 - next ? next->vm_start : USER_PGTABLES_CEILING); 1795 - } 1796 - tlb_finish_mmu(&tlb); 1797 - 1798 - vma_prev(&vmi); 1799 - /* Shrink the vma to just the new range */ 1800 - return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); 1801 - } 1802 - 1803 1720 #ifdef CONFIG_MMU 1804 1721 /* 1805 1722 * Obtain a read lock on mm->mmap_lock, if the specified address is below the
+5
mm/vma.h
··· 548 548 549 549 int __vm_munmap(unsigned long start, size_t len, bool unlock); 550 550 551 + /* vma_exec.c */ 552 + #ifdef CONFIG_MMU 553 + int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift); 554 + #endif 555 + 551 556 #endif /* __MM_VMA_H */
+92
mm/vma_exec.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + /* 4 + * Functions explicitly implemented for exec functionality which however are 5 + * explicitly VMA-only logic. 6 + */ 7 + 8 + #include "vma_internal.h" 9 + #include "vma.h" 10 + 11 + /* 12 + * Relocate a VMA downwards by shift bytes. There cannot be any VMAs between 13 + * this VMA and its relocated range, which will now reside at [vma->vm_start - 14 + * shift, vma->vm_end - shift). 15 + * 16 + * This function is almost certainly NOT what you want for anything other than 17 + * early executable temporary stack relocation. 18 + */ 19 + int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift) 20 + { 21 + /* 22 + * The process proceeds as follows: 23 + * 24 + * 1) Use shift to calculate the new vma endpoints. 25 + * 2) Extend vma to cover both the old and new ranges. This ensures the 26 + * arguments passed to subsequent functions are consistent. 27 + * 3) Move vma's page tables to the new range. 28 + * 4) Free up any cleared pgd range. 29 + * 5) Shrink the vma to cover only the new range. 30 + */ 31 + 32 + struct mm_struct *mm = vma->vm_mm; 33 + unsigned long old_start = vma->vm_start; 34 + unsigned long old_end = vma->vm_end; 35 + unsigned long length = old_end - old_start; 36 + unsigned long new_start = old_start - shift; 37 + unsigned long new_end = old_end - shift; 38 + VMA_ITERATOR(vmi, mm, new_start); 39 + VMG_STATE(vmg, mm, &vmi, new_start, old_end, 0, vma->vm_pgoff); 40 + struct vm_area_struct *next; 41 + struct mmu_gather tlb; 42 + PAGETABLE_MOVE(pmc, vma, vma, old_start, new_start, length); 43 + 44 + BUG_ON(new_start > new_end); 45 + 46 + /* 47 + * ensure there are no vmas between where we want to go 48 + * and where we are 49 + */ 50 + if (vma != vma_next(&vmi)) 51 + return -EFAULT; 52 + 53 + vma_iter_prev_range(&vmi); 54 + /* 55 + * cover the whole range: [new_start, old_end) 56 + */ 57 + vmg.middle = vma; 58 + if (vma_expand(&vmg)) 59 + return -ENOMEM; 60 + 61 + /* 62 + * move the page tables downwards, on failure we rely on 63 + * process cleanup to remove whatever mess we made. 64 + */ 65 + pmc.for_stack = true; 66 + if (length != move_page_tables(&pmc)) 67 + return -ENOMEM; 68 + 69 + tlb_gather_mmu(&tlb, mm); 70 + next = vma_next(&vmi); 71 + if (new_end > old_start) { 72 + /* 73 + * when the old and new regions overlap clear from new_end. 74 + */ 75 + free_pgd_range(&tlb, new_end, old_end, new_end, 76 + next ? next->vm_start : USER_PGTABLES_CEILING); 77 + } else { 78 + /* 79 + * otherwise, clean from old_start; this is done to not touch 80 + * the address space in [new_end, old_start) some architectures 81 + * have constraints on va-space that make this illegal (IA64) - 82 + * for the others its just a little faster. 83 + */ 84 + free_pgd_range(&tlb, old_start, old_end, new_end, 85 + next ? next->vm_start : USER_PGTABLES_CEILING); 86 + } 87 + tlb_finish_mmu(&tlb); 88 + 89 + vma_prev(&vmi); 90 + /* Shrink the vma to just the new range */ 91 + return vma_shrink(&vmi, vma, new_start, new_end, vma->vm_pgoff); 92 + }
+1 -1
tools/testing/vma/Makefile
··· 9 9 OFILES = $(SHARED_OFILES) vma.o maple-shim.o 10 10 TARGETS = vma 11 11 12 - vma.o: vma.c vma_internal.h ../../../mm/vma.c ../../../mm/vma.h 12 + vma.o: vma.c vma_internal.h ../../../mm/vma.c ../../../mm/vma_exec.c ../../../mm/vma.h 13 13 14 14 vma: $(OFILES) 15 15 $(CC) $(CFLAGS) -o $@ $(OFILES) $(LDLIBS)
+1
tools/testing/vma/vma.c
··· 28 28 * Directly import the VMA implementation here. Our vma_internal.h wrapper 29 29 * provides userland-equivalent functionality for everything vma.c uses. 30 30 */ 31 + #include "../../../mm/vma_exec.c" 31 32 #include "../../../mm/vma.c" 32 33 33 34 const struct vm_operations_struct vma_dummy_vm_ops;
+40
tools/testing/vma/vma_internal.h
··· 421 421 unsigned long start_gap; 422 422 }; 423 423 424 + struct pagetable_move_control { 425 + struct vm_area_struct *old; /* Source VMA. */ 426 + struct vm_area_struct *new; /* Destination VMA. */ 427 + unsigned long old_addr; /* Address from which the move begins. */ 428 + unsigned long old_end; /* Exclusive address at which old range ends. */ 429 + unsigned long new_addr; /* Address to move page tables to. */ 430 + unsigned long len_in; /* Bytes to remap specified by user. */ 431 + 432 + bool need_rmap_locks; /* Do rmap locks need to be taken? */ 433 + bool for_stack; /* Is this an early temp stack being moved? */ 434 + }; 435 + 436 + #define PAGETABLE_MOVE(name, old_, new_, old_addr_, new_addr_, len_) \ 437 + struct pagetable_move_control name = { \ 438 + .old = old_, \ 439 + .new = new_, \ 440 + .old_addr = old_addr_, \ 441 + .old_end = (old_addr_) + (len_), \ 442 + .new_addr = new_addr_, \ 443 + .len_in = len_, \ 444 + } 445 + 424 446 static inline void vma_iter_invalidate(struct vma_iterator *vmi) 425 447 { 426 448 mas_pause(&vmi->mas); ··· 1260 1238 } while (!__sync_bool_compare_and_swap(&mapping->i_mmap_writable, c, c+1)); 1261 1239 1262 1240 return 0; 1241 + } 1242 + 1243 + static inline unsigned long move_page_tables(struct pagetable_move_control *pmc) 1244 + { 1245 + (void)pmc; 1246 + 1247 + return 0; 1248 + } 1249 + 1250 + static inline void free_pgd_range(struct mmu_gather *tlb, 1251 + unsigned long addr, unsigned long end, 1252 + unsigned long floor, unsigned long ceiling) 1253 + { 1254 + (void)tlb; 1255 + (void)addr; 1256 + (void)end; 1257 + (void)floor; 1258 + (void)ceiling; 1263 1259 } 1264 1260 1265 1261 #endif /* __MM_VMA_INTERNAL_H */