Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM fixes from Andrew Morton:
"28 hotfixes.

23 are cc:stable and the other five address issues which were
introduced during this merge cycle.

20 are for MM and the remainder are for other subsystems"

* tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: fix a potential concurrency bug in RCU mode
maple_tree: fix get wrong data_end in mtree_lookup_walk()
mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
nilfs2: fix sysfs interface lifetime
mm: take a page reference when removing device exclusive entries
mm: vmalloc: avoid warn_alloc noise caused by fatal signal
nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
zsmalloc: document freeable stats
zsmalloc: document new fullness grouping
fsdax: force clear dirty mark if CoW
mm/hugetlb: fix uffd wr-protection for CoW optimization path
mm: enable maple tree RCU mode by default
maple_tree: add RCU lock checking to rcu callback functions
maple_tree: add smp_rmb() to dead node detection
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: detect dead nodes in mas_start()
maple_tree: be more cautious about dead nodes
...

+401 -194
+2
.mailmap
··· 265 265 Krzysztof Kozlowski <krzk@kernel.org> <krzysztof.kozlowski@canonical.com> 266 266 Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> 267 267 Kuogee Hsieh <quic_khsieh@quicinc.com> <khsieh@codeaurora.org> 268 + Leonard Crestez <leonard.crestez@nxp.com> Leonard Crestez <cdleonard@gmail.com> 268 269 Leonardo Bras <leobras.c@gmail.com> <leonardo@linux.ibm.com> 270 + Leonard Göhrs <l.goehrs@pengutronix.de> 269 271 Leonid I Ananiev <leonid.i.ananiev@intel.com> 270 272 Leon Romanovsky <leon@kernel.org> <leon@leon.nu> 271 273 Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
+76 -59
Documentation/mm/zsmalloc.rst
··· 39 39 40 40 # cat /sys/kernel/debug/zsmalloc/zram0/classes 41 41 42 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage 42 + class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable 43 43 ... 44 44 ... 45 - 9 176 0 1 186 129 8 4 46 - 10 192 1 0 2880 2872 135 3 47 - 11 208 0 1 819 795 42 2 48 - 12 224 0 1 219 159 12 4 45 + 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14 46 + 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44 47 + 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26 49 48 ... 50 49 ... 51 50 ··· 53 54 index 54 55 size 55 56 object size zspage stores 56 - almost_empty 57 - the number of ZS_ALMOST_EMPTY zspages(see below) 58 - almost_full 59 - the number of ZS_ALMOST_FULL zspages(see below) 57 + 10% 58 + the number of zspages with usage ratio less than 10% (see below) 59 + 20% 60 + the number of zspages with usage ratio between 10% and 20% 61 + 30% 62 + the number of zspages with usage ratio between 20% and 30% 63 + 40% 64 + the number of zspages with usage ratio between 30% and 40% 65 + 50% 66 + the number of zspages with usage ratio between 40% and 50% 67 + 60% 68 + the number of zspages with usage ratio between 50% and 60% 69 + 70% 70 + the number of zspages with usage ratio between 60% and 70% 71 + 80% 72 + the number of zspages with usage ratio between 70% and 80% 73 + 90% 74 + the number of zspages with usage ratio between 80% and 90% 75 + 99% 76 + the number of zspages with usage ratio between 90% and 99% 77 + 100% 78 + the number of zspages with usage ratio 100% 60 79 obj_allocated 61 80 the number of objects allocated 62 81 obj_used ··· 83 66 the number of pages allocated for the class 84 67 pages_per_zspage 85 68 the number of 0-order pages to make a zspage 69 + freeable 70 + the approximate number of pages class compaction can free 86 71 87 - We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where 88 - 89 - * n = number of allocated objects 90 - * N = total number of objects zspage can store 91 - * f = fullness_threshold_frac(ie, 4 at the moment) 92 - 93 - Similarly, we assign zspage to: 94 - 95 - * ZS_ALMOST_FULL when n > N / f 96 - * ZS_EMPTY when n == 0 97 - * ZS_FULL when n == N 98 - 72 + Each zspage maintains inuse counter which keeps track of the number of 73 + objects stored in the zspage. The inuse counter determines the zspage's 74 + "fullness group" which is calculated as the ratio of the "inuse" objects to 75 + the total number of objects the zspage can hold (objs_per_zspage). The 76 + closer the inuse counter is to objs_per_zspage, the better. 99 77 100 78 Internals 101 79 ========= ··· 106 94 107 95 For instance, consider the following size classes::: 108 96 109 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 97 + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 110 98 ... 111 - 94 1536 0 0 0 0 0 3 0 112 - 100 1632 0 0 0 0 0 2 0 99 + 94 1536 0 .... 0 0 0 0 3 0 100 + 100 1632 0 .... 0 0 0 0 2 0 113 101 ... 114 102 115 103 ··· 146 134 147 135 Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: 148 136 149 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 137 + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 138 + 150 139 ... 151 - 202 3264 0 0 0 0 0 4 0 152 - 254 4096 0 0 0 0 0 1 0 140 + 202 3264 0 .. 0 0 0 0 4 0 141 + 254 4096 0 .. 0 0 0 0 1 0 153 142 ... 154 143 155 144 Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages ··· 164 151 165 152 For zspage chain size of 8, huge class watermark becomes 3632 bytes::: 166 153 167 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 154 + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 155 + 168 156 ... 169 - 202 3264 0 0 0 0 0 4 0 170 - 211 3408 0 0 0 0 0 5 0 171 - 217 3504 0 0 0 0 0 6 0 172 - 222 3584 0 0 0 0 0 7 0 173 - 225 3632 0 0 0 0 0 8 0 174 - 254 4096 0 0 0 0 0 1 0 157 + 202 3264 0 .. 0 0 0 0 4 0 158 + 211 3408 0 .. 0 0 0 0 5 0 159 + 217 3504 0 .. 0 0 0 0 6 0 160 + 222 3584 0 .. 0 0 0 0 7 0 161 + 225 3632 0 .. 0 0 0 0 8 0 162 + 254 4096 0 .. 0 0 0 0 1 0 175 163 ... 176 164 177 165 For zspage chain size of 16, huge class watermark becomes 3840 bytes::: 178 166 179 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 167 + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 168 + 180 169 ... 181 - 202 3264 0 0 0 0 0 4 0 182 - 206 3328 0 0 0 0 0 13 0 183 - 207 3344 0 0 0 0 0 9 0 184 - 208 3360 0 0 0 0 0 14 0 185 - 211 3408 0 0 0 0 0 5 0 186 - 212 3424 0 0 0 0 0 16 0 187 - 214 3456 0 0 0 0 0 11 0 188 - 217 3504 0 0 0 0 0 6 0 189 - 219 3536 0 0 0 0 0 13 0 190 - 222 3584 0 0 0 0 0 7 0 191 - 223 3600 0 0 0 0 0 15 0 192 - 225 3632 0 0 0 0 0 8 0 193 - 228 3680 0 0 0 0 0 9 0 194 - 230 3712 0 0 0 0 0 10 0 195 - 232 3744 0 0 0 0 0 11 0 196 - 234 3776 0 0 0 0 0 12 0 197 - 235 3792 0 0 0 0 0 13 0 198 - 236 3808 0 0 0 0 0 14 0 199 - 238 3840 0 0 0 0 0 15 0 200 - 254 4096 0 0 0 0 0 1 0 170 + 202 3264 0 .. 0 0 0 0 4 0 171 + 206 3328 0 .. 0 0 0 0 13 0 172 + 207 3344 0 .. 0 0 0 0 9 0 173 + 208 3360 0 .. 0 0 0 0 14 0 174 + 211 3408 0 .. 0 0 0 0 5 0 175 + 212 3424 0 .. 0 0 0 0 16 0 176 + 214 3456 0 .. 0 0 0 0 11 0 177 + 217 3504 0 .. 0 0 0 0 6 0 178 + 219 3536 0 .. 0 0 0 0 13 0 179 + 222 3584 0 .. 0 0 0 0 7 0 180 + 223 3600 0 .. 0 0 0 0 15 0 181 + 225 3632 0 .. 0 0 0 0 8 0 182 + 228 3680 0 .. 0 0 0 0 9 0 183 + 230 3712 0 .. 0 0 0 0 10 0 184 + 232 3744 0 .. 0 0 0 0 11 0 185 + 234 3776 0 .. 0 0 0 0 12 0 186 + 235 3792 0 .. 0 0 0 0 13 0 187 + 236 3808 0 .. 0 0 0 0 14 0 188 + 238 3840 0 .. 0 0 0 0 15 0 189 + 254 4096 0 .. 0 0 0 0 1 0 201 190 ... 202 191 203 192 Overall the combined zspage chain size effect on zsmalloc pool configuration::: ··· 229 214 230 215 zsmalloc classes stats::: 231 216 232 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 217 + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 218 + 233 219 ... 234 - Total 13 51 413836 412973 159955 3 220 + Total 13 .. 51 413836 412973 159955 3 235 221 236 222 zram mm_stat::: 237 223 ··· 243 227 244 228 zsmalloc classes stats::: 245 229 246 - class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 230 + class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 231 + 247 232 ... 248 - Total 18 87 414852 412978 156666 0 233 + Total 18 .. 87 414852 412978 156666 0 249 234 250 235 zram mm_stat::: 251 236
+47 -5
fs/dax.c
··· 781 781 return ret; 782 782 } 783 783 784 + static int __dax_clear_dirty_range(struct address_space *mapping, 785 + pgoff_t start, pgoff_t end) 786 + { 787 + XA_STATE(xas, &mapping->i_pages, start); 788 + unsigned int scanned = 0; 789 + void *entry; 790 + 791 + xas_lock_irq(&xas); 792 + xas_for_each(&xas, entry, end) { 793 + entry = get_unlocked_entry(&xas, 0); 794 + xas_clear_mark(&xas, PAGECACHE_TAG_DIRTY); 795 + xas_clear_mark(&xas, PAGECACHE_TAG_TOWRITE); 796 + put_unlocked_entry(&xas, entry, WAKE_NEXT); 797 + 798 + if (++scanned % XA_CHECK_SCHED) 799 + continue; 800 + 801 + xas_pause(&xas); 802 + xas_unlock_irq(&xas); 803 + cond_resched(); 804 + xas_lock_irq(&xas); 805 + } 806 + xas_unlock_irq(&xas); 807 + 808 + return 0; 809 + } 810 + 784 811 /* 785 812 * Delete DAX entry at @index from @mapping. Wait for it 786 813 * to be unlocked before deleting it. ··· 1285 1258 /* don't bother with blocks that are not shared to start with */ 1286 1259 if (!(iomap->flags & IOMAP_F_SHARED)) 1287 1260 return length; 1288 - /* don't bother with holes or unwritten extents */ 1289 - if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) 1290 - return length; 1291 1261 1292 1262 id = dax_read_lock(); 1293 1263 ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL); 1294 1264 if (ret < 0) 1295 1265 goto out_unlock; 1266 + 1267 + /* zero the distance if srcmap is HOLE or UNWRITTEN */ 1268 + if (srcmap->flags & IOMAP_F_SHARED || srcmap->type == IOMAP_UNWRITTEN) { 1269 + memset(daddr, 0, length); 1270 + dax_flush(iomap->dax_dev, daddr, length); 1271 + ret = length; 1272 + goto out_unlock; 1273 + } 1296 1274 1297 1275 ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL); 1298 1276 if (ret < 0) ··· 1467 1435 * written by write(2) is visible in mmap. 1468 1436 */ 1469 1437 if (iomap->flags & IOMAP_F_NEW || cow) { 1438 + /* 1439 + * Filesystem allows CoW on non-shared extents. The src extents 1440 + * may have been mmapped with dirty mark before. To be able to 1441 + * invalidate its dax entries, we need to clear the dirty mark 1442 + * in advance. 1443 + */ 1444 + if (cow) 1445 + __dax_clear_dirty_range(iomi->inode->i_mapping, 1446 + pos >> PAGE_SHIFT, 1447 + (end - 1) >> PAGE_SHIFT); 1470 1448 invalidate_inode_pages2_range(iomi->inode->i_mapping, 1471 1449 pos >> PAGE_SHIFT, 1472 1450 (end - 1) >> PAGE_SHIFT); ··· 2064 2022 2065 2023 while ((ret = iomap_iter(&src_iter, ops)) > 0 && 2066 2024 (ret = iomap_iter(&dst_iter, ops)) > 0) { 2067 - compared = dax_range_compare_iter(&src_iter, &dst_iter, len, 2068 - same); 2025 + compared = dax_range_compare_iter(&src_iter, &dst_iter, 2026 + min(src_iter.len, dst_iter.len), same); 2069 2027 if (compared < 0) 2070 2028 return ret; 2071 2029 src_iter.processed = dst_iter.processed = compared;
+1
fs/nilfs2/btree.c
··· 2219 2219 /* on-disk format */ 2220 2220 binfo->bi_dat.bi_blkoff = cpu_to_le64(key); 2221 2221 binfo->bi_dat.bi_level = level; 2222 + memset(binfo->bi_dat.bi_pad, 0, sizeof(binfo->bi_dat.bi_pad)); 2222 2223 2223 2224 return 0; 2224 2225 }
+1
fs/nilfs2/direct.c
··· 314 314 315 315 binfo->bi_dat.bi_blkoff = cpu_to_le64(key); 316 316 binfo->bi_dat.bi_level = 0; 317 + memset(binfo->bi_dat.bi_pad, 0, sizeof(binfo->bi_dat.bi_pad)); 317 318 318 319 return 0; 319 320 }
+1 -2
fs/nilfs2/segment.c
··· 2609 2609 goto loop; 2610 2610 2611 2611 end_thread: 2612 - spin_unlock(&sci->sc_state_lock); 2613 - 2614 2612 /* end sync. */ 2615 2613 sci->sc_task = NULL; 2616 2614 wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */ 2615 + spin_unlock(&sci->sc_state_lock); 2617 2616 return 0; 2618 2617 } 2619 2618
+2
fs/nilfs2/super.c
··· 482 482 up_write(&nilfs->ns_sem); 483 483 } 484 484 485 + nilfs_sysfs_delete_device_group(nilfs); 485 486 iput(nilfs->ns_sufile); 486 487 iput(nilfs->ns_cpfile); 487 488 iput(nilfs->ns_dat); ··· 1106 1105 nilfs_put_root(fsroot); 1107 1106 1108 1107 failed_unload: 1108 + nilfs_sysfs_delete_device_group(nilfs); 1109 1109 iput(nilfs->ns_sufile); 1110 1110 iput(nilfs->ns_cpfile); 1111 1111 iput(nilfs->ns_dat);
+7 -5
fs/nilfs2/the_nilfs.c
··· 87 87 { 88 88 might_sleep(); 89 89 if (nilfs_init(nilfs)) { 90 - nilfs_sysfs_delete_device_group(nilfs); 91 90 brelse(nilfs->ns_sbh[0]); 92 91 brelse(nilfs->ns_sbh[1]); 93 92 } ··· 304 305 goto failed; 305 306 } 306 307 308 + err = nilfs_sysfs_create_device_group(sb); 309 + if (unlikely(err)) 310 + goto sysfs_error; 311 + 307 312 if (valid_fs) 308 313 goto skip_recovery; 309 314 ··· 369 366 goto failed; 370 367 371 368 failed_unload: 369 + nilfs_sysfs_delete_device_group(nilfs); 370 + 371 + sysfs_error: 372 372 iput(nilfs->ns_cpfile); 373 373 iput(nilfs->ns_sufile); 374 374 iput(nilfs->ns_dat); ··· 700 694 nilfs->ns_mount_state = le16_to_cpu(sbp->s_state); 701 695 702 696 err = nilfs_store_log_cursor(nilfs, sbp); 703 - if (err) 704 - goto failed_sbh; 705 - 706 - err = nilfs_sysfs_create_device_group(sb); 707 697 if (err) 708 698 goto failed_sbh; 709 699
+2 -1
include/linux/mm_types.h
··· 774 774 unsigned long cpu_bitmap[]; 775 775 }; 776 776 777 - #define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN) 777 + #define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN | \ 778 + MT_FLAGS_USE_RCU) 778 779 extern struct mm_struct init_mm; 779 780 780 781 /* Pointer magic because the dynamic array size confuses some compilers. */
+3
kernel/fork.c
··· 617 617 if (retval) 618 618 goto out; 619 619 620 + mt_clear_in_rcu(vmi.mas.tree); 620 621 for_each_vma(old_vmi, mpnt) { 621 622 struct file *file; 622 623 ··· 701 700 retval = arch_dup_mmap(oldmm, mm); 702 701 loop_out: 703 702 vma_iter_free(&vmi); 703 + if (!retval) 704 + mt_set_in_rcu(vmi.mas.tree); 704 705 out: 705 706 mmap_write_unlock(mm); 706 707 flush_tlb_mm(oldmm);
+2 -2
lib/Kconfig.debug
··· 1143 1143 1144 1144 config SCHED_DEBUG 1145 1145 bool "Collect scheduler debugging info" 1146 - depends on DEBUG_KERNEL && PROC_FS 1146 + depends on DEBUG_KERNEL && DEBUG_FS 1147 1147 default y 1148 1148 help 1149 1149 If you say Y here, the /sys/kernel/debug/sched file will be provided ··· 1392 1392 range 10 30 1393 1393 default 14 1394 1394 help 1395 - Try increasing this value if you need large MAX_STACK_TRACE_ENTRIES. 1395 + Try increasing this value if you need large STACK_TRACE_HASH_SIZE. 1396 1396 1397 1397 config LOCKDEP_CIRCULAR_QUEUE_BITS 1398 1398 int "Bitsize for elements in circular_queue struct"
+189 -96
lib/maple_tree.c
··· 185 185 */ 186 186 static void ma_free_rcu(struct maple_node *node) 187 187 { 188 - node->parent = ma_parent_ptr(node); 188 + WARN_ON(node->parent != ma_parent_ptr(node)); 189 189 call_rcu(&node->rcu, mt_free_rcu); 190 190 } 191 191 ··· 539 539 */ 540 540 static inline bool ma_dead_node(const struct maple_node *node) 541 541 { 542 - struct maple_node *parent = (void *)((unsigned long) 543 - node->parent & ~MAPLE_NODE_MASK); 542 + struct maple_node *parent; 544 543 544 + /* Do not reorder reads from the node prior to the parent check */ 545 + smp_rmb(); 546 + parent = (void *)((unsigned long) node->parent & ~MAPLE_NODE_MASK); 545 547 return (parent == node); 546 548 } 549 + 547 550 /* 548 551 * mte_dead_node() - check if the @enode is dead. 549 552 * @enode: The encoded maple node ··· 558 555 struct maple_node *parent, *node; 559 556 560 557 node = mte_to_node(enode); 558 + /* Do not reorder reads from the node prior to the parent check */ 559 + smp_rmb(); 561 560 parent = mte_parent(enode); 562 561 return (parent == node); 563 562 } ··· 629 624 * ma_pivots() - Get a pointer to the maple node pivots. 630 625 * @node - the maple node 631 626 * @type - the node type 627 + * 628 + * In the event of a dead node, this array may be %NULL 632 629 * 633 630 * Return: A pointer to the maple node pivots 634 631 */ ··· 824 817 return rcu_dereference_check(slots[offset], mt_locked(mt)); 825 818 } 826 819 820 + static inline void *mt_slot_locked(struct maple_tree *mt, void __rcu **slots, 821 + unsigned char offset) 822 + { 823 + return rcu_dereference_protected(slots[offset], mt_locked(mt)); 824 + } 827 825 /* 828 826 * mas_slot_locked() - Get the slot value when holding the maple tree lock. 829 827 * @mas: The maple state ··· 840 828 static inline void *mas_slot_locked(struct ma_state *mas, void __rcu **slots, 841 829 unsigned char offset) 842 830 { 843 - return rcu_dereference_protected(slots[offset], mt_locked(mas->tree)); 831 + return mt_slot_locked(mas->tree, slots, offset); 844 832 } 845 833 846 834 /* ··· 909 897 910 898 meta->gap = offset; 911 899 meta->end = end; 900 + } 901 + 902 + /* 903 + * mt_clear_meta() - clear the metadata information of a node, if it exists 904 + * @mt: The maple tree 905 + * @mn: The maple node 906 + * @type: The maple node type 907 + * @offset: The offset of the highest sub-gap in this node. 908 + * @end: The end of the data in this node. 909 + */ 910 + static inline void mt_clear_meta(struct maple_tree *mt, struct maple_node *mn, 911 + enum maple_type type) 912 + { 913 + struct maple_metadata *meta; 914 + unsigned long *pivots; 915 + void __rcu **slots; 916 + void *next; 917 + 918 + switch (type) { 919 + case maple_range_64: 920 + pivots = mn->mr64.pivot; 921 + if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) { 922 + slots = mn->mr64.slot; 923 + next = mt_slot_locked(mt, slots, 924 + MAPLE_RANGE64_SLOTS - 1); 925 + if (unlikely((mte_to_node(next) && 926 + mte_node_type(next)))) 927 + return; /* no metadata, could be node */ 928 + } 929 + fallthrough; 930 + case maple_arange_64: 931 + meta = ma_meta(mn, type); 932 + break; 933 + default: 934 + return; 935 + } 936 + 937 + meta->gap = 0; 938 + meta->end = 0; 912 939 } 913 940 914 941 /* ··· 1147 1096 a_type = mas_parent_enum(mas, p_enode); 1148 1097 a_node = mte_parent(p_enode); 1149 1098 a_slot = mte_parent_slot(p_enode); 1150 - pivots = ma_pivots(a_node, a_type); 1151 1099 a_enode = mt_mk_node(a_node, a_type); 1100 + pivots = ma_pivots(a_node, a_type); 1101 + 1102 + if (unlikely(ma_dead_node(a_node))) 1103 + return 1; 1152 1104 1153 1105 if (!set_min && a_slot) { 1154 1106 set_min = true; ··· 1408 1354 mas->max = ULONG_MAX; 1409 1355 mas->depth = 0; 1410 1356 1357 + retry: 1411 1358 root = mas_root(mas); 1412 1359 /* Tree with nodes */ 1413 1360 if (likely(xa_is_node(root))) { 1414 1361 mas->depth = 1; 1415 1362 mas->node = mte_safe_root(root); 1416 1363 mas->offset = 0; 1364 + if (mte_dead_node(mas->node)) 1365 + goto retry; 1366 + 1417 1367 return NULL; 1418 1368 } 1419 1369 ··· 1459 1401 { 1460 1402 unsigned char offset; 1461 1403 1404 + if (!pivots) 1405 + return 0; 1406 + 1462 1407 if (type == maple_arange_64) 1463 1408 return ma_meta_end(node, type); 1464 1409 ··· 1497 1436 return ma_meta_end(node, type); 1498 1437 1499 1438 pivots = ma_pivots(node, type); 1439 + if (unlikely(ma_dead_node(node))) 1440 + return 0; 1441 + 1500 1442 offset = mt_pivots[type] - 1; 1501 1443 if (likely(!pivots[offset])) 1502 1444 return ma_meta_end(node, type); ··· 1788 1724 rcu_assign_pointer(slots[offset], mas->node); 1789 1725 } 1790 1726 1791 - if (!advanced) 1727 + if (!advanced) { 1728 + mte_set_node_dead(old_enode); 1792 1729 mas_free(mas, old_enode); 1730 + } 1793 1731 } 1794 1732 1795 1733 /* ··· 3725 3659 slot++; 3726 3660 mas->depth = 1; 3727 3661 mas_set_height(mas); 3728 - 3662 + ma_set_meta(node, maple_leaf_64, 0, slot); 3729 3663 /* swap the new root into the tree */ 3730 3664 rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node)); 3731 - ma_set_meta(node, maple_leaf_64, 0, slot); 3732 3665 return slot; 3733 3666 } 3734 3667 ··· 3940 3875 end = ma_data_end(node, type, pivots, max); 3941 3876 if (unlikely(ma_dead_node(node))) 3942 3877 goto dead_node; 3943 - 3944 - if (pivots[offset] >= mas->index) 3945 - goto next; 3946 - 3947 3878 do { 3948 - offset++; 3949 - } while ((offset < end) && (pivots[offset] < mas->index)); 3879 + if (pivots[offset] >= mas->index) { 3880 + max = pivots[offset]; 3881 + break; 3882 + } 3883 + } while (++offset < end); 3950 3884 3951 - if (likely(offset > end)) 3952 - max = pivots[offset]; 3953 - 3954 - next: 3955 3885 slots = ma_slots(node, type); 3956 3886 next = mt_slot(mas->tree, slots, offset); 3957 3887 if (unlikely(ma_dead_node(node))) ··· 4224 4164 done: 4225 4165 mas_leaf_set_meta(mas, newnode, dst_pivots, maple_leaf_64, new_end); 4226 4166 if (in_rcu) { 4167 + mte_set_node_dead(mas->node); 4227 4168 mas->node = mt_mk_node(newnode, wr_mas->type); 4228 4169 mas_replace(mas, false); 4229 4170 } else { ··· 4566 4505 node = mas_mn(mas); 4567 4506 slots = ma_slots(node, mt); 4568 4507 pivots = ma_pivots(node, mt); 4508 + if (unlikely(ma_dead_node(node))) 4509 + return 1; 4510 + 4569 4511 mas->max = pivots[offset]; 4570 4512 if (offset) 4571 4513 mas->min = pivots[offset - 1] + 1; ··· 4590 4526 slots = ma_slots(node, mt); 4591 4527 pivots = ma_pivots(node, mt); 4592 4528 offset = ma_data_end(node, mt, pivots, mas->max); 4529 + if (unlikely(ma_dead_node(node))) 4530 + return 1; 4531 + 4593 4532 if (offset) 4594 4533 mas->min = pivots[offset - 1] + 1; 4595 4534 ··· 4641 4574 struct maple_enode *enode; 4642 4575 int level = 0; 4643 4576 unsigned char offset; 4577 + unsigned char node_end; 4644 4578 enum maple_type mt; 4645 4579 void __rcu **slots; 4646 4580 ··· 4665 4597 node = mas_mn(mas); 4666 4598 mt = mte_node_type(mas->node); 4667 4599 pivots = ma_pivots(node, mt); 4668 - } while (unlikely(offset == ma_data_end(node, mt, pivots, mas->max))); 4600 + node_end = ma_data_end(node, mt, pivots, mas->max); 4601 + if (unlikely(ma_dead_node(node))) 4602 + return 1; 4603 + 4604 + } while (unlikely(offset == node_end)); 4669 4605 4670 4606 slots = ma_slots(node, mt); 4671 4607 pivot = mas_safe_pivot(mas, pivots, ++offset, mt); ··· 4685 4613 mt = mte_node_type(mas->node); 4686 4614 slots = ma_slots(node, mt); 4687 4615 pivots = ma_pivots(node, mt); 4616 + if (unlikely(ma_dead_node(node))) 4617 + return 1; 4618 + 4688 4619 offset = 0; 4689 4620 pivot = pivots[0]; 4690 4621 } ··· 4734 4659 return NULL; 4735 4660 } 4736 4661 4737 - pivots = ma_pivots(node, type); 4738 4662 slots = ma_slots(node, type); 4739 - mas->index = mas_safe_min(mas, pivots, mas->offset); 4663 + pivots = ma_pivots(node, type); 4740 4664 count = ma_data_end(node, type, pivots, mas->max); 4741 - if (ma_dead_node(node)) 4665 + if (unlikely(ma_dead_node(node))) 4666 + return NULL; 4667 + 4668 + mas->index = mas_safe_min(mas, pivots, mas->offset); 4669 + if (unlikely(ma_dead_node(node))) 4742 4670 return NULL; 4743 4671 4744 4672 if (mas->index > max) ··· 4895 4817 4896 4818 slots = ma_slots(mn, mt); 4897 4819 pivots = ma_pivots(mn, mt); 4820 + if (unlikely(ma_dead_node(mn))) { 4821 + mas_rewalk(mas, index); 4822 + goto retry; 4823 + } 4824 + 4898 4825 if (offset == mt_pivots[mt]) 4899 4826 pivot = mas->max; 4900 4827 else ··· 5483 5400 } 5484 5401 5485 5402 /* 5486 - * mas_dead_leaves() - Mark all leaves of a node as dead. 5403 + * mte_dead_leaves() - Mark all leaves of a node as dead. 5487 5404 * @mas: The maple state 5488 5405 * @slots: Pointer to the slot array 5406 + * @type: The maple node type 5489 5407 * 5490 5408 * Must hold the write lock. 5491 5409 * 5492 5410 * Return: The number of leaves marked as dead. 5493 5411 */ 5494 5412 static inline 5495 - unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots) 5413 + unsigned char mte_dead_leaves(struct maple_enode *enode, struct maple_tree *mt, 5414 + void __rcu **slots) 5496 5415 { 5497 5416 struct maple_node *node; 5498 5417 enum maple_type type; 5499 5418 void *entry; 5500 5419 int offset; 5501 5420 5502 - for (offset = 0; offset < mt_slot_count(mas->node); offset++) { 5503 - entry = mas_slot_locked(mas, slots, offset); 5421 + for (offset = 0; offset < mt_slot_count(enode); offset++) { 5422 + entry = mt_slot(mt, slots, offset); 5504 5423 type = mte_node_type(entry); 5505 5424 node = mte_to_node(entry); 5506 5425 /* Use both node and type to catch LE & BE metadata */ ··· 5510 5425 break; 5511 5426 5512 5427 mte_set_node_dead(entry); 5513 - smp_wmb(); /* Needed for RCU */ 5514 5428 node->type = type; 5515 5429 rcu_assign_pointer(slots[offset], node); 5516 5430 } ··· 5517 5433 return offset; 5518 5434 } 5519 5435 5520 - static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offset) 5436 + /** 5437 + * mte_dead_walk() - Walk down a dead tree to just before the leaves 5438 + * @enode: The maple encoded node 5439 + * @offset: The starting offset 5440 + * 5441 + * Note: This can only be used from the RCU callback context. 5442 + */ 5443 + static void __rcu **mte_dead_walk(struct maple_enode **enode, unsigned char offset) 5521 5444 { 5522 5445 struct maple_node *node, *next; 5523 5446 void __rcu **slots = NULL; 5524 5447 5525 - next = mas_mn(mas); 5448 + next = mte_to_node(*enode); 5526 5449 do { 5527 - mas->node = ma_enode_ptr(next); 5528 - node = mas_mn(mas); 5450 + *enode = ma_enode_ptr(next); 5451 + node = mte_to_node(*enode); 5529 5452 slots = ma_slots(node, node->type); 5530 - next = mas_slot_locked(mas, slots, offset); 5453 + next = rcu_dereference_protected(slots[offset], 5454 + lock_is_held(&rcu_callback_map)); 5531 5455 offset = 0; 5532 5456 } while (!ma_is_leaf(next->type)); 5533 5457 5534 5458 return slots; 5535 5459 } 5536 5460 5461 + /** 5462 + * mt_free_walk() - Walk & free a tree in the RCU callback context 5463 + * @head: The RCU head that's within the node. 5464 + * 5465 + * Note: This can only be used from the RCU callback context. 5466 + */ 5537 5467 static void mt_free_walk(struct rcu_head *head) 5538 5468 { 5539 5469 void __rcu **slots; 5540 5470 struct maple_node *node, *start; 5541 - struct maple_tree mt; 5471 + struct maple_enode *enode; 5542 5472 unsigned char offset; 5543 5473 enum maple_type type; 5544 - MA_STATE(mas, &mt, 0, 0); 5545 5474 5546 5475 node = container_of(head, struct maple_node, rcu); 5547 5476 5548 5477 if (ma_is_leaf(node->type)) 5549 5478 goto free_leaf; 5550 5479 5551 - mt_init_flags(&mt, node->ma_flags); 5552 - mas_lock(&mas); 5553 5480 start = node; 5554 - mas.node = mt_mk_node(node, node->type); 5555 - slots = mas_dead_walk(&mas, 0); 5556 - node = mas_mn(&mas); 5481 + enode = mt_mk_node(node, node->type); 5482 + slots = mte_dead_walk(&enode, 0); 5483 + node = mte_to_node(enode); 5557 5484 do { 5558 5485 mt_free_bulk(node->slot_len, slots); 5559 5486 offset = node->parent_slot + 1; 5560 - mas.node = node->piv_parent; 5561 - if (mas_mn(&mas) == node) 5562 - goto start_slots_free; 5487 + enode = node->piv_parent; 5488 + if (mte_to_node(enode) == node) 5489 + goto free_leaf; 5563 5490 5564 - type = mte_node_type(mas.node); 5565 - slots = ma_slots(mte_to_node(mas.node), type); 5566 - if ((offset < mt_slots[type]) && (slots[offset])) 5567 - slots = mas_dead_walk(&mas, offset); 5568 - 5569 - node = mas_mn(&mas); 5491 + type = mte_node_type(enode); 5492 + slots = ma_slots(mte_to_node(enode), type); 5493 + if ((offset < mt_slots[type]) && 5494 + rcu_dereference_protected(slots[offset], 5495 + lock_is_held(&rcu_callback_map))) 5496 + slots = mte_dead_walk(&enode, offset); 5497 + node = mte_to_node(enode); 5570 5498 } while ((node != start) || (node->slot_len < offset)); 5571 5499 5572 5500 slots = ma_slots(node, node->type); 5573 5501 mt_free_bulk(node->slot_len, slots); 5574 5502 5575 - start_slots_free: 5576 - mas_unlock(&mas); 5577 5503 free_leaf: 5578 5504 mt_free_rcu(&node->rcu); 5579 5505 } 5580 5506 5581 - static inline void __rcu **mas_destroy_descend(struct ma_state *mas, 5582 - struct maple_enode *prev, unsigned char offset) 5507 + static inline void __rcu **mte_destroy_descend(struct maple_enode **enode, 5508 + struct maple_tree *mt, struct maple_enode *prev, unsigned char offset) 5583 5509 { 5584 5510 struct maple_node *node; 5585 - struct maple_enode *next = mas->node; 5511 + struct maple_enode *next = *enode; 5586 5512 void __rcu **slots = NULL; 5513 + enum maple_type type; 5514 + unsigned char next_offset = 0; 5587 5515 5588 5516 do { 5589 - mas->node = next; 5590 - node = mas_mn(mas); 5591 - slots = ma_slots(node, mte_node_type(mas->node)); 5592 - next = mas_slot_locked(mas, slots, 0); 5517 + *enode = next; 5518 + node = mte_to_node(*enode); 5519 + type = mte_node_type(*enode); 5520 + slots = ma_slots(node, type); 5521 + next = mt_slot_locked(mt, slots, next_offset); 5593 5522 if ((mte_dead_node(next))) 5594 - next = mas_slot_locked(mas, slots, 1); 5523 + next = mt_slot_locked(mt, slots, ++next_offset); 5595 5524 5596 - mte_set_node_dead(mas->node); 5597 - node->type = mte_node_type(mas->node); 5525 + mte_set_node_dead(*enode); 5526 + node->type = type; 5598 5527 node->piv_parent = prev; 5599 5528 node->parent_slot = offset; 5600 - offset = 0; 5601 - prev = mas->node; 5529 + offset = next_offset; 5530 + next_offset = 0; 5531 + prev = *enode; 5602 5532 } while (!mte_is_leaf(next)); 5603 5533 5604 5534 return slots; 5605 5535 } 5606 5536 5607 - static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags, 5537 + static void mt_destroy_walk(struct maple_enode *enode, struct maple_tree *mt, 5608 5538 bool free) 5609 5539 { 5610 5540 void __rcu **slots; 5611 5541 struct maple_node *node = mte_to_node(enode); 5612 5542 struct maple_enode *start; 5613 - struct maple_tree mt; 5614 5543 5615 - MA_STATE(mas, &mt, 0, 0); 5616 - 5617 - if (mte_is_leaf(enode)) 5544 + if (mte_is_leaf(enode)) { 5545 + node->type = mte_node_type(enode); 5618 5546 goto free_leaf; 5547 + } 5619 5548 5620 - mt_init_flags(&mt, ma_flags); 5621 - mas_lock(&mas); 5622 - 5623 - mas.node = start = enode; 5624 - slots = mas_destroy_descend(&mas, start, 0); 5625 - node = mas_mn(&mas); 5549 + start = enode; 5550 + slots = mte_destroy_descend(&enode, mt, start, 0); 5551 + node = mte_to_node(enode); // Updated in the above call. 5626 5552 do { 5627 5553 enum maple_type type; 5628 5554 unsigned char offset; 5629 5555 struct maple_enode *parent, *tmp; 5630 5556 5631 - node->slot_len = mas_dead_leaves(&mas, slots); 5557 + node->slot_len = mte_dead_leaves(enode, mt, slots); 5632 5558 if (free) 5633 5559 mt_free_bulk(node->slot_len, slots); 5634 5560 offset = node->parent_slot + 1; 5635 - mas.node = node->piv_parent; 5636 - if (mas_mn(&mas) == node) 5637 - goto start_slots_free; 5561 + enode = node->piv_parent; 5562 + if (mte_to_node(enode) == node) 5563 + goto free_leaf; 5638 5564 5639 - type = mte_node_type(mas.node); 5640 - slots = ma_slots(mte_to_node(mas.node), type); 5565 + type = mte_node_type(enode); 5566 + slots = ma_slots(mte_to_node(enode), type); 5641 5567 if (offset >= mt_slots[type]) 5642 5568 goto next; 5643 5569 5644 - tmp = mas_slot_locked(&mas, slots, offset); 5570 + tmp = mt_slot_locked(mt, slots, offset); 5645 5571 if (mte_node_type(tmp) && mte_to_node(tmp)) { 5646 - parent = mas.node; 5647 - mas.node = tmp; 5648 - slots = mas_destroy_descend(&mas, parent, offset); 5572 + parent = enode; 5573 + enode = tmp; 5574 + slots = mte_destroy_descend(&enode, mt, parent, offset); 5649 5575 } 5650 5576 next: 5651 - node = mas_mn(&mas); 5652 - } while (start != mas.node); 5577 + node = mte_to_node(enode); 5578 + } while (start != enode); 5653 5579 5654 - node = mas_mn(&mas); 5655 - node->slot_len = mas_dead_leaves(&mas, slots); 5580 + node = mte_to_node(enode); 5581 + node->slot_len = mte_dead_leaves(enode, mt, slots); 5656 5582 if (free) 5657 5583 mt_free_bulk(node->slot_len, slots); 5658 - 5659 - start_slots_free: 5660 - mas_unlock(&mas); 5661 5584 5662 5585 free_leaf: 5663 5586 if (free) 5664 5587 mt_free_rcu(&node->rcu); 5588 + else 5589 + mt_clear_meta(mt, node, node->type); 5665 5590 } 5666 5591 5667 5592 /* ··· 5686 5593 struct maple_node *node = mte_to_node(enode); 5687 5594 5688 5595 if (mt_in_rcu(mt)) { 5689 - mt_destroy_walk(enode, mt->ma_flags, false); 5596 + mt_destroy_walk(enode, mt, false); 5690 5597 call_rcu(&node->rcu, mt_free_walk); 5691 5598 } else { 5692 - mt_destroy_walk(enode, mt->ma_flags, true); 5599 + mt_destroy_walk(enode, mt, true); 5693 5600 } 5694 5601 } 5695 5602 ··· 6710 6617 while (likely(!ma_is_leaf(mt))) { 6711 6618 MT_BUG_ON(mas->tree, mte_dead_node(mas->node)); 6712 6619 slots = ma_slots(mn, mt); 6713 - pivots = ma_pivots(mn, mt); 6714 - max = pivots[0]; 6715 6620 entry = mas_slot(mas, slots, 0); 6621 + pivots = ma_pivots(mn, mt); 6716 6622 if (unlikely(ma_dead_node(mn))) 6717 6623 return NULL; 6624 + max = pivots[0]; 6718 6625 mas->node = entry; 6719 6626 mn = mas_mn(mas); 6720 6627 mt = mte_node_type(mas->node); ··· 6734 6641 if (likely(entry)) 6735 6642 return entry; 6736 6643 6737 - pivots = ma_pivots(mn, mt); 6738 - mas->index = pivots[0] + 1; 6739 6644 mas->offset = 1; 6740 6645 entry = mas_slot(mas, slots, 1); 6646 + pivots = ma_pivots(mn, mt); 6741 6647 if (unlikely(ma_dead_node(mn))) 6742 6648 return NULL; 6743 6649 6650 + mas->index = pivots[0] + 1; 6744 6651 if (mas->index > limit) 6745 6652 goto none; 6746 6653
+12 -2
mm/hugetlb.c
··· 5478 5478 struct folio *pagecache_folio, spinlock_t *ptl) 5479 5479 { 5480 5480 const bool unshare = flags & FAULT_FLAG_UNSHARE; 5481 - pte_t pte; 5481 + pte_t pte = huge_ptep_get(ptep); 5482 5482 struct hstate *h = hstate_vma(vma); 5483 5483 struct page *old_page; 5484 5484 struct folio *new_folio; ··· 5486 5486 vm_fault_t ret = 0; 5487 5487 unsigned long haddr = address & huge_page_mask(h); 5488 5488 struct mmu_notifier_range range; 5489 + 5490 + /* 5491 + * Never handle CoW for uffd-wp protected pages. It should be only 5492 + * handled when the uffd-wp protection is removed. 5493 + * 5494 + * Note that only the CoW optimization path (in hugetlb_no_page()) 5495 + * can trigger this, because hugetlb_fault() will always resolve 5496 + * uffd-wp bit first. 5497 + */ 5498 + if (!unshare && huge_pte_uffd_wp(pte)) 5499 + return 0; 5489 5500 5490 5501 /* 5491 5502 * hugetlb does not support FOLL_FORCE-style write faults that keep the ··· 5511 5500 return 0; 5512 5501 } 5513 5502 5514 - pte = huge_ptep_get(ptep); 5515 5503 old_page = pte_page(pte); 5516 5504 5517 5505 delayacct_wpcopy_start();
+16 -16
mm/kfence/core.c
··· 556 556 * enters __slab_free() slow-path. 557 557 */ 558 558 for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) { 559 - struct slab *slab = page_slab(&pages[i]); 559 + struct slab *slab = page_slab(nth_page(pages, i)); 560 560 561 561 if (!i || (i % 2)) 562 562 continue; 563 - 564 - /* Verify we do not have a compound head page. */ 565 - if (WARN_ON(compound_head(&pages[i]) != &pages[i])) 566 - return addr; 567 563 568 564 __folio_set_slab(slab_folio(slab)); 569 565 #ifdef CONFIG_MEMCG ··· 593 597 594 598 /* Protect the right redzone. */ 595 599 if (unlikely(!kfence_protect(addr + PAGE_SIZE))) 596 - return addr; 600 + goto reset_slab; 597 601 598 602 addr += 2 * PAGE_SIZE; 599 603 } 600 604 601 605 return 0; 606 + 607 + reset_slab: 608 + for (i = 0; i < KFENCE_POOL_SIZE / PAGE_SIZE; i++) { 609 + struct slab *slab = page_slab(nth_page(pages, i)); 610 + 611 + if (!i || (i % 2)) 612 + continue; 613 + #ifdef CONFIG_MEMCG 614 + slab->memcg_data = 0; 615 + #endif 616 + __folio_clear_slab(slab_folio(slab)); 617 + } 618 + 619 + return addr; 602 620 } 603 621 604 622 static bool __init kfence_init_pool_early(void) ··· 642 632 * fails for the first page, and therefore expect addr==__kfence_pool in 643 633 * most failure cases. 644 634 */ 645 - for (char *p = (char *)addr; p < __kfence_pool + KFENCE_POOL_SIZE; p += PAGE_SIZE) { 646 - struct slab *slab = virt_to_slab(p); 647 - 648 - if (!slab) 649 - continue; 650 - #ifdef CONFIG_MEMCG 651 - slab->memcg_data = 0; 652 - #endif 653 - __folio_clear_slab(slab_folio(slab)); 654 - } 655 635 memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - (unsigned long)__kfence_pool)); 656 636 __kfence_pool = NULL; 657 637 return false;
+15 -1
mm/memory.c
··· 3563 3563 struct vm_area_struct *vma = vmf->vma; 3564 3564 struct mmu_notifier_range range; 3565 3565 3566 - if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) 3566 + /* 3567 + * We need a reference to lock the folio because we don't hold 3568 + * the PTL so a racing thread can remove the device-exclusive 3569 + * entry and unmap it. If the folio is free the entry must 3570 + * have been removed already. If it happens to have already 3571 + * been re-allocated after being freed all we do is lock and 3572 + * unlock it. 3573 + */ 3574 + if (!folio_try_get(folio)) 3575 + return 0; 3576 + 3577 + if (!folio_lock_or_retry(folio, vma->vm_mm, vmf->flags)) { 3578 + folio_put(folio); 3567 3579 return VM_FAULT_RETRY; 3580 + } 3568 3581 mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, 3569 3582 vma->vm_mm, vmf->address & PAGE_MASK, 3570 3583 (vmf->address & PAGE_MASK) + PAGE_SIZE, NULL); ··· 3590 3577 3591 3578 pte_unmap_unlock(vmf->pte, vmf->ptl); 3592 3579 folio_unlock(folio); 3580 + folio_put(folio); 3593 3581 3594 3582 mmu_notifier_invalidate_range_end(&range); 3595 3583 return 0;
+2 -1
mm/mmap.c
··· 2277 2277 int count = 0; 2278 2278 int error = -ENOMEM; 2279 2279 MA_STATE(mas_detach, &mt_detach, 0, 0); 2280 - mt_init_flags(&mt_detach, MT_FLAGS_LOCK_EXTERN); 2280 + mt_init_flags(&mt_detach, vmi->mas.tree->ma_flags & MT_FLAGS_LOCK_MASK); 2281 2281 mt_set_external_lock(&mt_detach, &mm->mmap_lock); 2282 2282 2283 2283 /* ··· 3037 3037 */ 3038 3038 set_bit(MMF_OOM_SKIP, &mm->flags); 3039 3039 mmap_write_lock(mm); 3040 + mt_clear_in_rcu(&mm->mm_mt); 3040 3041 free_pgtables(&tlb, &mm->mm_mt, vma, FIRST_USER_ADDRESS, 3041 3042 USER_PGTABLES_CEILING); 3042 3043 tlb_finish_mmu(&tlb);
+2 -1
mm/swapfile.c
··· 679 679 { 680 680 int nid; 681 681 682 + assert_spin_locked(&p->lock); 682 683 for_each_node(nid) 683 684 plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]); 684 685 } ··· 2435 2434 spin_unlock(&swap_lock); 2436 2435 goto out_dput; 2437 2436 } 2438 - del_from_avail_list(p); 2439 2437 spin_lock(&p->lock); 2438 + del_from_avail_list(p); 2440 2439 if (p->prio < 0) { 2441 2440 struct swap_info_struct *si = p; 2442 2441 int nid;
+5 -3
mm/vmalloc.c
··· 3042 3042 * allocation request, free them via vfree() if any. 3043 3043 */ 3044 3044 if (area->nr_pages != nr_small_pages) { 3045 - warn_alloc(gfp_mask, NULL, 3046 - "vmalloc error: size %lu, page order %u, failed to allocate pages", 3047 - area->nr_pages * PAGE_SIZE, page_order); 3045 + /* vm_area_alloc_pages() can also fail due to a fatal signal */ 3046 + if (!fatal_signal_pending(current)) 3047 + warn_alloc(gfp_mask, NULL, 3048 + "vmalloc error: size %lu, page order %u, failed to allocate pages", 3049 + area->nr_pages * PAGE_SIZE, page_order); 3048 3050 goto fail; 3049 3051 } 3050 3052
+16
tools/testing/radix-tree/maple.c
··· 108 108 MT_BUG_ON(mt, mn->slot[1] != NULL); 109 109 MT_BUG_ON(mt, mas_allocated(&mas) != 0); 110 110 111 + mn->parent = ma_parent_ptr(mn); 111 112 ma_free_rcu(mn); 112 113 mas.node = MAS_START; 113 114 mas_nomem(&mas, GFP_KERNEL); ··· 161 160 MT_BUG_ON(mt, mas_allocated(&mas) != i); 162 161 MT_BUG_ON(mt, !mn); 163 162 MT_BUG_ON(mt, not_empty(mn)); 163 + mn->parent = ma_parent_ptr(mn); 164 164 ma_free_rcu(mn); 165 165 } 166 166 ··· 194 192 MT_BUG_ON(mt, not_empty(mn)); 195 193 MT_BUG_ON(mt, mas_allocated(&mas) != i - 1); 196 194 MT_BUG_ON(mt, !mn); 195 + mn->parent = ma_parent_ptr(mn); 197 196 ma_free_rcu(mn); 198 197 } 199 198 ··· 213 210 mn = mas_pop_node(&mas); 214 211 MT_BUG_ON(mt, not_empty(mn)); 215 212 MT_BUG_ON(mt, mas_allocated(&mas) != j - 1); 213 + mn->parent = ma_parent_ptr(mn); 216 214 ma_free_rcu(mn); 217 215 } 218 216 MT_BUG_ON(mt, mas_allocated(&mas) != 0); ··· 237 233 MT_BUG_ON(mt, mas_allocated(&mas) != i - j); 238 234 mn = mas_pop_node(&mas); 239 235 MT_BUG_ON(mt, not_empty(mn)); 236 + mn->parent = ma_parent_ptr(mn); 240 237 ma_free_rcu(mn); 241 238 MT_BUG_ON(mt, mas_allocated(&mas) != i - j - 1); 242 239 } ··· 274 269 mn = mas_pop_node(&mas); /* get the next node. */ 275 270 MT_BUG_ON(mt, mn == NULL); 276 271 MT_BUG_ON(mt, not_empty(mn)); 272 + mn->parent = ma_parent_ptr(mn); 277 273 ma_free_rcu(mn); 278 274 } 279 275 MT_BUG_ON(mt, mas_allocated(&mas) != 0); ··· 300 294 mn = mas_pop_node(&mas2); /* get the next node. */ 301 295 MT_BUG_ON(mt, mn == NULL); 302 296 MT_BUG_ON(mt, not_empty(mn)); 297 + mn->parent = ma_parent_ptr(mn); 303 298 ma_free_rcu(mn); 304 299 } 305 300 MT_BUG_ON(mt, mas_allocated(&mas2) != 0); ··· 341 334 MT_BUG_ON(mt, mas_allocated(&mas) != MAPLE_ALLOC_SLOTS + 2); 342 335 mn = mas_pop_node(&mas); 343 336 MT_BUG_ON(mt, not_empty(mn)); 337 + mn->parent = ma_parent_ptr(mn); 344 338 ma_free_rcu(mn); 345 339 for (i = 1; i <= MAPLE_ALLOC_SLOTS + 1; i++) { 346 340 mn = mas_pop_node(&mas); 347 341 MT_BUG_ON(mt, not_empty(mn)); 342 + mn->parent = ma_parent_ptr(mn); 348 343 ma_free_rcu(mn); 349 344 } 350 345 MT_BUG_ON(mt, mas_allocated(&mas) != 0); ··· 384 375 mas_node_count(&mas, i); /* Request */ 385 376 mas_nomem(&mas, GFP_KERNEL); /* Fill request */ 386 377 mn = mas_pop_node(&mas); /* get the next node. */ 378 + mn->parent = ma_parent_ptr(mn); 387 379 ma_free_rcu(mn); 388 380 mas_destroy(&mas); 389 381 ··· 392 382 mas_node_count(&mas, i); /* Request */ 393 383 mas_nomem(&mas, GFP_KERNEL); /* Fill request */ 394 384 mn = mas_pop_node(&mas); /* get the next node. */ 385 + mn->parent = ma_parent_ptr(mn); 395 386 ma_free_rcu(mn); 396 387 mn = mas_pop_node(&mas); /* get the next node. */ 388 + mn->parent = ma_parent_ptr(mn); 397 389 ma_free_rcu(mn); 398 390 mn = mas_pop_node(&mas); /* get the next node. */ 391 + mn->parent = ma_parent_ptr(mn); 399 392 ma_free_rcu(mn); 400 393 mas_destroy(&mas); 401 394 } ··· 35382 35369 MT_BUG_ON(mt, allocated != 1 + height * 3); 35383 35370 mn = mas_pop_node(&mas); 35384 35371 MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1); 35372 + mn->parent = ma_parent_ptr(mn); 35385 35373 ma_free_rcu(mn); 35386 35374 MT_BUG_ON(mt, mas_preallocate(&mas, GFP_KERNEL) != 0); 35387 35375 mas_destroy(&mas); ··· 35400 35386 mas_destroy(&mas); 35401 35387 allocated = mas_allocated(&mas); 35402 35388 MT_BUG_ON(mt, allocated != 0); 35389 + mn->parent = ma_parent_ptr(mn); 35403 35390 ma_free_rcu(mn); 35404 35391 35405 35392 MT_BUG_ON(mt, mas_preallocate(&mas, GFP_KERNEL) != 0); ··· 35771 35756 tree.ma_root = mt_mk_node(node, maple_leaf_64); 35772 35757 mt_dump(&tree); 35773 35758 35759 + node->parent = ma_parent_ptr(node); 35774 35760 ma_free_rcu(node); 35775 35761 35776 35762 /* Check things that will make lockdep angry */