Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm, swap: use the swap table for the swap cache and switch API

Introduce basic swap table infrastructures, which are now just a
fixed-sized flat array inside each swap cluster, with access wrappers.

Each cluster contains a swap table of 512 entries. Each table entry is an
opaque atomic long. It could be in 3 types: a shadow type (XA_VALUE), a
folio type (pointer), or NULL.

In this first step, it only supports storing a folio or shadow, and it is
a drop-in replacement for the current swap cache. Convert all swap cache
users to use the new sets of APIs. Chris Li has been suggesting using a
new infrastructure for swap cache for better performance, and that idea
combined well with the swap table as the new backing structure. Now the
lock contention range is reduced to 2M clusters, which is much smaller
than the 64M address_space. And we can also drop the multiple
address_space design.

All the internal works are done with swap_cache_get_* helpers. Swap cache
lookup is still lock-less like before, and the helper's contexts are same
with original swap cache helpers. They still require a pin on the swap
device to prevent the backing data from being freed.

Swap cache updates are now protected by the swap cluster lock instead of
the XArray lock. This is mostly handled internally, but new
__swap_cache_* helpers require the caller to lock the cluster. So, a few
new cluster access and locking helpers are also introduced.

A fully cluster-based unified swap table can be implemented on top of this
to take care of all count tracking and synchronization work, with dynamic
allocation. It should reduce the memory usage while making the
performance even better.

Link: https://lkml.kernel.org/r/20250916160100.31545-12-ryncsn@gmail.com
Co-developed-by: Chris Li <chrisl@kernel.org>
Signed-off-by: Chris Li <chrisl@kernel.org>
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Suggested-by: Chris Li <chrisl@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: kernel test robot <oliver.sang@intel.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Kairui Song and committed by
Andrew Morton
8578e0c0 094dc8b0

+454 -243
+1
MAINTAINERS
··· 16232 16232 F: mm/page_io.c 16233 16233 F: mm/swap.c 16234 16234 F: mm/swap.h 16235 + F: mm/swap_table.h 16235 16236 F: mm/swap_state.c 16236 16237 F: mm/swapfile.c 16237 16238
-2
include/linux/swap.h
··· 480 480 extern bool swap_entry_swapped(struct swap_info_struct *si, swp_entry_t entry); 481 481 extern int swp_swapcount(swp_entry_t entry); 482 482 struct backing_dev_info; 483 - extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); 484 - extern void exit_swap_address_space(unsigned int type); 485 483 extern struct swap_info_struct *get_swap_device(swp_entry_t entry); 486 484 sector_t swap_folio_sector(struct folio *folio); 487 485
+6 -7
mm/huge_memory.c
··· 3720 3720 /* Prevent deferred_split_scan() touching ->_refcount */ 3721 3721 spin_lock(&ds_queue->split_queue_lock); 3722 3722 if (folio_ref_freeze(folio, 1 + extra_pins)) { 3723 - struct address_space *swap_cache = NULL; 3723 + struct swap_cluster_info *ci = NULL; 3724 3724 struct lruvec *lruvec; 3725 3725 int expected_refs; 3726 3726 ··· 3764 3764 goto fail; 3765 3765 } 3766 3766 3767 - swap_cache = swap_address_space(folio->swap); 3768 - xa_lock(&swap_cache->i_pages); 3767 + ci = swap_cluster_get_and_lock(folio); 3769 3768 } 3770 3769 3771 3770 /* lock lru list/PageCompound, ref frozen by page_ref_freeze */ ··· 3796 3797 * Anonymous folio with swap cache. 3797 3798 * NOTE: shmem in swap cache is not supported yet. 3798 3799 */ 3799 - if (swap_cache) { 3800 - __swap_cache_replace_folio(folio, new_folio); 3800 + if (ci) { 3801 + __swap_cache_replace_folio(ci, folio, new_folio); 3801 3802 continue; 3802 3803 } 3803 3804 ··· 3832 3833 3833 3834 unlock_page_lruvec(lruvec); 3834 3835 3835 - if (swap_cache) 3836 - xa_unlock(&swap_cache->i_pages); 3836 + if (ci) 3837 + swap_cluster_unlock(ci); 3837 3838 } else { 3838 3839 spin_unlock(&ds_queue->split_queue_lock); 3839 3840 ret = -EAGAIN;
+15 -4
mm/migrate.c
··· 563 563 struct folio *newfolio, struct folio *folio, int expected_count) 564 564 { 565 565 XA_STATE(xas, &mapping->i_pages, folio_index(folio)); 566 + struct swap_cluster_info *ci = NULL; 566 567 struct zone *oldzone, *newzone; 567 568 int dirty; 568 569 long nr = folio_nr_pages(folio); ··· 592 591 oldzone = folio_zone(folio); 593 592 newzone = folio_zone(newfolio); 594 593 595 - xas_lock_irq(&xas); 594 + if (folio_test_swapcache(folio)) 595 + ci = swap_cluster_get_and_lock_irq(folio); 596 + else 597 + xas_lock_irq(&xas); 598 + 596 599 if (!folio_ref_freeze(folio, expected_count)) { 597 - xas_unlock_irq(&xas); 600 + if (ci) 601 + swap_cluster_unlock_irq(ci); 602 + else 603 + xas_unlock_irq(&xas); 598 604 return -EAGAIN; 599 605 } 600 606 ··· 632 624 } 633 625 634 626 if (folio_test_swapcache(folio)) 635 - __swap_cache_replace_folio(folio, newfolio); 627 + __swap_cache_replace_folio(ci, folio, newfolio); 636 628 else 637 629 xas_store(&xas, newfolio); 638 630 ··· 643 635 */ 644 636 folio_ref_unfreeze(folio, expected_count - nr); 645 637 646 - xas_unlock(&xas); 647 638 /* Leave irq disabled to prevent preemption while updating stats */ 639 + if (ci) 640 + swap_cluster_unlock(ci); 641 + else 642 + xas_unlock(&xas); 648 643 649 644 /* 650 645 * If moved to a different zone then also account
+4 -4
mm/shmem.c
··· 2083 2083 struct shmem_inode_info *info, pgoff_t index, 2084 2084 struct vm_area_struct *vma) 2085 2085 { 2086 + struct swap_cluster_info *ci; 2086 2087 struct folio *new, *old = *foliop; 2087 2088 swp_entry_t entry = old->swap; 2088 - struct address_space *swap_mapping = swap_address_space(entry); 2089 2089 int nr_pages = folio_nr_pages(old); 2090 2090 int error = 0; 2091 2091 ··· 2116 2116 new->swap = entry; 2117 2117 folio_set_swapcache(new); 2118 2118 2119 - xa_lock_irq(&swap_mapping->i_pages); 2120 - __swap_cache_replace_folio(old, new); 2119 + ci = swap_cluster_get_and_lock_irq(old); 2120 + __swap_cache_replace_folio(ci, old, new); 2121 2121 mem_cgroup_replace_folio(old, new); 2122 2122 shmem_update_stats(new, nr_pages); 2123 2123 shmem_update_stats(old, -nr_pages); 2124 - xa_unlock_irq(&swap_mapping->i_pages); 2124 + swap_cluster_unlock_irq(ci); 2125 2125 2126 2126 folio_add_lru(new); 2127 2127 *foliop = new;
+119 -35
mm/swap.h
··· 2 2 #ifndef _MM_SWAP_H 3 3 #define _MM_SWAP_H 4 4 5 + #include <linux/atomic.h> /* for atomic_long_t */ 5 6 struct mempolicy; 6 7 struct swap_iocb; 7 8 ··· 36 35 u16 count; 37 36 u8 flags; 38 37 u8 order; 38 + atomic_long_t *table; /* Swap table entries, see mm/swap_table.h */ 39 39 struct list_head list; 40 40 }; 41 41 ··· 56 54 #ifdef CONFIG_SWAP 57 55 #include <linux/swapops.h> /* for swp_offset */ 58 56 #include <linux/blk_types.h> /* for bio_end_io_t */ 57 + 58 + static inline unsigned int swp_cluster_offset(swp_entry_t entry) 59 + { 60 + return swp_offset(entry) % SWAPFILE_CLUSTER; 61 + } 59 62 60 63 /* 61 64 * Callers of all helpers below must ensure the entry, type, or offset is ··· 88 81 return &si->cluster_info[offset / SWAPFILE_CLUSTER]; 89 82 } 90 83 84 + static inline struct swap_cluster_info *__swap_entry_to_cluster(swp_entry_t entry) 85 + { 86 + return __swap_offset_to_cluster(__swap_entry_to_info(entry), 87 + swp_offset(entry)); 88 + } 89 + 90 + static __always_inline struct swap_cluster_info *__swap_cluster_lock( 91 + struct swap_info_struct *si, unsigned long offset, bool irq) 92 + { 93 + struct swap_cluster_info *ci = __swap_offset_to_cluster(si, offset); 94 + 95 + VM_WARN_ON_ONCE(percpu_ref_is_zero(&si->users)); /* race with swapoff */ 96 + if (irq) 97 + spin_lock_irq(&ci->lock); 98 + else 99 + spin_lock(&ci->lock); 100 + return ci; 101 + } 102 + 91 103 /** 92 104 * swap_cluster_lock - Lock and return the swap cluster of given offset. 93 105 * @si: swap device the cluster belongs to. ··· 118 92 static inline struct swap_cluster_info *swap_cluster_lock( 119 93 struct swap_info_struct *si, unsigned long offset) 120 94 { 121 - struct swap_cluster_info *ci = __swap_offset_to_cluster(si, offset); 95 + return __swap_cluster_lock(si, offset, false); 96 + } 122 97 123 - VM_WARN_ON_ONCE(percpu_ref_is_zero(&si->users)); /* race with swapoff */ 124 - spin_lock(&ci->lock); 125 - return ci; 98 + static inline struct swap_cluster_info *__swap_cluster_get_and_lock( 99 + const struct folio *folio, bool irq) 100 + { 101 + VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); 102 + VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio); 103 + return __swap_cluster_lock(__swap_entry_to_info(folio->swap), 104 + swp_offset(folio->swap), irq); 105 + } 106 + 107 + /* 108 + * swap_cluster_get_and_lock - Locks the cluster that holds a folio's entries. 109 + * @folio: The folio. 110 + * 111 + * This locks and returns the swap cluster that contains a folio's swap 112 + * entries. The swap entries of a folio are always in one single cluster. 113 + * The folio has to be locked so its swap entries won't change and the 114 + * cluster won't be freed. 115 + * 116 + * Context: Caller must ensure the folio is locked and in the swap cache. 117 + * Return: Pointer to the swap cluster. 118 + */ 119 + static inline struct swap_cluster_info *swap_cluster_get_and_lock( 120 + const struct folio *folio) 121 + { 122 + return __swap_cluster_get_and_lock(folio, false); 123 + } 124 + 125 + /* 126 + * swap_cluster_get_and_lock_irq - Locks the cluster that holds a folio's entries. 127 + * @folio: The folio. 128 + * 129 + * Same as swap_cluster_get_and_lock but also disable IRQ. 130 + * 131 + * Context: Caller must ensure the folio is locked and in the swap cache. 132 + * Return: Pointer to the swap cluster. 133 + */ 134 + static inline struct swap_cluster_info *swap_cluster_get_and_lock_irq( 135 + const struct folio *folio) 136 + { 137 + return __swap_cluster_get_and_lock(folio, true); 126 138 } 127 139 128 140 static inline void swap_cluster_unlock(struct swap_cluster_info *ci) 129 141 { 130 142 spin_unlock(&ci->lock); 143 + } 144 + 145 + static inline void swap_cluster_unlock_irq(struct swap_cluster_info *ci) 146 + { 147 + spin_unlock_irq(&ci->lock); 131 148 } 132 149 133 150 /* linux/mm/page_io.c */ ··· 192 123 #define SWAP_ADDRESS_SPACE_SHIFT 14 193 124 #define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) 194 125 #define SWAP_ADDRESS_SPACE_MASK (SWAP_ADDRESS_SPACE_PAGES - 1) 195 - extern struct address_space *swapper_spaces[]; 196 - #define swap_address_space(entry) \ 197 - (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ 198 - >> SWAP_ADDRESS_SPACE_SHIFT]) 126 + extern struct address_space swap_space; 127 + static inline struct address_space *swap_address_space(swp_entry_t entry) 128 + { 129 + return &swap_space; 130 + } 199 131 200 132 /* 201 133 * Return the swap device position of the swap entry. ··· 204 134 static inline loff_t swap_dev_pos(swp_entry_t entry) 205 135 { 206 136 return ((loff_t)swp_offset(entry)) << PAGE_SHIFT; 207 - } 208 - 209 - /* 210 - * Return the swap cache index of the swap entry. 211 - */ 212 - static inline pgoff_t swap_cache_index(swp_entry_t entry) 213 - { 214 - BUILD_BUG_ON((SWP_OFFSET_MASK | SWAP_ADDRESS_SPACE_MASK) != SWP_OFFSET_MASK); 215 - return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK; 216 137 } 217 138 218 139 /** ··· 241 180 */ 242 181 struct folio *swap_cache_get_folio(swp_entry_t entry); 243 182 void *swap_cache_get_shadow(swp_entry_t entry); 244 - int swap_cache_add_folio(struct folio *folio, swp_entry_t entry, 245 - gfp_t gfp, void **shadow); 183 + void swap_cache_add_folio(struct folio *folio, swp_entry_t entry, void **shadow); 246 184 void swap_cache_del_folio(struct folio *folio); 247 - void __swap_cache_del_folio(struct folio *folio, 248 - swp_entry_t entry, void *shadow); 249 - void __swap_cache_replace_folio(struct folio *old, struct folio *new); 250 - void swap_cache_clear_shadow(int type, unsigned long begin, 251 - unsigned long end); 185 + /* Below helpers require the caller to lock and pass in the swap cluster. */ 186 + void __swap_cache_del_folio(struct swap_cluster_info *ci, 187 + struct folio *folio, swp_entry_t entry, void *shadow); 188 + void __swap_cache_replace_folio(struct swap_cluster_info *ci, 189 + struct folio *old, struct folio *new); 190 + void __swap_cache_clear_shadow(swp_entry_t entry, int nr_ents); 252 191 253 192 void show_swap_cache_info(void); 254 193 void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry, int nr); ··· 316 255 317 256 #else /* CONFIG_SWAP */ 318 257 struct swap_iocb; 258 + static inline struct swap_cluster_info *swap_cluster_lock( 259 + struct swap_info_struct *si, pgoff_t offset, bool irq) 260 + { 261 + return NULL; 262 + } 263 + 264 + static inline struct swap_cluster_info *swap_cluster_get_and_lock( 265 + struct folio *folio) 266 + { 267 + return NULL; 268 + } 269 + 270 + static inline struct swap_cluster_info *swap_cluster_get_and_lock_irq( 271 + struct folio *folio) 272 + { 273 + return NULL; 274 + } 275 + 276 + static inline void swap_cluster_unlock(struct swap_cluster_info *ci) 277 + { 278 + } 279 + 280 + static inline void swap_cluster_unlock_irq(struct swap_cluster_info *ci) 281 + { 282 + } 283 + 319 284 static inline struct swap_info_struct *__swap_entry_to_info(swp_entry_t entry) 320 285 { 321 286 return NULL; ··· 357 270 static inline struct address_space *swap_address_space(swp_entry_t entry) 358 271 { 359 272 return NULL; 360 - } 361 - 362 - static inline pgoff_t swap_cache_index(swp_entry_t entry) 363 - { 364 - return 0; 365 273 } 366 274 367 275 static inline bool folio_matches_swap_entry(const struct folio *folio, swp_entry_t entry) ··· 405 323 return NULL; 406 324 } 407 325 408 - static inline int swap_cache_add_folio(swp_entry_t entry, struct folio *folio, 409 - gfp_t gfp, void **shadow) 326 + static inline void swap_cache_add_folio(struct folio *folio, swp_entry_t entry, void **shadow) 410 327 { 411 - return -EINVAL; 412 328 } 413 329 414 330 static inline void swap_cache_del_folio(struct folio *folio) 415 331 { 416 332 } 417 333 418 - static inline void __swap_cache_del_folio(struct folio *folio, swp_entry_t entry, void *shadow) 334 + static inline void __swap_cache_del_folio(struct swap_cluster_info *ci, 335 + struct folio *folio, swp_entry_t entry, void *shadow) 419 336 { 420 337 } 421 338 422 - static inline void __swap_cache_replace_folio(struct folio *old, struct folio *new) 339 + static inline void __swap_cache_replace_folio(struct swap_cluster_info *ci, 340 + struct folio *old, struct folio *new) 423 341 { 424 342 } 425 343 ··· 453 371 */ 454 372 static inline pgoff_t folio_index(struct folio *folio) 455 373 { 374 + #ifdef CONFIG_SWAP 456 375 if (unlikely(folio_test_swapcache(folio))) 457 - return swap_cache_index(folio->swap); 376 + return swp_offset(folio->swap); 377 + #endif 458 378 return folio->index; 459 379 } 460 380
+123 -160
mm/swap_state.c
··· 23 23 #include <linux/huge_mm.h> 24 24 #include <linux/shmem_fs.h> 25 25 #include "internal.h" 26 + #include "swap_table.h" 26 27 #include "swap.h" 27 28 28 29 /* ··· 37 36 #endif 38 37 }; 39 38 40 - struct address_space *swapper_spaces[MAX_SWAPFILES] __read_mostly; 41 - static unsigned int nr_swapper_spaces[MAX_SWAPFILES] __read_mostly; 39 + struct address_space swap_space __read_mostly = { 40 + .a_ops = &swap_aops, 41 + }; 42 + 42 43 static bool enable_vma_readahead __read_mostly = true; 43 44 44 45 #define SWAP_RA_ORDER_CEILING 5 ··· 86 83 */ 87 84 struct folio *swap_cache_get_folio(swp_entry_t entry) 88 85 { 89 - struct folio *folio = filemap_get_folio(swap_address_space(entry), 90 - swap_cache_index(entry)); 91 - if (IS_ERR(folio)) 92 - return NULL; 93 - return folio; 86 + unsigned long swp_tb; 87 + struct folio *folio; 88 + 89 + for (;;) { 90 + swp_tb = __swap_table_get(__swap_entry_to_cluster(entry), 91 + swp_cluster_offset(entry)); 92 + if (!swp_tb_is_folio(swp_tb)) 93 + return NULL; 94 + folio = swp_tb_to_folio(swp_tb); 95 + if (likely(folio_try_get(folio))) 96 + return folio; 97 + } 98 + 99 + return NULL; 94 100 } 95 101 96 102 /** ··· 112 100 */ 113 101 void *swap_cache_get_shadow(swp_entry_t entry) 114 102 { 115 - struct address_space *address_space = swap_address_space(entry); 116 - pgoff_t idx = swap_cache_index(entry); 117 - void *shadow; 103 + unsigned long swp_tb; 118 104 119 - shadow = xa_load(&address_space->i_pages, idx); 120 - if (xa_is_value(shadow)) 121 - return shadow; 105 + swp_tb = __swap_table_get(__swap_entry_to_cluster(entry), 106 + swp_cluster_offset(entry)); 107 + if (swp_tb_is_shadow(swp_tb)) 108 + return swp_tb_to_shadow(swp_tb); 109 + 122 110 return NULL; 123 111 } 124 112 ··· 131 119 * 132 120 * Context: Caller must ensure @entry is valid and protect the swap device 133 121 * with reference count or locks. 134 - * The caller also needs to mark the corresponding swap_map slots with 135 - * SWAP_HAS_CACHE to avoid race or conflict. 136 - * Return: Returns 0 on success, error code otherwise. 122 + * The caller also needs to update the corresponding swap_map slots with 123 + * SWAP_HAS_CACHE bit to avoid race or conflict. 137 124 */ 138 - int swap_cache_add_folio(struct folio *folio, swp_entry_t entry, 139 - gfp_t gfp, void **shadowp) 125 + void swap_cache_add_folio(struct folio *folio, swp_entry_t entry, void **shadowp) 140 126 { 141 - struct address_space *address_space = swap_address_space(entry); 142 - pgoff_t idx = swap_cache_index(entry); 143 - XA_STATE_ORDER(xas, &address_space->i_pages, idx, folio_order(folio)); 144 - unsigned long i, nr = folio_nr_pages(folio); 145 - void *old; 127 + void *shadow = NULL; 128 + unsigned long old_tb, new_tb; 129 + struct swap_cluster_info *ci; 130 + unsigned int ci_start, ci_off, ci_end; 131 + unsigned long nr_pages = folio_nr_pages(folio); 146 132 147 - xas_set_update(&xas, workingset_update_node); 133 + VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); 134 + VM_WARN_ON_ONCE_FOLIO(folio_test_swapcache(folio), folio); 135 + VM_WARN_ON_ONCE_FOLIO(!folio_test_swapbacked(folio), folio); 148 136 149 - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); 150 - VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); 151 - VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio); 137 + new_tb = folio_to_swp_tb(folio); 138 + ci_start = swp_cluster_offset(entry); 139 + ci_end = ci_start + nr_pages; 140 + ci_off = ci_start; 141 + ci = swap_cluster_lock(__swap_entry_to_info(entry), swp_offset(entry)); 142 + do { 143 + old_tb = __swap_table_xchg(ci, ci_off, new_tb); 144 + WARN_ON_ONCE(swp_tb_is_folio(old_tb)); 145 + if (swp_tb_is_shadow(old_tb)) 146 + shadow = swp_tb_to_shadow(old_tb); 147 + } while (++ci_off < ci_end); 152 148 153 - folio_ref_add(folio, nr); 149 + folio_ref_add(folio, nr_pages); 154 150 folio_set_swapcache(folio); 155 151 folio->swap = entry; 152 + swap_cluster_unlock(ci); 156 153 157 - do { 158 - xas_lock_irq(&xas); 159 - xas_create_range(&xas); 160 - if (xas_error(&xas)) 161 - goto unlock; 162 - for (i = 0; i < nr; i++) { 163 - VM_BUG_ON_FOLIO(xas.xa_index != idx + i, folio); 164 - if (shadowp) { 165 - old = xas_load(&xas); 166 - if (xa_is_value(old)) 167 - *shadowp = old; 168 - } 169 - xas_store(&xas, folio); 170 - xas_next(&xas); 171 - } 172 - address_space->nrpages += nr; 173 - __node_stat_mod_folio(folio, NR_FILE_PAGES, nr); 174 - __lruvec_stat_mod_folio(folio, NR_SWAPCACHE, nr); 175 - unlock: 176 - xas_unlock_irq(&xas); 177 - } while (xas_nomem(&xas, gfp)); 154 + node_stat_mod_folio(folio, NR_FILE_PAGES, nr_pages); 155 + lruvec_stat_mod_folio(folio, NR_SWAPCACHE, nr_pages); 178 156 179 - if (!xas_error(&xas)) 180 - return 0; 181 - 182 - folio_clear_swapcache(folio); 183 - folio_ref_sub(folio, nr); 184 - return xas_error(&xas); 157 + if (shadowp) 158 + *shadowp = shadow; 185 159 } 186 160 187 161 /** 188 162 * __swap_cache_del_folio - Removes a folio from the swap cache. 163 + * @ci: The locked swap cluster. 189 164 * @folio: The folio. 190 165 * @entry: The first swap entry that the folio corresponds to. 191 166 * @shadow: shadow value to be filled in the swap cache. ··· 180 181 * Removes a folio from the swap cache and fills a shadow in place. 181 182 * This won't put the folio's refcount. The caller has to do that. 182 183 * 183 - * Context: Caller must hold the xa_lock, ensure the folio is 184 - * locked and in the swap cache, using the index of @entry. 184 + * Context: Caller must ensure the folio is locked and in the swap cache 185 + * using the index of @entry, and lock the cluster that holds the entries. 185 186 */ 186 - void __swap_cache_del_folio(struct folio *folio, 187 + void __swap_cache_del_folio(struct swap_cluster_info *ci, struct folio *folio, 187 188 swp_entry_t entry, void *shadow) 188 189 { 189 - struct address_space *address_space = swap_address_space(entry); 190 - int i; 191 - long nr = folio_nr_pages(folio); 192 - pgoff_t idx = swap_cache_index(entry); 193 - XA_STATE(xas, &address_space->i_pages, idx); 190 + unsigned long old_tb, new_tb; 191 + unsigned int ci_start, ci_off, ci_end; 192 + unsigned long nr_pages = folio_nr_pages(folio); 194 193 195 - xas_set_update(&xas, workingset_update_node); 194 + VM_WARN_ON_ONCE(__swap_entry_to_cluster(entry) != ci); 195 + VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); 196 + VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio); 197 + VM_WARN_ON_ONCE_FOLIO(folio_test_writeback(folio), folio); 196 198 197 - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); 198 - VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio); 199 - VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio); 199 + new_tb = shadow_swp_to_tb(shadow); 200 + ci_start = swp_cluster_offset(entry); 201 + ci_end = ci_start + nr_pages; 202 + ci_off = ci_start; 203 + do { 204 + /* If shadow is NULL, we sets an empty shadow */ 205 + old_tb = __swap_table_xchg(ci, ci_off, new_tb); 206 + WARN_ON_ONCE(!swp_tb_is_folio(old_tb) || 207 + swp_tb_to_folio(old_tb) != folio); 208 + } while (++ci_off < ci_end); 200 209 201 - for (i = 0; i < nr; i++) { 202 - void *entry = xas_store(&xas, shadow); 203 - VM_BUG_ON_PAGE(entry != folio, entry); 204 - xas_next(&xas); 205 - } 206 210 folio->swap.val = 0; 207 211 folio_clear_swapcache(folio); 208 - address_space->nrpages -= nr; 209 - __node_stat_mod_folio(folio, NR_FILE_PAGES, -nr); 210 - __lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr); 212 + node_stat_mod_folio(folio, NR_FILE_PAGES, -nr_pages); 213 + lruvec_stat_mod_folio(folio, NR_SWAPCACHE, -nr_pages); 211 214 } 212 215 213 216 /** ··· 224 223 */ 225 224 void swap_cache_del_folio(struct folio *folio) 226 225 { 226 + struct swap_cluster_info *ci; 227 227 swp_entry_t entry = folio->swap; 228 - struct address_space *address_space = swap_address_space(entry); 229 228 230 - xa_lock_irq(&address_space->i_pages); 231 - __swap_cache_del_folio(folio, entry, NULL); 232 - xa_unlock_irq(&address_space->i_pages); 229 + ci = swap_cluster_lock(__swap_entry_to_info(entry), swp_offset(entry)); 230 + __swap_cache_del_folio(ci, folio, entry, NULL); 231 + swap_cluster_unlock(ci); 233 232 234 233 put_swap_folio(folio, entry); 235 234 folio_ref_sub(folio, folio_nr_pages(folio)); ··· 237 236 238 237 /** 239 238 * __swap_cache_replace_folio - Replace a folio in the swap cache. 239 + * @ci: The locked swap cluster. 240 240 * @old: The old folio to be replaced. 241 241 * @new: The new folio. 242 242 * ··· 246 244 * entries. Replacement will take the new folio's swap entry value as 247 245 * the starting offset to override all slots covered by the new folio. 248 246 * 249 - * Context: Caller must ensure both folios are locked, also lock the 250 - * swap address_space that holds the old folio to avoid races. 247 + * Context: Caller must ensure both folios are locked, and lock the 248 + * cluster that holds the old folio to be replaced. 251 249 */ 252 - void __swap_cache_replace_folio(struct folio *old, struct folio *new) 250 + void __swap_cache_replace_folio(struct swap_cluster_info *ci, 251 + struct folio *old, struct folio *new) 253 252 { 254 253 swp_entry_t entry = new->swap; 255 254 unsigned long nr_pages = folio_nr_pages(new); 256 - unsigned long offset = swap_cache_index(entry); 257 - unsigned long end = offset + nr_pages; 258 - 259 - XA_STATE(xas, &swap_address_space(entry)->i_pages, offset); 255 + unsigned int ci_off = swp_cluster_offset(entry); 256 + unsigned int ci_end = ci_off + nr_pages; 257 + unsigned long old_tb, new_tb; 260 258 261 259 VM_WARN_ON_ONCE(!folio_test_swapcache(old) || !folio_test_swapcache(new)); 262 260 VM_WARN_ON_ONCE(!folio_test_locked(old) || !folio_test_locked(new)); 263 261 VM_WARN_ON_ONCE(!entry.val); 264 262 265 263 /* Swap cache still stores N entries instead of a high-order entry */ 264 + new_tb = folio_to_swp_tb(new); 266 265 do { 267 - WARN_ON_ONCE(xas_store(&xas, new) != old); 268 - xas_next(&xas); 269 - } while (++offset < end); 266 + old_tb = __swap_table_xchg(ci, ci_off, new_tb); 267 + WARN_ON_ONCE(!swp_tb_is_folio(old_tb) || swp_tb_to_folio(old_tb) != old); 268 + } while (++ci_off < ci_end); 269 + 270 + /* 271 + * If the old folio is partially replaced (e.g., splitting a large 272 + * folio, the old folio is shrunk, and new split sub folios replace 273 + * the shrunk part), ensure the new folio doesn't overlap it. 274 + */ 275 + if (IS_ENABLED(CONFIG_DEBUG_VM) && 276 + folio_order(old) != folio_order(new)) { 277 + ci_off = swp_cluster_offset(old->swap); 278 + ci_end = ci_off + folio_nr_pages(old); 279 + while (ci_off++ < ci_end) 280 + WARN_ON_ONCE(swp_tb_to_folio(__swap_table_get(ci, ci_off)) != old); 281 + } 270 282 } 271 283 272 284 /** 273 285 * swap_cache_clear_shadow - Clears a set of shadows in the swap cache. 274 - * @type: Indicates the swap device. 275 - * @begin: Beginning offset of the range. 276 - * @end: Ending offset of the range. 286 + * @entry: The starting index entry. 287 + * @nr_ents: How many slots need to be cleared. 277 288 * 278 - * Context: Caller must ensure the range is valid and hold a reference to 279 - * the swap device. 289 + * Context: Caller must ensure the range is valid, all in one single cluster, 290 + * not occupied by any folio, and lock the cluster. 280 291 */ 281 - void swap_cache_clear_shadow(int type, unsigned long begin, 282 - unsigned long end) 292 + void __swap_cache_clear_shadow(swp_entry_t entry, int nr_ents) 283 293 { 284 - unsigned long curr = begin; 285 - void *old; 294 + struct swap_cluster_info *ci = __swap_entry_to_cluster(entry); 295 + unsigned int ci_off = swp_cluster_offset(entry), ci_end; 296 + unsigned long old; 286 297 287 - for (;;) { 288 - swp_entry_t entry = swp_entry(type, curr); 289 - unsigned long index = curr & SWAP_ADDRESS_SPACE_MASK; 290 - struct address_space *address_space = swap_address_space(entry); 291 - XA_STATE(xas, &address_space->i_pages, index); 292 - 293 - xas_set_update(&xas, workingset_update_node); 294 - 295 - xa_lock_irq(&address_space->i_pages); 296 - xas_for_each(&xas, old, min(index + (end - curr), SWAP_ADDRESS_SPACE_PAGES)) { 297 - if (!xa_is_value(old)) 298 - continue; 299 - xas_store(&xas, NULL); 300 - } 301 - xa_unlock_irq(&address_space->i_pages); 302 - 303 - /* search the next swapcache until we meet end */ 304 - curr = ALIGN((curr + 1), SWAP_ADDRESS_SPACE_PAGES); 305 - if (curr > end) 306 - break; 307 - } 298 + ci_end = ci_off + nr_ents; 299 + do { 300 + old = __swap_table_xchg(ci, ci_off, null_to_swp_tb()); 301 + WARN_ON_ONCE(swp_tb_is_folio(old)); 302 + } while (++ci_off < ci_end); 308 303 } 309 304 310 305 /* ··· 481 482 if (mem_cgroup_swapin_charge_folio(new_folio, NULL, gfp_mask, entry)) 482 483 goto fail_unlock; 483 484 484 - /* May fail (-ENOMEM) if XArray node allocation failed. */ 485 - if (swap_cache_add_folio(new_folio, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) 486 - goto fail_unlock; 487 - 485 + swap_cache_add_folio(new_folio, entry, &shadow); 488 486 memcg1_swapin(entry, 1); 489 487 490 488 if (shadow) ··· 673 677 return folio; 674 678 } 675 679 676 - int init_swap_address_space(unsigned int type, unsigned long nr_pages) 677 - { 678 - struct address_space *spaces, *space; 679 - unsigned int i, nr; 680 - 681 - nr = DIV_ROUND_UP(nr_pages, SWAP_ADDRESS_SPACE_PAGES); 682 - spaces = kvcalloc(nr, sizeof(struct address_space), GFP_KERNEL); 683 - if (!spaces) 684 - return -ENOMEM; 685 - for (i = 0; i < nr; i++) { 686 - space = spaces + i; 687 - xa_init_flags(&space->i_pages, XA_FLAGS_LOCK_IRQ); 688 - atomic_set(&space->i_mmap_writable, 0); 689 - space->a_ops = &swap_aops; 690 - /* swap cache doesn't use writeback related tags */ 691 - mapping_set_no_writeback_tags(space); 692 - } 693 - nr_swapper_spaces[type] = nr; 694 - swapper_spaces[type] = spaces; 695 - 696 - return 0; 697 - } 698 - 699 - void exit_swap_address_space(unsigned int type) 700 - { 701 - int i; 702 - struct address_space *spaces = swapper_spaces[type]; 703 - 704 - for (i = 0; i < nr_swapper_spaces[type]; i++) 705 - VM_WARN_ON_ONCE(!mapping_empty(&spaces[i])); 706 - kvfree(spaces); 707 - nr_swapper_spaces[type] = 0; 708 - swapper_spaces[type] = NULL; 709 - } 710 - 711 680 static int swap_vma_ra_win(struct vm_fault *vmf, unsigned long *start, 712 681 unsigned long *end) 713 682 { ··· 845 884 .attrs = swap_attrs, 846 885 }; 847 886 848 - static int __init swap_init_sysfs(void) 887 + static int __init swap_init(void) 849 888 { 850 889 int err; 851 890 struct kobject *swap_kobj; ··· 860 899 pr_err("failed to register swap group\n"); 861 900 goto delete_obj; 862 901 } 902 + /* Swap cache writeback is LRU based, no tags for it */ 903 + mapping_set_no_writeback_tags(&swap_space); 863 904 return 0; 864 905 865 906 delete_obj: 866 907 kobject_put(swap_kobj); 867 908 return err; 868 909 } 869 - subsys_initcall(swap_init_sysfs); 910 + subsys_initcall(swap_init); 870 911 #endif
+97
mm/swap_table.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _MM_SWAP_TABLE_H 3 + #define _MM_SWAP_TABLE_H 4 + 5 + #include "swap.h" 6 + 7 + /* 8 + * A swap table entry represents the status of a swap slot on a swap 9 + * (physical or virtual) device. The swap table in each cluster is a 10 + * 1:1 map of the swap slots in this cluster. 11 + * 12 + * Each swap table entry could be a pointer (folio), a XA_VALUE 13 + * (shadow), or NULL. 14 + */ 15 + 16 + /* 17 + * Helpers for casting one type of info into a swap table entry. 18 + */ 19 + static inline unsigned long null_to_swp_tb(void) 20 + { 21 + BUILD_BUG_ON(sizeof(unsigned long) != sizeof(atomic_long_t)); 22 + return 0; 23 + } 24 + 25 + static inline unsigned long folio_to_swp_tb(struct folio *folio) 26 + { 27 + BUILD_BUG_ON(sizeof(unsigned long) != sizeof(void *)); 28 + return (unsigned long)folio; 29 + } 30 + 31 + static inline unsigned long shadow_swp_to_tb(void *shadow) 32 + { 33 + BUILD_BUG_ON((BITS_PER_XA_VALUE + 1) != 34 + BITS_PER_BYTE * sizeof(unsigned long)); 35 + VM_WARN_ON_ONCE(shadow && !xa_is_value(shadow)); 36 + return (unsigned long)shadow; 37 + } 38 + 39 + /* 40 + * Helpers for swap table entry type checking. 41 + */ 42 + static inline bool swp_tb_is_null(unsigned long swp_tb) 43 + { 44 + return !swp_tb; 45 + } 46 + 47 + static inline bool swp_tb_is_folio(unsigned long swp_tb) 48 + { 49 + return !xa_is_value((void *)swp_tb) && !swp_tb_is_null(swp_tb); 50 + } 51 + 52 + static inline bool swp_tb_is_shadow(unsigned long swp_tb) 53 + { 54 + return xa_is_value((void *)swp_tb); 55 + } 56 + 57 + /* 58 + * Helpers for retrieving info from swap table. 59 + */ 60 + static inline struct folio *swp_tb_to_folio(unsigned long swp_tb) 61 + { 62 + VM_WARN_ON(!swp_tb_is_folio(swp_tb)); 63 + return (void *)swp_tb; 64 + } 65 + 66 + static inline void *swp_tb_to_shadow(unsigned long swp_tb) 67 + { 68 + VM_WARN_ON(!swp_tb_is_shadow(swp_tb)); 69 + return (void *)swp_tb; 70 + } 71 + 72 + /* 73 + * Helpers for accessing or modifying the swap table of a cluster, 74 + * the swap cluster must be locked. 75 + */ 76 + static inline void __swap_table_set(struct swap_cluster_info *ci, 77 + unsigned int off, unsigned long swp_tb) 78 + { 79 + VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER); 80 + atomic_long_set(&ci->table[off], swp_tb); 81 + } 82 + 83 + static inline unsigned long __swap_table_xchg(struct swap_cluster_info *ci, 84 + unsigned int off, unsigned long swp_tb) 85 + { 86 + VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER); 87 + /* Ordering is guaranteed by cluster lock, relax */ 88 + return atomic_long_xchg_relaxed(&ci->table[off], swp_tb); 89 + } 90 + 91 + static inline unsigned long __swap_table_get(struct swap_cluster_info *ci, 92 + unsigned int off) 93 + { 94 + VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER); 95 + return atomic_long_read(&ci->table[off]); 96 + } 97 + #endif
+75 -25
mm/swapfile.c
··· 46 46 #include <asm/tlbflush.h> 47 47 #include <linux/swapops.h> 48 48 #include <linux/swap_cgroup.h> 49 + #include "swap_table.h" 49 50 #include "internal.h" 50 51 #include "swap.h" 51 52 ··· 422 421 return cluster_index(si, ci) * SWAPFILE_CLUSTER; 423 422 } 424 423 424 + static int swap_cluster_alloc_table(struct swap_cluster_info *ci) 425 + { 426 + WARN_ON(ci->table); 427 + ci->table = kzalloc(sizeof(unsigned long) * SWAPFILE_CLUSTER, GFP_KERNEL); 428 + if (!ci->table) 429 + return -ENOMEM; 430 + return 0; 431 + } 432 + 433 + static void swap_cluster_free_table(struct swap_cluster_info *ci) 434 + { 435 + unsigned int ci_off; 436 + unsigned long swp_tb; 437 + 438 + if (!ci->table) 439 + return; 440 + 441 + for (ci_off = 0; ci_off < SWAPFILE_CLUSTER; ci_off++) { 442 + swp_tb = __swap_table_get(ci, ci_off); 443 + if (!swp_tb_is_null(swp_tb)) 444 + pr_err_once("swap: unclean swap space on swapoff: 0x%lx", 445 + swp_tb); 446 + } 447 + 448 + kfree(ci->table); 449 + ci->table = NULL; 450 + } 451 + 425 452 static void move_cluster(struct swap_info_struct *si, 426 453 struct swap_cluster_info *ci, struct list_head *list, 427 454 enum swap_cluster_flags new_flags) ··· 732 703 return true; 733 704 } 734 705 706 + /* 707 + * Currently, the swap table is not used for count tracking, just 708 + * do a sanity check here to ensure nothing leaked, so the swap 709 + * table should be empty upon freeing. 710 + */ 711 + static void swap_cluster_assert_table_empty(struct swap_cluster_info *ci, 712 + unsigned int start, unsigned int nr) 713 + { 714 + unsigned int ci_off = start % SWAPFILE_CLUSTER; 715 + unsigned int ci_end = ci_off + nr; 716 + unsigned long swp_tb; 717 + 718 + if (IS_ENABLED(CONFIG_DEBUG_VM)) { 719 + do { 720 + swp_tb = __swap_table_get(ci, ci_off); 721 + VM_WARN_ON_ONCE(!swp_tb_is_null(swp_tb)); 722 + } while (++ci_off < ci_end); 723 + } 724 + } 725 + 735 726 static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster_info *ci, 736 727 unsigned int start, unsigned char usage, 737 728 unsigned int order) ··· 771 722 ci->order = order; 772 723 773 724 memset(si->swap_map + start, usage, nr_pages); 725 + swap_cluster_assert_table_empty(ci, start, nr_pages); 774 726 swap_range_alloc(si, nr_pages); 775 727 ci->count += nr_pages; 776 728 ··· 1174 1124 swap_slot_free_notify(si->bdev, offset); 1175 1125 offset++; 1176 1126 } 1177 - swap_cache_clear_shadow(si->type, begin, end); 1127 + __swap_cache_clear_shadow(swp_entry(si->type, begin), nr_entries); 1178 1128 1179 1129 /* 1180 1130 * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 ··· 1331 1281 if (!entry.val) 1332 1282 return -ENOMEM; 1333 1283 1334 - /* 1335 - * XArray node allocations from PF_MEMALLOC contexts could 1336 - * completely exhaust the page allocator. __GFP_NOMEMALLOC 1337 - * stops emergency reserves from being allocated. 1338 - * 1339 - * TODO: this could cause a theoretical memory reclaim 1340 - * deadlock in the swap out path. 1341 - */ 1342 - if (swap_cache_add_folio(folio, entry, gfp | __GFP_NOMEMALLOC, NULL)) 1343 - goto out_free; 1284 + swap_cache_add_folio(folio, entry, NULL); 1344 1285 1345 1286 return 0; 1346 1287 ··· 1597 1556 1598 1557 mem_cgroup_uncharge_swap(entry, nr_pages); 1599 1558 swap_range_free(si, offset, nr_pages); 1559 + swap_cluster_assert_table_empty(ci, offset, nr_pages); 1600 1560 1601 1561 if (!ci->count) 1602 1562 free_cluster(si, ci); ··· 2676 2634 } 2677 2635 } 2678 2636 2637 + static void free_cluster_info(struct swap_cluster_info *cluster_info, 2638 + unsigned long maxpages) 2639 + { 2640 + int i, nr_clusters = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER); 2641 + 2642 + if (!cluster_info) 2643 + return; 2644 + for (i = 0; i < nr_clusters; i++) 2645 + swap_cluster_free_table(&cluster_info[i]); 2646 + kvfree(cluster_info); 2647 + } 2648 + 2679 2649 /* 2680 2650 * Called after swap device's reference count is dead, so 2681 2651 * neither scan nor allocation will use it. ··· 2822 2768 2823 2769 swap_file = p->swap_file; 2824 2770 p->swap_file = NULL; 2825 - p->max = 0; 2826 2771 swap_map = p->swap_map; 2827 2772 p->swap_map = NULL; 2828 2773 zeromap = p->zeromap; 2829 2774 p->zeromap = NULL; 2830 2775 cluster_info = p->cluster_info; 2776 + free_cluster_info(cluster_info, p->max); 2777 + p->max = 0; 2831 2778 p->cluster_info = NULL; 2832 2779 spin_unlock(&p->lock); 2833 2780 spin_unlock(&swap_lock); ··· 2839 2784 p->global_cluster = NULL; 2840 2785 vfree(swap_map); 2841 2786 kvfree(zeromap); 2842 - kvfree(cluster_info); 2843 2787 /* Destroy swap account information */ 2844 2788 swap_cgroup_swapoff(p->type); 2845 - exit_swap_address_space(p->type); 2846 2789 2847 2790 inode = mapping->host; 2848 2791 ··· 3224 3171 if (!cluster_info) 3225 3172 goto err; 3226 3173 3227 - for (i = 0; i < nr_clusters; i++) 3174 + for (i = 0; i < nr_clusters; i++) { 3228 3175 spin_lock_init(&cluster_info[i].lock); 3176 + if (swap_cluster_alloc_table(&cluster_info[i])) 3177 + goto err_free; 3178 + } 3229 3179 3230 3180 if (!(si->flags & SWP_SOLIDSTATE)) { 3231 3181 si->global_cluster = kmalloc(sizeof(*si->global_cluster), ··· 3289 3233 } 3290 3234 3291 3235 return cluster_info; 3292 - 3293 3236 err_free: 3294 - kvfree(cluster_info); 3237 + free_cluster_info(cluster_info, maxpages); 3295 3238 err: 3296 3239 return ERR_PTR(err); 3297 3240 } ··· 3484 3429 } 3485 3430 } 3486 3431 3487 - error = init_swap_address_space(si->type, maxpages); 3488 - if (error) 3489 - goto bad_swap_unlock_inode; 3490 - 3491 3432 error = zswap_swapon(si->type, maxpages); 3492 3433 if (error) 3493 - goto free_swap_address_space; 3434 + goto bad_swap_unlock_inode; 3494 3435 3495 3436 /* 3496 3437 * Flush any pending IO and dirty mappings before we start using this ··· 3521 3470 goto out; 3522 3471 free_swap_zswap: 3523 3472 zswap_swapoff(si->type); 3524 - free_swap_address_space: 3525 - exit_swap_address_space(si->type); 3526 3473 bad_swap_unlock_inode: 3527 3474 inode_unlock(inode); 3528 3475 bad_swap: ··· 3535 3486 spin_unlock(&swap_lock); 3536 3487 vfree(swap_map); 3537 3488 kvfree(zeromap); 3538 - kvfree(cluster_info); 3489 + if (cluster_info) 3490 + free_cluster_info(cluster_info, maxpages); 3539 3491 if (inced_nr_rotate_swap) 3540 3492 atomic_dec(&nr_rotate_swap); 3541 3493 if (swap_file)
+14 -6
mm/vmscan.c
··· 730 730 { 731 731 int refcount; 732 732 void *shadow = NULL; 733 + struct swap_cluster_info *ci; 733 734 734 735 BUG_ON(!folio_test_locked(folio)); 735 736 BUG_ON(mapping != folio_mapping(folio)); 736 737 737 - if (!folio_test_swapcache(folio)) 738 + if (folio_test_swapcache(folio)) { 739 + ci = swap_cluster_get_and_lock_irq(folio); 740 + } else { 738 741 spin_lock(&mapping->host->i_lock); 739 - xa_lock_irq(&mapping->i_pages); 742 + xa_lock_irq(&mapping->i_pages); 743 + } 744 + 740 745 /* 741 746 * The non racy check for a busy folio. 742 747 * ··· 781 776 782 777 if (reclaimed && !mapping_exiting(mapping)) 783 778 shadow = workingset_eviction(folio, target_memcg); 784 - __swap_cache_del_folio(folio, swap, shadow); 779 + __swap_cache_del_folio(ci, folio, swap, shadow); 785 780 memcg1_swapout(folio, swap); 786 - xa_unlock_irq(&mapping->i_pages); 781 + swap_cluster_unlock_irq(ci); 787 782 put_swap_folio(folio, swap); 788 783 } else { 789 784 void (*free_folio)(struct folio *); ··· 821 816 return 1; 822 817 823 818 cannot_free: 824 - xa_unlock_irq(&mapping->i_pages); 825 - if (!folio_test_swapcache(folio)) 819 + if (folio_test_swapcache(folio)) { 820 + swap_cluster_unlock_irq(ci); 821 + } else { 822 + xa_unlock_irq(&mapping->i_pages); 826 823 spin_unlock(&mapping->host->i_lock); 824 + } 827 825 return 0; 828 826 } 829 827