Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm, swap: simplify checking if a folio is swapped

Clean up and simplify how we check if a folio is swapped. The helper
already requires the folio to be in swap cache and locked. That's enough
to pin the swap cluster from being freed, so there is no need to lock
anything else to avoid UAF.

And besides, we have cleaned up and defined the swap operation to be
mostly folio based, and now the only place a folio will have any of its
swap slots' count increased from 0 to 1 is folio_dup_swap, which also
requires the folio lock. So as we are holding the folio lock here, a
folio can't change its swap status from not swapped (all swap slots have a
count of 0) to swapped (any slot has a swap count larger than 0).

So there won't be any false negatives of this helper if we simply depend
on the folio lock to stabilize the cluster.

We are only using this helper to determine if we can and should release
the swap cache. So false positives are completely harmless, and also
already exist before. Depending on the timing, previously, it's also
possible that a racing thread releases the swap count right after
releasing the ci lock and before this helper returns. In any case, the
worst that could happen is we leave a clean swap cache. It will still be
reclaimed when under pressure just fine.

So, in conclusion, we can simplify and make the check much simpler and
lockless. Also, rename it to folio_maybe_swapped to reflect the design.

Link: https://lkml.kernel.org/r/20260218-swap-table-p3-v3-11-f4e34be021a7@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Kairui Song and committed by
Andrew Morton
a0f79916 45711d44

+49 -40
+3 -2
mm/swap.h
··· 195 195 * 196 196 * folio_alloc_swap(): the entry point for a folio to be swapped 197 197 * out. It allocates swap slots and pins the slots with swap cache. 198 - * The slots start with a swap count of zero. 198 + * The slots start with a swap count of zero. The slots are pinned 199 + * by swap cache reference which doesn't contribute to swap count. 199 200 * 200 201 * folio_dup_swap(): increases the swap count of a folio, usually 201 202 * during it gets unmapped and a swap entry is installed to replace 202 203 * it (e.g., swap entry in page table). A swap slot with swap 203 - * count == 0 should only be increasd by this helper. 204 + * count == 0 can only be increased by this helper. 204 205 * 205 206 * folio_put_swap(): does the opposite thing of folio_dup_swap(). 206 207 */
+46 -38
mm/swapfile.c
··· 1743 1743 * @subpage: if not NULL, only increase the swap count of this subpage. 1744 1744 * 1745 1745 * Typically called when the folio is unmapped and have its swap entry to 1746 - * take its palce. 1746 + * take its place: Swap entries allocated to a folio has count == 0 and pinned 1747 + * by swap cache. The swap cache pin doesn't increase the swap count. This 1748 + * helper sets the initial count == 1 and increases the count as the folio is 1749 + * unmapped and swap entries referencing the slots are generated to replace 1750 + * the folio. 1747 1751 * 1748 1752 * Context: Caller must ensure the folio is locked and in the swap cache. 1749 1753 * NOTE: The caller also has to ensure there is no raced call to ··· 1946 1942 return count < 0 ? 0 : count; 1947 1943 } 1948 1944 1949 - static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, 1950 - swp_entry_t entry, int order) 1951 - { 1952 - struct swap_cluster_info *ci; 1953 - unsigned int nr_pages = 1 << order; 1954 - unsigned long roffset = swp_offset(entry); 1955 - unsigned long offset = round_down(roffset, nr_pages); 1956 - unsigned int ci_off; 1957 - int i; 1958 - bool ret = false; 1959 - 1960 - ci = swap_cluster_lock(si, offset); 1961 - if (nr_pages == 1) { 1962 - ci_off = roffset % SWAPFILE_CLUSTER; 1963 - if (swp_tb_get_count(__swap_table_get(ci, ci_off))) 1964 - ret = true; 1965 - goto unlock_out; 1966 - } 1967 - for (i = 0; i < nr_pages; i++) { 1968 - ci_off = (offset + i) % SWAPFILE_CLUSTER; 1969 - if (swp_tb_get_count(__swap_table_get(ci, ci_off))) { 1970 - ret = true; 1971 - break; 1972 - } 1973 - } 1974 - unlock_out: 1975 - swap_cluster_unlock(ci); 1976 - return ret; 1977 - } 1978 - 1979 - static bool folio_swapped(struct folio *folio) 1945 + /* 1946 + * folio_maybe_swapped - Test if a folio covers any swap slot with count > 0. 1947 + * 1948 + * Check if a folio is swapped. Holding the folio lock ensures the folio won't 1949 + * go from not-swapped to swapped because the initial swap count increment can 1950 + * only be done by folio_dup_swap, which also locks the folio. But a concurrent 1951 + * decrease of swap count is possible through swap_put_entries_direct, so this 1952 + * may return a false positive. 1953 + * 1954 + * Context: Caller must ensure the folio is locked and in the swap cache. 1955 + */ 1956 + static bool folio_maybe_swapped(struct folio *folio) 1980 1957 { 1981 1958 swp_entry_t entry = folio->swap; 1982 - struct swap_info_struct *si; 1959 + struct swap_cluster_info *ci; 1960 + unsigned int ci_off, ci_end; 1961 + bool ret = false; 1983 1962 1984 1963 VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio); 1985 1964 VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio); 1986 1965 1987 - si = __swap_entry_to_info(entry); 1988 - if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(folio))) 1989 - return swap_entry_swapped(si, entry); 1966 + ci = __swap_entry_to_cluster(entry); 1967 + ci_off = swp_cluster_offset(entry); 1968 + ci_end = ci_off + folio_nr_pages(folio); 1969 + /* 1970 + * Extra locking not needed, folio lock ensures its swap entries 1971 + * won't be released, the backing data won't be gone either. 1972 + */ 1973 + rcu_read_lock(); 1974 + do { 1975 + if (__swp_tb_get_count(__swap_table_get(ci, ci_off))) { 1976 + ret = true; 1977 + break; 1978 + } 1979 + } while (++ci_off < ci_end); 1980 + rcu_read_unlock(); 1990 1981 1991 - return swap_page_trans_huge_swapped(si, entry, folio_order(folio)); 1982 + return ret; 1992 1983 } 1993 1984 1994 1985 static bool folio_swapcache_freeable(struct folio *folio) ··· 2029 2030 { 2030 2031 if (!folio_swapcache_freeable(folio)) 2031 2032 return false; 2032 - if (folio_swapped(folio)) 2033 + if (folio_maybe_swapped(folio)) 2033 2034 return false; 2034 2035 2035 2036 swap_cache_del_folio(folio); ··· 3702 3703 * 3703 3704 * Context: Caller must ensure there is no race condition on the reference 3704 3705 * owner. e.g., locking the PTL of a PTE containing the entry being increased. 3706 + * Also the swap entry must have a count >= 1. Otherwise folio_dup_swap should 3707 + * be used. 3705 3708 */ 3706 3709 int swap_dup_entry_direct(swp_entry_t entry) 3707 3710 { ··· 3714 3713 pr_err("%s%08lx\n", Bad_file, entry.val); 3715 3714 return -EINVAL; 3716 3715 } 3716 + 3717 + /* 3718 + * The caller must be increasing the swap count from a direct 3719 + * reference of the swap slot (e.g. a swap entry in page table). 3720 + * So the swap count must be >= 1. 3721 + */ 3722 + VM_WARN_ON_ONCE(!swap_entry_swapped(si, entry)); 3717 3723 3718 3724 return swap_dup_entries_cluster(si, swp_offset(entry), 1); 3719 3725 }