Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: introduce a new page type for page pool in page type

Currently, the condition 'page->pp_magic == PP_SIGNATURE' is used to
determine if a page belongs to a page pool. However, with the planned
removal of @pp_magic, we should instead leverage the page_type in struct
page, such as PGTY_netpp, for this purpose.

Introduce and use the page type APIs e.g. PageNetpp(), __SetPageNetpp(),
and __ClearPageNetpp() instead, and remove the existing APIs accessing
@pp_magic e.g. page_pool_page_is_pp(), netmem_or_pp_magic(), and
netmem_clear_pp_magic().

Plus, add @page_type to struct net_iov at the same offset as struct page
so as to use the page_type APIs for struct net_iov as well. While at it,
reorder @type and @owner in struct net_iov to avoid a hole and increasing
the struct size.

This work was inspired by the following link:

https://lore.kernel.org/all/582f41c0-2742-4400-9c81-0d46bf4e8314@gmail.com/

While at it, move the sanity check for page pool to on the free path.

[byungchul@sk.com: gate the sanity check, per Johannes]
Link: https://lkml.kernel.org/r/20260316223113.20097-1-byungchul@sk.com
Link: https://lkml.kernel.org/r/20260224051347.19621-1-byungchul@sk.com
Co-developed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Wei <dw@davidwei.uk>
Cc: Dragos Tatulea <dtatulea@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mark Bloch <mbloch@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Stehen Rothwell <sfr@canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Byungchul Park and committed by
Andrew Morton
db359fcc 92a9cf97

+64 -46
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
··· 707 707 xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo); 708 708 page = xdpi.page.page; 709 709 710 - /* No need to check page_pool_page_is_pp() as we 710 + /* No need to check PageNetpp() as we 711 711 * know this is a page_pool page. 712 712 */ 713 713 page_pool_recycle_direct(pp_page_to_nmdesc(page)->pp,
+3 -24
include/linux/mm.h
··· 4840 4840 * DMA mapping IDs for page_pool 4841 4841 * 4842 4842 * When DMA-mapping a page, page_pool allocates an ID (from an xarray) and 4843 - * stashes it in the upper bits of page->pp_magic. We always want to be able to 4844 - * unambiguously identify page pool pages (using page_pool_page_is_pp()). Non-PP 4845 - * pages can have arbitrary kernel pointers stored in the same field as pp_magic 4846 - * (since it overlaps with page->lru.next), so we must ensure that we cannot 4843 + * stashes it in the upper bits of page->pp_magic. Non-PP pages can have 4844 + * arbitrary kernel pointers stored in the same field as pp_magic (since 4845 + * it overlaps with page->lru.next), so we must ensure that we cannot 4847 4846 * mistake a valid kernel pointer with any of the values we write into this 4848 4847 * field. 4849 4848 * ··· 4876 4877 4877 4878 #define PP_DMA_INDEX_MASK GENMASK(PP_DMA_INDEX_BITS + PP_DMA_INDEX_SHIFT - 1, \ 4878 4879 PP_DMA_INDEX_SHIFT) 4879 - 4880 - /* Mask used for checking in page_pool_page_is_pp() below. page->pp_magic is 4881 - * OR'ed with PP_SIGNATURE after the allocation in order to preserve bit 0 for 4882 - * the head page of compound page and bit 1 for pfmemalloc page, as well as the 4883 - * bits used for the DMA index. page_is_pfmemalloc() is checked in 4884 - * __page_pool_put_page() to avoid recycling the pfmemalloc page. 4885 - */ 4886 - #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL) 4887 - 4888 - #ifdef CONFIG_PAGE_POOL 4889 - static inline bool page_pool_page_is_pp(const struct page *page) 4890 - { 4891 - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE; 4892 - } 4893 - #else 4894 - static inline bool page_pool_page_is_pp(const struct page *page) 4895 - { 4896 - return false; 4897 - } 4898 - #endif 4899 4880 4900 4881 #define PAGE_SNAPSHOT_FAITHFUL (1 << 0) 4901 4882 #define PAGE_SNAPSHOT_PG_BUDDY (1 << 1)
+6
include/linux/page-flags.h
··· 923 923 PGTY_zsmalloc = 0xf6, 924 924 PGTY_unaccepted = 0xf7, 925 925 PGTY_large_kmalloc = 0xf8, 926 + PGTY_netpp = 0xf9, 926 927 927 928 PGTY_mapcount_underflow = 0xff 928 929 }; ··· 1055 1054 */ 1056 1055 PAGE_TYPE_OPS(Unaccepted, unaccepted, unaccepted) 1057 1056 PAGE_TYPE_OPS(LargeKmalloc, large_kmalloc, large_kmalloc) 1057 + 1058 + /* 1059 + * Marks page_pool allocated pages. 1060 + */ 1061 + PAGE_TYPE_OPS(Netpp, netpp, netpp) 1058 1062 1059 1063 /** 1060 1064 * PageHuge - Determine if the page belongs to hugetlbfs
+13 -2
include/net/netmem.h
··· 110 110 atomic_long_t pp_ref_count; 111 111 }; 112 112 }; 113 - struct net_iov_area *owner; 113 + 114 + unsigned int page_type; 114 115 enum net_iov_type type; 116 + struct net_iov_area *owner; 115 117 }; 118 + 119 + /* Make sure 'the offset of page_type in struct page == the offset of 120 + * type in struct net_iov'. 121 + */ 122 + #define NET_IOV_ASSERT_OFFSET(pg, iov) \ 123 + static_assert(offsetof(struct page, pg) == \ 124 + offsetof(struct net_iov, iov)) 125 + NET_IOV_ASSERT_OFFSET(page_type, page_type); 126 + #undef NET_IOV_ASSERT_OFFSET 116 127 117 128 struct net_iov_area { 118 129 /* Array of net_iovs for this area. */ ··· 267 256 */ 268 257 #define pp_page_to_nmdesc(p) \ 269 258 ({ \ 270 - DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(p)); \ 259 + DEBUG_NET_WARN_ON_ONCE(!PageNetpp(p)); \ 271 260 __pp_page_to_nmdesc(p); \ 272 261 }) 273 262
+9 -4
mm/page_alloc.c
··· 1043 1043 #ifdef CONFIG_MEMCG 1044 1044 page->memcg_data | 1045 1045 #endif 1046 - page_pool_page_is_pp(page) | 1047 1046 (page->flags.f & check_flags))) 1048 1047 return false; 1049 1048 ··· 1069 1070 if (unlikely(page->memcg_data)) 1070 1071 bad_reason = "page still charged to cgroup"; 1071 1072 #endif 1072 - if (unlikely(page_pool_page_is_pp(page))) 1073 - bad_reason = "page_pool leak"; 1074 1073 return bad_reason; 1075 1074 } 1076 1075 ··· 1377 1380 mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1); 1378 1381 folio->mapping = NULL; 1379 1382 } 1380 - if (unlikely(page_has_type(page))) 1383 + if (unlikely(page_has_type(page))) { 1384 + /* networking expects to clear its page type before releasing */ 1385 + if (is_check_pages_enabled()) { 1386 + if (unlikely(PageNetpp(page))) { 1387 + bad_page(page, "page_pool leak"); 1388 + return false; 1389 + } 1390 + } 1381 1391 /* Reset the page_type (which overlays _mapcount) */ 1382 1392 page->page_type = UINT_MAX; 1393 + } 1383 1394 1384 1395 if (is_check_pages_enabled()) { 1385 1396 if (free_page_is_bad(page))
+10 -13
net/core/netmem_priv.h
··· 8 8 return netmem_to_nmdesc(netmem)->pp_magic & ~PP_DMA_INDEX_MASK; 9 9 } 10 10 11 - static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic) 12 - { 13 - netmem_to_nmdesc(netmem)->pp_magic |= pp_magic; 14 - } 15 - 16 - static inline void netmem_clear_pp_magic(netmem_ref netmem) 17 - { 18 - WARN_ON_ONCE(netmem_to_nmdesc(netmem)->pp_magic & PP_DMA_INDEX_MASK); 19 - 20 - netmem_to_nmdesc(netmem)->pp_magic = 0; 21 - } 22 - 23 11 static inline bool netmem_is_pp(netmem_ref netmem) 24 12 { 25 - return (netmem_get_pp_magic(netmem) & PP_MAGIC_MASK) == PP_SIGNATURE; 13 + struct page *page; 14 + 15 + /* XXX: Now that the offset of page_type is shared between 16 + * struct page and net_iov, just cast the netmem to struct page 17 + * unconditionally by clearing NET_IOV if any, no matter whether 18 + * it comes from struct net_iov or struct page. This should be 19 + * adjusted once the offset is no longer shared. 20 + */ 21 + page = (struct page *)((__force unsigned long)netmem & ~NET_IOV); 22 + return PageNetpp(page); 26 23 } 27 24 28 25 static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool)
+22 -2
net/core/page_pool.c
··· 702 702 703 703 void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) 704 704 { 705 + struct page *page; 706 + 705 707 netmem_set_pp(netmem, pool); 706 - netmem_or_pp_magic(netmem, PP_SIGNATURE); 708 + 709 + /* XXX: Now that the offset of page_type is shared between 710 + * struct page and net_iov, just cast the netmem to struct page 711 + * unconditionally by clearing NET_IOV if any, no matter whether 712 + * it comes from struct net_iov or struct page. This should be 713 + * adjusted once the offset is no longer shared. 714 + */ 715 + page = (struct page *)((__force unsigned long)netmem & ~NET_IOV); 716 + __SetPageNetpp(page); 707 717 708 718 /* Ensuring all pages have been split into one fragment initially: 709 719 * page_pool_set_pp_info() is only called once for every page when it ··· 728 718 729 719 void page_pool_clear_pp_info(netmem_ref netmem) 730 720 { 731 - netmem_clear_pp_magic(netmem); 721 + struct page *page; 722 + 723 + /* XXX: Now that the offset of page_type is shared between 724 + * struct page and net_iov, just cast the netmem to struct page 725 + * unconditionally by clearing NET_IOV if any, no matter whether 726 + * it comes from struct net_iov or struct page. This should be 727 + * adjusted once the offset is no longer shared. 728 + */ 729 + page = (struct page *)((__force unsigned long)netmem & ~NET_IOV); 730 + __ClearPageNetpp(page); 731 + 732 732 netmem_set_pp(netmem, NULL); 733 733 } 734 734