Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: hugetlb: fix incorrect fallback for subpool

During our testing with hugetlb subpool enabled, we observe that
hstate->resv_huge_pages may underflow into negative values. Root cause
analysis reveals a race condition in subpool reservation fallback handling
as follow:

hugetlb_reserve_pages()
/* Attempt subpool reservation */
gbl_reserve = hugepage_subpool_get_pages(spool, chg);

/* Global reservation may fail after subpool allocation */
if (hugetlb_acct_memory(h, gbl_reserve) < 0)
goto out_put_pages;

out_put_pages:
/* This incorrectly restores reservation to subpool */
hugepage_subpool_put_pages(spool, chg);

When hugetlb_acct_memory() fails after subpool allocation, the current
implementation over-commits subpool reservations by returning the full
'chg' value instead of the actual allocated 'gbl_reserve' amount. This
discrepancy propagates to global reservations during subsequent releases,
eventually causing resv_huge_pages underflow.

This problem can be trigger easily with the following steps:
1. reverse hugepage for hugeltb allocation
2. mount hugetlbfs with min_size to enable hugetlb subpool
3. alloc hugepages with two task(make sure the second will fail due to
insufficient amount of hugepages)
4. with for a few seconds and repeat step 3 which will make
hstate->resv_huge_pages to go below zero.

To fix this problem, return corrent amount of pages to subpool during the
fallback after hugepage_subpool_get_pages is called.

Link: https://lkml.kernel.org/r/20250410062633.3102457-1-mawupeng1@huawei.com
Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ma Wupeng <mawupeng1@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Wupeng Ma and committed by
Andrew Morton
a833a693 82f2b0b9

+22 -6
+22 -6
mm/hugetlb.c
··· 3010 3010 struct hugepage_subpool *spool = subpool_vma(vma); 3011 3011 struct hstate *h = hstate_vma(vma); 3012 3012 struct folio *folio; 3013 - long retval, gbl_chg; 3013 + long retval, gbl_chg, gbl_reserve; 3014 3014 map_chg_state map_chg; 3015 3015 int ret, idx; 3016 3016 struct hugetlb_cgroup *h_cg = NULL; ··· 3163 3163 hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), 3164 3164 h_cg); 3165 3165 out_subpool_put: 3166 - if (map_chg) 3167 - hugepage_subpool_put_pages(spool, 1); 3166 + /* 3167 + * put page to subpool iff the quota of subpool's rsv_hpages is used 3168 + * during hugepage_subpool_get_pages. 3169 + */ 3170 + if (map_chg && !gbl_chg) { 3171 + gbl_reserve = hugepage_subpool_put_pages(spool, 1); 3172 + hugetlb_acct_memory(h, -gbl_reserve); 3173 + } 3174 + 3175 + 3168 3176 out_end_reservation: 3169 3177 if (map_chg != MAP_CHG_ENFORCED) 3170 3178 vma_end_reservation(h, vma, addr); ··· 7247 7239 struct vm_area_struct *vma, 7248 7240 vm_flags_t vm_flags) 7249 7241 { 7250 - long chg = -1, add = -1; 7242 + long chg = -1, add = -1, spool_resv, gbl_resv; 7251 7243 struct hstate *h = hstate_inode(inode); 7252 7244 struct hugepage_subpool *spool = subpool_inode(inode); 7253 7245 struct resv_map *resv_map; ··· 7382 7374 return true; 7383 7375 7384 7376 out_put_pages: 7385 - /* put back original number of pages, chg */ 7386 - (void)hugepage_subpool_put_pages(spool, chg); 7377 + spool_resv = chg - gbl_reserve; 7378 + if (spool_resv) { 7379 + /* put sub pool's reservation back, chg - gbl_reserve */ 7380 + gbl_resv = hugepage_subpool_put_pages(spool, spool_resv); 7381 + /* 7382 + * subpool's reserved pages can not be put back due to race, 7383 + * return to hstate. 7384 + */ 7385 + hugetlb_acct_memory(h, -gbl_resv); 7386 + } 7387 7387 out_uncharge_cgroup: 7388 7388 hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), 7389 7389 chg * pages_per_huge_page(h), h_cg);