selftests/mm/uffd-stress: make test operate on less hugetlb memory

Patch series "selftests/mm: uffd-stress fixes", v2.

This patchset ensures that the number of hugepages is correctly set in the
system so that the uffd-stress test does not fail due to the racy nature
of the test. Patch 1 changes the hugepage constraint in the
run_vmtests.sh script, whereas patch 2 changes the constraint in the test
itself.

This patch (of 2):

We observed uffd-stress selftest failure on arm64 and intermittent
failures on x86 too:

running ./uffd-stress hugetlb-private 128 32

bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, @uffd-common.c:617) [FAIL]
not ok 18 uffd-stress hugetlb-private 128 32 # exit=1

For this particular case, the number of free hugepages from run_vmtests.sh
will be 128, and the test will allocate 64 hugepages in the source
location. The stress() function will start spawning threads which will
operate on the destination location, triggering uffd-operations like
UFFDIO_COPY from src to dst, which means that we will require 64 more
hugepages for the dst location.

Let us observe the locking_thread() function. It will lock the mutex kept
at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for
dst) hugepages have been reserved. In case of BOUNCE_RANDOM, it may
happen that two threads trying to lock the mutex at dst, try to do so at
the same hugepage number. If one thread succeeds in reserving the last
hugepage, then the other thread may fail in alloc_hugetlb_folio(),
returning -ENOMEM. I can confirm that this is indeed the case by this
hacky patch:

:--- a/mm/hugetlb.c
; +++ b/mm/hugetlb.c
; @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
;
; folio = alloc_hugetlb_folio(dst_vma, dst_addr, false);
; if (IS_ERR(folio)) {
; + pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE);
; + if (actual_pte) {
; + ret = -EEXIST;
; + goto out;
; + }
; ret = -ENOMEM;
; goto out;
; }

This code path gets triggered indicating that the PMD at which one thread
is trying to map a hugepage, gets filled by a racing thread.

Therefore, instead of using freepgs to compute the amount of memory, use
freepgs - (min(32, nr_cpus) - 1), so that the test still has some extra
hugepages to use. The adjustment is a function of min(32, nr_cpus) - the
value of nr_parallel in the test - because in the worst case, nr_parallel
number of threads will try to map a hugepage on the same PMD, one will win
the allocation race, and the other nr_parallel - 1 threads will fail, so
we need extra nr_parallel - 1 hugepages to satisfy this request. Note
that, in case the adjusted value underflows, there is a check for the
number of free hugepages in the test itself, which will fail:
get_free_hugepages() < bytes / page_size A negative value will be passed
on to bytes which is of type size_t, thus the RHS will become a large
value and the check will fail, so we are safe.

Link: https://lkml.kernel.org/r/20250909061531.57272-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20250909061531.57272-2-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Dev Jain and committed by

Andrew Morton 9 months ago 060b6c72 6d11dec1

+12 -3

2 changed files

expand all

hugetlb.c

tools

testing

selftests

run_vmtests.sh

mm/hugetlb.c

··· 6932 6932 6933 6933 folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); 6934 6934 if (IS_ERR(folio)) { 6935 + pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, PMD_SIZE); 6936 + if (actual_pte) { 6937 + ret = -EEXIST; 6938 + goto out; 6939 + } 6935 6940 ret = -ENOMEM; 6936 6941 goto out; 6937 6942 }

+7 -3

tools/testing/selftests/mm/run_vmtests.sh

··· 324 324 CATEGORY="userfaultfd" run_test ./uffd-unit-tests 325 325 uffd_stress_bin=./uffd-stress 326 326 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon 20 16 327 - # Hugetlb tests require source and destination huge pages. Pass in half 328 - # the size of the free pages we have, which is used for *each*. 327 + # Hugetlb tests require source and destination huge pages. Pass in almost half 328 + # the size of the free pages we have, which is used for *each*. An adjustment 329 + # of (nr_parallel - 1) is done (see nr_parallel in uffd-stress.c) to have some 330 + # extra hugepages - this is done to prevent the test from failing by racily 331 + # reserving more hugepages than strictly required. 329 332 # uffd-stress expects a region expressed in MiB, so we adjust 330 333 # half_ufd_size_MB accordingly. 331 - half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2)) 334 + adjustment=$(( (31 < (nr_cpus - 1)) ? 31 : (nr_cpus - 1) )) 335 + half_ufd_size_MB=$((((freepgs - adjustment) * hpgsize_KB) / 1024 / 2)) 332 336 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb "$half_ufd_size_MB" 32 333 337 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private "$half_ufd_size_MB" 32 334 338 CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16

Configure Feed

Configure Feed