Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/thp: fix deferred split queue not partially_mapped

Recent changes are putting more pressure on THP deferred split queues:
under load revealing long-standing races, causing list_del corruptions,
"Bad page state"s and worse (I keep BUGs in both of those, so usually
don't get to see how badly they end up without). The relevant recent
changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin,
improved swap allocation, and underused THP splitting.

The new unlocked list_del_init() in deferred_split_scan() is buggy. I
gave bad advice, it looks plausible since that's a local on-stack list,
but the fact is that it can race with a third party freeing or migrating
the preceding folio (properly unqueueing it with refcount 0 while holding
split_queue_lock), thereby corrupting the list linkage.

The obvious answer would be to take split_queue_lock there: but it has a
long history of contention, so I'm reluctant to add to that. Instead,
make sure that there is always one safe (raised refcount) folio before, by
delaying its folio_put(). (And of course I was wrong to suggest updating
split_queue_len without the lock: leave that until the splice.)

And remove two over-eager partially_mapped checks, restoring those tests
to how they were before: if uncharge_folio() or free_tail_page_prepare()
finds _deferred_list non-empty, it's in trouble whether or not that folio
is partially_mapped (and the flag was already cleared in the latter case).

Link: https://lkml.kernel.org/r/81e34a8b-113a-0701-740e-2135c97eb1d7@google.com
Fixes: dafff3f4c850 ("mm: split underused THPs")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Usama Arif <usamaarif642@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Hugh Dickins and committed by
Andrew Morton
e66f3185 59b723cd

+20 -9
+17 -4
mm/huge_memory.c
··· 3718 3718 struct deferred_split *ds_queue = &pgdata->deferred_split_queue; 3719 3719 unsigned long flags; 3720 3720 LIST_HEAD(list); 3721 - struct folio *folio, *next; 3722 - int split = 0; 3721 + struct folio *folio, *next, *prev = NULL; 3722 + int split = 0, removed = 0; 3723 3723 3724 3724 #ifdef CONFIG_MEMCG 3725 3725 if (sc->memcg) ··· 3775 3775 */ 3776 3776 if (!did_split && !folio_test_partially_mapped(folio)) { 3777 3777 list_del_init(&folio->_deferred_list); 3778 - ds_queue->split_queue_len--; 3778 + removed++; 3779 + } else { 3780 + /* 3781 + * That unlocked list_del_init() above would be unsafe, 3782 + * unless its folio is separated from any earlier folios 3783 + * left on the list (which may be concurrently unqueued) 3784 + * by one safe folio with refcount still raised. 3785 + */ 3786 + swap(folio, prev); 3779 3787 } 3780 - folio_put(folio); 3788 + if (folio) 3789 + folio_put(folio); 3781 3790 } 3782 3791 3783 3792 spin_lock_irqsave(&ds_queue->split_queue_lock, flags); 3784 3793 list_splice_tail(&list, &ds_queue->split_queue); 3794 + ds_queue->split_queue_len -= removed; 3785 3795 spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); 3796 + 3797 + if (prev) 3798 + folio_put(prev); 3786 3799 3787 3800 /* 3788 3801 * Stop shrinker if we didn't split any page, but the queue is empty.
+1 -2
mm/memcontrol.c
··· 4631 4631 VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); 4632 4632 VM_BUG_ON_FOLIO(folio_order(folio) > 1 && 4633 4633 !folio_test_hugetlb(folio) && 4634 - !list_empty(&folio->_deferred_list) && 4635 - folio_test_partially_mapped(folio), folio); 4634 + !list_empty(&folio->_deferred_list), folio); 4636 4635 4637 4636 /* 4638 4637 * Nobody should be changing or seriously looking at
+2 -3
mm/page_alloc.c
··· 961 961 break; 962 962 case 2: 963 963 /* the second tail page: deferred_list overlaps ->mapping */ 964 - if (unlikely(!list_empty(&folio->_deferred_list) && 965 - folio_test_partially_mapped(folio))) { 966 - bad_page(page, "partially mapped folio on deferred list"); 964 + if (unlikely(!list_empty(&folio->_deferred_list))) { 965 + bad_page(page, "on deferred list"); 967 966 goto out; 968 967 } 969 968 break;