Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: vmscan: fix dirty folios throttling on cgroup v1 for MGLRU

The balance_dirty_pages() won't do the dirty folios throttling on
cgroupv1. See commit 9badce000e2c ("cgroup, writeback: don't enable
cgroup writeback on traditional hierarchies").

Moreover, after commit 6b0dfabb3555 ("fs: Remove aops->writepage"), we no
longer attempt to write back filesystem folios through reclaim.

On large memory systems, the flusher may not be able to write back quickly
enough. Consequently, MGLRU will encounter many folios that are already
under writeback. Since we cannot reclaim these dirty folios, the system
may run out of memory and trigger the OOM killer.

Hence, for cgroup v1, let's throttle reclaim after waking up the flusher,
which is similar to commit 81a70c21d917 ("mm/cgroup/reclaim: fix dirty
pages throttling on cgroup v1"), to avoid unnecessary OOM.

The following test program can easily reproduce the OOM issue. With this
patch applied, the test passes successfully.

$mkdir /sys/fs/cgroup/memory/test
$echo 256M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
$echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs
$dd if=/dev/zero of=/mnt/data.bin bs=1M count=800

Link: https://lore.kernel.org/3445af0f09e8ca945492e052e82594f8c4f2e2f6.1774606060.git.baolin.wang@linux.alibaba.com
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Kairui Song <kasong@tencent.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Baolin Wang and committed by
Andrew Morton
13b6b620 e3e613a3

+16 -1
+16 -1
mm/vmscan.c
··· 5027 5027 * If too many file cache in the coldest generation can't be evicted 5028 5028 * due to being dirty, wake up the flusher. 5029 5029 */ 5030 - if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken) 5030 + if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken) { 5031 + struct pglist_data *pgdat = lruvec_pgdat(lruvec); 5032 + 5031 5033 wakeup_flusher_threads(WB_REASON_VMSCAN); 5034 + 5035 + /* 5036 + * For cgroupv1 dirty throttling is achieved by waking up 5037 + * the kernel flusher here and later waiting on folios 5038 + * which are in writeback to finish (see shrink_folio_list()). 5039 + * 5040 + * Flusher may not be able to issue writeback quickly 5041 + * enough for cgroupv1 writeback throttling to work 5042 + * on a large system. 5043 + */ 5044 + if (!writeback_throttling_sane(sc)) 5045 + reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); 5046 + } 5032 5047 5033 5048 /* whether this lruvec should be rotated */ 5034 5049 return nr_to_scan < 0;