Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: vmscan: Reduce throttling due to a failure to make progress

Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time. In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling. In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a8b
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse. Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback. If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly. Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches. With this
patch applies, the video plays similarly to 5.15.

[lkp@intel.com: Fix W=1 build warning]

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Mel Gorman and committed by
Linus Torvalds
1b4e3f26 f87bcc88

+59 -10
+1
include/linux/mmzone.h
··· 277 277 VMSCAN_THROTTLE_WRITEBACK, 278 278 VMSCAN_THROTTLE_ISOLATED, 279 279 VMSCAN_THROTTLE_NOPROGRESS, 280 + VMSCAN_THROTTLE_CONGESTED, 280 281 NR_VMSCAN_THROTTLE, 281 282 }; 282 283
+3 -1
include/trace/events/vmscan.h
··· 30 30 #define _VMSCAN_THROTTLE_WRITEBACK (1 << VMSCAN_THROTTLE_WRITEBACK) 31 31 #define _VMSCAN_THROTTLE_ISOLATED (1 << VMSCAN_THROTTLE_ISOLATED) 32 32 #define _VMSCAN_THROTTLE_NOPROGRESS (1 << VMSCAN_THROTTLE_NOPROGRESS) 33 + #define _VMSCAN_THROTTLE_CONGESTED (1 << VMSCAN_THROTTLE_CONGESTED) 33 34 34 35 #define show_throttle_flags(flags) \ 35 36 (flags) ? __print_flags(flags, "|", \ 36 37 {_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"}, \ 37 38 {_VMSCAN_THROTTLE_ISOLATED, "VMSCAN_THROTTLE_ISOLATED"}, \ 38 - {_VMSCAN_THROTTLE_NOPROGRESS, "VMSCAN_THROTTLE_NOPROGRESS"} \ 39 + {_VMSCAN_THROTTLE_NOPROGRESS, "VMSCAN_THROTTLE_NOPROGRESS"}, \ 40 + {_VMSCAN_THROTTLE_CONGESTED, "VMSCAN_THROTTLE_CONGESTED"} \ 39 41 ) : "VMSCAN_THROTTLE_NONE" 40 42 41 43
+55 -9
mm/vmscan.c
··· 1021 1021 unlock_page(page); 1022 1022 } 1023 1023 1024 + static bool skip_throttle_noprogress(pg_data_t *pgdat) 1025 + { 1026 + int reclaimable = 0, write_pending = 0; 1027 + int i; 1028 + 1029 + /* 1030 + * If kswapd is disabled, reschedule if necessary but do not 1031 + * throttle as the system is likely near OOM. 1032 + */ 1033 + if (pgdat->kswapd_failures >= MAX_RECLAIM_RETRIES) 1034 + return true; 1035 + 1036 + /* 1037 + * If there are a lot of dirty/writeback pages then do not 1038 + * throttle as throttling will occur when the pages cycle 1039 + * towards the end of the LRU if still under writeback. 1040 + */ 1041 + for (i = 0; i < MAX_NR_ZONES; i++) { 1042 + struct zone *zone = pgdat->node_zones + i; 1043 + 1044 + if (!populated_zone(zone)) 1045 + continue; 1046 + 1047 + reclaimable += zone_reclaimable_pages(zone); 1048 + write_pending += zone_page_state_snapshot(zone, 1049 + NR_ZONE_WRITE_PENDING); 1050 + } 1051 + if (2 * write_pending <= reclaimable) 1052 + return true; 1053 + 1054 + return false; 1055 + } 1056 + 1024 1057 void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason) 1025 1058 { 1026 1059 wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason]; ··· 1089 1056 } 1090 1057 1091 1058 break; 1059 + case VMSCAN_THROTTLE_CONGESTED: 1060 + fallthrough; 1092 1061 case VMSCAN_THROTTLE_NOPROGRESS: 1093 - timeout = HZ/2; 1062 + if (skip_throttle_noprogress(pgdat)) { 1063 + cond_resched(); 1064 + return; 1065 + } 1066 + 1067 + timeout = 1; 1068 + 1094 1069 break; 1095 1070 case VMSCAN_THROTTLE_ISOLATED: 1096 1071 timeout = HZ/50; ··· 3362 3321 if (!current_is_kswapd() && current_may_throttle() && 3363 3322 !sc->hibernation_mode && 3364 3323 test_bit(LRUVEC_CONGESTED, &target_lruvec->flags)) 3365 - reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); 3324 + reclaim_throttle(pgdat, VMSCAN_THROTTLE_CONGESTED); 3366 3325 3367 3326 if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, 3368 3327 sc)) ··· 3427 3386 } 3428 3387 3429 3388 /* 3430 - * Do not throttle kswapd on NOPROGRESS as it will throttle on 3431 - * VMSCAN_THROTTLE_WRITEBACK if there are too many pages under 3432 - * writeback and marked for immediate reclaim at the tail of 3433 - * the LRU. 3389 + * Do not throttle kswapd or cgroup reclaim on NOPROGRESS as it will 3390 + * throttle on VMSCAN_THROTTLE_WRITEBACK if there are too many pages 3391 + * under writeback and marked for immediate reclaim at the tail of the 3392 + * LRU. 3434 3393 */ 3435 - if (current_is_kswapd()) 3394 + if (current_is_kswapd() || cgroup_reclaim(sc)) 3436 3395 return; 3437 3396 3438 3397 /* Throttle if making no progress at high prioities. */ 3439 - if (sc->priority < DEF_PRIORITY - 2) 3398 + if (sc->priority == 1 && !sc->nr_reclaimed) 3440 3399 reclaim_throttle(pgdat, VMSCAN_THROTTLE_NOPROGRESS); 3441 3400 } 3442 3401 ··· 3456 3415 unsigned long nr_soft_scanned; 3457 3416 gfp_t orig_mask; 3458 3417 pg_data_t *last_pgdat = NULL; 3418 + pg_data_t *first_pgdat = NULL; 3459 3419 3460 3420 /* 3461 3421 * If the number of buffer_heads in the machine exceeds the maximum ··· 3520 3478 /* need some check for avoid more shrink_zone() */ 3521 3479 } 3522 3480 3481 + if (!first_pgdat) 3482 + first_pgdat = zone->zone_pgdat; 3483 + 3523 3484 /* See comment about same check for global reclaim above */ 3524 3485 if (zone->zone_pgdat == last_pgdat) 3525 3486 continue; 3526 3487 last_pgdat = zone->zone_pgdat; 3527 3488 shrink_node(zone->zone_pgdat, sc); 3528 - consider_reclaim_throttle(zone->zone_pgdat, sc); 3529 3489 } 3490 + 3491 + consider_reclaim_throttle(first_pgdat, sc); 3530 3492 3531 3493 /* 3532 3494 * Restore to original mask to avoid the impact on the caller if we