Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/msm/gem: Prevent blocking within shrinker loop

Consider this scenario:

1. APP1 continuously creates lots of small GEMs
2. APP2 triggers `drop_caches`
3. Shrinker starts to evict APP1 GEMs, while APP1 produces new purgeable
GEMs
4. msm_gem_shrinker_scan() returns non-zero number of freed pages
and causes shrinker to try shrink more
5. msm_gem_shrinker_scan() returns non-zero number of freed pages again,
goto 4
6. The APP2 is blocked in `drop_caches` until APP1 stops producing
purgeable GEMs

To prevent this blocking scenario, check number of remaining pages
that GPU shrinker couldn't release due to a GEM locking contention
or shrinking rejection. If there are no remaining pages left to shrink,
then there is no need to free up more pages and shrinker may break out
from the loop.

This problem was found during shrinker/madvise IOCTL testing of
virtio-gpu driver. The MSM driver is affected in the same way.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: b352ba54a820 ("drm/msm/gem: Convert to using drm_gem_lru")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Link: https://lore.kernel.org/all/20230108210445.3948344-2-dmitry.osipenko@collabora.com/

+19 -5
+7 -2
drivers/gpu/drm/drm_gem.c
··· 1375 1375 * 1376 1376 * @lru: The LRU to scan 1377 1377 * @nr_to_scan: The number of pages to try to reclaim 1378 + * @remaining: The number of pages left to reclaim, should be initialized by caller 1378 1379 * @shrink: Callback to try to shrink/reclaim the object. 1379 1380 */ 1380 1381 unsigned long 1381 - drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan, 1382 + drm_gem_lru_scan(struct drm_gem_lru *lru, 1383 + unsigned int nr_to_scan, 1384 + unsigned long *remaining, 1382 1385 bool (*shrink)(struct drm_gem_object *obj)) 1383 1386 { 1384 1387 struct drm_gem_lru still_in_lru; ··· 1420 1417 * hit shrinker in response to trying to get backing pages 1421 1418 * for this obj (ie. while it's lock is already held) 1422 1419 */ 1423 - if (!dma_resv_trylock(obj->resv)) 1420 + if (!dma_resv_trylock(obj->resv)) { 1421 + *remaining += obj->size >> PAGE_SHIFT; 1424 1422 goto tail; 1423 + } 1425 1424 1426 1425 if (shrink(obj)) { 1427 1426 freed += obj->size >> PAGE_SHIFT;
+9 -2
drivers/gpu/drm/msm/msm_gem_shrinker.c
··· 107 107 bool (*shrink)(struct drm_gem_object *obj); 108 108 bool cond; 109 109 unsigned long freed; 110 + unsigned long remaining; 110 111 } stages[] = { 111 112 /* Stages of progressively more aggressive/expensive reclaim: */ 112 113 { &priv->lru.dontneed, purge, true }, ··· 117 116 }; 118 117 long nr = sc->nr_to_scan; 119 118 unsigned long freed = 0; 119 + unsigned long remaining = 0; 120 120 121 121 for (unsigned i = 0; (nr > 0) && (i < ARRAY_SIZE(stages)); i++) { 122 122 if (!stages[i].cond) 123 123 continue; 124 124 stages[i].freed = 125 - drm_gem_lru_scan(stages[i].lru, nr, stages[i].shrink); 125 + drm_gem_lru_scan(stages[i].lru, nr, 126 + &stages[i].remaining, 127 + stages[i].shrink); 126 128 nr -= stages[i].freed; 127 129 freed += stages[i].freed; 130 + remaining += stages[i].remaining; 128 131 } 129 132 130 133 if (freed) { ··· 137 132 stages[3].freed); 138 133 } 139 134 140 - return (freed > 0) ? freed : SHRINK_STOP; 135 + return (freed > 0 && remaining > 0) ? freed : SHRINK_STOP; 141 136 } 142 137 143 138 #ifdef CONFIG_DEBUG_FS ··· 187 182 NULL, 188 183 }; 189 184 unsigned idx, unmapped = 0; 185 + unsigned long remaining = 0; 190 186 191 187 for (idx = 0; lrus[idx] && unmapped < vmap_shrink_limit; idx++) { 192 188 unmapped += drm_gem_lru_scan(lrus[idx], 193 189 vmap_shrink_limit - unmapped, 190 + &remaining, 194 191 vmap_shrink); 195 192 } 196 193
+3 -1
include/drm/drm_gem.h
··· 475 475 void drm_gem_lru_init(struct drm_gem_lru *lru, struct mutex *lock); 476 476 void drm_gem_lru_remove(struct drm_gem_object *obj); 477 477 void drm_gem_lru_move_tail(struct drm_gem_lru *lru, struct drm_gem_object *obj); 478 - unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, unsigned nr_to_scan, 478 + unsigned long drm_gem_lru_scan(struct drm_gem_lru *lru, 479 + unsigned int nr_to_scan, 480 + unsigned long *remaining, 479 481 bool (*shrink)(struct drm_gem_object *obj)); 480 482 481 483 #endif /* __DRM_GEM_H__ */