Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values

Patch series "mm/damon/modules: detect and use fresh status", v3.

DAMON modules including DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
commonly expose the kdamond running status via their parameters. Under
certain scenarios including wrong user inputs and memory allocation
failures, those parameter values can be stale. It can confuse users. For
DAMON_RECLAIM and DAMON_LRU_SORT, it even makes the kdamond unable to be
restarted before the system reboot.

The problem comes from the fact that there are multiple events for the
status changes and it is difficult to follow up all the scenarios. Fix
the issue by detecting and using the status on demand, instead of using a
cached status that is difficult to be updated.

Patches 1-3 fix the bugs in DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
in the order.


This patch (of 3):

DAMON_RECLAIM updates 'enabled' and 'kdamond_pid' parameter values, which
represents the running status of its kdamond, when the user explicitly
requests start/stop of the kdamond. The kdamond can, however, be stopped
in events other than the explicit user request in the following three
events.

1. ctx->regions_score_histogram allocation failure at beginning of the
execution,
2. damon_commit_ctx() failure due to invalid user input, and
3. damon_commit_ctx() failure due to its internal allocation failures.

Hence, if the kdamond is stopped by the above three events, the values of
the status parameters can be stale. Users could show the stale values and
be confused. This is already bad, but the real consequence is worse.
DAMON_RECLAIM avoids unnecessary damon_start() and damon_stop() calls
based on the 'enabled' parameter value. And the update of 'enabled'
parameter value depends on the damon_start() and damon_stop() call
results. Hence, once the kdamond has stopped by the unintentional events,
the user cannot restart the kdamond before the system reboot. For
example, the issue can be reproduced via below steps.

# cd /sys/module/damon_reclaim/parameters
#
# # start DAMON_RECLAIM
# echo Y > enabled
# ps -ef | grep kdamond
root 806 2 0 17:53 ? 00:00:00 [kdamond.0]
root 808 803 0 17:53 pts/4 00:00:00 grep kdamond
#
# # commit wrong input to stop kdamond withou explicit stop request
# echo 3 > addr_unit
# echo Y > commit_inputs
bash: echo: write error: Invalid argument
#
# # confirm kdamond is stopped
# ps -ef | grep kdamond
root 811 803 0 17:53 pts/4 00:00:00 grep kdamond
#
# # users casn now show stable status
# cat enabled
Y
# cat kdamond_pid
806
#
# # even after fixing the wrong parameter,
# # kdamond cannot be restarted.
# echo 1 > addr_unit
# echo Y > enabled
# ps -ef | grep kdamond
root 815 803 0 17:54 pts/4 00:00:00 grep kdamond

The problem will only rarely happen in real and common setups for the
following reasons. The allocation failures are unlikely in such setups
since those allocations are arguably too small to fail. Also sane users
on real production environments may not commit wrong input parameters.
But once it happens, the consequence is quite bad. And the bug is a bug.

The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when those
are requested.

Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org
Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org
Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update")
Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

SeongJae Park and committed by
Andrew Morton
64a140af a3907a31

+55 -30
+55 -30
mm/damon/reclaim.c
··· 144 144 static bool skip_anon __read_mostly; 145 145 module_param(skip_anon, bool, 0600); 146 146 147 - /* 148 - * PID of the DAMON thread 149 - * 150 - * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. 151 - * Else, -1. 152 - */ 153 - static int kdamond_pid __read_mostly = -1; 154 - module_param(kdamond_pid, int, 0400); 155 - 156 147 static struct damos_stat damon_reclaim_stat; 157 148 DEFINE_DAMON_MODULES_DAMOS_STATS_PARAMS(damon_reclaim_stat, 158 149 reclaim_tried_regions, reclaimed_regions, quota_exceeds); ··· 279 288 { 280 289 int err; 281 290 282 - if (!on) { 283 - err = damon_stop(&ctx, 1); 284 - if (!err) 285 - kdamond_pid = -1; 286 - return err; 287 - } 291 + if (!on) 292 + return damon_stop(&ctx, 1); 288 293 289 294 err = damon_reclaim_apply_parameters(); 290 295 if (err) ··· 289 302 err = damon_start(&ctx, 1, true); 290 303 if (err) 291 304 return err; 292 - kdamond_pid = damon_kdamond_pid(ctx); 293 - if (kdamond_pid < 0) 294 - return kdamond_pid; 295 305 return damon_call(ctx, &call_control); 296 306 } 297 307 ··· 316 332 MODULE_PARM_DESC(addr_unit, 317 333 "Scale factor for DAMON_RECLAIM to ops address conversion (default: 1)"); 318 334 335 + static bool damon_reclaim_enabled(void) 336 + { 337 + if (!ctx) 338 + return false; 339 + return damon_is_running(ctx); 340 + } 341 + 319 342 static int damon_reclaim_enabled_store(const char *val, 320 343 const struct kernel_param *kp) 321 344 { 322 - bool is_enabled = enabled; 323 - bool enable; 324 345 int err; 325 346 326 - err = kstrtobool(val, &enable); 347 + err = kstrtobool(val, &enabled); 327 348 if (err) 328 349 return err; 329 350 330 - if (is_enabled == enable) 351 + if (damon_reclaim_enabled() == enabled) 331 352 return 0; 332 353 333 354 /* Called before init function. The function will handle this. */ 334 355 if (!damon_initialized()) 335 - goto set_param_out; 356 + return 0; 336 357 337 - err = damon_reclaim_turn(enable); 338 - if (err) 339 - return err; 358 + return damon_reclaim_turn(enabled); 359 + } 340 360 341 - set_param_out: 342 - enabled = enable; 343 - return err; 361 + static int damon_reclaim_enabled_load(char *buffer, 362 + const struct kernel_param *kp) 363 + { 364 + return sprintf(buffer, "%c\n", damon_reclaim_enabled() ? 'Y' : 'N'); 344 365 } 345 366 346 367 static const struct kernel_param_ops enabled_param_ops = { 347 368 .set = damon_reclaim_enabled_store, 348 - .get = param_get_bool, 369 + .get = damon_reclaim_enabled_load, 349 370 }; 350 371 351 372 module_param_cb(enabled, &enabled_param_ops, &enabled, 0600); 352 373 MODULE_PARM_DESC(enabled, 353 374 "Enable or disable DAMON_RECLAIM (default: disabled)"); 375 + 376 + static int damon_reclaim_kdamond_pid_store(const char *val, 377 + const struct kernel_param *kp) 378 + { 379 + /* 380 + * kdamond_pid is read-only, but kernel command line could write it. 381 + * Do nothing here. 382 + */ 383 + return 0; 384 + } 385 + 386 + static int damon_reclaim_kdamond_pid_load(char *buffer, 387 + const struct kernel_param *kp) 388 + { 389 + int kdamond_pid = -1; 390 + 391 + if (ctx) { 392 + kdamond_pid = damon_kdamond_pid(ctx); 393 + if (kdamond_pid < 0) 394 + kdamond_pid = -1; 395 + } 396 + return sprintf(buffer, "%d\n", kdamond_pid); 397 + } 398 + 399 + static const struct kernel_param_ops kdamond_pid_param_ops = { 400 + .set = damon_reclaim_kdamond_pid_store, 401 + .get = damon_reclaim_kdamond_pid_load, 402 + }; 403 + 404 + /* 405 + * PID of the DAMON thread 406 + * 407 + * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. 408 + * Else, -1. 409 + */ 410 + module_param_cb(kdamond_pid, &kdamond_pid_param_ops, NULL, 0400); 354 411 355 412 static int __init damon_reclaim_init(void) 356 413 {