Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'pm-cpuidle'

Merge cpuidle updates for 6.20-rc1/7.0-rc1:

- Add a command line option to adjust the C-states table in the
intel_idle driver, remove the 'preferred_cstates' module parameter
from it, add C-states validation to it and clean it up (Artem
Bityutskiy)

- Make the menu cpuidle governor always check the time till the closest
timer event when the scheduler tick has been stopped to prevent it
from mistakenly selecting the deepest available idle state (Rafael
Wysocki)

- Update the teo cpuidle governor to avoid making suboptimal decisions
in certain corner cases and generally improve idle state selection
accuracy (Rafael Wysocki)

- Remove an unlikely() annotation on the early-return condition in
menu_select() that leads to branch misprediction 100% of the time
on systems with only 1 idle state enabled, like ARM64 servers (Breno
Leitao)

- Add Christian Loehle to MAINTAINERS as a cpuidle reviewer (Christian
Loehle)

* pm-cpuidle:
cpuidle: governors: teo: Refine intercepts-based idle state lookup
cpuidle: governors: teo: Adjust the classification of wakeup events
cpuidle: governors: teo: Refine tick_intercepts vs total events check
cpuidle: governors: teo: Avoid fake intercepts produced by tick
cpuidle: governors: teo: Avoid selecting states with zero-size bins
cpuidle: governors: menu: Always check timers with tick stopped
MAINTAINERS: Add myself as cpuidle reviewer
cpuidle: menu: Remove incorrect unlikely() annotation
intel_idle: Add C-states validation
intel_idle: Add cmdline option to adjust C-states table
intel_idle: Initialize sysfs after cpuidle driver initialization
intel_idle: Remove the 'preferred_cstates' parameter
intel_idle: Remove unused driver version constant

+317 -74
+1
MAINTAINERS
··· 6561 6561 CPU IDLE TIME MANAGEMENT FRAMEWORK 6562 6562 M: "Rafael J. Wysocki" <rafael@kernel.org> 6563 6563 M: Daniel Lezcano <daniel.lezcano@linaro.org> 6564 + R: Christian Loehle <christian.loehle@arm.com> 6564 6565 L: linux-pm@vger.kernel.org 6565 6566 S: Maintained 6566 6567 B: https://bugzilla.kernel.org
+12 -12
drivers/cpuidle/governors/menu.c
··· 239 239 240 240 /* Find the shortest expected idle interval. */ 241 241 predicted_ns = get_typical_interval(data) * NSEC_PER_USEC; 242 - if (predicted_ns > RESIDENCY_THRESHOLD_NS) { 242 + if (predicted_ns > RESIDENCY_THRESHOLD_NS || tick_nohz_tick_stopped()) { 243 243 unsigned int timer_us; 244 244 245 245 /* Determine the time till the closest timer. */ ··· 259 259 RESOLUTION * DECAY * NSEC_PER_USEC); 260 260 /* Use the lowest expected idle interval to pick the idle state. */ 261 261 predicted_ns = min((u64)timer_us * NSEC_PER_USEC, predicted_ns); 262 + /* 263 + * If the tick is already stopped, the cost of possible short 264 + * idle duration misprediction is much higher, because the CPU 265 + * may be stuck in a shallow idle state for a long time as a 266 + * result of it. In that case, say we might mispredict and use 267 + * the known time till the closest timer event for the idle 268 + * state selection. 269 + */ 270 + if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC) 271 + predicted_ns = data->next_timer_ns; 262 272 } else { 263 273 /* 264 274 * Because the next timer event is not going to be determined ··· 281 271 data->bucket = BUCKETS - 1; 282 272 } 283 273 284 - if (unlikely(drv->state_count <= 1 || latency_req == 0) || 274 + if (drv->state_count <= 1 || latency_req == 0 || 285 275 ((data->next_timer_ns < drv->states[1].target_residency_ns || 286 276 latency_req < drv->states[1].exit_latency_ns) && 287 277 !dev->states_usage[0].disable)) { ··· 293 283 *stop_tick = !(drv->states[0].flags & CPUIDLE_FLAG_POLLING); 294 284 return 0; 295 285 } 296 - 297 - /* 298 - * If the tick is already stopped, the cost of possible short idle 299 - * duration misprediction is much higher, because the CPU may be stuck 300 - * in a shallow idle state for a long time as a result of it. In that 301 - * case, say we might mispredict and use the known time till the closest 302 - * timer event for the idle state selection. 303 - */ 304 - if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC) 305 - predicted_ns = data->next_timer_ns; 306 286 307 287 /* 308 288 * Find the idle state with the lowest power while satisfying
+79 -19
drivers/cpuidle/governors/teo.c
··· 48 48 * in accordance with what happened last time. 49 49 * 50 50 * The "hits" metric reflects the relative frequency of situations in which the 51 - * sleep length and the idle duration measured after CPU wakeup fall into the 52 - * same bin (that is, the CPU appears to wake up "on time" relative to the sleep 53 - * length). In turn, the "intercepts" metric reflects the relative frequency of 54 - * non-timer wakeup events for which the measured idle duration falls into a bin 55 - * that corresponds to an idle state shallower than the one whose bin is fallen 56 - * into by the sleep length (these events are also referred to as "intercepts" 51 + * sleep length and the idle duration measured after CPU wakeup are close enough 52 + * (that is, the CPU appears to wake up "on time" relative to the sleep length). 53 + * In turn, the "intercepts" metric reflects the relative frequency of non-timer 54 + * wakeup events for which the measured idle duration is significantly different 55 + * from the sleep length (these events are also referred to as "intercepts" 57 56 * below). 58 57 * 59 58 * The governor also counts "intercepts" with the measured idle duration below ··· 74 75 * than the candidate one (it represents the cases in which the CPU was 75 76 * likely woken up by a non-timer wakeup source). 76 77 * 78 + * Also find the idle state with the maximum intercepts metric (if there are 79 + * multiple states with the maximum intercepts metric, choose the one with 80 + * the highest index). 81 + * 77 82 * 2. If the second sum computed in step 1 is greater than a half of the sum of 78 83 * both metrics for the candidate state bin and all subsequent bins (if any), 79 84 * a shallower idle state is likely to be more suitable, so look for it. 80 85 * 81 86 * - Traverse the enabled idle states shallower than the candidate one in the 82 - * descending order. 87 + * descending order, starting at the state with the maximum intercepts 88 + * metric found in step 1. 83 89 * 84 90 * - For each of them compute the sum of the "intercepts" metrics over all 85 91 * of the idle states between it and the candidate one (including the ··· 171 167 */ 172 168 static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) 173 169 { 170 + s64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; 174 171 struct teo_cpu *cpu_data = this_cpu_ptr(&teo_cpus); 175 172 int i, idx_timer = 0, idx_duration = 0; 176 173 s64 target_residency_ns, measured_ns; ··· 187 182 */ 188 183 measured_ns = S64_MAX; 189 184 } else { 190 - s64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; 191 - 192 185 measured_ns = dev->last_residency_ns; 193 186 /* 194 187 * The delay between the wakeup and the first instruction ··· 242 239 cpu_data->state_bins[drv->state_count-1].hits += PULSE; 243 240 return; 244 241 } 242 + /* 243 + * If intercepts within the tick period range are not frequent 244 + * enough, count this wakeup as a hit, since it is likely that 245 + * the tick has woken up the CPU because an expected intercept 246 + * was not there. Otherwise, one of the intercepts may have 247 + * been incidentally preceded by the tick wakeup. 248 + */ 249 + if (3 * cpu_data->tick_intercepts < 2 * total) { 250 + cpu_data->state_bins[idx_timer].hits += PULSE; 251 + return; 252 + } 245 253 } 246 254 247 255 /* 248 - * If the measured idle duration falls into the same bin as the sleep 249 - * length, this is a "hit", so update the "hits" metric for that bin. 256 + * If the measured idle duration (adjusted for the entered state exit 257 + * latency) falls into the same bin as the sleep length and the latter 258 + * is less than the "raw" measured idle duration (so the wakeup appears 259 + * to have occurred after the anticipated timer event), this is a "hit", 260 + * so update the "hits" metric for that bin. 261 + * 250 262 * Otherwise, update the "intercepts" metric for the bin fallen into by 251 263 * the measured idle duration. 252 264 */ 253 - if (idx_timer == idx_duration) { 265 + if (idx_timer == idx_duration && 266 + cpu_data->sleep_length_ns - measured_ns < lat_ns / 2) { 254 267 cpu_data->state_bins[idx_timer].hits += PULSE; 255 268 } else { 256 269 cpu_data->state_bins[idx_duration].intercepts += PULSE; ··· 313 294 ktime_t delta_tick = TICK_NSEC / 2; 314 295 unsigned int idx_intercept_sum = 0; 315 296 unsigned int intercept_sum = 0; 297 + unsigned int intercept_max = 0; 316 298 unsigned int idx_hit_sum = 0; 317 299 unsigned int hit_sum = 0; 300 + int intercept_max_idx = -1; 318 301 int constraint_idx = 0; 319 302 int idx0 = 0, idx = -1; 320 303 s64 duration_ns; ··· 347 326 if (!dev->states_usage[0].disable) 348 327 idx = 0; 349 328 350 - /* Compute the sums of metrics for early wakeup pattern detection. */ 329 + /* 330 + * Compute the sums of metrics for early wakeup pattern detection and 331 + * look for the state bin with the maximum intercepts metric below the 332 + * deepest enabled one (if there are multiple states with the maximum 333 + * intercepts metric, choose the one with the highest index). 334 + */ 351 335 for (i = 1; i < drv->state_count; i++) { 352 336 struct teo_bin *prev_bin = &cpu_data->state_bins[i-1]; 337 + unsigned int prev_intercepts = prev_bin->intercepts; 353 338 struct cpuidle_state *s = &drv->states[i]; 354 339 355 340 /* 356 341 * Update the sums of idle state metrics for all of the states 357 342 * shallower than the current one. 358 343 */ 359 - intercept_sum += prev_bin->intercepts; 360 344 hit_sum += prev_bin->hits; 345 + intercept_sum += prev_intercepts; 346 + /* 347 + * Check if this is the bin with the maximum number of 348 + * intercepts so far and in that case update the index of 349 + * the state with the maximum intercepts metric. 350 + */ 351 + if (prev_intercepts >= intercept_max) { 352 + intercept_max = prev_intercepts; 353 + intercept_max_idx = i - 1; 354 + } 361 355 362 356 if (dev->states_usage[i].disable) 363 357 continue; ··· 424 388 while (min_idx < idx && 425 389 drv->states[min_idx].target_residency_ns < TICK_NSEC) 426 390 min_idx++; 391 + 392 + /* 393 + * Avoid selecting a state with a lower index, but with 394 + * the same target residency as the current candidate 395 + * one. 396 + */ 397 + if (drv->states[min_idx].target_residency_ns == 398 + drv->states[idx].target_residency_ns) 399 + goto constraint; 427 400 } 428 401 429 402 /* 430 - * Look for the deepest idle state whose target residency had 431 - * not exceeded the idle duration in over a half of the relevant 432 - * cases in the past. 403 + * If the minimum state index is greater than or equal to the 404 + * index of the state with the maximum intercepts metric and 405 + * the corresponding state is enabled, there is no need to look 406 + * at the deeper states. 407 + */ 408 + if (min_idx >= intercept_max_idx && 409 + !dev->states_usage[min_idx].disable) { 410 + idx = min_idx; 411 + goto constraint; 412 + } 413 + 414 + /* 415 + * Look for the deepest enabled idle state, at most as deep as 416 + * the one with the maximum intercepts metric, whose target 417 + * residency had not been greater than the idle duration in over 418 + * a half of the relevant cases in the past. 433 419 * 434 420 * Take the possible duration limitation present if the tick 435 421 * has been stopped already into account. ··· 463 405 continue; 464 406 465 407 idx = i; 466 - if (2 * intercept_sum > idx_intercept_sum) 408 + if (2 * intercept_sum > idx_intercept_sum && 409 + i <= intercept_max_idx) 467 410 break; 468 411 } 469 412 } 470 413 414 + constraint: 471 415 /* 472 416 * If there is a latency constraint, it may be necessary to select an 473 417 * idle state shallower than the current candidate one. ··· 524 464 * total wakeup events, do not stop the tick. 525 465 */ 526 466 if (drv->states[idx].target_residency_ns < TICK_NSEC && 527 - cpu_data->tick_intercepts > cpu_data->total / 2 + cpu_data->total / 8) 467 + 3 * cpu_data->tick_intercepts >= 2 * cpu_data->total) 528 468 duration_ns = TICK_NSEC / 2; 529 469 530 470 end:
+225 -43
drivers/idle/intel_idle.c
··· 45 45 #include <linux/kernel.h> 46 46 #include <linux/cpuidle.h> 47 47 #include <linux/tick.h> 48 + #include <linux/time64.h> 48 49 #include <trace/events/power.h> 49 50 #include <linux/sched.h> 50 51 #include <linux/sched/smt.h> ··· 64 63 #include <asm/fpu/api.h> 65 64 #include <asm/smp.h> 66 65 67 - #define INTEL_IDLE_VERSION "0.5.1" 68 - 69 66 static struct cpuidle_driver intel_idle_driver = { 70 67 .name = "intel_idle", 71 68 .owner = THIS_MODULE, ··· 71 72 /* intel_idle.max_cstate=0 disables driver */ 72 73 static int max_cstate = CPUIDLE_STATE_MAX - 1; 73 74 static unsigned int disabled_states_mask __read_mostly; 74 - static unsigned int preferred_states_mask __read_mostly; 75 75 static bool force_irq_on __read_mostly; 76 76 static bool ibrs_off __read_mostly; 77 + 78 + /* The maximum allowed length for the 'table' module parameter */ 79 + #define MAX_CMDLINE_TABLE_LEN 256 80 + /* Maximum allowed C-state latency */ 81 + #define MAX_CMDLINE_LATENCY_US (5 * USEC_PER_MSEC) 82 + /* Maximum allowed C-state target residency */ 83 + #define MAX_CMDLINE_RESIDENCY_US (100 * USEC_PER_MSEC) 84 + 85 + static char cmdline_table_str[MAX_CMDLINE_TABLE_LEN] __read_mostly; 77 86 78 87 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices; 79 88 ··· 113 106 114 107 static const struct idle_cpu *icpu __initdata; 115 108 static struct cpuidle_state *cpuidle_state_table __initdata; 109 + 110 + /* C-states data from the 'intel_idle.table' cmdline parameter */ 111 + static struct cpuidle_state cmdline_states[CPUIDLE_STATE_MAX] __initdata; 116 112 117 113 static unsigned int mwait_substates __initdata; 118 114 ··· 2062 2052 } 2063 2053 2064 2054 /** 2065 - * adl_idle_state_table_update - Adjust AlderLake idle states table. 2066 - */ 2067 - static void __init adl_idle_state_table_update(void) 2068 - { 2069 - /* Check if user prefers C1 over C1E. */ 2070 - if (preferred_states_mask & BIT(1) && !(preferred_states_mask & BIT(2))) { 2071 - cpuidle_state_table[0].flags &= ~CPUIDLE_FLAG_UNUSABLE; 2072 - cpuidle_state_table[1].flags |= CPUIDLE_FLAG_UNUSABLE; 2073 - 2074 - /* Disable C1E by clearing the "C1E promotion" bit. */ 2075 - c1e_promotion = C1E_PROMOTION_DISABLE; 2076 - return; 2077 - } 2078 - 2079 - /* Make sure C1E is enabled by default */ 2080 - c1e_promotion = C1E_PROMOTION_ENABLE; 2081 - } 2082 - 2083 - /** 2084 2055 * spr_idle_state_table_update - Adjust Sapphire Rapids idle states table. 2085 2056 */ 2086 2057 static void __init spr_idle_state_table_update(void) ··· 2166 2175 case INTEL_SAPPHIRERAPIDS_X: 2167 2176 case INTEL_EMERALDRAPIDS_X: 2168 2177 spr_idle_state_table_update(); 2169 - break; 2170 - case INTEL_ALDERLAKE: 2171 - case INTEL_ALDERLAKE_L: 2172 - case INTEL_ATOM_GRACEMONT: 2173 - adl_idle_state_table_update(); 2174 2178 break; 2175 2179 case INTEL_ATOM_SILVERMONT: 2176 2180 case INTEL_ATOM_AIRMONT: ··· 2406 2420 put_device(sysfs_root); 2407 2421 } 2408 2422 2423 + /** 2424 + * get_cmdline_field - Get the current field from a cmdline string. 2425 + * @args: The cmdline string to get the current field from. 2426 + * @field: Pointer to the current field upon return. 2427 + * @sep: The fields separator character. 2428 + * 2429 + * Examples: 2430 + * Input: args="C1:1:1,C1E:2:10", sep=':' 2431 + * Output: field="C1", return "1:1,C1E:2:10" 2432 + * Input: args="C1:1:1,C1E:2:10", sep=',' 2433 + * Output: field="C1:1:1", return "C1E:2:10" 2434 + * Ipnut: args="::", sep=':' 2435 + * Output: field="", return ":" 2436 + * 2437 + * Return: The continuation of the cmdline string after the field or NULL. 2438 + */ 2439 + static char *get_cmdline_field(char *args, char **field, char sep) 2440 + { 2441 + unsigned int i; 2442 + 2443 + for (i = 0; args[i] && !isspace(args[i]); i++) { 2444 + if (args[i] == sep) 2445 + break; 2446 + } 2447 + 2448 + *field = args; 2449 + 2450 + if (args[i] != sep) 2451 + return NULL; 2452 + 2453 + args[i] = '\0'; 2454 + return args + i + 1; 2455 + } 2456 + 2457 + /** 2458 + * validate_cmdline_cstate - Validate a C-state from cmdline. 2459 + * @state: The C-state to validate. 2460 + * @prev_state: The previous C-state in the table or NULL. 2461 + * 2462 + * Return: 0 if the C-state is valid or -EINVAL otherwise. 2463 + */ 2464 + static int validate_cmdline_cstate(struct cpuidle_state *state, 2465 + struct cpuidle_state *prev_state) 2466 + { 2467 + if (state->exit_latency == 0) 2468 + /* Exit latency 0 can only be used for the POLL state */ 2469 + return -EINVAL; 2470 + 2471 + if (state->exit_latency > MAX_CMDLINE_LATENCY_US) 2472 + return -EINVAL; 2473 + 2474 + if (state->target_residency > MAX_CMDLINE_RESIDENCY_US) 2475 + return -EINVAL; 2476 + 2477 + if (state->target_residency < state->exit_latency) 2478 + return -EINVAL; 2479 + 2480 + if (!prev_state) 2481 + return 0; 2482 + 2483 + if (state->exit_latency <= prev_state->exit_latency) 2484 + return -EINVAL; 2485 + 2486 + if (state->target_residency <= prev_state->target_residency) 2487 + return -EINVAL; 2488 + 2489 + return 0; 2490 + } 2491 + 2492 + /** 2493 + * cmdline_table_adjust - Adjust the C-states table with data from cmdline. 2494 + * @drv: cpuidle driver (assumed to point to intel_idle_driver). 2495 + * 2496 + * Adjust the C-states table with data from the 'intel_idle.table' module 2497 + * parameter (if specified). 2498 + */ 2499 + static void __init cmdline_table_adjust(struct cpuidle_driver *drv) 2500 + { 2501 + char *args = cmdline_table_str; 2502 + struct cpuidle_state *state; 2503 + int i; 2504 + 2505 + if (args[0] == '\0') 2506 + /* The 'intel_idle.table' module parameter was not specified */ 2507 + return; 2508 + 2509 + /* Create a copy of the C-states table */ 2510 + for (i = 0; i < drv->state_count; i++) 2511 + cmdline_states[i] = drv->states[i]; 2512 + 2513 + /* 2514 + * Adjust the C-states table copy with data from the 'intel_idle.table' 2515 + * module parameter. 2516 + */ 2517 + while (args) { 2518 + char *fields, *name, *val; 2519 + 2520 + /* 2521 + * Get the next C-state definition, which is expected to be 2522 + * '<name>:<latency_us>:<target_residency_us>'. Treat "empty" 2523 + * fields as unchanged. For example, 2524 + * '<name>::<target_residency_us>' leaves the latency unchanged. 2525 + */ 2526 + args = get_cmdline_field(args, &fields, ','); 2527 + 2528 + /* name */ 2529 + fields = get_cmdline_field(fields, &name, ':'); 2530 + if (!fields) 2531 + goto error; 2532 + 2533 + if (!strcmp(name, "POLL")) { 2534 + pr_err("Cannot adjust POLL\n"); 2535 + continue; 2536 + } 2537 + 2538 + /* Find the C-state by its name */ 2539 + state = NULL; 2540 + for (i = 0; i < drv->state_count; i++) { 2541 + if (!strcmp(name, drv->states[i].name)) { 2542 + state = &cmdline_states[i]; 2543 + break; 2544 + } 2545 + } 2546 + 2547 + if (!state) { 2548 + pr_err("C-state '%s' was not found\n", name); 2549 + continue; 2550 + } 2551 + 2552 + /* Latency */ 2553 + fields = get_cmdline_field(fields, &val, ':'); 2554 + if (!fields) 2555 + goto error; 2556 + 2557 + if (*val) { 2558 + if (kstrtouint(val, 0, &state->exit_latency)) 2559 + goto error; 2560 + } 2561 + 2562 + /* Target residency */ 2563 + fields = get_cmdline_field(fields, &val, ':'); 2564 + 2565 + if (*val) { 2566 + if (kstrtouint(val, 0, &state->target_residency)) 2567 + goto error; 2568 + } 2569 + 2570 + /* 2571 + * Allow for 3 more fields, but ignore them. Helps to make 2572 + * possible future extensions of the cmdline format backward 2573 + * compatible. 2574 + */ 2575 + for (i = 0; fields && i < 3; i++) { 2576 + fields = get_cmdline_field(fields, &val, ':'); 2577 + if (!fields) 2578 + break; 2579 + } 2580 + 2581 + if (fields) { 2582 + pr_err("Too many fields for C-state '%s'\n", state->name); 2583 + goto error; 2584 + } 2585 + 2586 + pr_info("C-state from cmdline: name=%s, latency=%u, residency=%u\n", 2587 + state->name, state->exit_latency, state->target_residency); 2588 + } 2589 + 2590 + /* Validate the adjusted C-states, start with index 1 to skip POLL */ 2591 + for (i = 1; i < drv->state_count; i++) { 2592 + struct cpuidle_state *prev_state; 2593 + 2594 + state = &cmdline_states[i]; 2595 + prev_state = &cmdline_states[i - 1]; 2596 + 2597 + if (validate_cmdline_cstate(state, prev_state)) { 2598 + pr_err("C-state '%s' validation failed\n", state->name); 2599 + goto error; 2600 + } 2601 + } 2602 + 2603 + /* Copy the adjusted C-states table back */ 2604 + for (i = 1; i < drv->state_count; i++) 2605 + drv->states[i] = cmdline_states[i]; 2606 + 2607 + pr_info("Adjusted C-states with data from 'intel_idle.table'\n"); 2608 + return; 2609 + 2610 + error: 2611 + pr_info("Failed to adjust C-states with data from 'intel_idle.table'\n"); 2612 + } 2613 + 2409 2614 static int __init intel_idle_init(void) 2410 2615 { 2411 2616 const struct x86_cpu_id *id; ··· 2655 2478 return -ENODEV; 2656 2479 } 2657 2480 2658 - pr_debug("v" INTEL_IDLE_VERSION " model 0x%X\n", 2659 - boot_cpu_data.x86_model); 2660 - 2661 2481 intel_idle_cpuidle_devices = alloc_percpu(struct cpuidle_device); 2662 2482 if (!intel_idle_cpuidle_devices) 2663 2483 return -ENOMEM; 2664 2484 2485 + intel_idle_cpuidle_driver_init(&intel_idle_driver); 2486 + cmdline_table_adjust(&intel_idle_driver); 2487 + 2665 2488 retval = intel_idle_sysfs_init(); 2666 2489 if (retval) 2667 2490 pr_warn("failed to initialized sysfs"); 2668 - 2669 - intel_idle_cpuidle_driver_init(&intel_idle_driver); 2670 2491 2671 2492 retval = cpuidle_register_driver(&intel_idle_driver); 2672 2493 if (retval) { ··· 2713 2538 module_param_named(states_off, disabled_states_mask, uint, 0444); 2714 2539 MODULE_PARM_DESC(states_off, "Mask of disabled idle states"); 2715 2540 /* 2716 - * Some platforms come with mutually exclusive C-states, so that if one is 2717 - * enabled, the other C-states must not be used. Example: C1 and C1E on 2718 - * Sapphire Rapids platform. This parameter allows for selecting the 2719 - * preferred C-states among the groups of mutually exclusive C-states - the 2720 - * selected C-states will be registered, the other C-states from the mutually 2721 - * exclusive group won't be registered. If the platform has no mutually 2722 - * exclusive C-states, this parameter has no effect. 2723 - */ 2724 - module_param_named(preferred_cstates, preferred_states_mask, uint, 0444); 2725 - MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states"); 2726 - /* 2727 2541 * Debugging option that forces the driver to enter all C-states with 2728 2542 * interrupts enabled. Does not apply to C-states with 2729 2543 * 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags. ··· 2724 2560 */ 2725 2561 module_param(ibrs_off, bool, 0444); 2726 2562 MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle"); 2563 + 2564 + /* 2565 + * Define the C-states table from a user input string. Expected format is 2566 + * 'name:latency:residency', where: 2567 + * - name: The C-state name. 2568 + * - latency: The C-state exit latency in us. 2569 + * - residency: The C-state target residency in us. 2570 + * 2571 + * Multiple C-states can be defined by separating them with commas: 2572 + * 'name1:latency1:residency1,name2:latency2:residency2' 2573 + * 2574 + * Example: intel_idle.table=C1:1:1,C1E:5:10,C6:100:600 2575 + * 2576 + * To leave latency or residency unchanged, use an empty field, for example: 2577 + * 'C1:1:1,C1E::10' - leaves C1E latency unchanged. 2578 + */ 2579 + module_param_string(table, cmdline_table_str, MAX_CMDLINE_TABLE_LEN, 0444); 2580 + MODULE_PARM_DESC(table, "Build the C-states table from a user input string");