Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
"Additional power management updates.

These fix a corner-case suspend-to-idle wakeup issue on systems where
the ACPI SCI is shared with another wakeup source, add a kernel
command line option to set pm_debug_messages via the kernel command
line, add a document desctibing system-wide suspend and resume code
flows, modify cpufreq Kconfig to choose schedutil as the preferred
governor by default in a couple of cases and do some assorted
cleanups.

Specifics:

- Fix corner-case suspend-to-idle wakeup issue on systems where the
ACPI SCI is shared with another wakeup source (Hans de Goede).

- Add document describing system-wide suspend and resume code flows
to the admin guide (Rafael Wysocki).

- Add kernel command line option to set pm_debug_messages (Chen Yu).

- Choose schedutil as the preferred scaling governor by default on
ARM big.LITTLE systems and on x86 systems using the intel_pstate
driver in the passive mode (Linus Walleij, Rafael Wysocki).

- Drop racy and redundant checks from the PM core's device_prepare()
routine (Rafael Wysocki).

- Make resume from hibernation take the hibernation_restore() return
value into account (Dexuan Cui)"

* tag 'pm-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
platform/x86: intel_int0002_vgpio: Use acpi_register_wakeup_handler()
ACPI: PM: Add acpi_[un]register_wakeup_handler()
Documentation: PM: sleep: Document system-wide suspend code flows
cpufreq: Select schedutil when using big.LITTLE
PM: sleep: Add pm_debug_messages kernel command line option
PM: sleep: core: Drop racy and redundant checks from device_prepare()
PM: hibernate: Propagate the return value of hibernation_restore()
cpufreq: intel_pstate: Select schedutil as the default governor

+389 -8
+3
Documentation/admin-guide/kernel-parameters.txt
··· 3720 3720 Override pmtimer IOPort with a hex value. 3721 3721 e.g. pmtmr=0x508 3722 3722 3723 + pm_debug_messages [SUSPEND,KNL] 3724 + Enable suspend/resume debug messages during boot up. 3725 + 3723 3726 pnp.debug=1 [PNP] 3724 3727 Enable PNP debug messages (depends on the 3725 3728 CONFIG_PNP_DEBUG_MESSAGES option). Change at run-time
+270
Documentation/admin-guide/pm/suspend-flows.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 3 + 4 + ========================= 5 + System Suspend Code Flows 6 + ========================= 7 + 8 + :Copyright: |copy| 2020 Intel Corporation 9 + 10 + :Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 11 + 12 + At least one global system-wide transition needs to be carried out for the 13 + system to get from the working state into one of the supported 14 + :doc:`sleep states <sleep-states>`. Hibernation requires more than one 15 + transition to occur for this purpose, but the other sleep states, commonly 16 + referred to as *system-wide suspend* (or simply *system suspend*) states, need 17 + only one. 18 + 19 + For those sleep states, the transition from the working state of the system into 20 + the target sleep state is referred to as *system suspend* too (in the majority 21 + of cases, whether this means a transition or a sleep state of the system should 22 + be clear from the context) and the transition back from the sleep state into the 23 + working state is referred to as *system resume*. 24 + 25 + The kernel code flows associated with the suspend and resume transitions for 26 + different sleep states of the system are quite similar, but there are some 27 + significant differences between the :ref:`suspend-to-idle <s2idle>` code flows 28 + and the code flows related to the :ref:`suspend-to-RAM <s2ram>` and 29 + :ref:`standby <standby>` sleep states. 30 + 31 + The :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states 32 + cannot be implemented without platform support and the difference between them 33 + boils down to the platform-specific actions carried out by the suspend and 34 + resume hooks that need to be provided by the platform driver to make them 35 + available. Apart from that, the suspend and resume code flows for these sleep 36 + states are mostly identical, so they both together will be referred to as 37 + *platform-dependent suspend* states in what follows. 38 + 39 + 40 + .. _s2idle_suspend: 41 + 42 + Suspend-to-idle Suspend Code Flow 43 + ================================= 44 + 45 + The following steps are taken in order to transition the system from the working 46 + state to the :ref:`suspend-to-idle <s2idle>` sleep state: 47 + 48 + 1. Invoking system-wide suspend notifiers. 49 + 50 + Kernel subsystems can register callbacks to be invoked when the suspend 51 + transition is about to occur and when the resume transition has finished. 52 + 53 + That allows them to prepare for the change of the system state and to clean 54 + up after getting back to the working state. 55 + 56 + 2. Freezing tasks. 57 + 58 + Tasks are frozen primarily in order to avoid unchecked hardware accesses 59 + from user space through MMIO regions or I/O registers exposed directly to 60 + it and to prevent user space from entering the kernel while the next step 61 + of the transition is in progress (which might have been problematic for 62 + various reasons). 63 + 64 + All user space tasks are intercepted as though they were sent a signal and 65 + put into uninterruptible sleep until the end of the subsequent system resume 66 + transition. 67 + 68 + The kernel threads that choose to be frozen during system suspend for 69 + specific reasons are frozen subsequently, but they are not intercepted. 70 + Instead, they are expected to periodically check whether or not they need 71 + to be frozen and to put themselves into uninterruptible sleep if so. [Note, 72 + however, that kernel threads can use locking and other concurrency controls 73 + available in kernel space to synchronize themselves with system suspend and 74 + resume, which can be much more precise than the freezing, so the latter is 75 + not a recommended option for kernel threads.] 76 + 77 + 3. Suspending devices and reconfiguring IRQs. 78 + 79 + Devices are suspended in four phases called *prepare*, *suspend*, 80 + *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more 81 + information on what exactly happens in each phase). 82 + 83 + Every device is visited in each phase, but typically it is not physically 84 + accessed in more than two of them. 85 + 86 + The runtime PM API is disabled for every device during the *late* suspend 87 + phase and high-level ("action") interrupt handlers are prevented from being 88 + invoked before the *noirq* suspend phase. 89 + 90 + Interrupts are still handled after that, but they are only acknowledged to 91 + interrupt controllers without performing any device-specific actions that 92 + would be triggered in the working state of the system (those actions are 93 + deferred till the subsequent system resume transition as described 94 + `below <s2idle_resume_>`_). 95 + 96 + IRQs associated with system wakeup devices are "armed" so that the resume 97 + transition of the system is started when one of them signals an event. 98 + 99 + 4. Freezing the scheduler tick and suspending timekeeping. 100 + 101 + When all devices have been suspended, CPUs enter the idle loop and are put 102 + into the deepest available idle state. While doing that, each of them 103 + "freezes" its own scheduler tick so that the timer events associated with 104 + the tick do not occur until the CPU is woken up by another interrupt source. 105 + 106 + The last CPU to enter the idle state also stops the timekeeping which 107 + (among other things) prevents high resolution timers from triggering going 108 + forward until the first CPU that is woken up restarts the timekeeping. 109 + That allows the CPUs to stay in the deep idle state relatively long in one 110 + go. 111 + 112 + From this point on, the CPUs can only be woken up by non-timer hardware 113 + interrupts. If that happens, they go back to the idle state unless the 114 + interrupt that woke up one of them comes from an IRQ that has been armed for 115 + system wakeup, in which case the system resume transition is started. 116 + 117 + 118 + .. _s2idle_resume: 119 + 120 + Suspend-to-idle Resume Code Flow 121 + ================================ 122 + 123 + The following steps are taken in order to transition the system from the 124 + :ref:`suspend-to-idle <s2idle>` sleep state into the working state: 125 + 126 + 1. Resuming timekeeping and unfreezing the scheduler tick. 127 + 128 + When one of the CPUs is woken up (by a non-timer hardware interrupt), it 129 + leaves the idle state entered in the last step of the preceding suspend 130 + transition, restarts the timekeeping (unless it has been restarted already 131 + by another CPU that woke up earlier) and the scheduler tick on that CPU is 132 + unfrozen. 133 + 134 + If the interrupt that has woken up the CPU was armed for system wakeup, 135 + the system resume transition begins. 136 + 137 + 2. Resuming devices and restoring the working-state configuration of IRQs. 138 + 139 + Devices are resumed in four phases called *noirq resume*, *early resume*, 140 + *resume* and *complete* (see :ref:`driverapi_pm_devices` for more 141 + information on what exactly happens in each phase). 142 + 143 + Every device is visited in each phase, but typically it is not physically 144 + accessed in more than two of them. 145 + 146 + The working-state configuration of IRQs is restored after the *noirq* resume 147 + phase and the runtime PM API is re-enabled for every device whose driver 148 + supports it during the *early* resume phase. 149 + 150 + 3. Thawing tasks. 151 + 152 + Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_ 153 + transition are "thawed", which means that they are woken up from the 154 + uninterruptible sleep that they went into at that time and user space tasks 155 + are allowed to exit the kernel. 156 + 157 + 4. Invoking system-wide resume notifiers. 158 + 159 + This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition 160 + and the same set of callbacks is invoked at this point, but a different 161 + "notification type" parameter value is passed to them. 162 + 163 + 164 + Platform-dependent Suspend Code Flow 165 + ==================================== 166 + 167 + The following steps are taken in order to transition the system from the working 168 + state to platform-dependent suspend state: 169 + 170 + 1. Invoking system-wide suspend notifiers. 171 + 172 + This step is the same as step 1 of the suspend-to-idle suspend transition 173 + described `above <s2idle_suspend_>`_. 174 + 175 + 2. Freezing tasks. 176 + 177 + This step is the same as step 2 of the suspend-to-idle suspend transition 178 + described `above <s2idle_suspend_>`_. 179 + 180 + 3. Suspending devices and reconfiguring IRQs. 181 + 182 + This step is analogous to step 3 of the suspend-to-idle suspend transition 183 + described `above <s2idle_suspend_>`_, but the arming of IRQs for system 184 + wakeup generally does not have any effect on the platform. 185 + 186 + There are platforms that can go into a very deep low-power state internally 187 + when all CPUs in them are in sufficiently deep idle states and all I/O 188 + devices have been put into low-power states. On those platforms, 189 + suspend-to-idle can reduce system power very effectively. 190 + 191 + On the other platforms, however, low-level components (like interrupt 192 + controllers) need to be turned off in a platform-specific way (implemented 193 + in the hooks provided by the platform driver) to achieve comparable power 194 + reduction. 195 + 196 + That usually prevents in-band hardware interrupts from waking up the system, 197 + which must be done in a special platform-dependent way. Then, the 198 + configuration of system wakeup sources usually starts when system wakeup 199 + devices are suspended and is finalized by the platform suspend hooks later 200 + on. 201 + 202 + 4. Disabling non-boot CPUs. 203 + 204 + On some platforms the suspend hooks mentioned above must run in a one-CPU 205 + configuration of the system (in particular, the hardware cannot be accessed 206 + by any code running in parallel with the platform suspend hooks that may, 207 + and often do, trap into the platform firmware in order to finalize the 208 + suspend transition). 209 + 210 + For this reason, the CPU offline/online (CPU hotplug) framework is used 211 + to take all of the CPUs in the system, except for one (the boot CPU), 212 + offline (typically, the CPUs that have been taken offline go into deep idle 213 + states). 214 + 215 + This means that all tasks are migrated away from those CPUs and all IRQs are 216 + rerouted to the only CPU that remains online. 217 + 218 + 5. Suspending core system components. 219 + 220 + This prepares the core system components for (possibly) losing power going 221 + forward and suspends the timekeeping. 222 + 223 + 6. Platform-specific power removal. 224 + 225 + This is expected to remove power from all of the system components except 226 + for the memory controller and RAM (in order to preserve the contents of the 227 + latter) and some devices designated for system wakeup. 228 + 229 + In many cases control is passed to the platform firmware which is expected 230 + to finalize the suspend transition as needed. 231 + 232 + 233 + Platform-dependent Resume Code Flow 234 + =================================== 235 + 236 + The following steps are taken in order to transition the system from a 237 + platform-dependent suspend state into the working state: 238 + 239 + 1. Platform-specific system wakeup. 240 + 241 + The platform is woken up by a signal from one of the designated system 242 + wakeup devices (which need not be an in-band hardware interrupt) and 243 + control is passed back to the kernel (the working configuration of the 244 + platform may need to be restored by the platform firmware before the 245 + kernel gets control again). 246 + 247 + 2. Resuming core system components. 248 + 249 + The suspend-time configuration of the core system components is restored and 250 + the timekeeping is resumed. 251 + 252 + 3. Re-enabling non-boot CPUs. 253 + 254 + The CPUs disabled in step 4 of the preceding suspend transition are taken 255 + back online and their suspend-time configuration is restored. 256 + 257 + 4. Resuming devices and restoring the working-state configuration of IRQs. 258 + 259 + This step is the same as step 2 of the suspend-to-idle suspend transition 260 + described `above <s2idle_resume_>`_. 261 + 262 + 5. Thawing tasks. 263 + 264 + This step is the same as step 3 of the suspend-to-idle suspend transition 265 + described `above <s2idle_resume_>`_. 266 + 267 + 6. Invoking system-wide resume notifiers. 268 + 269 + This step is the same as step 4 of the suspend-to-idle suspend transition 270 + described `above <s2idle_resume_>`_.
+1
Documentation/admin-guide/pm/system-wide.rst
··· 8 8 :maxdepth: 2 9 9 10 10 sleep-states 11 + suspend-flows
+4
drivers/acpi/sleep.c
··· 1009 1009 if (acpi_any_fixed_event_status_set()) 1010 1010 return true; 1011 1011 1012 + /* Check wakeups from drivers sharing the SCI. */ 1013 + if (acpi_check_wakeup_handlers()) 1014 + return true; 1015 + 1012 1016 /* 1013 1017 * If the status bit is set for any enabled GPE other than the 1014 1018 * EC one, the wakeup is regarded as a genuine one.
+1
drivers/acpi/sleep.h
··· 2 2 3 3 extern void acpi_enable_wakeup_devices(u8 sleep_state); 4 4 extern void acpi_disable_wakeup_devices(u8 sleep_state); 5 + extern bool acpi_check_wakeup_handlers(void); 5 6 6 7 extern struct list_head acpi_wakeup_device_list; 7 8 extern struct mutex acpi_device_lock;
+81
drivers/acpi/wakeup.c
··· 12 12 #include "internal.h" 13 13 #include "sleep.h" 14 14 15 + struct acpi_wakeup_handler { 16 + struct list_head list_node; 17 + bool (*wakeup)(void *context); 18 + void *context; 19 + }; 20 + 21 + static LIST_HEAD(acpi_wakeup_handler_head); 22 + static DEFINE_MUTEX(acpi_wakeup_handler_mutex); 23 + 15 24 /* 16 25 * We didn't lock acpi_device_lock in the file, because it invokes oops in 17 26 * suspend/resume and isn't really required as this is called in S-state. At ··· 98 89 } 99 90 mutex_unlock(&acpi_device_lock); 100 91 return 0; 92 + } 93 + 94 + /** 95 + * acpi_register_wakeup_handler - Register wakeup handler 96 + * @wake_irq: The IRQ through which the device may receive wakeups 97 + * @wakeup: Wakeup-handler to call when the SCI has triggered a wakeup 98 + * @context: Context to pass to the handler when calling it 99 + * 100 + * Drivers which may share an IRQ with the SCI can use this to register 101 + * a handler which returns true when the device they are managing wants 102 + * to trigger a wakeup. 103 + */ 104 + int acpi_register_wakeup_handler(int wake_irq, bool (*wakeup)(void *context), 105 + void *context) 106 + { 107 + struct acpi_wakeup_handler *handler; 108 + 109 + /* 110 + * If the device is not sharing its IRQ with the SCI, there is no 111 + * need to register the handler. 112 + */ 113 + if (!acpi_sci_irq_valid() || wake_irq != acpi_sci_irq) 114 + return 0; 115 + 116 + handler = kmalloc(sizeof(*handler), GFP_KERNEL); 117 + if (!handler) 118 + return -ENOMEM; 119 + 120 + handler->wakeup = wakeup; 121 + handler->context = context; 122 + 123 + mutex_lock(&acpi_wakeup_handler_mutex); 124 + list_add(&handler->list_node, &acpi_wakeup_handler_head); 125 + mutex_unlock(&acpi_wakeup_handler_mutex); 126 + 127 + return 0; 128 + } 129 + EXPORT_SYMBOL_GPL(acpi_register_wakeup_handler); 130 + 131 + /** 132 + * acpi_unregister_wakeup_handler - Unregister wakeup handler 133 + * @wakeup: Wakeup-handler passed to acpi_register_wakeup_handler() 134 + * @context: Context passed to acpi_register_wakeup_handler() 135 + */ 136 + void acpi_unregister_wakeup_handler(bool (*wakeup)(void *context), 137 + void *context) 138 + { 139 + struct acpi_wakeup_handler *handler; 140 + 141 + mutex_lock(&acpi_wakeup_handler_mutex); 142 + list_for_each_entry(handler, &acpi_wakeup_handler_head, list_node) { 143 + if (handler->wakeup == wakeup && handler->context == context) { 144 + list_del(&handler->list_node); 145 + kfree(handler); 146 + break; 147 + } 148 + } 149 + mutex_unlock(&acpi_wakeup_handler_mutex); 150 + } 151 + EXPORT_SYMBOL_GPL(acpi_unregister_wakeup_handler); 152 + 153 + bool acpi_check_wakeup_handlers(void) 154 + { 155 + struct acpi_wakeup_handler *handler; 156 + 157 + /* No need to lock, nothing else is running when we're called. */ 158 + list_for_each_entry(handler, &acpi_wakeup_handler_head, list_node) { 159 + if (handler->wakeup(handler->context)) 160 + return true; 161 + } 162 + 163 + return false; 101 164 }
+1 -6
drivers/base/power/main.c
··· 1922 1922 if (dev->power.syscore) 1923 1923 return 0; 1924 1924 1925 - WARN_ON(!pm_runtime_enabled(dev) && 1926 - dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND | 1927 - DPM_FLAG_LEAVE_SUSPENDED)); 1928 - 1929 1925 /* 1930 1926 * If a device's parent goes into runtime suspend at the wrong time, 1931 1927 * it won't be possible to resume the device. To prevent this we ··· 1969 1973 */ 1970 1974 spin_lock_irq(&dev->power.lock); 1971 1975 dev->power.direct_complete = state.event == PM_EVENT_SUSPEND && 1972 - ((pm_runtime_suspended(dev) && ret > 0) || 1973 - dev->power.no_pm_callbacks) && 1976 + (ret > 0 || dev->power.no_pm_callbacks) && 1974 1977 !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP); 1975 1978 spin_unlock_irq(&dev->power.lock); 1976 1979 return 0;
+3 -1
drivers/cpufreq/Kconfig
··· 37 37 choice 38 38 prompt "Default CPUFreq governor" 39 39 default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ 40 + default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if BIG_LITTLE 41 + default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if X86_INTEL_PSTATE && SMP 40 42 default CPU_FREQ_DEFAULT_GOV_PERFORMANCE 41 43 help 42 44 This option sets which CPUFreq governor shall be loaded at 43 - startup. If in doubt, select 'performance'. 45 + startup. If in doubt, use the default setting. 44 46 45 47 config CPU_FREQ_DEFAULT_GOV_PERFORMANCE 46 48 bool "performance"
+2
drivers/cpufreq/Kconfig.x86
··· 8 8 depends on X86 9 9 select ACPI_PROCESSOR if ACPI 10 10 select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_MC_PRIO 11 + select CPU_FREQ_GOV_PERFORMANCE 12 + select CPU_FREQ_GOV_SCHEDUTIL if SMP 11 13 help 12 14 This driver provides a P state for Intel core processors. 13 15 The driver implements an internal governor and will become
+10
drivers/platform/x86/intel_int0002_vgpio.c
··· 127 127 return IRQ_HANDLED; 128 128 } 129 129 130 + static bool int0002_check_wake(void *data) 131 + { 132 + u32 gpe_sts_reg; 133 + 134 + gpe_sts_reg = inl(GPE0A_STS_PORT); 135 + return (gpe_sts_reg & GPE0A_PME_B0_STS_BIT); 136 + } 137 + 130 138 static struct irq_chip int0002_byt_irqchip = { 131 139 .name = DRV_NAME, 132 140 .irq_ack = int0002_irq_ack, ··· 228 220 return ret; 229 221 } 230 222 223 + acpi_register_wakeup_handler(irq, int0002_check_wake, NULL); 231 224 device_init_wakeup(dev, true); 232 225 return 0; 233 226 } ··· 236 227 static int int0002_remove(struct platform_device *pdev) 237 228 { 238 229 device_init_wakeup(&pdev->dev, false); 230 + acpi_unregister_wakeup_handler(int0002_check_wake, NULL); 239 231 return 0; 240 232 } 241 233
+5
include/linux/acpi.h
··· 488 488 void __init acpi_sleep_no_blacklist(void); 489 489 #endif /* CONFIG_PM_SLEEP */ 490 490 491 + int acpi_register_wakeup_handler( 492 + int wake_irq, bool (*wakeup)(void *context), void *context); 493 + void acpi_unregister_wakeup_handler( 494 + bool (*wakeup)(void *context), void *context); 495 + 491 496 struct acpi_osc_context { 492 497 char *uuid_str; /* UUID string */ 493 498 int rev;
+1 -1
kernel/power/hibernate.c
··· 678 678 error = swsusp_read(&flags); 679 679 swsusp_close(FMODE_READ); 680 680 if (!error) 681 - hibernation_restore(flags & SF_PLATFORM_MODE); 681 + error = hibernation_restore(flags & SF_PLATFORM_MODE); 682 682 683 683 pr_err("Failed to load image, recovering.\n"); 684 684 swsusp_free();
+7
kernel/power/main.c
··· 535 535 536 536 power_attr(pm_debug_messages); 537 537 538 + static int __init pm_debug_messages_setup(char *str) 539 + { 540 + pm_debug_messages_on = true; 541 + return 1; 542 + } 543 + __setup("pm_debug_messages", pm_debug_messages_setup); 544 + 538 545 /** 539 546 * __pm_pr_dbg - Print a suspend debug message to the kernel log. 540 547 * @defer: Whether or not to use printk_deferred() to print the message.