Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
"Once again, cpufreq is the most active development area, mostly
because of the new feature additions and documentation updates in the
amd-pstate driver, but there are also changes in the cpufreq core
related to boost support and other assorted updates elsewhere.

Next up are power capping changes due to the major cleanup of the
Intel RAPL driver.

On the cpuidle front, a new C-states table for Intel Panther Lake is
added to the intel_idle driver, the stopped tick handling in the menu
and teo governors is updated, and there are a couple of cleanups.

Apart from the above, support for Tegra114 is added to devfreq and
there are assorted cleanups of that code, there are also two updates
of the operating performance points (OPP) library, two minor updates
related to hibernation, and cpupower utility man pages updates and
cleanups.

Specifics:

- Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa)

- Update cpufreq-dt-platdev blocklist (Faruque Ansari)

- Minor updates to driver and dt-bindings for Tegra (Thierry Reding,
Rosen Penev)

- Add MAINTAINERS entry for CPPC driver (Viresh Kumar)

- Add support for new features: CPPC performance priority, Dynamic
EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham
Shenoy, Mario Limonciello)

- Fix sysfs files being present when HW missing and broken/outdated
documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy)

- Pass the policy to cpufreq_driver->adjust_perf() to avoid using
cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate
which leads to a scheduling-while-atomic bug (K Prateek Nayak)

- Clean up dead code in Kconfig for cpufreq (Julian Braha)

- Remove max_freq_req update for pre-existing cpufreq policy and add
a boost_freq_req QoS request to save the boost constraint instead
of overwriting the last scaling_max_freq constraint (Pierre
Gondois)

- Embed cpufreq QoS freq_req objects in cpufreq policy so they all
are allocated in one go along with the policy to simplify lifetime
rules and avoid error handling issues (Viresh Kumar)

- Use DMI max speed when CPPC is unavailable in the acpi-cpufreq
scaling driver (Henry Tseng)

- Switch policy_is_shared() in cpufreq to using cpumask_nth() instead
of cpumask_weight() because the former is more efficient (Yury
Norov)

- Use sysfs_emit() in sysfs show functions for cpufreq governor
attributes (Thorsten Blum)

- Update intel_pstate to stop returning an error when "off" is
written to its status sysfs attribute while the driver is already
off (Fabio De Francesco)

- Include current frequency in the debug message printed by
__cpufreq_driver_target() (Pengjie Zhang)

- Refine stopped tick handling in the menu cpuidle governor and
rearrange stopped tick handling in the teo cpuidle governor (Rafael
Wysocki)

- Add Panther Lake C-states table to the intel_idle driver (Artem
Bityutskiy)

- Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha)

- Simplify cpuidle_register_device() with guard() (Huisong Li)

- Use performance level if available to distinguish between rates in
OPP debugfs (Manivannan Sadhasivam)

- Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar)

- Return -ENODATA if the snapshot image is not loaded (Alberto
Garcia)

- Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric
Biggers)

- Clean up and rearrange the intel_rapl power capping driver to make
the respective interface drivers (TPMI, MSR, and MMOI) hold their
own settings and primitives and consolidate PL4 and PMU support
flags into rapl_defaults (Kuppuswamy Sathyanarayanan)

- Correct kernel-doc function parameter names in the power capping
core code (Randy Dunlap)

- Remove unneeded casting for HZ_PER_KHZ in devfreq (Andy Shevchenko)

- Use _visible attribute to replace create/remove_sysfs_files() in
devfreq (Pengjie Zhang)

- Add Tegra114 support to activity monitor device in tegra30-devfreq
as a preparation to upcoming EMC controller support (Svyatoslav
Ryhel)

- Fix mistakes in cpupower man pages, add the boost and epp options
to the cpupower-frequency-info man page, and add the perf-bias
option to the cpupower-info man page (Roberto Ricci)

- Remove unnecessary extern declarations from getopt.h in arguments
parsing functions in cpufreq-set, cpuidle-info, cpuidle-set,
cpupower-info, and cpupower-set utilities (Kaushlendra Kumar)"

* tag 'pm-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP
cpupower: remove extern declarations in cmd functions
cpuidle: Simplify cpuidle_register_device() with guard()
PM / devfreq: tegra30-devfreq: add support for Tegra114
PM / devfreq: use _visible attribute to replace create/remove_sysfs_files()
PM / devfreq: Remove unneeded casting for HZ_PER_KHZ
MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
cpufreq/amd-pstate-ut: Add a unit test for raw EPP
cpufreq/amd-pstate: Add support for raw EPP writes
cpufreq/amd-pstate: Add support for platform profile class
cpufreq/amd-pstate: add kernel command line to override dynamic epp
cpufreq/amd-pstate: Add dynamic energy performance preference
Documentation: amd-pstate: fix dead links in the reference section
cpufreq/amd-pstate: Cache the max frequency in cpudata
Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count}
Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file
Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file
amd-pstate-ut: Add a testcase to validate the visibility of driver attributes
...

+1950 -902
+7
Documentation/admin-guide/kernel-parameters.txt
··· 501 501 disable 502 502 Disable amd-pstate preferred core. 503 503 504 + amd_dynamic_epp= 505 + [X86] 506 + disable 507 + Disable amd-pstate dynamic EPP. 508 + enable 509 + Enable amd-pstate dynamic EPP. 510 + 504 511 amijoy.map= [HW,JOY] Amiga joystick support 505 512 Map of devices attached to JOY0DAT and JOY1DAT 506 513 Format: <a>,<b>
+77 -10
Documentation/admin-guide/pm/amd-pstate.rst
··· 239 239 240 240 root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd* 241 241 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf 242 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_hw_prefcore 242 243 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq 243 244 /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq 245 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_floor_freq 246 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_floor_count 247 + /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_prefcore_ranking 244 248 245 249 246 250 ``amd_pstate_highest_perf / amd_pstate_max_freq`` ··· 268 264 269 265 ``amd_pstate_hw_prefcore`` 270 266 271 - Whether the platform supports the preferred core feature and it has been 272 - enabled. This attribute is read-only. 267 + Whether the platform supports the preferred core feature and it has 268 + been enabled. This attribute is read-only. This file is only visible 269 + on platforms which support the preferred core feature. 273 270 274 271 ``amd_pstate_prefcore_ranking`` 275 272 276 273 The performance ranking of the core. This number doesn't have any unit, but 277 274 larger numbers are preferred at the time of reading. This can change at 278 - runtime based on platform conditions. This attribute is read-only. 275 + runtime based on platform conditions. This attribute is read-only. This file 276 + is only visible on platforms which support the preferred core feature. 277 + 278 + ``amd_pstate_floor_freq`` 279 + 280 + The floor frequency associated with each CPU. Userspace can write any 281 + value between ``cpuinfo_min_freq`` and ``scaling_max_freq`` into this 282 + file. When the system is under power or thermal constraints, the 283 + platform firmware will attempt to throttle the CPU frequency to the 284 + value specified in ``amd_pstate_floor_freq`` before throttling it 285 + further. This allows userspace to specify different floor frequencies 286 + to different CPUs. For optimal results, threads of the same core 287 + should have the same floor frequency value. This file is only visible 288 + on platforms that support the CPPC Performance Priority feature. 289 + 290 + 291 + ``amd_pstate_floor_count`` 292 + 293 + The number of distinct Floor Performance levels supported by the 294 + platform. For example, if this value is 2, then the number of unique 295 + values obtained from the command ``cat 296 + /sys/devices/system/cpu/cpufreq/policy*/amd_pstate_floor_freq | 297 + sort -n | uniq`` should be at most this number for the behavior 298 + described in ``amd_pstate_floor_freq`` to take effect. A zero value 299 + implies that the platform supports unlimited floor performance levels. 300 + This file is only visible on platforms that support the CPPC 301 + Performance Priority feature. 302 + 303 + **Note**: When ``amd_pstate_floor_count`` is non-zero, the frequency to 304 + which the CPU is throttled under power or thermal constraints is 305 + undefined when the number of unique values of ``amd_pstate_floor_freq`` 306 + across all CPUs in the system exceeds ``amd_pstate_floor_count``. 279 307 280 308 ``energy_performance_available_preferences`` 281 309 ··· 316 280 These profiles represent different hints that are provided 317 281 to the low-level firmware about the user's desired energy vs efficiency 318 282 tradeoff. ``default`` represents the epp value is set by platform 319 - firmware. This attribute is read-only. 283 + firmware. ``custom`` designates that integer values 0-255 may be written 284 + as well. This attribute is read-only. 320 285 321 286 ``energy_performance_preference`` 322 287 323 288 The current energy performance preference can be read from this attribute. 324 289 and user can change current preference according to energy or performance needs 325 - Please get all support profiles list from 326 - ``energy_performance_available_preferences`` attribute, all the profiles are 327 - integer values defined between 0 to 255 when EPP feature is enabled by platform 328 - firmware, if EPP feature is disabled, driver will ignore the written value 290 + Coarse named profiles are available in the attribute 291 + ``energy_performance_available_preferences``. 292 + Users can also write individual integer values between 0 to 255. 293 + When dynamic EPP is enabled, writes to energy_performance_preference are blocked 294 + even when EPP feature is enabled by platform firmware. Lower epp values shift the bias 295 + towards improved performance while a higher epp value shifts the bias towards 296 + power-savings. The exact impact can change from one platform to the other. 297 + If a valid integer was last written, then a number will be returned on future reads. 298 + If a valid string was last written then a string will be returned on future reads. 329 299 This attribute is read-write. 330 300 331 301 ``boost`` ··· 353 311 Other performance and frequency values can be read back from 354 312 ``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`. 355 313 314 + Dynamic energy performance profile 315 + ================================== 316 + The amd-pstate driver supports dynamically selecting the energy performance 317 + profile based on whether the machine is running on AC or DC power. 318 + 319 + Whether this behavior is enabled by default depends on the kernel 320 + config option `CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP`. This behavior can also be overridden 321 + at runtime by the sysfs file ``/sys/devices/system/cpu/cpufreq/policyX/dynamic_epp``. 322 + 323 + When set to enabled, the driver will select a different energy performance 324 + profile when the machine is running on battery or AC power. The driver will 325 + also register with the platform profile handler to receive notifications of 326 + user desired power state and react to those. 327 + When set to disabled, the driver will not change the energy performance profile 328 + based on the power source and will not react to user desired power state. 329 + 330 + Attempting to manually write to the ``energy_performance_preference`` sysfs 331 + file will fail when ``dynamic_epp`` is enabled. 356 332 357 333 ``amd-pstate`` vs ``acpi-cpufreq`` 358 334 ====================================== ··· 481 421 For systems that support ``amd-pstate`` preferred core, the core rankings will 482 422 always be advertised by the platform. But OS can choose to ignore that via the 483 423 kernel parameter ``amd_prefcore=disable``. 424 + 425 + ``amd_dynamic_epp`` 426 + 427 + When AMD pstate is in auto mode, dynamic EPP will control whether the kernel 428 + autonomously changes the EPP mode. The default is configured by 429 + ``CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP`` but can be explicitly enabled with 430 + ``amd_dynamic_epp=enable`` or disabled with ``amd_dynamic_epp=disable``. 484 431 485 432 User Space Interface in ``sysfs`` - General 486 433 =========================================== ··· 857 790 =========== 858 791 859 792 .. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming, 860 - https://www.amd.com/system/files/TechDocs/24593.pdf 793 + https://docs.amd.com/v/u/en-US/24593_3.44_APM_Vol2 861 794 862 795 .. [2] Advanced Configuration and Power Interface Specification, 863 796 https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf 864 797 865 798 .. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors 866 - https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip 799 + https://docs.amd.com/v/u/en-US/56569-A1-PUB_3.03 867 800 868 801 .. [4] Linux Kernel Selftests, 869 802 https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html
+1
Documentation/devicetree/bindings/arm/tegra/nvidia,tegra-ccplex-cluster.yaml
··· 24 24 enum: 25 25 - nvidia,tegra186-ccplex-cluster 26 26 - nvidia,tegra234-ccplex-cluster 27 + - nvidia,tegra238-ccplex-cluster 27 28 28 29 reg: 29 30 maxItems: 1
+1
Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml
··· 35 35 - description: v2 of CPUFREQ HW (EPSS) 36 36 items: 37 37 - enum: 38 + - qcom,eliza-cpufreq-epss 38 39 - qcom,milos-cpufreq-epss 39 40 - qcom,qcs8300-cpufreq-epss 40 41 - qcom,qdu1000-cpufreq-epss
+18 -7
MAINTAINERS
··· 1234 1234 1235 1235 AMD PSTATE DRIVER 1236 1236 M: Huang Rui <ray.huang@amd.com> 1237 - M: Gautham R. Shenoy <gautham.shenoy@amd.com> 1238 1237 M: Mario Limonciello <mario.limonciello@amd.com> 1239 1238 R: Perry Yuan <perry.yuan@amd.com> 1239 + R: K Prateek Nayak <kprateek.nayak@amd.com> 1240 1240 L: linux-pm@vger.kernel.org 1241 1241 S: Supported 1242 1242 F: Documentation/admin-guide/pm/amd-pstate.rst ··· 6618 6618 S: Maintained 6619 6619 F: drivers/i2c/busses/i2c-cp2615.c 6620 6620 6621 + CPU FREQUENCY DRIVERS - CPPC CPUFREQ 6622 + M: "Rafael J. Wysocki" <rafael@kernel.org> 6623 + M: Viresh Kumar <viresh.kumar@linaro.org> 6624 + R: Jie Zhan <zhanjie9@hisilicon.com> 6625 + R: Lifeng Zheng <zhenglifeng1@huawei.com> 6626 + R: Pierre Gondois <pierre.gondois@arm.com> 6627 + R: Sumit Gupta <sumitg@nvidia.com> 6628 + L: linux-pm@vger.kernel.org 6629 + S: Maintained 6630 + F: drivers/cpufreq/cppc_cpufreq.c 6631 + 6621 6632 CPU FREQUENCY DRIVERS - VEXPRESS SPC ARM BIG LITTLE 6622 6633 M: Viresh Kumar <viresh.kumar@linaro.org> 6623 6634 M: Sudeep Holla <sudeep.holla@kernel.org> ··· 6636 6625 S: Maintained 6637 6626 W: http://www.arm.com/products/processors/technologies/biglittleprocessing.php 6638 6627 F: drivers/cpufreq/vexpress-spc-cpufreq.c 6628 + 6629 + CPU FREQUENCY DRIVERS - VIRTUAL MACHINE CPUFREQ 6630 + M: Saravana Kannan <saravanak@kernel.org> 6631 + L: linux-pm@vger.kernel.org 6632 + S: Maintained 6633 + F: drivers/cpufreq/virtual-cpufreq.c 6639 6634 6640 6635 CPU FREQUENCY SCALING FRAMEWORK 6641 6636 M: "Rafael J. Wysocki" <rafael@kernel.org> ··· 6661 6644 F: kernel/sched/cpufreq*.c 6662 6645 F: rust/kernel/cpufreq.rs 6663 6646 F: tools/testing/selftests/cpufreq/ 6664 - 6665 - CPU FREQUENCY DRIVERS - VIRTUAL MACHINE CPUFREQ 6666 - M: Saravana Kannan <saravanak@kernel.org> 6667 - L: linux-pm@vger.kernel.org 6668 - S: Maintained 6669 - F: drivers/cpufreq/virtual-cpufreq.c 6670 6647 6671 6648 CPU HOTPLUG 6672 6649 M: Thomas Gleixner <tglx@kernel.org>
+1 -1
arch/x86/include/asm/cpufeatures.h
··· 415 415 */ 416 416 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* "overflow_recov" MCA overflow recovery support */ 417 417 #define X86_FEATURE_SUCCOR (17*32+ 1) /* "succor" Uncorrectable error containment and recovery */ 418 - 418 + #define X86_FEATURE_CPPC_PERF_PRIO (17*32+ 2) /* CPPC Floor Perf support */ 419 419 #define X86_FEATURE_SMCA (17*32+ 3) /* "smca" Scalable MCA */ 420 420 421 421 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
+5
arch/x86/include/asm/msr-index.h
··· 765 765 #define MSR_AMD_CPPC_CAP2 0xc00102b2 766 766 #define MSR_AMD_CPPC_REQ 0xc00102b3 767 767 #define MSR_AMD_CPPC_STATUS 0xc00102b4 768 + #define MSR_AMD_CPPC_REQ2 0xc00102b5 768 769 769 770 /* Masks for use with MSR_AMD_CPPC_CAP1 */ 770 771 #define AMD_CPPC_LOWEST_PERF_MASK GENMASK(7, 0) 771 772 #define AMD_CPPC_LOWNONLIN_PERF_MASK GENMASK(15, 8) 772 773 #define AMD_CPPC_NOMINAL_PERF_MASK GENMASK(23, 16) 773 774 #define AMD_CPPC_HIGHEST_PERF_MASK GENMASK(31, 24) 775 + #define AMD_CPPC_FLOOR_PERF_CNT_MASK GENMASK_ULL(39, 32) 774 776 775 777 /* Masks for use with MSR_AMD_CPPC_REQ */ 776 778 #define AMD_CPPC_MAX_PERF_MASK GENMASK(7, 0) 777 779 #define AMD_CPPC_MIN_PERF_MASK GENMASK(15, 8) 778 780 #define AMD_CPPC_DES_PERF_MASK GENMASK(23, 16) 779 781 #define AMD_CPPC_EPP_PERF_MASK GENMASK(31, 24) 782 + 783 + /* Masks for use with MSR_AMD_CPPC_REQ2 */ 784 + #define AMD_CPPC_FLOOR_PERF_MASK GENMASK(7, 0) 780 785 781 786 /* AMD Performance Counter Global Status and Control MSRs */ 782 787 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
+1
arch/x86/kernel/cpu/scattered.c
··· 52 52 { X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 }, 53 53 { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 }, 54 54 { X86_FEATURE_AMD_FAST_CPPC, CPUID_EDX, 15, 0x80000007, 0 }, 55 + { X86_FEATURE_CPPC_PERF_PRIO, CPUID_EDX, 16, 0x80000007, 0 }, 55 56 { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 }, 56 57 { X86_FEATURE_X2AVIC_EXT, CPUID_ECX, 6, 0x8000000a, 0 }, 57 58 { X86_FEATURE_COHERENCY_SFW_NO, CPUID_EBX, 31, 0x8000001f, 0 },
-2
arch/x86/power/hibernate_64.c
··· 14 14 #include <linux/kdebug.h> 15 15 #include <linux/pgtable.h> 16 16 17 - #include <crypto/hash.h> 18 - 19 17 #include <asm/e820/api.h> 20 18 #include <asm/init.h> 21 19 #include <asm/proto.h>
+2 -1
drivers/acpi/cppc_acpi.c
··· 2145 2145 } 2146 2146 2147 2147 /* Look up the max frequency in DMI */ 2148 - static u64 cppc_get_dmi_max_khz(void) 2148 + u64 cppc_get_dmi_max_khz(void) 2149 2149 { 2150 2150 u16 mhz = 0; 2151 2151 ··· 2159 2159 2160 2160 return KHZ_PER_MHZ * mhz; 2161 2161 } 2162 + EXPORT_SYMBOL_GPL(cppc_get_dmi_max_khz); 2162 2163 2163 2164 /* 2164 2165 * If CPPC lowest_freq and nominal_freq registers are exposed then we can
+2 -3
drivers/cpufreq/Kconfig
··· 163 163 164 164 config CPU_FREQ_GOV_CONSERVATIVE 165 165 tristate "'conservative' cpufreq governor" 166 - depends on CPU_FREQ 167 166 select CPU_FREQ_GOV_COMMON 168 167 help 169 168 'conservative' - this driver is rather similar to the 'ondemand' ··· 187 188 188 189 config CPU_FREQ_GOV_SCHEDUTIL 189 190 bool "'schedutil' cpufreq policy governor" 190 - depends on CPU_FREQ && SMP 191 + depends on SMP 191 192 select CPU_FREQ_GOV_ATTR_SET 192 193 select IRQ_WORK 193 194 help ··· 364 365 365 366 If in doubt, say N. 366 367 367 - endif 368 + endif # CPU_FREQ 368 369 369 370 endmenu
+1 -1
drivers/cpufreq/Kconfig.arm
··· 248 248 249 249 config ARM_TEGRA194_CPUFREQ 250 250 tristate "Tegra194 CPUFreq support" 251 - depends on ARCH_TEGRA_194_SOC || ARCH_TEGRA_234_SOC || (64BIT && COMPILE_TEST) 251 + depends on ARCH_TEGRA_194_SOC || ARCH_TEGRA_234_SOC 252 252 depends on TEGRA_BPMP 253 253 default ARCH_TEGRA_194_SOC || ARCH_TEGRA_234_SOC 254 254 help
+14
drivers/cpufreq/Kconfig.x86
··· 40 40 select ACPI_PROCESSOR 41 41 select ACPI_CPPC_LIB if X86_64 42 42 select CPU_FREQ_GOV_SCHEDUTIL if SMP 43 + select ACPI_PLATFORM_PROFILE 44 + select POWER_SUPPLY 43 45 help 44 46 This driver adds a CPUFreq driver which utilizes a fine grain 45 47 processor performance frequency control range instead of legacy ··· 69 67 70 68 For details, take a look at: 71 69 <file:Documentation/admin-guide/pm/amd-pstate.rst>. 70 + 71 + config X86_AMD_PSTATE_DYNAMIC_EPP 72 + bool "AMD Processor P-State dynamic EPP support" 73 + depends on X86_AMD_PSTATE 74 + default n 75 + help 76 + Allow the kernel to dynamically change the energy performance 77 + value from events like ACPI platform profile and AC adapter plug 78 + events. 79 + 80 + This feature can also be changed at runtime, this configuration 81 + option only sets the kernel default value behavior. 72 82 73 83 config X86_AMD_PSTATE_UT 74 84 tristate "selftest for AMD Processor P-State driver"
+24 -7
drivers/cpufreq/acpi-cpufreq.c
··· 675 675 } 676 676 #endif 677 677 678 + static void acpi_cpufreq_resolve_max_freq(struct cpufreq_policy *policy, 679 + unsigned int pss_max_freq) 680 + { 681 + #ifdef CONFIG_ACPI_CPPC_LIB 682 + u64 max_speed = cppc_get_dmi_max_khz(); 683 + /* 684 + * Use DMI "Max Speed" if it looks plausible: must be 685 + * above _PSS P0 frequency and within 2x of it. 686 + */ 687 + if (max_speed > pss_max_freq && max_speed < pss_max_freq * 2) { 688 + policy->cpuinfo.max_freq = max_speed; 689 + return; 690 + } 691 + #endif 692 + /* 693 + * If the maximum "boost" frequency is unknown, ask the arch 694 + * scale-invariance code to use the "nominal" performance for 695 + * CPU utilization scaling so as to prevent the schedutil 696 + * governor from selecting inadequate CPU frequencies. 697 + */ 698 + arch_set_max_freq_ratio(true); 699 + } 700 + 678 701 static int acpi_cpufreq_cpu_init(struct cpufreq_policy *policy) 679 702 { 680 703 struct cpufreq_frequency_table *freq_table; ··· 872 849 873 850 policy->cpuinfo.max_freq = freq * max_boost_ratio >> SCHED_CAPACITY_SHIFT; 874 851 } else { 875 - /* 876 - * If the maximum "boost" frequency is unknown, ask the arch 877 - * scale-invariance code to use the "nominal" performance for 878 - * CPU utilization scaling so as to prevent the schedutil 879 - * governor from selecting inadequate CPU frequencies. 880 - */ 881 - arch_set_max_freq_ratio(true); 852 + acpi_cpufreq_resolve_max_freq(policy, freq_table[0].frequency); 882 853 } 883 854 884 855 policy->freq_table = freq_table;
+35
drivers/cpufreq/amd-pstate-trace.h
··· 133 133 ) 134 134 ); 135 135 136 + TRACE_EVENT(amd_pstate_cppc_req2, 137 + 138 + TP_PROTO(unsigned int cpu_id, 139 + u8 floor_perf, 140 + bool changed, 141 + int err_code 142 + ), 143 + 144 + TP_ARGS(cpu_id, 145 + floor_perf, 146 + changed, 147 + err_code), 148 + 149 + TP_STRUCT__entry( 150 + __field(unsigned int, cpu_id) 151 + __field(u8, floor_perf) 152 + __field(bool, changed) 153 + __field(int, err_code) 154 + ), 155 + 156 + TP_fast_assign( 157 + __entry->cpu_id = cpu_id; 158 + __entry->floor_perf = floor_perf; 159 + __entry->changed = changed; 160 + __entry->err_code = err_code; 161 + ), 162 + 163 + TP_printk("cpu%u: floor_perf=%u, changed=%u (error = %d)", 164 + __entry->cpu_id, 165 + __entry->floor_perf, 166 + __entry->changed, 167 + __entry->err_code 168 + ) 169 + ); 170 + 136 171 #endif /* _AMD_PSTATE_TRACE_H */ 137 172 138 173 /* This part must be outside protection */
+273 -6
drivers/cpufreq/amd-pstate-ut.c
··· 23 23 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 24 24 25 25 #include <linux/bitfield.h> 26 + #include <linux/cpufeature.h> 27 + #include <linux/cpufreq.h> 26 28 #include <linux/kernel.h> 27 29 #include <linux/module.h> 28 30 #include <linux/moduleparam.h> 31 + #include <linux/mm.h> 29 32 #include <linux/fs.h> 30 33 #include <linux/cleanup.h> 31 34 ··· 38 35 39 36 #include "amd-pstate.h" 40 37 38 + static char *test_list; 39 + module_param(test_list, charp, 0444); 40 + MODULE_PARM_DESC(test_list, 41 + "Comma-delimited list of tests to run (empty means run all tests)"); 42 + DEFINE_FREE(cleanup_page, void *, if (_T) free_page((unsigned long)_T)) 41 43 42 44 struct amd_pstate_ut_struct { 43 45 const char *name; ··· 56 48 static int amd_pstate_ut_check_enabled(u32 index); 57 49 static int amd_pstate_ut_check_perf(u32 index); 58 50 static int amd_pstate_ut_check_freq(u32 index); 51 + static int amd_pstate_ut_epp(u32 index); 59 52 static int amd_pstate_ut_check_driver(u32 index); 53 + static int amd_pstate_ut_check_freq_attrs(u32 index); 60 54 61 55 static struct amd_pstate_ut_struct amd_pstate_ut_cases[] = { 62 - {"amd_pstate_ut_acpi_cpc_valid", amd_pstate_ut_acpi_cpc_valid }, 63 - {"amd_pstate_ut_check_enabled", amd_pstate_ut_check_enabled }, 64 - {"amd_pstate_ut_check_perf", amd_pstate_ut_check_perf }, 65 - {"amd_pstate_ut_check_freq", amd_pstate_ut_check_freq }, 66 - {"amd_pstate_ut_check_driver", amd_pstate_ut_check_driver } 56 + {"amd_pstate_ut_acpi_cpc_valid", amd_pstate_ut_acpi_cpc_valid }, 57 + {"amd_pstate_ut_check_enabled", amd_pstate_ut_check_enabled }, 58 + {"amd_pstate_ut_check_perf", amd_pstate_ut_check_perf }, 59 + {"amd_pstate_ut_check_freq", amd_pstate_ut_check_freq }, 60 + {"amd_pstate_ut_epp", amd_pstate_ut_epp }, 61 + {"amd_pstate_ut_check_driver", amd_pstate_ut_check_driver }, 62 + {"amd_pstate_ut_check_freq_attrs", amd_pstate_ut_check_freq_attrs }, 67 63 }; 64 + 65 + static bool test_in_list(const char *list, const char *name) 66 + { 67 + size_t name_len = strlen(name); 68 + const char *p = list; 69 + 70 + while (*p) { 71 + const char *sep = strchr(p, ','); 72 + size_t token_len = sep ? sep - p : strlen(p); 73 + 74 + if (token_len == name_len && !strncmp(p, name, token_len)) 75 + return true; 76 + if (!sep) 77 + break; 78 + p = sep + 1; 79 + } 80 + 81 + return false; 82 + } 68 83 69 84 static bool get_shared_mem(void) 70 85 { ··· 272 241 return amd_pstate_update_status(mode_str, strlen(mode_str)); 273 242 } 274 243 244 + static int amd_pstate_ut_epp(u32 index) 245 + { 246 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = NULL; 247 + char *buf __free(cleanup_page) = NULL; 248 + static const char * const epp_strings[] = { 249 + "performance", 250 + "balance_performance", 251 + "balance_power", 252 + "power", 253 + }; 254 + struct amd_cpudata *cpudata; 255 + enum amd_pstate_mode orig_mode; 256 + bool orig_dynamic_epp; 257 + int ret, cpu = 0; 258 + int i; 259 + u16 epp; 260 + 261 + policy = cpufreq_cpu_get(cpu); 262 + if (!policy) 263 + return -ENODEV; 264 + 265 + cpudata = policy->driver_data; 266 + orig_mode = amd_pstate_get_status(); 267 + orig_dynamic_epp = cpudata->dynamic_epp; 268 + 269 + /* disable dynamic EPP before running test */ 270 + if (cpudata->dynamic_epp) { 271 + pr_debug("Dynamic EPP is enabled, disabling it\n"); 272 + amd_pstate_clear_dynamic_epp(policy); 273 + } 274 + 275 + buf = (char *)__get_free_page(GFP_KERNEL); 276 + if (!buf) 277 + return -ENOMEM; 278 + 279 + ret = amd_pstate_set_mode(AMD_PSTATE_ACTIVE); 280 + if (ret) 281 + goto out; 282 + 283 + for (epp = 0; epp <= U8_MAX; epp++) { 284 + u8 val; 285 + 286 + /* write all EPP values */ 287 + memset(buf, 0, PAGE_SIZE); 288 + snprintf(buf, PAGE_SIZE, "%d", epp); 289 + ret = store_energy_performance_preference(policy, buf, strlen(buf)); 290 + if (ret < 0) 291 + goto out; 292 + 293 + /* check if the EPP value reads back correctly for raw numbers */ 294 + memset(buf, 0, PAGE_SIZE); 295 + ret = show_energy_performance_preference(policy, buf); 296 + if (ret < 0) 297 + goto out; 298 + strreplace(buf, '\n', '\0'); 299 + ret = kstrtou8(buf, 0, &val); 300 + if (!ret && epp != val) { 301 + pr_err("Raw EPP value mismatch: %d != %d\n", epp, val); 302 + ret = -EINVAL; 303 + goto out; 304 + } 305 + } 306 + 307 + for (i = 0; i < ARRAY_SIZE(epp_strings); i++) { 308 + memset(buf, 0, PAGE_SIZE); 309 + snprintf(buf, PAGE_SIZE, "%s", epp_strings[i]); 310 + ret = store_energy_performance_preference(policy, buf, strlen(buf)); 311 + if (ret < 0) 312 + goto out; 313 + 314 + memset(buf, 0, PAGE_SIZE); 315 + ret = show_energy_performance_preference(policy, buf); 316 + if (ret < 0) 317 + goto out; 318 + strreplace(buf, '\n', '\0'); 319 + 320 + if (strcmp(buf, epp_strings[i])) { 321 + pr_err("String EPP value mismatch: %s != %s\n", buf, epp_strings[i]); 322 + ret = -EINVAL; 323 + goto out; 324 + } 325 + } 326 + 327 + ret = 0; 328 + 329 + out: 330 + if (orig_dynamic_epp) { 331 + int ret2; 332 + 333 + ret2 = amd_pstate_set_mode(AMD_PSTATE_DISABLE); 334 + if (!ret && ret2) 335 + ret = ret2; 336 + } 337 + 338 + if (orig_mode != amd_pstate_get_status()) { 339 + int ret2; 340 + 341 + ret2 = amd_pstate_set_mode(orig_mode); 342 + if (!ret && ret2) 343 + ret = ret2; 344 + } 345 + 346 + return ret; 347 + } 348 + 275 349 static int amd_pstate_ut_check_driver(u32 index) 276 350 { 277 351 enum amd_pstate_mode mode1, mode2 = AMD_PSTATE_DISABLE; ··· 406 270 return ret; 407 271 } 408 272 273 + enum attr_category { 274 + ATTR_ALWAYS, 275 + ATTR_PREFCORE, 276 + ATTR_EPP, 277 + ATTR_FLOOR_FREQ, 278 + }; 279 + 280 + static const struct { 281 + const char *name; 282 + enum attr_category category; 283 + } expected_freq_attrs[] = { 284 + {"amd_pstate_max_freq", ATTR_ALWAYS}, 285 + {"amd_pstate_lowest_nonlinear_freq", ATTR_ALWAYS}, 286 + {"amd_pstate_highest_perf", ATTR_ALWAYS}, 287 + {"amd_pstate_prefcore_ranking", ATTR_PREFCORE}, 288 + {"amd_pstate_hw_prefcore", ATTR_PREFCORE}, 289 + {"energy_performance_preference", ATTR_EPP}, 290 + {"energy_performance_available_preferences", ATTR_EPP}, 291 + {"amd_pstate_floor_freq", ATTR_FLOOR_FREQ}, 292 + {"amd_pstate_floor_count", ATTR_FLOOR_FREQ}, 293 + }; 294 + 295 + static bool attr_in_driver(struct freq_attr **driver_attrs, const char *name) 296 + { 297 + int j; 298 + 299 + for (j = 0; driver_attrs[j]; j++) { 300 + if (!strcmp(driver_attrs[j]->attr.name, name)) 301 + return true; 302 + } 303 + return false; 304 + } 305 + 306 + /* 307 + * Verify that for each mode the driver's live ->attr array contains exactly 308 + * the attributes that should be visible. Expected visibility is derived 309 + * independently from hw_prefcore, cpu features, and the current mode — 310 + * not from the driver's own visibility functions. 311 + */ 312 + static int amd_pstate_ut_check_freq_attrs(u32 index) 313 + { 314 + enum amd_pstate_mode orig_mode = amd_pstate_get_status(); 315 + static const enum amd_pstate_mode modes[] = { 316 + AMD_PSTATE_PASSIVE, AMD_PSTATE_ACTIVE, AMD_PSTATE_GUIDED, 317 + }; 318 + bool has_prefcore, has_floor_freq; 319 + int m, i, ret; 320 + 321 + has_floor_freq = cpu_feature_enabled(X86_FEATURE_CPPC_PERF_PRIO); 322 + 323 + /* 324 + * Determine prefcore support from any online CPU's cpudata. 325 + * hw_prefcore reflects the platform-wide decision made at init. 326 + */ 327 + has_prefcore = false; 328 + for_each_online_cpu(i) { 329 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = NULL; 330 + struct amd_cpudata *cpudata; 331 + 332 + policy = cpufreq_cpu_get(i); 333 + if (!policy) 334 + continue; 335 + cpudata = policy->driver_data; 336 + has_prefcore = cpudata->hw_prefcore; 337 + break; 338 + } 339 + 340 + for (m = 0; m < ARRAY_SIZE(modes); m++) { 341 + struct freq_attr **driver_attrs; 342 + 343 + ret = amd_pstate_set_mode(modes[m]); 344 + if (ret) 345 + goto out; 346 + 347 + driver_attrs = amd_pstate_get_current_attrs(); 348 + if (!driver_attrs) { 349 + pr_err("%s: no driver attrs in mode %s\n", 350 + __func__, amd_pstate_get_mode_string(modes[m])); 351 + ret = -EINVAL; 352 + goto out; 353 + } 354 + 355 + for (i = 0; i < ARRAY_SIZE(expected_freq_attrs); i++) { 356 + bool expected, found; 357 + 358 + switch (expected_freq_attrs[i].category) { 359 + case ATTR_ALWAYS: 360 + expected = true; 361 + break; 362 + case ATTR_PREFCORE: 363 + expected = has_prefcore; 364 + break; 365 + case ATTR_EPP: 366 + expected = (modes[m] == AMD_PSTATE_ACTIVE); 367 + break; 368 + case ATTR_FLOOR_FREQ: 369 + expected = has_floor_freq; 370 + break; 371 + default: 372 + expected = false; 373 + break; 374 + } 375 + 376 + found = attr_in_driver(driver_attrs, 377 + expected_freq_attrs[i].name); 378 + 379 + if (expected != found) { 380 + pr_err("%s: mode %s: attr %s expected %s but is %s\n", 381 + __func__, 382 + amd_pstate_get_mode_string(modes[m]), 383 + expected_freq_attrs[i].name, 384 + expected ? "visible" : "hidden", 385 + found ? "visible" : "hidden"); 386 + ret = -EINVAL; 387 + goto out; 388 + } 389 + } 390 + } 391 + 392 + ret = 0; 393 + out: 394 + amd_pstate_set_mode(orig_mode); 395 + return ret; 396 + } 397 + 409 398 static int __init amd_pstate_ut_init(void) 410 399 { 411 400 u32 i = 0, arr_size = ARRAY_SIZE(amd_pstate_ut_cases); 412 401 413 402 for (i = 0; i < arr_size; i++) { 414 - int ret = amd_pstate_ut_cases[i].func(i); 403 + int ret; 404 + 405 + if (test_list && *test_list && 406 + !test_in_list(test_list, amd_pstate_ut_cases[i].name)) 407 + continue; 408 + 409 + ret = amd_pstate_ut_cases[i].func(i); 415 410 416 411 if (ret) 417 412 pr_err("%-4d %-20s\t fail: %d!\n", i+1, amd_pstate_ut_cases[i].name, ret);
+543 -84
drivers/cpufreq/amd-pstate.c
··· 36 36 #include <linux/io.h> 37 37 #include <linux/delay.h> 38 38 #include <linux/uaccess.h> 39 + #include <linux/power_supply.h> 39 40 #include <linux/static_call.h> 40 41 #include <linux/topology.h> 41 42 ··· 87 86 static struct cpufreq_driver amd_pstate_epp_driver; 88 87 static int cppc_state = AMD_PSTATE_UNDEFINED; 89 88 static bool amd_pstate_prefcore = true; 89 + #ifdef CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP 90 + static bool dynamic_epp = CONFIG_X86_AMD_PSTATE_DYNAMIC_EPP; 91 + #else 92 + static bool dynamic_epp; 93 + #endif 90 94 static struct quirk_entry *quirks; 91 95 92 96 /* ··· 109 103 * 2 balance_performance 110 104 * 3 balance_power 111 105 * 4 power 106 + * 5 custom (for raw EPP values) 112 107 */ 113 108 enum energy_perf_value_index { 114 109 EPP_INDEX_DEFAULT = 0, ··· 117 110 EPP_INDEX_BALANCE_PERFORMANCE, 118 111 EPP_INDEX_BALANCE_POWERSAVE, 119 112 EPP_INDEX_POWERSAVE, 113 + EPP_INDEX_CUSTOM, 120 114 EPP_INDEX_MAX, 121 115 }; 122 116 ··· 127 119 [EPP_INDEX_BALANCE_PERFORMANCE] = "balance_performance", 128 120 [EPP_INDEX_BALANCE_POWERSAVE] = "balance_power", 129 121 [EPP_INDEX_POWERSAVE] = "power", 122 + [EPP_INDEX_CUSTOM] = "custom", 130 123 }; 131 124 static_assert(ARRAY_SIZE(energy_perf_strings) == EPP_INDEX_MAX); 132 125 ··· 138 129 [EPP_INDEX_BALANCE_POWERSAVE] = AMD_CPPC_EPP_BALANCE_POWERSAVE, 139 130 [EPP_INDEX_POWERSAVE] = AMD_CPPC_EPP_POWERSAVE, 140 131 }; 141 - static_assert(ARRAY_SIZE(epp_values) == EPP_INDEX_MAX); 132 + static_assert(ARRAY_SIZE(epp_values) == EPP_INDEX_MAX - 1); 142 133 143 134 typedef int (*cppc_mode_transition_fn)(int); 144 135 ··· 270 261 271 262 if (fast_switch) { 272 263 wrmsrq(MSR_AMD_CPPC_REQ, value); 273 - return 0; 274 264 } else { 275 265 int ret = wrmsrq_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); 276 266 ··· 336 328 static inline int amd_pstate_set_epp(struct cpufreq_policy *policy, u8 epp) 337 329 { 338 330 return static_call(amd_pstate_set_epp)(policy, epp); 331 + } 332 + 333 + static int amd_pstate_set_floor_perf(struct cpufreq_policy *policy, u8 perf) 334 + { 335 + struct amd_cpudata *cpudata = policy->driver_data; 336 + u64 value, prev; 337 + bool changed; 338 + int ret; 339 + 340 + if (!cpu_feature_enabled(X86_FEATURE_CPPC_PERF_PRIO)) 341 + return 0; 342 + 343 + value = prev = READ_ONCE(cpudata->cppc_req2_cached); 344 + FIELD_MODIFY(AMD_CPPC_FLOOR_PERF_MASK, &value, perf); 345 + 346 + changed = value != prev; 347 + if (!changed) { 348 + ret = 0; 349 + goto out_trace; 350 + } 351 + 352 + ret = wrmsrq_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ2, value); 353 + if (ret) { 354 + changed = false; 355 + pr_err("failed to set CPPC REQ2 value. Error (%d)\n", ret); 356 + goto out_trace; 357 + } 358 + 359 + WRITE_ONCE(cpudata->cppc_req2_cached, value); 360 + 361 + out_trace: 362 + if (trace_amd_pstate_cppc_req2_enabled()) 363 + trace_amd_pstate_cppc_req2(cpudata->cpu, perf, changed, ret); 364 + return ret; 365 + } 366 + 367 + static int amd_pstate_init_floor_perf(struct cpufreq_policy *policy) 368 + { 369 + struct amd_cpudata *cpudata = policy->driver_data; 370 + u8 floor_perf; 371 + u64 value; 372 + int ret; 373 + 374 + if (!cpu_feature_enabled(X86_FEATURE_CPPC_PERF_PRIO)) 375 + return 0; 376 + 377 + ret = rdmsrq_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ2, &value); 378 + if (ret) { 379 + pr_err("failed to read CPPC REQ2 value. Error (%d)\n", ret); 380 + return ret; 381 + } 382 + 383 + WRITE_ONCE(cpudata->cppc_req2_cached, value); 384 + floor_perf = FIELD_GET(AMD_CPPC_FLOOR_PERF_MASK, 385 + cpudata->cppc_req2_cached); 386 + 387 + /* Set a sane value for floor_perf if the default value is invalid */ 388 + if (floor_perf < cpudata->perf.lowest_perf) { 389 + floor_perf = cpudata->perf.nominal_perf; 390 + ret = amd_pstate_set_floor_perf(policy, floor_perf); 391 + if (ret) 392 + return ret; 393 + } 394 + 395 + 396 + cpudata->bios_floor_perf = floor_perf; 397 + cpudata->floor_freq = perf_to_freq(cpudata->perf, cpudata->nominal_freq, 398 + floor_perf); 399 + return 0; 339 400 } 340 401 341 402 static int shmem_set_epp(struct cpufreq_policy *policy, u8 epp) ··· 504 427 perf.lowest_perf = FIELD_GET(AMD_CPPC_LOWEST_PERF_MASK, cap1); 505 428 WRITE_ONCE(cpudata->perf, perf); 506 429 WRITE_ONCE(cpudata->prefcore_ranking, FIELD_GET(AMD_CPPC_HIGHEST_PERF_MASK, cap1)); 430 + WRITE_ONCE(cpudata->floor_perf_cnt, FIELD_GET(AMD_CPPC_FLOOR_PERF_CNT_MASK, cap1)); 507 431 508 432 return 0; 509 433 } ··· 643 565 return true; 644 566 } 645 567 646 - static void amd_pstate_update(struct amd_cpudata *cpudata, u8 min_perf, 568 + static void amd_pstate_update(struct cpufreq_policy *policy, u8 min_perf, 647 569 u8 des_perf, u8 max_perf, bool fast_switch, int gov_flags) 648 570 { 649 - struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu); 571 + struct amd_cpudata *cpudata = policy->driver_data; 650 572 union perf_cached perf = READ_ONCE(cpudata->perf); 651 - 652 - if (!policy) 653 - return; 654 573 655 574 /* limit the max perf when core performance boost feature is disabled */ 656 575 if (!cpudata->boost_supported) ··· 763 688 if (!fast_switch) 764 689 cpufreq_freq_transition_begin(policy, &freqs); 765 690 766 - amd_pstate_update(cpudata, perf.min_limit_perf, des_perf, 691 + amd_pstate_update(policy, perf.min_limit_perf, des_perf, 767 692 perf.max_limit_perf, fast_switch, 768 693 policy->governor->flags); 769 694 ··· 788 713 return policy->cur; 789 714 } 790 715 791 - static void amd_pstate_adjust_perf(unsigned int cpu, 716 + static void amd_pstate_adjust_perf(struct cpufreq_policy *policy, 792 717 unsigned long _min_perf, 793 718 unsigned long target_perf, 794 719 unsigned long capacity) 795 720 { 796 721 u8 max_perf, min_perf, des_perf, cap_perf; 797 - struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpu); 798 722 struct amd_cpudata *cpudata; 799 723 union perf_cached perf; 800 724 ··· 824 750 if (max_perf < min_perf) 825 751 max_perf = min_perf; 826 752 827 - amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true, 753 + amd_pstate_update(policy, min_perf, des_perf, max_perf, true, 828 754 policy->governor->flags); 829 755 } 830 756 831 757 static int amd_pstate_cpu_boost_update(struct cpufreq_policy *policy, bool on) 832 758 { 833 759 struct amd_cpudata *cpudata = policy->driver_data; 834 - union perf_cached perf = READ_ONCE(cpudata->perf); 835 - u32 nominal_freq, max_freq; 760 + u32 nominal_freq; 836 761 int ret = 0; 837 762 838 763 nominal_freq = READ_ONCE(cpudata->nominal_freq); 839 - max_freq = perf_to_freq(perf, cpudata->nominal_freq, perf.highest_perf); 840 764 841 765 if (on) 842 - policy->cpuinfo.max_freq = max_freq; 766 + policy->cpuinfo.max_freq = cpudata->max_freq; 843 767 else if (policy->cpuinfo.max_freq > nominal_freq) 844 768 policy->cpuinfo.max_freq = nominal_freq; 845 - 846 - policy->max = policy->cpuinfo.max_freq; 847 769 848 770 if (cppc_state == AMD_PSTATE_PASSIVE) { 849 771 ret = freq_qos_update_request(&cpudata->req[1], policy->cpuinfo.max_freq); ··· 1022 952 1023 953 WRITE_ONCE(cpudata->nominal_freq, nominal_freq); 1024 954 955 + /* max_freq is calculated according to (nominal_freq * highest_perf)/nominal_perf */ 1025 956 max_freq = perf_to_freq(perf, nominal_freq, perf.highest_perf); 957 + WRITE_ONCE(cpudata->max_freq, max_freq); 958 + 1026 959 lowest_nonlinear_freq = perf_to_freq(perf, nominal_freq, perf.lowest_nonlinear_perf); 1027 960 WRITE_ONCE(cpudata->lowest_nonlinear_freq, lowest_nonlinear_freq); 1028 961 1029 962 /** 1030 963 * Below values need to be initialized correctly, otherwise driver will fail to load 1031 - * max_freq is calculated according to (nominal_freq * highest_perf)/nominal_perf 1032 964 * lowest_nonlinear_freq is a value between [min_freq, nominal_freq] 1033 965 * Check _CPC in ACPI table objects if any values are incorrect 1034 966 */ ··· 1093 1021 policy->cpuinfo.min_freq = policy->min = perf_to_freq(perf, 1094 1022 cpudata->nominal_freq, 1095 1023 perf.lowest_perf); 1096 - policy->cpuinfo.max_freq = policy->max = perf_to_freq(perf, 1097 - cpudata->nominal_freq, 1098 - perf.highest_perf); 1024 + policy->cpuinfo.max_freq = policy->max = cpudata->max_freq; 1099 1025 1026 + policy->driver_data = cpudata; 1100 1027 ret = amd_pstate_cppc_enable(policy); 1101 1028 if (ret) 1102 1029 goto free_cpudata1; ··· 1107 1036 1108 1037 if (cpu_feature_enabled(X86_FEATURE_CPPC)) 1109 1038 policy->fast_switch_possible = true; 1039 + 1040 + ret = amd_pstate_init_floor_perf(policy); 1041 + if (ret) { 1042 + dev_err(dev, "Failed to initialize Floor Perf (%d)\n", ret); 1043 + goto free_cpudata1; 1044 + } 1110 1045 1111 1046 ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0], 1112 1047 FREQ_QOS_MIN, FREQ_QOS_MIN_DEFAULT_VALUE); ··· 1128 1051 goto free_cpudata2; 1129 1052 } 1130 1053 1131 - policy->driver_data = cpudata; 1132 1054 1133 1055 if (!current_pstate_driver->adjust_perf) 1134 1056 current_pstate_driver->adjust_perf = amd_pstate_adjust_perf; ··· 1139 1063 free_cpudata1: 1140 1064 pr_warn("Failed to initialize CPU %d: %d\n", policy->cpu, ret); 1141 1065 kfree(cpudata); 1066 + policy->driver_data = NULL; 1142 1067 return ret; 1143 1068 } 1144 1069 ··· 1150 1073 1151 1074 /* Reset CPPC_REQ MSR to the BIOS value */ 1152 1075 amd_pstate_update_perf(policy, perf.bios_min_perf, 0U, 0U, 0U, false); 1076 + amd_pstate_set_floor_perf(policy, cpudata->bios_floor_perf); 1153 1077 1154 1078 freq_qos_remove_request(&cpudata->req[1]); 1155 1079 freq_qos_remove_request(&cpudata->req[0]); 1156 1080 policy->fast_switch_possible = false; 1157 1081 kfree(cpudata); 1082 + } 1083 + 1084 + static int amd_pstate_get_balanced_epp(struct cpufreq_policy *policy) 1085 + { 1086 + struct amd_cpudata *cpudata = policy->driver_data; 1087 + 1088 + if (power_supply_is_system_supplied()) 1089 + return cpudata->epp_default_ac; 1090 + else 1091 + return cpudata->epp_default_dc; 1092 + } 1093 + 1094 + static int amd_pstate_power_supply_notifier(struct notifier_block *nb, 1095 + unsigned long event, void *data) 1096 + { 1097 + struct amd_cpudata *cpudata = container_of(nb, struct amd_cpudata, power_nb); 1098 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu); 1099 + u8 epp; 1100 + int ret; 1101 + 1102 + if (event != PSY_EVENT_PROP_CHANGED) 1103 + return NOTIFY_OK; 1104 + 1105 + /* dynamic actions are only applied while platform profile is in balanced */ 1106 + if (cpudata->current_profile != PLATFORM_PROFILE_BALANCED) 1107 + return 0; 1108 + 1109 + epp = amd_pstate_get_balanced_epp(policy); 1110 + 1111 + ret = amd_pstate_set_epp(policy, epp); 1112 + if (ret) 1113 + pr_warn("Failed to set CPU %d EPP %u: %d\n", cpudata->cpu, epp, ret); 1114 + 1115 + return NOTIFY_OK; 1116 + } 1117 + 1118 + static int amd_pstate_profile_probe(void *drvdata, unsigned long *choices) 1119 + { 1120 + set_bit(PLATFORM_PROFILE_LOW_POWER, choices); 1121 + set_bit(PLATFORM_PROFILE_BALANCED, choices); 1122 + set_bit(PLATFORM_PROFILE_PERFORMANCE, choices); 1123 + 1124 + return 0; 1125 + } 1126 + 1127 + static int amd_pstate_profile_get(struct device *dev, 1128 + enum platform_profile_option *profile) 1129 + { 1130 + struct amd_cpudata *cpudata = dev_get_drvdata(dev); 1131 + 1132 + *profile = cpudata->current_profile; 1133 + 1134 + return 0; 1135 + } 1136 + 1137 + static int amd_pstate_profile_set(struct device *dev, 1138 + enum platform_profile_option profile) 1139 + { 1140 + struct amd_cpudata *cpudata = dev_get_drvdata(dev); 1141 + struct cpufreq_policy *policy __free(put_cpufreq_policy) = cpufreq_cpu_get(cpudata->cpu); 1142 + int ret; 1143 + 1144 + switch (profile) { 1145 + case PLATFORM_PROFILE_LOW_POWER: 1146 + ret = amd_pstate_set_epp(policy, AMD_CPPC_EPP_POWERSAVE); 1147 + if (ret) 1148 + return ret; 1149 + break; 1150 + case PLATFORM_PROFILE_BALANCED: 1151 + ret = amd_pstate_set_epp(policy, 1152 + amd_pstate_get_balanced_epp(policy)); 1153 + if (ret) 1154 + return ret; 1155 + break; 1156 + case PLATFORM_PROFILE_PERFORMANCE: 1157 + ret = amd_pstate_set_epp(policy, AMD_CPPC_EPP_PERFORMANCE); 1158 + if (ret) 1159 + return ret; 1160 + break; 1161 + default: 1162 + pr_err("Unknown Platform Profile %d\n", profile); 1163 + return -EOPNOTSUPP; 1164 + } 1165 + 1166 + cpudata->current_profile = profile; 1167 + 1168 + return 0; 1169 + } 1170 + 1171 + static const struct platform_profile_ops amd_pstate_profile_ops = { 1172 + .probe = amd_pstate_profile_probe, 1173 + .profile_set = amd_pstate_profile_set, 1174 + .profile_get = amd_pstate_profile_get, 1175 + }; 1176 + 1177 + void amd_pstate_clear_dynamic_epp(struct cpufreq_policy *policy) 1178 + { 1179 + struct amd_cpudata *cpudata = policy->driver_data; 1180 + 1181 + if (cpudata->power_nb.notifier_call) 1182 + power_supply_unreg_notifier(&cpudata->power_nb); 1183 + if (cpudata->ppdev) { 1184 + platform_profile_remove(cpudata->ppdev); 1185 + cpudata->ppdev = NULL; 1186 + } 1187 + kfree(cpudata->profile_name); 1188 + cpudata->dynamic_epp = false; 1189 + } 1190 + EXPORT_SYMBOL_GPL(amd_pstate_clear_dynamic_epp); 1191 + 1192 + static int amd_pstate_set_dynamic_epp(struct cpufreq_policy *policy) 1193 + { 1194 + struct amd_cpudata *cpudata = policy->driver_data; 1195 + int ret; 1196 + u8 epp; 1197 + 1198 + switch (cpudata->current_profile) { 1199 + case PLATFORM_PROFILE_PERFORMANCE: 1200 + epp = AMD_CPPC_EPP_PERFORMANCE; 1201 + break; 1202 + case PLATFORM_PROFILE_LOW_POWER: 1203 + epp = AMD_CPPC_EPP_POWERSAVE; 1204 + break; 1205 + case PLATFORM_PROFILE_BALANCED: 1206 + epp = amd_pstate_get_balanced_epp(policy); 1207 + break; 1208 + default: 1209 + pr_err("Unknown Platform Profile %d\n", cpudata->current_profile); 1210 + return -EOPNOTSUPP; 1211 + } 1212 + ret = amd_pstate_set_epp(policy, epp); 1213 + if (ret) 1214 + return ret; 1215 + 1216 + cpudata->profile_name = kasprintf(GFP_KERNEL, "amd-pstate-epp-cpu%d", cpudata->cpu); 1217 + 1218 + cpudata->ppdev = platform_profile_register(get_cpu_device(policy->cpu), 1219 + cpudata->profile_name, 1220 + policy->driver_data, 1221 + &amd_pstate_profile_ops); 1222 + if (IS_ERR(cpudata->ppdev)) { 1223 + ret = PTR_ERR(cpudata->ppdev); 1224 + goto cleanup; 1225 + } 1226 + 1227 + /* only enable notifier if things will actually change */ 1228 + if (cpudata->epp_default_ac != cpudata->epp_default_dc) { 1229 + cpudata->power_nb.notifier_call = amd_pstate_power_supply_notifier; 1230 + ret = power_supply_reg_notifier(&cpudata->power_nb); 1231 + if (ret) 1232 + goto cleanup; 1233 + } 1234 + 1235 + cpudata->dynamic_epp = true; 1236 + 1237 + return 0; 1238 + 1239 + cleanup: 1240 + amd_pstate_clear_dynamic_epp(policy); 1241 + 1242 + return ret; 1158 1243 } 1159 1244 1160 1245 /* Sysfs attributes */ ··· 1329 1090 static ssize_t show_amd_pstate_max_freq(struct cpufreq_policy *policy, 1330 1091 char *buf) 1331 1092 { 1332 - struct amd_cpudata *cpudata; 1333 - union perf_cached perf; 1093 + struct amd_cpudata *cpudata = policy->driver_data; 1334 1094 1335 - cpudata = policy->driver_data; 1336 - perf = READ_ONCE(cpudata->perf); 1337 - 1338 - return sysfs_emit(buf, "%u\n", 1339 - perf_to_freq(perf, cpudata->nominal_freq, perf.highest_perf)); 1095 + return sysfs_emit(buf, "%u\n", cpudata->max_freq); 1340 1096 } 1341 1097 1342 1098 static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *policy, ··· 1401 1167 return offset; 1402 1168 } 1403 1169 1404 - static ssize_t store_energy_performance_preference( 1405 - struct cpufreq_policy *policy, const char *buf, size_t count) 1170 + ssize_t store_energy_performance_preference(struct cpufreq_policy *policy, 1171 + const char *buf, size_t count) 1406 1172 { 1407 1173 struct amd_cpudata *cpudata = policy->driver_data; 1408 1174 ssize_t ret; 1175 + bool raw_epp = false; 1409 1176 u8 epp; 1410 1177 1411 - ret = sysfs_match_string(energy_perf_strings, buf); 1412 - if (ret < 0) 1413 - return -EINVAL; 1178 + if (cpudata->dynamic_epp) { 1179 + pr_debug("EPP cannot be set when dynamic EPP is enabled\n"); 1180 + return -EBUSY; 1181 + } 1414 1182 1415 - if (!ret) 1416 - epp = cpudata->epp_default; 1417 - else 1418 - epp = epp_values[ret]; 1183 + /* 1184 + * if the value matches a number, use that, otherwise see if 1185 + * matches an index in the energy_perf_strings array 1186 + */ 1187 + ret = kstrtou8(buf, 0, &epp); 1188 + raw_epp = !ret; 1189 + if (ret) { 1190 + ret = sysfs_match_string(energy_perf_strings, buf); 1191 + if (ret < 0 || ret == EPP_INDEX_CUSTOM) 1192 + return -EINVAL; 1193 + if (ret) 1194 + epp = epp_values[ret]; 1195 + else 1196 + epp = amd_pstate_get_balanced_epp(policy); 1197 + } 1419 1198 1420 - if (epp > 0 && policy->policy == CPUFREQ_POLICY_PERFORMANCE) { 1199 + if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE) { 1421 1200 pr_debug("EPP cannot be set under performance policy\n"); 1422 1201 return -EBUSY; 1423 1202 } 1424 1203 1425 1204 ret = amd_pstate_set_epp(policy, epp); 1205 + if (ret) 1206 + return ret; 1426 1207 1427 - return ret ? ret : count; 1208 + cpudata->raw_epp = raw_epp; 1209 + 1210 + return count; 1428 1211 } 1212 + EXPORT_SYMBOL_GPL(store_energy_performance_preference); 1429 1213 1430 - static ssize_t show_energy_performance_preference( 1431 - struct cpufreq_policy *policy, char *buf) 1214 + ssize_t show_energy_performance_preference(struct cpufreq_policy *policy, char *buf) 1432 1215 { 1433 1216 struct amd_cpudata *cpudata = policy->driver_data; 1434 1217 u8 preference, epp; 1435 1218 1436 1219 epp = FIELD_GET(AMD_CPPC_EPP_PERF_MASK, cpudata->cppc_req_cached); 1220 + 1221 + if (cpudata->raw_epp) 1222 + return sysfs_emit(buf, "%u\n", epp); 1437 1223 1438 1224 switch (epp) { 1439 1225 case AMD_CPPC_EPP_PERFORMANCE: ··· 1474 1220 1475 1221 return sysfs_emit(buf, "%s\n", energy_perf_strings[preference]); 1476 1222 } 1223 + EXPORT_SYMBOL_GPL(show_energy_performance_preference); 1224 + 1225 + static ssize_t store_amd_pstate_floor_freq(struct cpufreq_policy *policy, 1226 + const char *buf, size_t count) 1227 + { 1228 + struct amd_cpudata *cpudata = policy->driver_data; 1229 + union perf_cached perf = READ_ONCE(cpudata->perf); 1230 + unsigned int freq; 1231 + u8 floor_perf; 1232 + int ret; 1233 + 1234 + ret = kstrtouint(buf, 0, &freq); 1235 + if (ret) 1236 + return ret; 1237 + 1238 + if (freq < policy->cpuinfo.min_freq || freq > policy->max) 1239 + return -EINVAL; 1240 + 1241 + floor_perf = freq_to_perf(perf, cpudata->nominal_freq, freq); 1242 + ret = amd_pstate_set_floor_perf(policy, floor_perf); 1243 + 1244 + if (!ret) 1245 + cpudata->floor_freq = freq; 1246 + 1247 + return ret ?: count; 1248 + } 1249 + 1250 + static ssize_t show_amd_pstate_floor_freq(struct cpufreq_policy *policy, char *buf) 1251 + { 1252 + struct amd_cpudata *cpudata = policy->driver_data; 1253 + 1254 + return sysfs_emit(buf, "%u\n", cpudata->floor_freq); 1255 + } 1256 + 1257 + static ssize_t show_amd_pstate_floor_count(struct cpufreq_policy *policy, char *buf) 1258 + { 1259 + struct amd_cpudata *cpudata = policy->driver_data; 1260 + u8 count = cpudata->floor_perf_cnt; 1261 + 1262 + return sysfs_emit(buf, "%u\n", count); 1263 + } 1264 + 1265 + cpufreq_freq_attr_ro(amd_pstate_max_freq); 1266 + cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); 1267 + 1268 + cpufreq_freq_attr_ro(amd_pstate_highest_perf); 1269 + cpufreq_freq_attr_ro(amd_pstate_prefcore_ranking); 1270 + cpufreq_freq_attr_ro(amd_pstate_hw_prefcore); 1271 + cpufreq_freq_attr_rw(energy_performance_preference); 1272 + cpufreq_freq_attr_ro(energy_performance_available_preferences); 1273 + cpufreq_freq_attr_rw(amd_pstate_floor_freq); 1274 + cpufreq_freq_attr_ro(amd_pstate_floor_count); 1275 + 1276 + struct freq_attr_visibility { 1277 + struct freq_attr *attr; 1278 + bool (*visibility_fn)(void); 1279 + }; 1280 + 1281 + /* For attributes which are always visible */ 1282 + static bool always_visible(void) 1283 + { 1284 + return true; 1285 + } 1286 + 1287 + /* Determines whether prefcore related attributes should be visible */ 1288 + static bool prefcore_visibility(void) 1289 + { 1290 + return amd_pstate_prefcore; 1291 + } 1292 + 1293 + /* Determines whether energy performance preference should be visible */ 1294 + static bool epp_visibility(void) 1295 + { 1296 + return cppc_state == AMD_PSTATE_ACTIVE; 1297 + } 1298 + 1299 + /* Determines whether amd_pstate_floor_freq related attributes should be visible */ 1300 + static bool floor_freq_visibility(void) 1301 + { 1302 + return cpu_feature_enabled(X86_FEATURE_CPPC_PERF_PRIO); 1303 + } 1304 + 1305 + static struct freq_attr_visibility amd_pstate_attr_visibility[] = { 1306 + {&amd_pstate_max_freq, always_visible}, 1307 + {&amd_pstate_lowest_nonlinear_freq, always_visible}, 1308 + {&amd_pstate_highest_perf, always_visible}, 1309 + {&amd_pstate_prefcore_ranking, prefcore_visibility}, 1310 + {&amd_pstate_hw_prefcore, prefcore_visibility}, 1311 + {&energy_performance_preference, epp_visibility}, 1312 + {&energy_performance_available_preferences, epp_visibility}, 1313 + {&amd_pstate_floor_freq, floor_freq_visibility}, 1314 + {&amd_pstate_floor_count, floor_freq_visibility}, 1315 + }; 1316 + 1317 + struct freq_attr **amd_pstate_get_current_attrs(void) 1318 + { 1319 + if (!current_pstate_driver) 1320 + return NULL; 1321 + return current_pstate_driver->attr; 1322 + } 1323 + EXPORT_SYMBOL_GPL(amd_pstate_get_current_attrs); 1324 + 1325 + static struct freq_attr **get_freq_attrs(void) 1326 + { 1327 + bool attr_visible[ARRAY_SIZE(amd_pstate_attr_visibility)]; 1328 + struct freq_attr **attrs; 1329 + int i, j, count; 1330 + 1331 + for (i = 0, count = 0; i < ARRAY_SIZE(amd_pstate_attr_visibility); i++) { 1332 + struct freq_attr_visibility *v = &amd_pstate_attr_visibility[i]; 1333 + 1334 + attr_visible[i] = v->visibility_fn(); 1335 + if (attr_visible[i]) 1336 + count++; 1337 + } 1338 + 1339 + /* amd_pstate_{max_freq, lowest_nonlinear_freq, highest_perf} should always be visible */ 1340 + BUG_ON(!count); 1341 + 1342 + attrs = kcalloc(count + 1, sizeof(struct freq_attr *), GFP_KERNEL); 1343 + if (!attrs) 1344 + return ERR_PTR(-ENOMEM); 1345 + 1346 + for (i = 0, j = 0; i < ARRAY_SIZE(amd_pstate_attr_visibility); i++) { 1347 + if (!attr_visible[i]) 1348 + continue; 1349 + 1350 + attrs[j++] = amd_pstate_attr_visibility[i].attr; 1351 + } 1352 + 1353 + return attrs; 1354 + } 1477 1355 1478 1356 static void amd_pstate_driver_cleanup(void) 1479 1357 { ··· 1613 1227 sched_clear_itmt_support(); 1614 1228 1615 1229 cppc_state = AMD_PSTATE_DISABLE; 1230 + kfree(current_pstate_driver->attr); 1231 + current_pstate_driver->attr = NULL; 1616 1232 current_pstate_driver = NULL; 1617 1233 } 1618 1234 ··· 1639 1251 1640 1252 static int amd_pstate_register_driver(int mode) 1641 1253 { 1254 + struct freq_attr **attr = NULL; 1642 1255 int ret; 1643 1256 1644 1257 ret = amd_pstate_set_driver(mode); ··· 1647 1258 return ret; 1648 1259 1649 1260 cppc_state = mode; 1261 + 1262 + /* 1263 + * Note: It is important to compute the attrs _after_ 1264 + * re-initializing the cppc_state. Some attributes become 1265 + * visible only when cppc_state is AMD_PSTATE_ACTIVE. 1266 + */ 1267 + attr = get_freq_attrs(); 1268 + if (IS_ERR(attr)) { 1269 + ret = (int) PTR_ERR(attr); 1270 + pr_err("Couldn't compute freq_attrs for current mode %s [%d]\n", 1271 + amd_pstate_get_mode_string(cppc_state), ret); 1272 + amd_pstate_driver_cleanup(); 1273 + return ret; 1274 + } 1275 + 1276 + current_pstate_driver->attr = attr; 1650 1277 1651 1278 /* at least one CPU supports CPB */ 1652 1279 current_pstate_driver->boost_enabled = cpu_feature_enabled(X86_FEATURE_CPB); ··· 1805 1400 return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore)); 1806 1401 } 1807 1402 1808 - cpufreq_freq_attr_ro(amd_pstate_max_freq); 1809 - cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); 1403 + static ssize_t dynamic_epp_show(struct device *dev, 1404 + struct device_attribute *attr, char *buf) 1405 + { 1406 + return sysfs_emit(buf, "%s\n", str_enabled_disabled(dynamic_epp)); 1407 + } 1810 1408 1811 - cpufreq_freq_attr_ro(amd_pstate_highest_perf); 1812 - cpufreq_freq_attr_ro(amd_pstate_prefcore_ranking); 1813 - cpufreq_freq_attr_ro(amd_pstate_hw_prefcore); 1814 - cpufreq_freq_attr_rw(energy_performance_preference); 1815 - cpufreq_freq_attr_ro(energy_performance_available_preferences); 1409 + static ssize_t dynamic_epp_store(struct device *a, struct device_attribute *b, 1410 + const char *buf, size_t count) 1411 + { 1412 + bool enabled; 1413 + int ret; 1414 + 1415 + ret = kstrtobool(buf, &enabled); 1416 + if (ret) 1417 + return ret; 1418 + 1419 + if (dynamic_epp == enabled) 1420 + return -EINVAL; 1421 + 1422 + /* reinitialize with desired dynamic EPP value */ 1423 + dynamic_epp = enabled; 1424 + ret = amd_pstate_change_driver_mode(cppc_state); 1425 + if (ret) 1426 + dynamic_epp = false; 1427 + 1428 + return ret ? ret : count; 1429 + } 1430 + 1816 1431 static DEVICE_ATTR_RW(status); 1817 1432 static DEVICE_ATTR_RO(prefcore); 1818 - 1819 - static struct freq_attr *amd_pstate_attr[] = { 1820 - &amd_pstate_max_freq, 1821 - &amd_pstate_lowest_nonlinear_freq, 1822 - &amd_pstate_highest_perf, 1823 - &amd_pstate_prefcore_ranking, 1824 - &amd_pstate_hw_prefcore, 1825 - NULL, 1826 - }; 1827 - 1828 - static struct freq_attr *amd_pstate_epp_attr[] = { 1829 - &amd_pstate_max_freq, 1830 - &amd_pstate_lowest_nonlinear_freq, 1831 - &amd_pstate_highest_perf, 1832 - &amd_pstate_prefcore_ranking, 1833 - &amd_pstate_hw_prefcore, 1834 - &energy_performance_preference, 1835 - &energy_performance_available_preferences, 1836 - NULL, 1837 - }; 1433 + static DEVICE_ATTR_RW(dynamic_epp); 1838 1434 1839 1435 static struct attribute *pstate_global_attributes[] = { 1840 1436 &dev_attr_status.attr, 1841 1437 &dev_attr_prefcore.attr, 1438 + &dev_attr_dynamic_epp.attr, 1842 1439 NULL 1843 1440 }; 1844 1441 ··· 1910 1503 policy->cpuinfo.min_freq = policy->min = perf_to_freq(perf, 1911 1504 cpudata->nominal_freq, 1912 1505 perf.lowest_perf); 1913 - policy->cpuinfo.max_freq = policy->max = perf_to_freq(perf, 1914 - cpudata->nominal_freq, 1915 - perf.highest_perf); 1506 + policy->cpuinfo.max_freq = policy->max = cpudata->max_freq; 1916 1507 policy->driver_data = cpudata; 1917 1508 1918 1509 ret = amd_pstate_cppc_enable(policy); ··· 1930 1525 if (amd_pstate_acpi_pm_profile_server() || 1931 1526 amd_pstate_acpi_pm_profile_undefined()) { 1932 1527 policy->policy = CPUFREQ_POLICY_PERFORMANCE; 1933 - cpudata->epp_default = amd_pstate_get_epp(cpudata); 1528 + cpudata->epp_default_ac = cpudata->epp_default_dc = amd_pstate_get_epp(cpudata); 1529 + cpudata->current_profile = PLATFORM_PROFILE_PERFORMANCE; 1934 1530 } else { 1935 1531 policy->policy = CPUFREQ_POLICY_POWERSAVE; 1936 - cpudata->epp_default = AMD_CPPC_EPP_BALANCE_PERFORMANCE; 1532 + cpudata->epp_default_ac = AMD_CPPC_EPP_PERFORMANCE; 1533 + cpudata->epp_default_dc = AMD_CPPC_EPP_BALANCE_PERFORMANCE; 1534 + cpudata->current_profile = PLATFORM_PROFILE_BALANCED; 1937 1535 } 1938 1536 1939 - ret = amd_pstate_set_epp(policy, cpudata->epp_default); 1537 + if (dynamic_epp) 1538 + ret = amd_pstate_set_dynamic_epp(policy); 1539 + else 1540 + ret = amd_pstate_set_epp(policy, amd_pstate_get_balanced_epp(policy)); 1940 1541 if (ret) 1941 - return ret; 1542 + goto free_cpudata1; 1543 + 1544 + ret = amd_pstate_init_floor_perf(policy); 1545 + if (ret) { 1546 + dev_err(dev, "Failed to initialize Floor Perf (%d)\n", ret); 1547 + goto free_cpudata1; 1548 + } 1942 1549 1943 1550 current_pstate_driver->adjust_perf = NULL; 1944 1551 ··· 1959 1542 free_cpudata1: 1960 1543 pr_warn("Failed to initialize CPU %d: %d\n", policy->cpu, ret); 1961 1544 kfree(cpudata); 1545 + policy->driver_data = NULL; 1962 1546 return ret; 1963 1547 } 1964 1548 ··· 1972 1554 1973 1555 /* Reset CPPC_REQ MSR to the BIOS value */ 1974 1556 amd_pstate_update_perf(policy, perf.bios_min_perf, 0U, 0U, 0U, false); 1557 + amd_pstate_set_floor_perf(policy, cpudata->bios_floor_perf); 1975 1558 1559 + if (cpudata->dynamic_epp) 1560 + amd_pstate_clear_dynamic_epp(policy); 1976 1561 kfree(cpudata); 1977 1562 policy->driver_data = NULL; 1978 1563 } ··· 2030 1609 2031 1610 static int amd_pstate_cpu_online(struct cpufreq_policy *policy) 2032 1611 { 2033 - return amd_pstate_cppc_enable(policy); 1612 + struct amd_cpudata *cpudata = policy->driver_data; 1613 + union perf_cached perf = READ_ONCE(cpudata->perf); 1614 + u8 cached_floor_perf; 1615 + int ret; 1616 + 1617 + ret = amd_pstate_cppc_enable(policy); 1618 + if (ret) 1619 + return ret; 1620 + 1621 + cached_floor_perf = freq_to_perf(perf, cpudata->nominal_freq, cpudata->floor_freq); 1622 + return amd_pstate_set_floor_perf(policy, cached_floor_perf); 2034 1623 } 2035 1624 2036 1625 static int amd_pstate_cpu_offline(struct cpufreq_policy *policy) 2037 1626 { 2038 1627 struct amd_cpudata *cpudata = policy->driver_data; 2039 1628 union perf_cached perf = READ_ONCE(cpudata->perf); 1629 + int ret; 2040 1630 2041 1631 /* 2042 1632 * Reset CPPC_REQ MSR to the BIOS value, this will allow us to retain the BIOS specified 2043 1633 * min_perf value across kexec reboots. If this CPU is just onlined normally after this, the 2044 1634 * limits, epp and desired perf will get reset to the cached values in cpudata struct 2045 1635 */ 2046 - return amd_pstate_update_perf(policy, perf.bios_min_perf, 1636 + ret = amd_pstate_update_perf(policy, perf.bios_min_perf, 2047 1637 FIELD_GET(AMD_CPPC_DES_PERF_MASK, cpudata->cppc_req_cached), 2048 1638 FIELD_GET(AMD_CPPC_MAX_PERF_MASK, cpudata->cppc_req_cached), 2049 1639 FIELD_GET(AMD_CPPC_EPP_PERF_MASK, cpudata->cppc_req_cached), 2050 1640 false); 1641 + if (ret) 1642 + return ret; 1643 + 1644 + return amd_pstate_set_floor_perf(policy, cpudata->bios_floor_perf); 2051 1645 } 2052 1646 2053 1647 static int amd_pstate_suspend(struct cpufreq_policy *policy) ··· 2084 1648 if (ret) 2085 1649 return ret; 2086 1650 1651 + ret = amd_pstate_set_floor_perf(policy, cpudata->bios_floor_perf); 1652 + if (ret) 1653 + return ret; 1654 + 2087 1655 /* set this flag to avoid setting core offline*/ 2088 1656 cpudata->suspended = true; 2089 1657 ··· 2099 1659 struct amd_cpudata *cpudata = policy->driver_data; 2100 1660 union perf_cached perf = READ_ONCE(cpudata->perf); 2101 1661 int cur_perf = freq_to_perf(perf, cpudata->nominal_freq, policy->cur); 1662 + u8 cached_floor_perf; 1663 + int ret; 2102 1664 2103 1665 /* Set CPPC_REQ to last sane value until the governor updates it */ 2104 - return amd_pstate_update_perf(policy, perf.min_limit_perf, cur_perf, perf.max_limit_perf, 2105 - 0U, false); 1666 + ret = amd_pstate_update_perf(policy, perf.min_limit_perf, cur_perf, perf.max_limit_perf, 1667 + 0U, false); 1668 + if (ret) 1669 + return ret; 1670 + 1671 + cached_floor_perf = freq_to_perf(perf, cpudata->nominal_freq, cpudata->floor_freq); 1672 + return amd_pstate_set_floor_perf(policy, cached_floor_perf); 2106 1673 } 2107 1674 2108 1675 static int amd_pstate_epp_resume(struct cpufreq_policy *policy) 2109 1676 { 2110 1677 struct amd_cpudata *cpudata = policy->driver_data; 1678 + union perf_cached perf = READ_ONCE(cpudata->perf); 1679 + u8 cached_floor_perf; 2111 1680 2112 1681 if (cpudata->suspended) { 2113 1682 int ret; ··· 2129 1680 cpudata->suspended = false; 2130 1681 } 2131 1682 2132 - return 0; 1683 + cached_floor_perf = freq_to_perf(perf, cpudata->nominal_freq, cpudata->floor_freq); 1684 + return amd_pstate_set_floor_perf(policy, cached_floor_perf); 2133 1685 } 2134 1686 2135 1687 static struct cpufreq_driver amd_pstate_driver = { ··· 2147 1697 .set_boost = amd_pstate_set_boost, 2148 1698 .update_limits = amd_pstate_update_limits, 2149 1699 .name = "amd-pstate", 2150 - .attr = amd_pstate_attr, 2151 1700 }; 2152 1701 2153 1702 static struct cpufreq_driver amd_pstate_epp_driver = { ··· 2162 1713 .update_limits = amd_pstate_update_limits, 2163 1714 .set_boost = amd_pstate_set_boost, 2164 1715 .name = "amd-pstate-epp", 2165 - .attr = amd_pstate_epp_attr, 2166 1716 }; 2167 1717 2168 1718 /* ··· 2307 1859 return ret; 2308 1860 2309 1861 global_attr_free: 2310 - cpufreq_unregister_driver(current_pstate_driver); 1862 + amd_pstate_unregister_driver(0); 2311 1863 return ret; 2312 1864 } 2313 1865 device_initcall(amd_pstate_init); ··· 2334 1886 return 0; 2335 1887 } 2336 1888 1889 + static int __init amd_dynamic_epp_param(char *str) 1890 + { 1891 + if (!strcmp(str, "disable")) 1892 + dynamic_epp = false; 1893 + if (!strcmp(str, "enable")) 1894 + dynamic_epp = true; 1895 + 1896 + return 0; 1897 + } 1898 + 2337 1899 early_param("amd_pstate", amd_pstate_param); 2338 1900 early_param("amd_prefcore", amd_prefcore_param); 1901 + early_param("amd_dynamic_epp", amd_dynamic_epp_param); 2339 1902 2340 1903 MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>"); 2341 1904 MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
+36 -1
drivers/cpufreq/amd-pstate.h
··· 9 9 #define _LINUX_AMD_PSTATE_H 10 10 11 11 #include <linux/pm_qos.h> 12 + #include <linux/platform_profile.h> 12 13 13 14 /********************************************************************* 14 15 * AMD P-state INTERFACE * ··· 63 62 * @cpu: CPU number 64 63 * @req: constraint request to apply 65 64 * @cppc_req_cached: cached performance request hints 65 + * @cppc_req2_cached: cached value of MSR_AMD_CPPC_REQ2 66 66 * @perf: cached performance-related data 67 67 * @prefcore_ranking: the preferred core ranking, the higher value indicates a higher 68 68 * priority. 69 + * @floor_perf_cnt: Cached value of the number of distinct floor 70 + * performance levels supported 71 + * @bios_floor_perf: Cached value of the boot-time floor performance level from 72 + * MSR_AMD_CPPC_REQ2 69 73 * @min_limit_freq: Cached value of policy->min (in khz) 70 74 * @max_limit_freq: Cached value of policy->max (in khz) 71 75 * @nominal_freq: the frequency (in khz) that mapped to nominal_perf 76 + * @max_freq: in ideal conditions the maximum frequency (in khz) possible frequency 72 77 * @lowest_nonlinear_freq: the frequency (in khz) that mapped to lowest_nonlinear_perf 78 + * @floor_freq: Cached value of the user requested floor_freq 73 79 * @cur: Difference of Aperf/Mperf/tsc count between last and current sample 74 80 * @prev: Last Aperf/Mperf/tsc count value read from register 75 81 * @freq: current cpu frequency value (in khz) ··· 86 78 * AMD P-State driver supports preferred core featue. 87 79 * @epp_cached: Cached CPPC energy-performance preference value 88 80 * @policy: Cpufreq policy value 81 + * @suspended: If CPU core if offlined 82 + * @epp_default_ac: Default EPP value for AC power source 83 + * @epp_default_dc: Default EPP value for DC power source 84 + * @dynamic_epp: Whether dynamic EPP is enabled 85 + * @power_nb: Notifier block for power events 89 86 * 90 87 * The amd_cpudata is key private data for each CPU thread in AMD P-State, and 91 88 * represents all the attributes and goals that AMD P-State requests at runtime. ··· 100 87 101 88 struct freq_qos_request req[2]; 102 89 u64 cppc_req_cached; 90 + u64 cppc_req2_cached; 103 91 104 92 union perf_cached perf; 105 93 106 94 u8 prefcore_ranking; 95 + u8 floor_perf_cnt; 96 + u8 bios_floor_perf; 107 97 u32 min_limit_freq; 108 98 u32 max_limit_freq; 109 99 u32 nominal_freq; 100 + u32 max_freq; 110 101 u32 lowest_nonlinear_freq; 102 + u32 floor_freq; 111 103 112 104 struct amd_aperf_mperf cur; 113 105 struct amd_aperf_mperf prev; ··· 124 106 /* EPP feature related attributes*/ 125 107 u32 policy; 126 108 bool suspended; 127 - u8 epp_default; 109 + u8 epp_default_ac; 110 + u8 epp_default_dc; 111 + bool dynamic_epp; 112 + bool raw_epp; 113 + struct notifier_block power_nb; 114 + 115 + /* platform profile */ 116 + enum platform_profile_option current_profile; 117 + struct device *ppdev; 118 + char *profile_name; 128 119 }; 129 120 130 121 /* ··· 150 123 const char *amd_pstate_get_mode_string(enum amd_pstate_mode mode); 151 124 int amd_pstate_get_status(void); 152 125 int amd_pstate_update_status(const char *buf, size_t size); 126 + ssize_t store_energy_performance_preference(struct cpufreq_policy *policy, 127 + const char *buf, size_t count); 128 + ssize_t show_energy_performance_preference(struct cpufreq_policy *policy, char *buf); 129 + void amd_pstate_clear_dynamic_epp(struct cpufreq_policy *policy); 130 + 131 + struct freq_attr; 132 + 133 + struct freq_attr **amd_pstate_get_current_attrs(void); 153 134 154 135 #endif /* _LINUX_AMD_PSTATE_H */
+2 -8
drivers/cpufreq/cppc_cpufreq.c
··· 834 834 { 835 835 struct cppc_cpudata *cpu_data = policy->driver_data; 836 836 struct cppc_perf_caps *caps = &cpu_data->perf_caps; 837 - int ret; 838 837 839 838 if (state) 840 - policy->max = cppc_perf_to_khz(caps, caps->highest_perf); 839 + policy->cpuinfo.max_freq = cppc_perf_to_khz(caps, caps->highest_perf); 841 840 else 842 - policy->max = cppc_perf_to_khz(caps, caps->nominal_perf); 843 - policy->cpuinfo.max_freq = policy->max; 844 - 845 - ret = freq_qos_update_request(policy->max_freq_req, policy->max); 846 - if (ret < 0) 847 - return ret; 841 + policy->cpuinfo.max_freq = cppc_perf_to_khz(caps, caps->nominal_perf); 848 842 849 843 return 0; 850 844 }
+1
drivers/cpufreq/cpufreq-dt-platdev.c
··· 159 159 { .compatible = "qcom,qcm2290", }, 160 160 { .compatible = "qcom,qcm6490", }, 161 161 { .compatible = "qcom,qcs404", }, 162 + { .compatible = "qcom,qcs8300", }, 162 163 { .compatible = "qcom,qdu1000", }, 163 164 { .compatible = "qcom,sa8155p" }, 164 165 { .compatible = "qcom,sa8540p" },
+36 -49
drivers/cpufreq/cpufreq.c
··· 609 609 policy->boost_enabled = enable; 610 610 611 611 ret = cpufreq_driver->set_boost(policy, enable); 612 - if (ret) 612 + if (ret) { 613 613 policy->boost_enabled = !policy->boost_enabled; 614 + return ret; 615 + } 614 616 615 - return ret; 617 + ret = freq_qos_update_request(&policy->boost_freq_req, policy->cpuinfo.max_freq); 618 + if (ret < 0) { 619 + policy->boost_enabled = !policy->boost_enabled; 620 + cpufreq_driver->set_boost(policy, policy->boost_enabled); 621 + return ret; 622 + } 623 + 624 + return 0; 616 625 } 617 626 618 627 static ssize_t store_local_boost(struct cpufreq_policy *policy, ··· 769 760 if (ret) \ 770 761 return ret; \ 771 762 \ 772 - ret = freq_qos_update_request(policy->object##_freq_req, val);\ 763 + ret = freq_qos_update_request(&policy->object##_freq_req, val); \ 773 764 return ret >= 0 ? count : ret; \ 774 765 } 775 766 ··· 1374 1365 /* Cancel any pending policy->update work before freeing the policy. */ 1375 1366 cancel_work_sync(&policy->update); 1376 1367 1377 - if (policy->max_freq_req) { 1368 + if (freq_qos_request_active(&policy->max_freq_req)) { 1378 1369 /* 1379 1370 * Remove max_freq_req after sending CPUFREQ_REMOVE_POLICY 1380 1371 * notification, since CPUFREQ_CREATE_POLICY notification was ··· 1382 1373 */ 1383 1374 blocking_notifier_call_chain(&cpufreq_policy_notifier_list, 1384 1375 CPUFREQ_REMOVE_POLICY, policy); 1385 - freq_qos_remove_request(policy->max_freq_req); 1376 + freq_qos_remove_request(&policy->max_freq_req); 1386 1377 } 1387 1378 1388 - freq_qos_remove_request(policy->min_freq_req); 1389 - kfree(policy->min_freq_req); 1379 + if (freq_qos_request_active(&policy->min_freq_req)) 1380 + freq_qos_remove_request(&policy->min_freq_req); 1381 + if (freq_qos_request_active(&policy->boost_freq_req)) 1382 + freq_qos_remove_request(&policy->boost_freq_req); 1390 1383 1391 1384 cpufreq_policy_put_kobj(policy); 1392 1385 free_cpumask_var(policy->real_cpus); ··· 1458 1447 add_cpu_dev_symlink(policy, j, get_cpu_device(j)); 1459 1448 } 1460 1449 1461 - policy->min_freq_req = kzalloc(2 * sizeof(*policy->min_freq_req), 1462 - GFP_KERNEL); 1463 - if (!policy->min_freq_req) { 1464 - ret = -ENOMEM; 1465 - goto out_destroy_policy; 1450 + if (policy->boost_supported) { 1451 + ret = freq_qos_add_request(&policy->constraints, 1452 + &policy->boost_freq_req, 1453 + FREQ_QOS_MAX, 1454 + policy->cpuinfo.max_freq); 1455 + if (ret < 0) 1456 + goto out_destroy_policy; 1466 1457 } 1467 1458 1468 1459 ret = freq_qos_add_request(&policy->constraints, 1469 - policy->min_freq_req, FREQ_QOS_MIN, 1460 + &policy->min_freq_req, FREQ_QOS_MIN, 1470 1461 FREQ_QOS_MIN_DEFAULT_VALUE); 1471 - if (ret < 0) { 1472 - /* 1473 - * So we don't call freq_qos_remove_request() for an 1474 - * uninitialized request. 1475 - */ 1476 - kfree(policy->min_freq_req); 1477 - policy->min_freq_req = NULL; 1462 + if (ret < 0) 1478 1463 goto out_destroy_policy; 1479 - } 1480 - 1481 - /* 1482 - * This must be initialized right here to avoid calling 1483 - * freq_qos_remove_request() on uninitialized request in case 1484 - * of errors. 1485 - */ 1486 - policy->max_freq_req = policy->min_freq_req + 1; 1487 1464 1488 1465 ret = freq_qos_add_request(&policy->constraints, 1489 - policy->max_freq_req, FREQ_QOS_MAX, 1466 + &policy->max_freq_req, FREQ_QOS_MAX, 1490 1467 FREQ_QOS_MAX_DEFAULT_VALUE); 1491 - if (ret < 0) { 1492 - policy->max_freq_req = NULL; 1468 + if (ret < 0) 1493 1469 goto out_destroy_policy; 1494 - } 1495 1470 1496 1471 blocking_notifier_call_chain(&cpufreq_policy_notifier_list, 1497 1472 CPUFREQ_CREATE_POLICY, policy); 1498 - } else { 1499 - ret = freq_qos_update_request(policy->max_freq_req, policy->max); 1500 - if (ret < 0) 1501 - goto out_destroy_policy; 1502 1473 } 1503 1474 1504 1475 if (cpufreq_driver->get && has_target()) { ··· 2221 2228 2222 2229 /** 2223 2230 * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go. 2224 - * @cpu: Target CPU. 2231 + * @policy: cpufreq policy object of the target CPU. 2225 2232 * @min_perf: Minimum (required) performance level (units of @capacity). 2226 2233 * @target_perf: Target (desired) performance level (units of @capacity). 2227 2234 * @capacity: Capacity of the target CPU. ··· 2240 2247 * parallel with either ->target() or ->target_index() or ->fast_switch() for 2241 2248 * the same CPU. 2242 2249 */ 2243 - void cpufreq_driver_adjust_perf(unsigned int cpu, 2250 + void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy, 2244 2251 unsigned long min_perf, 2245 2252 unsigned long target_perf, 2246 2253 unsigned long capacity) 2247 2254 { 2248 - cpufreq_driver->adjust_perf(cpu, min_perf, target_perf, capacity); 2255 + cpufreq_driver->adjust_perf(policy, min_perf, target_perf, capacity); 2249 2256 } 2250 2257 2251 2258 /** ··· 2357 2364 target_freq = __resolve_freq(policy, target_freq, policy->min, 2358 2365 policy->max, relation); 2359 2366 2360 - pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n", 2361 - policy->cpu, target_freq, relation, old_target_freq); 2367 + pr_debug("CPU %u: cur %u kHz -> target %u kHz (req %u kHz, rel %u)\n", 2368 + policy->cpu, policy->cur, target_freq, old_target_freq, relation); 2362 2369 2363 2370 /* 2364 2371 * This might look like a redundant call as we are checking it again ··· 2782 2789 return -ENXIO; 2783 2790 2784 2791 ret = cpufreq_frequency_table_cpuinfo(policy); 2785 - if (ret) { 2792 + if (ret) 2786 2793 pr_err("%s: Policy frequency update failed\n", __func__); 2787 - return ret; 2788 - } 2789 2794 2790 - ret = freq_qos_update_request(policy->max_freq_req, policy->max); 2791 - if (ret < 0) 2792 - return ret; 2793 - 2794 - return 0; 2795 + return ret; 2795 2796 } 2796 2797 EXPORT_SYMBOL_GPL(cpufreq_boost_set_sw); 2797 2798
+3 -2
drivers/cpufreq/cpufreq_governor.h
··· 21 21 #include <linux/kernel_stat.h> 22 22 #include <linux/module.h> 23 23 #include <linux/mutex.h> 24 + #include <linux/sysfs.h> 24 25 25 26 /* Ondemand Sampling types */ 26 27 enum {OD_NORMAL_SAMPLE, OD_SUB_SAMPLE}; ··· 58 57 { \ 59 58 struct dbs_data *dbs_data = to_dbs_data(attr_set); \ 60 59 struct _gov##_dbs_tuners *tuners = dbs_data->tuners; \ 61 - return sprintf(buf, "%u\n", tuners->file_name); \ 60 + return sysfs_emit(buf, "%u\n", tuners->file_name); \ 62 61 } 63 62 64 63 #define gov_show_one_common(file_name) \ ··· 66 65 (struct gov_attr_set *attr_set, char *buf) \ 67 66 { \ 68 67 struct dbs_data *dbs_data = to_dbs_data(attr_set); \ 69 - return sprintf(buf, "%u\n", dbs_data->file_name); \ 68 + return sysfs_emit(buf, "%u\n", dbs_data->file_name); \ 70 69 } 71 70 72 71 #define gov_attr_ro(_name) \
+3 -3
drivers/cpufreq/intel_pstate.c
··· 3239 3239 return target_pstate * cpu->pstate.scaling; 3240 3240 } 3241 3241 3242 - static void intel_cpufreq_adjust_perf(unsigned int cpunum, 3242 + static void intel_cpufreq_adjust_perf(struct cpufreq_policy *policy, 3243 3243 unsigned long min_perf, 3244 3244 unsigned long target_perf, 3245 3245 unsigned long capacity) 3246 3246 { 3247 - struct cpudata *cpu = all_cpu_data[cpunum]; 3247 + struct cpudata *cpu = all_cpu_data[policy->cpu]; 3248 3248 u64 hwp_cap = READ_ONCE(cpu->hwp_cap_cached); 3249 3249 int old_pstate = cpu->pstate.current_pstate; 3250 3250 int cap_pstate, min_pstate, max_pstate, target_pstate; ··· 3472 3472 { 3473 3473 if (size == 3 && !strncmp(buf, "off", size)) { 3474 3474 if (!intel_pstate_driver) 3475 - return -EINVAL; 3475 + return 0; 3476 3476 3477 3477 if (hwp_active) 3478 3478 return -EBUSY;
+2 -2
drivers/cpufreq/tegra194-cpufreq.c
··· 196 196 .refclk_delta_min = 16000, 197 197 }; 198 198 199 - static const struct tegra_cpufreq_soc tegra239_cpufreq_soc = { 199 + static const struct tegra_cpufreq_soc tegra238_cpufreq_soc = { 200 200 .ops = &tegra234_cpufreq_ops, 201 201 .actmon_cntr_base = 0x4000, 202 202 .maxcpus_per_cluster = 8, ··· 807 807 static const struct of_device_id tegra194_cpufreq_of_match[] = { 808 808 { .compatible = "nvidia,tegra194-ccplex", .data = &tegra194_cpufreq_soc }, 809 809 { .compatible = "nvidia,tegra234-ccplex-cluster", .data = &tegra234_cpufreq_soc }, 810 - { .compatible = "nvidia,tegra239-ccplex-cluster", .data = &tegra239_cpufreq_soc }, 810 + { .compatible = "nvidia,tegra238-ccplex-cluster", .data = &tegra238_cpufreq_soc }, 811 811 { /* sentinel */ } 812 812 }; 813 813 MODULE_DEVICE_TABLE(of, tegra194_cpufreq_of_match);
+1 -1
drivers/cpuidle/Kconfig
··· 81 81 before halting in the guest (more efficient than polling in the 82 82 host via halt_poll_ns for some scenarios). 83 83 84 - endif 84 + endif # CPU_IDLE 85 85 86 86 config ARCH_NEEDS_CPU_IDLE_COUPLED 87 87 def_bool n
+1 -1
drivers/cpuidle/Kconfig.mips
··· 4 4 # 5 5 config MIPS_CPS_CPUIDLE 6 6 bool "CPU Idle driver for MIPS CPS platforms" 7 - depends on CPU_IDLE && MIPS_CPS 7 + depends on MIPS_CPS 8 8 depends on SYS_SUPPORTS_MIPS_CPS 9 9 select ARCH_NEEDS_CPU_IDLE_COUPLED if MIPS_MT || CPU_MIPSR6 10 10 select GENERIC_CLOCKEVENTS_BROADCAST if SMP
-2
drivers/cpuidle/Kconfig.powerpc
··· 4 4 # 5 5 config PSERIES_CPUIDLE 6 6 bool "Cpuidle driver for pSeries platforms" 7 - depends on CPU_IDLE 8 7 depends on PPC_PSERIES 9 8 default y 10 9 help ··· 12 13 13 14 config POWERNV_CPUIDLE 14 15 bool "Cpuidle driver for powernv platforms" 15 - depends on CPU_IDLE 16 16 depends on PPC_POWERNV 17 17 default y 18 18 help
+5 -7
drivers/cpuidle/cpuidle.c
··· 679 679 if (!dev) 680 680 return -EINVAL; 681 681 682 - mutex_lock(&cpuidle_lock); 682 + guard(mutex)(&cpuidle_lock); 683 683 684 684 if (dev->registered) 685 - goto out_unlock; 685 + return ret; 686 686 687 687 __cpuidle_device_init(dev); 688 688 689 689 ret = __cpuidle_register_device(dev); 690 690 if (ret) 691 - goto out_unlock; 691 + return ret; 692 692 693 693 ret = cpuidle_add_sysfs(dev); 694 694 if (ret) ··· 700 700 701 701 cpuidle_install_idle_handler(); 702 702 703 - out_unlock: 704 - mutex_unlock(&cpuidle_lock); 705 - 706 703 return ret; 707 704 708 705 out_sysfs: 709 706 cpuidle_remove_sysfs(dev); 710 707 out_unregister: 711 708 __cpuidle_unregister_device(dev); 712 - goto out_unlock; 709 + 710 + return ret; 713 711 } 714 712 715 713 EXPORT_SYMBOL_GPL(cpuidle_register_device);
+5
drivers/cpuidle/governors/gov.h
··· 10 10 * check the time till the closest expected timer event. 11 11 */ 12 12 #define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC) 13 + /* 14 + * If the closest timer is in this range, the governor idle state selection need 15 + * not be adjusted after the scheduler tick has been stopped. 16 + */ 17 + #define SAFE_TIMER_RANGE_NS (2 * TICK_NSEC) 13 18 14 19 #endif /* __CPUIDLE_GOVERNOR_H */
+9 -6
drivers/cpuidle/governors/menu.c
··· 261 261 predicted_ns = min((u64)timer_us * NSEC_PER_USEC, predicted_ns); 262 262 /* 263 263 * If the tick is already stopped, the cost of possible short 264 - * idle duration misprediction is much higher, because the CPU 265 - * may be stuck in a shallow idle state for a long time as a 266 - * result of it. In that case, say we might mispredict and use 267 - * the known time till the closest timer event for the idle 268 - * state selection. 264 + * idle duration misprediction is higher because the CPU may get 265 + * stuck in a shallow idle state then. To avoid that, if 266 + * predicted_ns is small enough, say it might be mispredicted 267 + * and use the known time till the closest timer for idle state 268 + * selection unless that timer is going to trigger within 269 + * SAFE_TIMER_RANGE_NS in which case it can be regarded as a 270 + * sufficient safety net. 269 271 */ 270 - if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC) 272 + if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC && 273 + data->next_timer_ns > SAFE_TIMER_RANGE_NS) 271 274 predicted_ns = data->next_timer_ns; 272 275 } else { 273 276 /*
+34 -47
drivers/cpuidle/governors/teo.c
··· 407 407 * better choice. 408 408 */ 409 409 if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) { 410 - int min_idx = idx0; 411 - 412 - if (tick_nohz_tick_stopped()) { 413 - /* 414 - * Look for the shallowest idle state below the current 415 - * candidate one whose target residency is at least 416 - * equal to the tick period length. 417 - */ 418 - while (min_idx < idx && 419 - drv->states[min_idx].target_residency_ns < TICK_NSEC) 420 - min_idx++; 421 - 422 - /* 423 - * Avoid selecting a state with a lower index, but with 424 - * the same target residency as the current candidate 425 - * one. 426 - */ 427 - if (drv->states[min_idx].target_residency_ns == 428 - drv->states[idx].target_residency_ns) 429 - goto constraint; 430 - } 431 - 432 - /* 433 - * If the minimum state index is greater than or equal to the 434 - * index of the state with the maximum intercepts metric and 435 - * the corresponding state is enabled, there is no need to look 436 - * at the deeper states. 437 - */ 438 - if (min_idx >= intercept_max_idx && 439 - !dev->states_usage[min_idx].disable) { 440 - idx = min_idx; 441 - goto constraint; 442 - } 443 - 444 410 /* 445 411 * Look for the deepest enabled idle state, at most as deep as 446 412 * the one with the maximum intercepts metric, whose target 447 413 * residency had not been greater than the idle duration in over 448 414 * a half of the relevant cases in the past. 449 - * 450 - * Take the possible duration limitation present if the tick 451 - * has been stopped already into account. 452 415 */ 453 - for (i = idx - 1, intercept_sum = 0; i >= min_idx; i--) { 416 + for (i = idx - 1, intercept_sum = 0; i >= idx0; i--) { 454 417 intercept_sum += cpu_data->state_bins[i].intercepts; 455 418 456 419 if (dev->states_usage[i].disable) ··· 426 463 } 427 464 } 428 465 429 - constraint: 430 466 /* 431 467 * If there is a latency constraint, it may be necessary to select an 432 468 * idle state shallower than the current candidate one. ··· 434 472 idx = constraint_idx; 435 473 436 474 /* 437 - * If either the candidate state is state 0 or its target residency is 438 - * low enough, there is basically nothing more to do, but if the sleep 439 - * length is not updated, the subsequent wakeup will be counted as an 440 - * "intercept" which may be problematic in the cases when timer wakeups 441 - * are dominant. Namely, it may effectively prevent deeper idle states 442 - * from being selected at one point even if no imminent timers are 443 - * scheduled. 475 + * If the tick has not been stopped and either the candidate state is 476 + * state 0 or its target residency is low enough, there is basically 477 + * nothing more to do, but if the sleep length is not updated, the 478 + * subsequent wakeup will be counted as an "intercept". That may be 479 + * problematic in the cases when timer wakeups are dominant because it 480 + * may effectively prevent deeper idle states from being selected at one 481 + * point even if no imminent timers are scheduled. 444 482 * 445 483 * However, frequent timers in the RESIDENCY_THRESHOLD_NS range on one 446 484 * CPU are unlikely (user space has a default 50 us slack value for ··· 456 494 * shallow idle states regardless of the wakeup type, so the sleep 457 495 * length need not be known in that case. 458 496 */ 459 - if ((!idx || drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) && 497 + if (!tick_nohz_tick_stopped() && (!idx || 498 + drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) && 460 499 (2 * cpu_data->short_idles >= cpu_data->total || 461 500 latency_req < LATENCY_THRESHOLD_NS)) 462 501 goto out_tick; 463 502 464 503 duration_ns = tick_nohz_get_sleep_length(&delta_tick); 465 504 cpu_data->sleep_length_ns = duration_ns; 505 + 506 + /* 507 + * If the tick has been stopped and the closest timer is too far away, 508 + * update the selection to prevent the CPU from getting stuck in a 509 + * shallow idle state for too long. 510 + */ 511 + if (tick_nohz_tick_stopped() && duration_ns > SAFE_TIMER_RANGE_NS && 512 + drv->states[idx].target_residency_ns < TICK_NSEC) { 513 + /* 514 + * Look for the deepest enabled idle state with exit latency 515 + * within the PM QoS limit and with target residency within 516 + * duration_ns. 517 + */ 518 + for (i = constraint_idx; i > idx; i--) { 519 + if (dev->states_usage[i].disable) 520 + continue; 521 + 522 + if (drv->states[i].target_residency_ns <= duration_ns) { 523 + idx = i; 524 + break; 525 + } 526 + } 527 + return idx; 528 + } 466 529 467 530 if (!idx) 468 531 goto out_tick;
+62 -46
drivers/devfreq/devfreq.c
··· 38 38 39 39 static struct class *devfreq_class; 40 40 static struct dentry *devfreq_debugfs; 41 + static const struct attribute_group gov_attr_group; 41 42 42 43 /* 43 44 * devfreq core provides delayed work based load monitoring helper ··· 147 146 DEV_PM_QOS_MIN_FREQUENCY); 148 147 qos_max_freq = dev_pm_qos_read_value(devfreq->dev.parent, 149 148 DEV_PM_QOS_MAX_FREQUENCY); 150 - *min_freq = max(*min_freq, (unsigned long)HZ_PER_KHZ * qos_min_freq); 149 + *min_freq = max(*min_freq, HZ_PER_KHZ * qos_min_freq); 151 150 if (qos_max_freq != PM_QOS_MAX_FREQUENCY_DEFAULT_VALUE) 152 - *max_freq = min(*max_freq, 153 - (unsigned long)HZ_PER_KHZ * qos_max_freq); 151 + *max_freq = min(*max_freq, HZ_PER_KHZ * qos_max_freq); 154 152 155 153 /* Apply constraints from OPP interface */ 156 154 *max_freq = clamp(*max_freq, devfreq->scaling_min_freq, devfreq->scaling_max_freq); ··· 785 785 kfree(devfreq); 786 786 } 787 787 788 - static void create_sysfs_files(struct devfreq *devfreq, 789 - const struct devfreq_governor *gov); 790 - static void remove_sysfs_files(struct devfreq *devfreq, 791 - const struct devfreq_governor *gov); 792 - 793 788 /** 794 789 * devfreq_add_device() - Add devfreq feature to the device 795 790 * @dev: the device to add devfreq feature. ··· 951 956 __func__); 952 957 goto err_init; 953 958 } 954 - create_sysfs_files(devfreq, devfreq->governor); 959 + 960 + err = sysfs_update_group(&devfreq->dev.kobj, &gov_attr_group); 961 + if (err) 962 + goto err_init; 955 963 956 964 list_add(&devfreq->node, &devfreq_list); 957 965 ··· 993 995 994 996 devfreq_cooling_unregister(devfreq->cdev); 995 997 996 - if (devfreq->governor) { 998 + if (devfreq->governor) 997 999 devfreq->governor->event_handler(devfreq, 998 1000 DEVFREQ_GOV_STOP, NULL); 999 - remove_sysfs_files(devfreq, devfreq->governor); 1000 - } 1001 - 1002 1001 device_unregister(&devfreq->dev); 1003 1002 1004 1003 return 0; ··· 1455 1460 __func__, df->governor->name, ret); 1456 1461 goto out; 1457 1462 } 1458 - remove_sysfs_files(df, df->governor); 1459 1463 1460 1464 /* 1461 1465 * Start the new governor and create the specific sysfs files ··· 1483 1489 * Create the sysfs files for the new governor. But if failed to start 1484 1490 * the new governor, restore the sysfs files of previous governor. 1485 1491 */ 1486 - create_sysfs_files(df, df->governor); 1492 + ret = sysfs_update_group(&df->dev.kobj, &gov_attr_group); 1487 1493 1488 1494 out: 1489 1495 mutex_unlock(&devfreq_list_lock); ··· 1801 1807 &dev_attr_trans_stat.attr, 1802 1808 NULL, 1803 1809 }; 1804 - ATTRIBUTE_GROUPS(devfreq); 1805 1810 1806 1811 static ssize_t polling_interval_show(struct device *dev, 1807 1812 struct device_attribute *attr, char *buf) 1808 1813 { 1809 1814 struct devfreq *df = to_devfreq(dev); 1810 1815 1811 - if (!df->profile) 1816 + /* Protect against race between sysfs attrs update and read/write */ 1817 + guard(mutex)(&devfreq_list_lock); 1818 + 1819 + if (!df->profile || !df->governor || 1820 + !IS_SUPPORTED_ATTR(df->governor->attrs, POLLING_INTERVAL)) 1812 1821 return -EINVAL; 1813 1822 1814 1823 return sprintf(buf, "%d\n", df->profile->polling_ms); ··· 1825 1828 unsigned int value; 1826 1829 int ret; 1827 1830 1828 - if (!df->governor) 1831 + guard(mutex)(&devfreq_list_lock); 1832 + 1833 + if (!df->governor || 1834 + !IS_SUPPORTED_ATTR(df->governor->attrs, POLLING_INTERVAL)) 1829 1835 return -EINVAL; 1830 1836 1831 1837 ret = sscanf(buf, "%u", &value); ··· 1847 1847 { 1848 1848 struct devfreq *df = to_devfreq(dev); 1849 1849 1850 - if (!df->profile) 1850 + guard(mutex)(&devfreq_list_lock); 1851 + 1852 + if (!df->profile || !df->governor || 1853 + !IS_SUPPORTED_ATTR(df->governor->attrs, TIMER)) 1851 1854 return -EINVAL; 1852 1855 1853 1856 return sprintf(buf, "%s\n", timer_name[df->profile->timer]); ··· 1864 1861 int timer = -1; 1865 1862 int ret = 0, i; 1866 1863 1867 - if (!df->governor || !df->profile) 1864 + guard(mutex)(&devfreq_list_lock); 1865 + 1866 + if (!df->governor || !df->profile || 1867 + !IS_SUPPORTED_ATTR(df->governor->attrs, TIMER)) 1868 1868 return -EINVAL; 1869 1869 1870 1870 ret = sscanf(buf, "%16s", str_timer); ··· 1911 1905 } 1912 1906 static DEVICE_ATTR_RW(timer); 1913 1907 1914 - #define CREATE_SYSFS_FILE(df, name) \ 1915 - { \ 1916 - int ret; \ 1917 - ret = sysfs_create_file(&df->dev.kobj, &dev_attr_##name.attr); \ 1918 - if (ret < 0) { \ 1919 - dev_warn(&df->dev, \ 1920 - "Unable to create attr(%s)\n", "##name"); \ 1921 - } \ 1922 - } \ 1908 + static struct attribute *governor_attrs[] = { 1909 + &dev_attr_polling_interval.attr, 1910 + &dev_attr_timer.attr, 1911 + NULL 1912 + }; 1923 1913 1924 - /* Create the specific sysfs files which depend on each governor. */ 1925 - static void create_sysfs_files(struct devfreq *devfreq, 1926 - const struct devfreq_governor *gov) 1914 + static umode_t gov_attr_visible(struct kobject *kobj, 1915 + struct attribute *attr, int n) 1927 1916 { 1928 - if (IS_SUPPORTED_ATTR(gov->attrs, POLLING_INTERVAL)) 1929 - CREATE_SYSFS_FILE(devfreq, polling_interval); 1930 - if (IS_SUPPORTED_ATTR(gov->attrs, TIMER)) 1931 - CREATE_SYSFS_FILE(devfreq, timer); 1917 + struct device *dev = kobj_to_dev(kobj); 1918 + struct devfreq *df = to_devfreq(dev); 1919 + 1920 + if (!df->governor || !df->governor->attrs) 1921 + return 0; 1922 + 1923 + if (attr == &dev_attr_polling_interval.attr && 1924 + IS_SUPPORTED_ATTR(df->governor->attrs, POLLING_INTERVAL)) 1925 + return attr->mode; 1926 + 1927 + if (attr == &dev_attr_timer.attr && 1928 + IS_SUPPORTED_ATTR(df->governor->attrs, TIMER)) 1929 + return attr->mode; 1930 + 1931 + return 0; 1932 1932 } 1933 1933 1934 - /* Remove the specific sysfs files which depend on each governor. */ 1935 - static void remove_sysfs_files(struct devfreq *devfreq, 1936 - const struct devfreq_governor *gov) 1937 - { 1938 - if (IS_SUPPORTED_ATTR(gov->attrs, POLLING_INTERVAL)) 1939 - sysfs_remove_file(&devfreq->dev.kobj, 1940 - &dev_attr_polling_interval.attr); 1941 - if (IS_SUPPORTED_ATTR(gov->attrs, TIMER)) 1942 - sysfs_remove_file(&devfreq->dev.kobj, &dev_attr_timer.attr); 1943 - } 1934 + static const struct attribute_group devfreq_group = { 1935 + .attrs = devfreq_attrs, 1936 + }; 1937 + 1938 + static const struct attribute_group gov_attr_group = { 1939 + .attrs = governor_attrs, 1940 + .is_visible = gov_attr_visible, 1941 + }; 1942 + 1943 + static const struct attribute_group *devfreq_groups[] = { 1944 + &devfreq_group, 1945 + &gov_attr_group, 1946 + NULL 1947 + }; 1944 1948 1945 1949 /** 1946 1950 * devfreq_summary_show() - Show the summary of the devfreq devices
+12 -5
drivers/devfreq/tegra30-devfreq.c
··· 941 941 return 0; 942 942 } 943 943 944 + /* 945 + * The activity counter is incremented every 256 memory transactions. However, 946 + * the number of clock cycles required for each transaction varies across 947 + * different SoC generations. For instance, a single transaction takes 2 EMC 948 + * clocks on Tegra30, 1 EMC clock on Tegra114, and 4 EMC clocks on Tegra124. 949 + */ 944 950 static const struct tegra_devfreq_soc_data tegra124_soc = { 945 951 .configs = tegra124_device_configs, 946 - 947 - /* 948 - * Activity counter is incremented every 256 memory transactions, 949 - * and each transaction takes 4 EMC clocks. 950 - */ 951 952 .count_weight = 4 * 256, 953 + }; 954 + 955 + static const struct tegra_devfreq_soc_data tegra114_soc = { 956 + .configs = tegra124_device_configs, 957 + .count_weight = 256, 952 958 }; 953 959 954 960 static const struct tegra_devfreq_soc_data tegra30_soc = { ··· 964 958 965 959 static const struct of_device_id tegra_devfreq_of_match[] = { 966 960 { .compatible = "nvidia,tegra30-actmon", .data = &tegra30_soc, }, 961 + { .compatible = "nvidia,tegra114-actmon", .data = &tegra114_soc, }, 967 962 { .compatible = "nvidia,tegra124-actmon", .data = &tegra124_soc, }, 968 963 { }, 969 964 };
+42
drivers/idle/intel_idle.c
··· 983 983 .enter = NULL } 984 984 }; 985 985 986 + static struct cpuidle_state ptl_cstates[] __initdata = { 987 + { 988 + .name = "C1", 989 + .desc = "MWAIT 0x00", 990 + .flags = MWAIT2flg(0x00), 991 + .exit_latency = 1, 992 + .target_residency = 1, 993 + .enter = &intel_idle, 994 + .enter_s2idle = intel_idle_s2idle, }, 995 + { 996 + .name = "C1E", 997 + .desc = "MWAIT 0x01", 998 + .flags = MWAIT2flg(0x01) | CPUIDLE_FLAG_ALWAYS_ENABLE, 999 + .exit_latency = 10, 1000 + .target_residency = 10, 1001 + .enter = &intel_idle, 1002 + .enter_s2idle = intel_idle_s2idle, }, 1003 + { 1004 + .name = "C6S", 1005 + .desc = "MWAIT 0x21", 1006 + .flags = MWAIT2flg(0x21) | CPUIDLE_FLAG_TLB_FLUSHED, 1007 + .exit_latency = 300, 1008 + .target_residency = 300, 1009 + .enter = &intel_idle, 1010 + .enter_s2idle = intel_idle_s2idle, }, 1011 + { 1012 + .name = "C10", 1013 + .desc = "MWAIT 0x60", 1014 + .flags = MWAIT2flg(0x60) | CPUIDLE_FLAG_TLB_FLUSHED, 1015 + .exit_latency = 370, 1016 + .target_residency = 2500, 1017 + .enter = &intel_idle, 1018 + .enter_s2idle = intel_idle_s2idle, }, 1019 + { 1020 + .enter = NULL } 1021 + }; 1022 + 986 1023 static struct cpuidle_state gmt_cstates[] __initdata = { 987 1024 { 988 1025 .name = "C1", ··· 1598 1561 .state_table = mtl_l_cstates, 1599 1562 }; 1600 1563 1564 + static const struct idle_cpu idle_cpu_ptl __initconst = { 1565 + .state_table = ptl_cstates, 1566 + }; 1567 + 1601 1568 static const struct idle_cpu idle_cpu_gmt __initconst = { 1602 1569 .state_table = gmt_cstates, 1603 1570 }; ··· 1710 1669 X86_MATCH_VFM(INTEL_ALDERLAKE, &idle_cpu_adl), 1711 1670 X86_MATCH_VFM(INTEL_ALDERLAKE_L, &idle_cpu_adl_l), 1712 1671 X86_MATCH_VFM(INTEL_METEORLAKE_L, &idle_cpu_mtl_l), 1672 + X86_MATCH_VFM(INTEL_PANTHERLAKE_L, &idle_cpu_ptl), 1713 1673 X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, &idle_cpu_gmt), 1714 1674 X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, &idle_cpu_spr), 1715 1675 X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, &idle_cpu_spr),
+1 -1
drivers/opp/core.c
··· 2742 2742 break; 2743 2743 } 2744 2744 } 2745 - break; 2746 2745 } 2746 + break; 2747 2747 } 2748 2748 2749 2749 if (IS_ERR(dest_opp)) {
+11 -9
drivers/opp/debugfs.c
··· 130 130 { 131 131 struct dentry *pdentry = opp_table->dentry; 132 132 struct dentry *d; 133 - unsigned long id; 134 - char name[25]; /* 20 chars for 64 bit value + 5 (opp:\0) */ 133 + char name[36]; /* "opp:"(4) + u64(20) + "-" (1) + u32(10) + NULL(1) */ 135 134 136 135 /* 137 136 * Get directory name for OPP. 138 137 * 139 - * - Normally rate is unique to each OPP, use it to get unique opp-name. 138 + * - Normally rate is unique to each OPP, use it to get unique opp-name, 139 + * together with performance level if available. 140 140 * - For some devices rate isn't available or there are multiple, use 141 141 * index instead for them. 142 142 */ 143 - if (likely(opp_table->clk_count == 1 && opp->rates[0])) 144 - id = opp->rates[0]; 145 - else 146 - id = _get_opp_count(opp_table); 147 - 148 - snprintf(name, sizeof(name), "opp:%lu", id); 143 + if (likely(opp_table->clk_count == 1 && opp->rates[0])) { 144 + if (opp->level == OPP_LEVEL_UNSET) 145 + snprintf(name, sizeof(name), "opp:%lu", opp->rates[0]); 146 + else 147 + snprintf(name, sizeof(name), "opp:%lu-%u", opp->rates[0], opp->level); 148 + } else { 149 + snprintf(name, sizeof(name), "opp:%u", _get_opp_count(opp_table)); 150 + } 149 151 150 152 /* Create per-opp directory */ 151 153 d = debugfs_create_dir(name, pdentry);
+41 -522
drivers/powercap/intel_rapl_common.c
··· 24 24 #include <linux/suspend.h> 25 25 #include <linux/sysfs.h> 26 26 #include <linux/types.h> 27 + #include <linux/units.h> 27 28 28 29 #include <asm/cpu_device_id.h> 29 30 #include <asm/intel-family.h> 30 - #include <asm/iosf_mbi.h> 31 31 #include <asm/msr.h> 32 32 33 - /* bitmasks for RAPL MSRs, used by primitive access functions */ 34 - #define ENERGY_STATUS_MASK 0xffffffff 33 + #define ENERGY_STATUS_MASK GENMASK(31, 0) 35 34 36 - #define POWER_LIMIT1_MASK 0x7FFF 37 - #define POWER_LIMIT1_ENABLE BIT(15) 38 - #define POWER_LIMIT1_CLAMP BIT(16) 35 + #define POWER_UNIT_OFFSET 0x00 36 + #define POWER_UNIT_MASK GENMASK(3, 0) 39 37 40 - #define POWER_LIMIT2_MASK (0x7FFFULL<<32) 41 - #define POWER_LIMIT2_ENABLE BIT_ULL(47) 42 - #define POWER_LIMIT2_CLAMP BIT_ULL(48) 43 - #define POWER_HIGH_LOCK BIT_ULL(63) 44 - #define POWER_LOW_LOCK BIT(31) 38 + #define ENERGY_UNIT_OFFSET 0x08 39 + #define ENERGY_UNIT_MASK GENMASK(12, 8) 45 40 46 - #define POWER_LIMIT4_MASK 0x1FFF 47 - 48 - #define TIME_WINDOW1_MASK (0x7FULL<<17) 49 - #define TIME_WINDOW2_MASK (0x7FULL<<49) 50 - 51 - #define POWER_UNIT_OFFSET 0 52 - #define POWER_UNIT_MASK 0x0F 53 - 54 - #define ENERGY_UNIT_OFFSET 0x08 55 - #define ENERGY_UNIT_MASK 0x1F00 56 - 57 - #define TIME_UNIT_OFFSET 0x10 58 - #define TIME_UNIT_MASK 0xF0000 59 - 60 - #define POWER_INFO_MAX_MASK (0x7fffULL<<32) 61 - #define POWER_INFO_MIN_MASK (0x7fffULL<<16) 62 - #define POWER_INFO_MAX_TIME_WIN_MASK (0x3fULL<<48) 63 - #define POWER_INFO_THERMAL_SPEC_MASK 0x7fff 64 - 65 - #define PERF_STATUS_THROTTLE_TIME_MASK 0xffffffff 66 - #define PP_POLICY_MASK 0x1F 67 - 68 - /* 69 - * SPR has different layout for Psys Domain PowerLimit registers. 70 - * There are 17 bits of PL1 and PL2 instead of 15 bits. 71 - * The Enable bits and TimeWindow bits are also shifted as a result. 72 - */ 73 - #define PSYS_POWER_LIMIT1_MASK 0x1FFFF 74 - #define PSYS_POWER_LIMIT1_ENABLE BIT(17) 75 - 76 - #define PSYS_POWER_LIMIT2_MASK (0x1FFFFULL<<32) 77 - #define PSYS_POWER_LIMIT2_ENABLE BIT_ULL(49) 78 - 79 - #define PSYS_TIME_WINDOW1_MASK (0x7FULL<<19) 80 - #define PSYS_TIME_WINDOW2_MASK (0x7FULL<<51) 81 - 82 - /* bitmasks for RAPL TPMI, used by primitive access functions */ 83 - #define TPMI_POWER_LIMIT_MASK 0x3FFFF 84 - #define TPMI_POWER_LIMIT_ENABLE BIT_ULL(62) 85 - #define TPMI_TIME_WINDOW_MASK (0x7FULL<<18) 86 - #define TPMI_INFO_SPEC_MASK 0x3FFFF 87 - #define TPMI_INFO_MIN_MASK (0x3FFFFULL << 18) 88 - #define TPMI_INFO_MAX_MASK (0x3FFFFULL << 36) 89 - #define TPMI_INFO_MAX_TIME_WIN_MASK (0x7FULL << 54) 41 + #define TIME_UNIT_OFFSET 0x10 42 + #define TIME_UNIT_MASK GENMASK(19, 16) 90 43 91 44 /* Non HW constants */ 92 - #define RAPL_PRIMITIVE_DERIVED BIT(1) /* not from raw data */ 93 - #define RAPL_PRIMITIVE_DUMMY BIT(2) 45 + #define RAPL_PRIMITIVE_DUMMY BIT(2) 94 46 95 - #define TIME_WINDOW_MAX_MSEC 40000 96 - #define TIME_WINDOW_MIN_MSEC 250 97 - #define ENERGY_UNIT_SCALE 1000 /* scale from driver unit to powercap unit */ 98 - enum unit_type { 99 - ARBITRARY_UNIT, /* no translation */ 100 - POWER_UNIT, 101 - ENERGY_UNIT, 102 - TIME_UNIT, 103 - }; 47 + #define ENERGY_UNIT_SCALE 1000 /* scale from driver unit to powercap unit */ 104 48 105 49 /* per domain data, some are optional */ 106 - #define NR_RAW_PRIMITIVES (NR_RAPL_PRIMITIVES - 2) 50 + #define NR_RAW_PRIMITIVES (NR_RAPL_PRIMITIVES - 2) 107 51 108 - #define DOMAIN_STATE_INACTIVE BIT(0) 109 - #define DOMAIN_STATE_POWER_LIMIT_SET BIT(1) 52 + #define PACKAGE_PLN_INT_SAVED BIT(0) 53 + 54 + #define RAPL_EVENT_MASK GENMASK(7, 0) 110 55 111 56 static const char *pl_names[NR_POWER_LIMITS] = { 112 57 [POWER_LIMIT1] = "long_term", ··· 149 204 #define power_zone_to_rapl_domain(_zone) \ 150 205 container_of(_zone, struct rapl_domain, power_zone) 151 206 152 - struct rapl_defaults { 153 - u8 floor_freq_reg_addr; 154 - int (*check_unit)(struct rapl_domain *rd); 155 - void (*set_floor_freq)(struct rapl_domain *rd, bool mode); 156 - u64 (*compute_time_window)(struct rapl_domain *rd, u64 val, 157 - bool to_raw); 158 - unsigned int dram_domain_energy_unit; 159 - unsigned int psys_domain_energy_unit; 160 - bool spr_psys_bits; 161 - }; 162 - static struct rapl_defaults *defaults_msr; 163 - static const struct rapl_defaults defaults_tpmi; 164 - 165 - static struct rapl_defaults *get_defaults(struct rapl_package *rp) 207 + static const struct rapl_defaults *get_defaults(struct rapl_package *rp) 166 208 { 167 209 return rp->priv->defaults; 168 210 } 169 - 170 - /* Sideband MBI registers */ 171 - #define IOSF_CPU_POWER_BUDGET_CTL_BYT (0x2) 172 - #define IOSF_CPU_POWER_BUDGET_CTL_TNG (0xdf) 173 - 174 - #define PACKAGE_PLN_INT_SAVED BIT(0) 175 - #define MAX_PRIM_NAME (32) 176 - 177 - /* per domain data. used to describe individual knobs such that access function 178 - * can be consolidated into one instead of many inline functions. 179 - */ 180 - struct rapl_primitive_info { 181 - const char *name; 182 - u64 mask; 183 - int shift; 184 - enum rapl_domain_reg_id id; 185 - enum unit_type unit; 186 - u32 flag; 187 - }; 188 - 189 - #define PRIMITIVE_INFO_INIT(p, m, s, i, u, f) { \ 190 - .name = #p, \ 191 - .mask = m, \ 192 - .shift = s, \ 193 - .id = i, \ 194 - .unit = u, \ 195 - .flag = f \ 196 - } 197 211 198 212 static void rapl_init_domains(struct rapl_package *rp); 199 213 static int rapl_read_data_raw(struct rapl_domain *rd, ··· 245 341 static int set_domain_enable(struct powercap_zone *power_zone, bool mode) 246 342 { 247 343 struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone); 248 - struct rapl_defaults *defaults = get_defaults(rd->rp); 344 + const struct rapl_defaults *defaults = get_defaults(rd->rp); 249 345 u64 val; 250 346 int ret; 251 347 ··· 534 630 u64 value, int to_raw) 535 631 { 536 632 u64 units = 1; 537 - struct rapl_defaults *defaults = get_defaults(rd->rp); 633 + const struct rapl_defaults *defaults = get_defaults(rd->rp); 538 634 u64 scale = 1; 539 635 540 636 switch (type) { ··· 560 656 return div64_u64(value, scale); 561 657 } 562 658 563 - /* RAPL primitives for MSR and MMIO I/F */ 564 - static struct rapl_primitive_info rpi_msr[NR_RAPL_PRIMITIVES] = { 565 - /* name, mask, shift, msr index, unit divisor */ 566 - [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0, 567 - RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 568 - [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32, 569 - RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 570 - [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0, 571 - RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), 572 - [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, 573 - RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), 574 - [FW_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31, 575 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 576 - [FW_HIGH_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_HIGH_LOCK, 63, 577 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 578 - [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15, 579 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 580 - [PL1_CLAMP] = PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16, 581 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 582 - [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47, 583 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 584 - [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48, 585 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 586 - [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17, 587 - RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 588 - [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49, 589 - RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 590 - [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, POWER_INFO_THERMAL_SPEC_MASK, 591 - 0, RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 592 - [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32, 593 - RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 594 - [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16, 595 - RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 596 - [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, POWER_INFO_MAX_TIME_WIN_MASK, 48, 597 - RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0), 598 - [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0, 599 - RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), 600 - [PRIORITY_LEVEL] = PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0, 601 - RAPL_DOMAIN_REG_POLICY, ARBITRARY_UNIT, 0), 602 - [PSYS_POWER_LIMIT1] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0, 603 - RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 604 - [PSYS_POWER_LIMIT2] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 32, 605 - RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 606 - [PSYS_PL1_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 17, 607 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 608 - [PSYS_PL2_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 49, 609 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 610 - [PSYS_TIME_WINDOW1] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 19, 611 - RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 612 - [PSYS_TIME_WINDOW2] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 51, 613 - RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 614 - /* non-hardware */ 615 - [AVERAGE_POWER] = PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT, 616 - RAPL_PRIMITIVE_DERIVED), 617 - }; 618 - 619 - /* RAPL primitives for TPMI I/F */ 620 - static struct rapl_primitive_info rpi_tpmi[NR_RAPL_PRIMITIVES] = { 621 - /* name, mask, shift, msr index, unit divisor */ 622 - [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, TPMI_POWER_LIMIT_MASK, 0, 623 - RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 624 - [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, TPMI_POWER_LIMIT_MASK, 0, 625 - RAPL_DOMAIN_REG_PL2, POWER_UNIT, 0), 626 - [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, TPMI_POWER_LIMIT_MASK, 0, 627 - RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), 628 - [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, 629 - RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), 630 - [PL1_LOCK] = PRIMITIVE_INFO_INIT(PL1_LOCK, POWER_HIGH_LOCK, 63, 631 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 632 - [PL2_LOCK] = PRIMITIVE_INFO_INIT(PL2_LOCK, POWER_HIGH_LOCK, 63, 633 - RAPL_DOMAIN_REG_PL2, ARBITRARY_UNIT, 0), 634 - [PL4_LOCK] = PRIMITIVE_INFO_INIT(PL4_LOCK, POWER_HIGH_LOCK, 63, 635 - RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0), 636 - [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62, 637 - RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 638 - [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62, 639 - RAPL_DOMAIN_REG_PL2, ARBITRARY_UNIT, 0), 640 - [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62, 641 - RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0), 642 - [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TPMI_TIME_WINDOW_MASK, 18, 643 - RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 644 - [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TPMI_TIME_WINDOW_MASK, 18, 645 - RAPL_DOMAIN_REG_PL2, TIME_UNIT, 0), 646 - [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, TPMI_INFO_SPEC_MASK, 0, 647 - RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 648 - [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, TPMI_INFO_MAX_MASK, 36, 649 - RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 650 - [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, TPMI_INFO_MIN_MASK, 18, 651 - RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 652 - [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, TPMI_INFO_MAX_TIME_WIN_MASK, 54, 653 - RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0), 654 - [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0, 655 - RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), 656 - /* non-hardware */ 657 - [AVERAGE_POWER] = PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, 658 - POWER_UNIT, RAPL_PRIMITIVE_DERIVED), 659 - }; 660 - 661 659 static struct rapl_primitive_info *get_rpi(struct rapl_package *rp, int prim) 662 660 { 663 661 struct rapl_primitive_info *rpi = rp->priv->rpi; ··· 572 766 573 767 static int rapl_config(struct rapl_package *rp) 574 768 { 575 - switch (rp->priv->type) { 576 - /* MMIO I/F shares the same register layout as MSR registers */ 577 - case RAPL_IF_MMIO: 578 - case RAPL_IF_MSR: 579 - rp->priv->defaults = (void *)defaults_msr; 580 - rp->priv->rpi = (void *)rpi_msr; 581 - break; 582 - case RAPL_IF_TPMI: 583 - rp->priv->defaults = (void *)&defaults_tpmi; 584 - rp->priv->rpi = (void *)rpi_tpmi; 585 - break; 586 - default: 587 - return -EINVAL; 588 - } 589 - 590 769 /* defaults_msr can be NULL on unsupported platforms */ 591 770 if (!rp->priv->defaults || !rp->priv->rpi) 592 771 return -ENODEV; ··· 582 791 static enum rapl_primitives 583 792 prim_fixups(struct rapl_domain *rd, enum rapl_primitives prim) 584 793 { 585 - struct rapl_defaults *defaults = get_defaults(rd->rp); 794 + const struct rapl_defaults *defaults = get_defaults(rd->rp); 586 795 587 796 if (!defaults->spr_psys_bits) 588 797 return prim; ··· 636 845 ra.reg = rd->regs[rpi->id]; 637 846 if (!ra.reg.val) 638 847 return -EINVAL; 639 - 640 - /* non-hardware data are collected by the polling thread */ 641 - if (rpi->flag & RAPL_PRIMITIVE_DERIVED) { 642 - *data = rd->rdd.primitives[prim]; 643 - return 0; 644 - } 645 848 646 849 ra.mask = rpi->mask; 647 850 ··· 721 936 * power unit : microWatts : Represented in milliWatts by default 722 937 * time unit : microseconds: Represented in seconds by default 723 938 */ 724 - static int rapl_check_unit_core(struct rapl_domain *rd) 939 + int rapl_default_check_unit(struct rapl_domain *rd) 725 940 { 726 941 struct reg_action ra; 727 942 u32 value; ··· 735 950 } 736 951 737 952 value = (ra.value & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET; 738 - rd->energy_unit = ENERGY_UNIT_SCALE * 1000000 / (1 << value); 953 + rd->energy_unit = (ENERGY_UNIT_SCALE * MICROJOULE_PER_JOULE) >> value; 739 954 740 955 value = (ra.value & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET; 741 - rd->power_unit = 1000000 / (1 << value); 956 + rd->power_unit = MICROWATT_PER_WATT >> value; 742 957 743 958 value = (ra.value & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET; 744 - rd->time_unit = 1000000 / (1 << value); 959 + rd->time_unit = USEC_PER_SEC >> value; 745 960 746 961 pr_debug("Core CPU %s:%s energy=%dpJ, time=%dus, power=%duW\n", 747 962 rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit); 748 963 749 964 return 0; 750 965 } 751 - 752 - static int rapl_check_unit_atom(struct rapl_domain *rd) 753 - { 754 - struct reg_action ra; 755 - u32 value; 756 - 757 - ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 758 - ra.mask = ~0; 759 - if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra, false)) { 760 - pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 761 - ra.reg.val, rd->rp->name, rd->name); 762 - return -ENODEV; 763 - } 764 - 765 - value = (ra.value & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET; 766 - rd->energy_unit = ENERGY_UNIT_SCALE * 1 << value; 767 - 768 - value = (ra.value & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET; 769 - rd->power_unit = (1 << value) * 1000; 770 - 771 - value = (ra.value & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET; 772 - rd->time_unit = 1000000 / (1 << value); 773 - 774 - pr_debug("Atom %s:%s energy=%dpJ, time=%dus, power=%duW\n", 775 - rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit); 776 - 777 - return 0; 778 - } 966 + EXPORT_SYMBOL_NS_GPL(rapl_default_check_unit, "INTEL_RAPL"); 779 967 780 968 static void power_limit_irq_save_cpu(void *info) 781 969 { ··· 814 1056 wrmsr_safe(MSR_IA32_PACKAGE_THERM_INTERRUPT, l, h); 815 1057 } 816 1058 817 - static void set_floor_freq_default(struct rapl_domain *rd, bool mode) 1059 + void rapl_default_set_floor_freq(struct rapl_domain *rd, bool mode) 818 1060 { 819 1061 int i; 820 1062 ··· 828 1070 rapl_write_pl_data(rd, i, PL_CLAMP, mode); 829 1071 } 830 1072 } 1073 + EXPORT_SYMBOL_NS_GPL(rapl_default_set_floor_freq, "INTEL_RAPL"); 831 1074 832 - static void set_floor_freq_atom(struct rapl_domain *rd, bool enable) 833 - { 834 - static u32 power_ctrl_orig_val; 835 - struct rapl_defaults *defaults = get_defaults(rd->rp); 836 - u32 mdata; 837 - 838 - if (!defaults->floor_freq_reg_addr) { 839 - pr_err("Invalid floor frequency config register\n"); 840 - return; 841 - } 842 - 843 - if (!power_ctrl_orig_val) 844 - iosf_mbi_read(BT_MBI_UNIT_PMC, MBI_CR_READ, 845 - defaults->floor_freq_reg_addr, 846 - &power_ctrl_orig_val); 847 - mdata = power_ctrl_orig_val; 848 - if (enable) { 849 - mdata &= ~(0x7f << 8); 850 - mdata |= 1 << 8; 851 - } 852 - iosf_mbi_write(BT_MBI_UNIT_PMC, MBI_CR_WRITE, 853 - defaults->floor_freq_reg_addr, mdata); 854 - } 855 - 856 - static u64 rapl_compute_time_window_core(struct rapl_domain *rd, u64 value, 857 - bool to_raw) 1075 + u64 rapl_default_compute_time_window(struct rapl_domain *rd, u64 value, bool to_raw) 858 1076 { 859 1077 u64 f, y; /* fraction and exp. used for time unit */ 860 1078 ··· 841 1107 if (!to_raw) { 842 1108 f = (value & 0x60) >> 5; 843 1109 y = value & 0x1f; 844 - value = (1 << y) * (4 + f) * rd->time_unit / 4; 1110 + value = (1ULL << y) * (4 + f) * rd->time_unit / 4; 845 1111 } else { 846 1112 if (value < rd->time_unit) 847 1113 return 0; ··· 856 1122 if (y > 0x1f) 857 1123 return 0x7f; 858 1124 859 - f = div64_u64(4 * (value - (1ULL << y)), 1ULL << y); 1125 + f = div64_u64(4 * (value - BIT_ULL(y)), BIT_ULL(y)); 860 1126 value = (y & 0x1f) | ((f & 0x3) << 5); 861 1127 } 862 1128 return value; 863 1129 } 864 - 865 - static u64 rapl_compute_time_window_atom(struct rapl_domain *rd, u64 value, 866 - bool to_raw) 867 - { 868 - /* 869 - * Atom time unit encoding is straight forward val * time_unit, 870 - * where time_unit is default to 1 sec. Never 0. 871 - */ 872 - if (!to_raw) 873 - return (value) ? value * rd->time_unit : rd->time_unit; 874 - 875 - value = div64_u64(value, rd->time_unit); 876 - 877 - return value; 878 - } 879 - 880 - /* TPMI Unit register has different layout */ 881 - #define TPMI_POWER_UNIT_OFFSET POWER_UNIT_OFFSET 882 - #define TPMI_POWER_UNIT_MASK POWER_UNIT_MASK 883 - #define TPMI_ENERGY_UNIT_OFFSET 0x06 884 - #define TPMI_ENERGY_UNIT_MASK 0x7C0 885 - #define TPMI_TIME_UNIT_OFFSET 0x0C 886 - #define TPMI_TIME_UNIT_MASK 0xF000 887 - 888 - static int rapl_check_unit_tpmi(struct rapl_domain *rd) 889 - { 890 - struct reg_action ra; 891 - u32 value; 892 - 893 - ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 894 - ra.mask = ~0; 895 - if (rd->rp->priv->read_raw(get_rid(rd->rp), &ra, false)) { 896 - pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 897 - ra.reg.val, rd->rp->name, rd->name); 898 - return -ENODEV; 899 - } 900 - 901 - value = (ra.value & TPMI_ENERGY_UNIT_MASK) >> TPMI_ENERGY_UNIT_OFFSET; 902 - rd->energy_unit = ENERGY_UNIT_SCALE * 1000000 / (1 << value); 903 - 904 - value = (ra.value & TPMI_POWER_UNIT_MASK) >> TPMI_POWER_UNIT_OFFSET; 905 - rd->power_unit = 1000000 / (1 << value); 906 - 907 - value = (ra.value & TPMI_TIME_UNIT_MASK) >> TPMI_TIME_UNIT_OFFSET; 908 - rd->time_unit = 1000000 / (1 << value); 909 - 910 - pr_debug("Core CPU %s:%s energy=%dpJ, time=%dus, power=%duW\n", 911 - rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit); 912 - 913 - return 0; 914 - } 915 - 916 - static const struct rapl_defaults defaults_tpmi = { 917 - .check_unit = rapl_check_unit_tpmi, 918 - /* Reuse existing logic, ignore the PL_CLAMP failures and enable all Power Limits */ 919 - .set_floor_freq = set_floor_freq_default, 920 - .compute_time_window = rapl_compute_time_window_core, 921 - }; 922 - 923 - static const struct rapl_defaults rapl_defaults_core = { 924 - .floor_freq_reg_addr = 0, 925 - .check_unit = rapl_check_unit_core, 926 - .set_floor_freq = set_floor_freq_default, 927 - .compute_time_window = rapl_compute_time_window_core, 928 - }; 929 - 930 - static const struct rapl_defaults rapl_defaults_hsw_server = { 931 - .check_unit = rapl_check_unit_core, 932 - .set_floor_freq = set_floor_freq_default, 933 - .compute_time_window = rapl_compute_time_window_core, 934 - .dram_domain_energy_unit = 15300, 935 - }; 936 - 937 - static const struct rapl_defaults rapl_defaults_spr_server = { 938 - .check_unit = rapl_check_unit_core, 939 - .set_floor_freq = set_floor_freq_default, 940 - .compute_time_window = rapl_compute_time_window_core, 941 - .psys_domain_energy_unit = 1000000000, 942 - .spr_psys_bits = true, 943 - }; 944 - 945 - static const struct rapl_defaults rapl_defaults_byt = { 946 - .floor_freq_reg_addr = IOSF_CPU_POWER_BUDGET_CTL_BYT, 947 - .check_unit = rapl_check_unit_atom, 948 - .set_floor_freq = set_floor_freq_atom, 949 - .compute_time_window = rapl_compute_time_window_atom, 950 - }; 951 - 952 - static const struct rapl_defaults rapl_defaults_tng = { 953 - .floor_freq_reg_addr = IOSF_CPU_POWER_BUDGET_CTL_TNG, 954 - .check_unit = rapl_check_unit_atom, 955 - .set_floor_freq = set_floor_freq_atom, 956 - .compute_time_window = rapl_compute_time_window_atom, 957 - }; 958 - 959 - static const struct rapl_defaults rapl_defaults_ann = { 960 - .floor_freq_reg_addr = 0, 961 - .check_unit = rapl_check_unit_atom, 962 - .set_floor_freq = NULL, 963 - .compute_time_window = rapl_compute_time_window_atom, 964 - }; 965 - 966 - static const struct rapl_defaults rapl_defaults_cht = { 967 - .floor_freq_reg_addr = 0, 968 - .check_unit = rapl_check_unit_atom, 969 - .set_floor_freq = NULL, 970 - .compute_time_window = rapl_compute_time_window_atom, 971 - }; 972 - 973 - static const struct rapl_defaults rapl_defaults_amd = { 974 - .check_unit = rapl_check_unit_core, 975 - }; 976 - 977 - static const struct x86_cpu_id rapl_ids[] __initconst = { 978 - X86_MATCH_VFM(INTEL_SANDYBRIDGE, &rapl_defaults_core), 979 - X86_MATCH_VFM(INTEL_SANDYBRIDGE_X, &rapl_defaults_core), 980 - 981 - X86_MATCH_VFM(INTEL_IVYBRIDGE, &rapl_defaults_core), 982 - X86_MATCH_VFM(INTEL_IVYBRIDGE_X, &rapl_defaults_core), 983 - 984 - X86_MATCH_VFM(INTEL_HASWELL, &rapl_defaults_core), 985 - X86_MATCH_VFM(INTEL_HASWELL_L, &rapl_defaults_core), 986 - X86_MATCH_VFM(INTEL_HASWELL_G, &rapl_defaults_core), 987 - X86_MATCH_VFM(INTEL_HASWELL_X, &rapl_defaults_hsw_server), 988 - 989 - X86_MATCH_VFM(INTEL_BROADWELL, &rapl_defaults_core), 990 - X86_MATCH_VFM(INTEL_BROADWELL_G, &rapl_defaults_core), 991 - X86_MATCH_VFM(INTEL_BROADWELL_D, &rapl_defaults_core), 992 - X86_MATCH_VFM(INTEL_BROADWELL_X, &rapl_defaults_hsw_server), 993 - 994 - X86_MATCH_VFM(INTEL_SKYLAKE, &rapl_defaults_core), 995 - X86_MATCH_VFM(INTEL_SKYLAKE_L, &rapl_defaults_core), 996 - X86_MATCH_VFM(INTEL_SKYLAKE_X, &rapl_defaults_hsw_server), 997 - X86_MATCH_VFM(INTEL_KABYLAKE_L, &rapl_defaults_core), 998 - X86_MATCH_VFM(INTEL_KABYLAKE, &rapl_defaults_core), 999 - X86_MATCH_VFM(INTEL_CANNONLAKE_L, &rapl_defaults_core), 1000 - X86_MATCH_VFM(INTEL_ICELAKE_L, &rapl_defaults_core), 1001 - X86_MATCH_VFM(INTEL_ICELAKE, &rapl_defaults_core), 1002 - X86_MATCH_VFM(INTEL_ICELAKE_NNPI, &rapl_defaults_core), 1003 - X86_MATCH_VFM(INTEL_ICELAKE_X, &rapl_defaults_hsw_server), 1004 - X86_MATCH_VFM(INTEL_ICELAKE_D, &rapl_defaults_hsw_server), 1005 - X86_MATCH_VFM(INTEL_COMETLAKE_L, &rapl_defaults_core), 1006 - X86_MATCH_VFM(INTEL_COMETLAKE, &rapl_defaults_core), 1007 - X86_MATCH_VFM(INTEL_TIGERLAKE_L, &rapl_defaults_core), 1008 - X86_MATCH_VFM(INTEL_TIGERLAKE, &rapl_defaults_core), 1009 - X86_MATCH_VFM(INTEL_ROCKETLAKE, &rapl_defaults_core), 1010 - X86_MATCH_VFM(INTEL_ALDERLAKE, &rapl_defaults_core), 1011 - X86_MATCH_VFM(INTEL_ALDERLAKE_L, &rapl_defaults_core), 1012 - X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, &rapl_defaults_core), 1013 - X86_MATCH_VFM(INTEL_RAPTORLAKE, &rapl_defaults_core), 1014 - X86_MATCH_VFM(INTEL_RAPTORLAKE_P, &rapl_defaults_core), 1015 - X86_MATCH_VFM(INTEL_RAPTORLAKE_S, &rapl_defaults_core), 1016 - X86_MATCH_VFM(INTEL_BARTLETTLAKE, &rapl_defaults_core), 1017 - X86_MATCH_VFM(INTEL_METEORLAKE, &rapl_defaults_core), 1018 - X86_MATCH_VFM(INTEL_METEORLAKE_L, &rapl_defaults_core), 1019 - X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, &rapl_defaults_spr_server), 1020 - X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, &rapl_defaults_spr_server), 1021 - X86_MATCH_VFM(INTEL_LUNARLAKE_M, &rapl_defaults_core), 1022 - X86_MATCH_VFM(INTEL_PANTHERLAKE_L, &rapl_defaults_core), 1023 - X86_MATCH_VFM(INTEL_WILDCATLAKE_L, &rapl_defaults_core), 1024 - X86_MATCH_VFM(INTEL_NOVALAKE, &rapl_defaults_core), 1025 - X86_MATCH_VFM(INTEL_NOVALAKE_L, &rapl_defaults_core), 1026 - X86_MATCH_VFM(INTEL_ARROWLAKE_H, &rapl_defaults_core), 1027 - X86_MATCH_VFM(INTEL_ARROWLAKE, &rapl_defaults_core), 1028 - X86_MATCH_VFM(INTEL_ARROWLAKE_U, &rapl_defaults_core), 1029 - X86_MATCH_VFM(INTEL_LAKEFIELD, &rapl_defaults_core), 1030 - 1031 - X86_MATCH_VFM(INTEL_ATOM_SILVERMONT, &rapl_defaults_byt), 1032 - X86_MATCH_VFM(INTEL_ATOM_AIRMONT, &rapl_defaults_cht), 1033 - X86_MATCH_VFM(INTEL_ATOM_SILVERMONT_MID, &rapl_defaults_tng), 1034 - X86_MATCH_VFM(INTEL_ATOM_SILVERMONT_MID2,&rapl_defaults_ann), 1035 - X86_MATCH_VFM(INTEL_ATOM_GOLDMONT, &rapl_defaults_core), 1036 - X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_PLUS, &rapl_defaults_core), 1037 - X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_D, &rapl_defaults_core), 1038 - X86_MATCH_VFM(INTEL_ATOM_TREMONT, &rapl_defaults_core), 1039 - X86_MATCH_VFM(INTEL_ATOM_TREMONT_D, &rapl_defaults_core), 1040 - X86_MATCH_VFM(INTEL_ATOM_TREMONT_L, &rapl_defaults_core), 1041 - 1042 - X86_MATCH_VFM(INTEL_XEON_PHI_KNL, &rapl_defaults_hsw_server), 1043 - X86_MATCH_VFM(INTEL_XEON_PHI_KNM, &rapl_defaults_hsw_server), 1044 - 1045 - X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd), 1046 - X86_MATCH_VENDOR_FAM(AMD, 0x19, &rapl_defaults_amd), 1047 - X86_MATCH_VENDOR_FAM(AMD, 0x1A, &rapl_defaults_amd), 1048 - X86_MATCH_VENDOR_FAM(HYGON, 0x18, &rapl_defaults_amd), 1049 - {} 1050 - }; 1051 - MODULE_DEVICE_TABLE(x86cpu, rapl_ids); 1130 + EXPORT_SYMBOL_NS_GPL(rapl_default_compute_time_window, "INTEL_RAPL"); 1052 1131 1053 1132 /* Read once for all raw primitive data for domains */ 1054 1133 static void rapl_update_domain_data(struct rapl_package *rp) ··· 990 1443 */ 991 1444 static int rapl_get_domain_unit(struct rapl_domain *rd) 992 1445 { 993 - struct rapl_defaults *defaults = get_defaults(rd->rp); 1446 + const struct rapl_defaults *defaults = get_defaults(rd->rp); 994 1447 int ret; 995 1448 996 1449 if (!rd->regs[RAPL_DOMAIN_REG_UNIT].val) { ··· 1324 1777 PERF_RAPL_PSYS, /* psys */ 1325 1778 PERF_RAPL_MAX 1326 1779 }; 1327 - #define RAPL_EVENT_MASK GENMASK(7, 0) 1328 1780 1329 1781 static const int event_to_domain[PERF_RAPL_MAX] = { 1330 1782 [PERF_RAPL_PP0] = RAPL_DOMAIN_PP0, ··· 1629 2083 1630 2084 return rapl_pmu_update(rp); 1631 2085 } 1632 - EXPORT_SYMBOL_GPL(rapl_package_add_pmu_locked); 2086 + EXPORT_SYMBOL_NS_GPL(rapl_package_add_pmu_locked, "INTEL_RAPL"); 1633 2087 1634 2088 int rapl_package_add_pmu(struct rapl_package *rp) 1635 2089 { ··· 1637 2091 1638 2092 return rapl_package_add_pmu_locked(rp); 1639 2093 } 1640 - EXPORT_SYMBOL_GPL(rapl_package_add_pmu); 2094 + EXPORT_SYMBOL_NS_GPL(rapl_package_add_pmu, "INTEL_RAPL"); 1641 2095 1642 2096 void rapl_package_remove_pmu_locked(struct rapl_package *rp) 1643 2097 { ··· 1655 2109 perf_pmu_unregister(&rapl_pmu.pmu); 1656 2110 memset(&rapl_pmu, 0, sizeof(struct rapl_pmu)); 1657 2111 } 1658 - EXPORT_SYMBOL_GPL(rapl_package_remove_pmu_locked); 2112 + EXPORT_SYMBOL_NS_GPL(rapl_package_remove_pmu_locked, "INTEL_RAPL"); 1659 2113 1660 2114 void rapl_package_remove_pmu(struct rapl_package *rp) 1661 2115 { ··· 1663 2117 1664 2118 rapl_package_remove_pmu_locked(rp); 1665 2119 } 1666 - EXPORT_SYMBOL_GPL(rapl_package_remove_pmu); 2120 + EXPORT_SYMBOL_NS_GPL(rapl_package_remove_pmu, "INTEL_RAPL"); 1667 2121 #endif 1668 2122 1669 2123 /* called from CPU hotplug notifier, hotplug lock held */ ··· 1696 2150 list_del(&rp->plist); 1697 2151 kfree(rp); 1698 2152 } 1699 - EXPORT_SYMBOL_GPL(rapl_remove_package_cpuslocked); 2153 + EXPORT_SYMBOL_NS_GPL(rapl_remove_package_cpuslocked, "INTEL_RAPL"); 1700 2154 1701 2155 void rapl_remove_package(struct rapl_package *rp) 1702 2156 { 1703 2157 guard(cpus_read_lock)(); 1704 2158 rapl_remove_package_cpuslocked(rp); 1705 2159 } 1706 - EXPORT_SYMBOL_GPL(rapl_remove_package); 2160 + EXPORT_SYMBOL_NS_GPL(rapl_remove_package, "INTEL_RAPL"); 1707 2161 1708 2162 /* 1709 2163 * RAPL Package energy counter scope: ··· 1746 2200 1747 2201 return NULL; 1748 2202 } 1749 - EXPORT_SYMBOL_GPL(rapl_find_package_domain_cpuslocked); 2203 + EXPORT_SYMBOL_NS_GPL(rapl_find_package_domain_cpuslocked, "INTEL_RAPL"); 1750 2204 1751 2205 struct rapl_package *rapl_find_package_domain(int id, struct rapl_if_priv *priv, bool id_is_cpu) 1752 2206 { 1753 2207 guard(cpus_read_lock)(); 1754 2208 return rapl_find_package_domain_cpuslocked(id, priv, id_is_cpu); 1755 2209 } 1756 - EXPORT_SYMBOL_GPL(rapl_find_package_domain); 2210 + EXPORT_SYMBOL_NS_GPL(rapl_find_package_domain, "INTEL_RAPL"); 1757 2211 1758 2212 /* called from CPU hotplug notifier, hotplug lock held */ 1759 2213 struct rapl_package *rapl_add_package_cpuslocked(int id, struct rapl_if_priv *priv, bool id_is_cpu) ··· 1807 2261 kfree(rp); 1808 2262 return ERR_PTR(ret); 1809 2263 } 1810 - EXPORT_SYMBOL_GPL(rapl_add_package_cpuslocked); 2264 + EXPORT_SYMBOL_NS_GPL(rapl_add_package_cpuslocked, "INTEL_RAPL"); 1811 2265 1812 2266 struct rapl_package *rapl_add_package(int id, struct rapl_if_priv *priv, bool id_is_cpu) 1813 2267 { 1814 2268 guard(cpus_read_lock)(); 1815 2269 return rapl_add_package_cpuslocked(id, priv, id_is_cpu); 1816 2270 } 1817 - EXPORT_SYMBOL_GPL(rapl_add_package); 2271 + EXPORT_SYMBOL_NS_GPL(rapl_add_package, "INTEL_RAPL"); 1818 2272 1819 2273 static void power_limit_state_save(void) 1820 2274 { ··· 1874 2328 .notifier_call = rapl_pm_callback, 1875 2329 }; 1876 2330 1877 - static struct platform_device *rapl_msr_platdev; 1878 - 1879 2331 static int __init rapl_init(void) 1880 2332 { 1881 - const struct x86_cpu_id *id; 1882 - int ret; 1883 - 1884 - id = x86_match_cpu(rapl_ids); 1885 - if (id) { 1886 - defaults_msr = (struct rapl_defaults *)id->driver_data; 1887 - 1888 - rapl_msr_platdev = platform_device_alloc("intel_rapl_msr", 0); 1889 - if (!rapl_msr_platdev) 1890 - return -ENOMEM; 1891 - 1892 - ret = platform_device_add(rapl_msr_platdev); 1893 - if (ret) { 1894 - platform_device_put(rapl_msr_platdev); 1895 - return ret; 1896 - } 1897 - } 1898 - 1899 - ret = register_pm_notifier(&rapl_pm_notifier); 1900 - if (ret && rapl_msr_platdev) { 1901 - platform_device_del(rapl_msr_platdev); 1902 - platform_device_put(rapl_msr_platdev); 1903 - } 1904 - 1905 - return ret; 2333 + return register_pm_notifier(&rapl_pm_notifier); 1906 2334 } 1907 2335 1908 2336 static void __exit rapl_exit(void) 1909 2337 { 1910 - platform_device_unregister(rapl_msr_platdev); 1911 2338 unregister_pm_notifier(&rapl_pm_notifier); 1912 2339 } 1913 2340
+365 -28
drivers/powercap/intel_rapl_msr.c
··· 21 21 #include <linux/intel_rapl.h> 22 22 #include <linux/processor.h> 23 23 #include <linux/platform_device.h> 24 + #include <linux/units.h> 25 + #include <linux/bits.h> 24 26 25 27 #include <asm/cpu_device_id.h> 26 28 #include <asm/intel-family.h> 29 + #include <asm/iosf_mbi.h> 27 30 #include <asm/msr.h> 28 31 29 32 /* Local defines */ 30 33 #define MSR_PLATFORM_POWER_LIMIT 0x0000065C 31 34 #define MSR_VR_CURRENT_CONFIG 0x00000601 35 + 36 + #define ENERGY_UNIT_SCALE 1000 /* scale from driver unit to powercap unit */ 37 + 38 + #define POWER_UNIT_OFFSET 0x00 39 + #define POWER_UNIT_MASK GENMASK(3, 0) 40 + 41 + #define ENERGY_UNIT_OFFSET 0x08 42 + #define ENERGY_UNIT_MASK GENMASK(12, 8) 43 + 44 + #define TIME_UNIT_OFFSET 0x10 45 + #define TIME_UNIT_MASK GENMASK(19, 16) 46 + 47 + /* bitmasks for RAPL MSRs, used by primitive access functions */ 48 + #define ENERGY_STATUS_MASK GENMASK(31, 0) 49 + 50 + #define POWER_LIMIT1_MASK GENMASK(14, 0) 51 + #define POWER_LIMIT1_ENABLE BIT(15) 52 + #define POWER_LIMIT1_CLAMP BIT(16) 53 + 54 + #define POWER_LIMIT2_MASK GENMASK_ULL(46, 32) 55 + #define POWER_LIMIT2_ENABLE BIT_ULL(47) 56 + #define POWER_LIMIT2_CLAMP BIT_ULL(48) 57 + #define POWER_HIGH_LOCK BIT_ULL(63) 58 + #define POWER_LOW_LOCK BIT(31) 59 + 60 + #define POWER_LIMIT4_MASK GENMASK(12, 0) 61 + 62 + #define TIME_WINDOW1_MASK GENMASK_ULL(23, 17) 63 + #define TIME_WINDOW2_MASK GENMASK_ULL(55, 49) 64 + 65 + #define POWER_INFO_MAX_MASK GENMASK_ULL(46, 32) 66 + #define POWER_INFO_MIN_MASK GENMASK_ULL(30, 16) 67 + #define POWER_INFO_MAX_TIME_WIN_MASK GENMASK_ULL(53, 48) 68 + #define POWER_INFO_THERMAL_SPEC_MASK GENMASK(14, 0) 69 + 70 + #define PERF_STATUS_THROTTLE_TIME_MASK GENMASK(31, 0) 71 + #define PP_POLICY_MASK GENMASK(4, 0) 72 + 73 + /* 74 + * SPR has different layout for Psys Domain PowerLimit registers. 75 + * There are 17 bits of PL1 and PL2 instead of 15 bits. 76 + * The Enable bits and TimeWindow bits are also shifted as a result. 77 + */ 78 + #define PSYS_POWER_LIMIT1_MASK GENMASK_ULL(16, 0) 79 + #define PSYS_POWER_LIMIT1_ENABLE BIT(17) 80 + 81 + #define PSYS_POWER_LIMIT2_MASK GENMASK_ULL(48, 32) 82 + #define PSYS_POWER_LIMIT2_ENABLE BIT_ULL(49) 83 + 84 + #define PSYS_TIME_WINDOW1_MASK GENMASK_ULL(25, 19) 85 + #define PSYS_TIME_WINDOW2_MASK GENMASK_ULL(57, 51) 86 + 87 + /* Sideband MBI registers */ 88 + #define IOSF_CPU_POWER_BUDGET_CTL_BYT 0x02 89 + #define IOSF_CPU_POWER_BUDGET_CTL_TNG 0xDF 32 90 33 91 /* private data for RAPL MSR Interface */ 34 92 static struct rapl_if_priv *rapl_msr_priv; ··· 216 158 return ra->err; 217 159 } 218 160 219 - /* List of verified CPUs. */ 220 - static const struct x86_cpu_id pl4_support_ids[] = { 221 - X86_MATCH_VFM(INTEL_ICELAKE_L, NULL), 222 - X86_MATCH_VFM(INTEL_TIGERLAKE_L, NULL), 223 - X86_MATCH_VFM(INTEL_ALDERLAKE, NULL), 224 - X86_MATCH_VFM(INTEL_ALDERLAKE_L, NULL), 225 - X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, NULL), 226 - X86_MATCH_VFM(INTEL_RAPTORLAKE, NULL), 227 - X86_MATCH_VFM(INTEL_RAPTORLAKE_P, NULL), 228 - X86_MATCH_VFM(INTEL_METEORLAKE, NULL), 229 - X86_MATCH_VFM(INTEL_METEORLAKE_L, NULL), 230 - X86_MATCH_VFM(INTEL_ARROWLAKE_U, NULL), 231 - X86_MATCH_VFM(INTEL_ARROWLAKE_H, NULL), 232 - X86_MATCH_VFM(INTEL_PANTHERLAKE_L, NULL), 233 - X86_MATCH_VFM(INTEL_WILDCATLAKE_L, NULL), 234 - X86_MATCH_VFM(INTEL_NOVALAKE, NULL), 235 - X86_MATCH_VFM(INTEL_NOVALAKE_L, NULL), 236 - {} 161 + static int rapl_check_unit_atom(struct rapl_domain *rd) 162 + { 163 + struct reg_action ra; 164 + u32 value; 165 + 166 + ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 167 + ra.mask = ~0; 168 + if (rapl_msr_read_raw(rd->rp->lead_cpu, &ra, false)) { 169 + pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 170 + ra.reg.val, rd->rp->name, rd->name); 171 + return -ENODEV; 172 + } 173 + 174 + value = (ra.value & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET; 175 + rd->energy_unit = ENERGY_UNIT_SCALE * (1ULL << value); 176 + 177 + value = (ra.value & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET; 178 + rd->power_unit = (1ULL << value) * MILLIWATT_PER_WATT; 179 + 180 + value = (ra.value & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET; 181 + rd->time_unit = USEC_PER_SEC >> value; 182 + 183 + pr_debug("Atom %s:%s energy=%dpJ, time=%dus, power=%duW\n", 184 + rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit); 185 + 186 + return 0; 187 + } 188 + 189 + static void set_floor_freq_atom(struct rapl_domain *rd, bool enable) 190 + { 191 + static u32 power_ctrl_orig_val; 192 + const struct rapl_defaults *defaults = rd->rp->priv->defaults; 193 + u32 mdata; 194 + 195 + if (!defaults->floor_freq_reg_addr) { 196 + pr_err("Invalid floor frequency config register\n"); 197 + return; 198 + } 199 + 200 + if (!power_ctrl_orig_val) 201 + iosf_mbi_read(BT_MBI_UNIT_PMC, MBI_CR_READ, 202 + defaults->floor_freq_reg_addr, 203 + &power_ctrl_orig_val); 204 + mdata = power_ctrl_orig_val; 205 + if (enable) { 206 + mdata &= ~GENMASK(14, 8); 207 + mdata |= BIT(8); 208 + } 209 + iosf_mbi_write(BT_MBI_UNIT_PMC, MBI_CR_WRITE, 210 + defaults->floor_freq_reg_addr, mdata); 211 + } 212 + 213 + static u64 rapl_compute_time_window_atom(struct rapl_domain *rd, u64 value, 214 + bool to_raw) 215 + { 216 + if (to_raw) 217 + return div64_u64(value, rd->time_unit); 218 + 219 + /* 220 + * Atom time unit encoding is straight forward val * time_unit, 221 + * where time_unit is default to 1 sec. Never 0. 222 + */ 223 + return value ? value * rd->time_unit : rd->time_unit; 224 + } 225 + 226 + /* RAPL primitives for MSR I/F */ 227 + static struct rapl_primitive_info rpi_msr[NR_RAPL_PRIMITIVES] = { 228 + /* name, mask, shift, msr index, unit divisor */ 229 + [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0, 230 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 231 + [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32, 232 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 233 + [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0, 234 + RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), 235 + [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, 236 + RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), 237 + [FW_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31, 238 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 239 + [FW_HIGH_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_HIGH_LOCK, 63, 240 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 241 + [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15, 242 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 243 + [PL1_CLAMP] = PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16, 244 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 245 + [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47, 246 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 247 + [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48, 248 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 249 + [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17, 250 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 251 + [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49, 252 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 253 + [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, 254 + POWER_INFO_THERMAL_SPEC_MASK, 0, 255 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 256 + [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32, 257 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 258 + [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16, 259 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 260 + [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, 261 + POWER_INFO_MAX_TIME_WIN_MASK, 48, 262 + RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0), 263 + [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, 264 + PERF_STATUS_THROTTLE_TIME_MASK, 0, 265 + RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), 266 + [PRIORITY_LEVEL] = PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0, 267 + RAPL_DOMAIN_REG_POLICY, ARBITRARY_UNIT, 0), 268 + [PSYS_POWER_LIMIT1] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0, 269 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 270 + [PSYS_POWER_LIMIT2] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 271 + 32, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 272 + [PSYS_PL1_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 273 + 17, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 274 + 0), 275 + [PSYS_PL2_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 276 + 49, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 277 + 0), 278 + [PSYS_TIME_WINDOW1] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 279 + 19, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 280 + [PSYS_TIME_WINDOW2] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 281 + 51, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 237 282 }; 238 283 239 - /* List of MSR-based RAPL PMU support CPUs */ 240 - static const struct x86_cpu_id pmu_support_ids[] = { 241 - X86_MATCH_VFM(INTEL_PANTHERLAKE_L, NULL), 242 - X86_MATCH_VFM(INTEL_WILDCATLAKE_L, NULL), 284 + static const struct rapl_defaults rapl_defaults_core = { 285 + .floor_freq_reg_addr = 0, 286 + .check_unit = rapl_default_check_unit, 287 + .set_floor_freq = rapl_default_set_floor_freq, 288 + .compute_time_window = rapl_default_compute_time_window, 289 + }; 290 + 291 + static const struct rapl_defaults rapl_defaults_hsw_server = { 292 + .check_unit = rapl_default_check_unit, 293 + .set_floor_freq = rapl_default_set_floor_freq, 294 + .compute_time_window = rapl_default_compute_time_window, 295 + .dram_domain_energy_unit = 15300, 296 + }; 297 + 298 + static const struct rapl_defaults rapl_defaults_spr_server = { 299 + .check_unit = rapl_default_check_unit, 300 + .set_floor_freq = rapl_default_set_floor_freq, 301 + .compute_time_window = rapl_default_compute_time_window, 302 + .psys_domain_energy_unit = NANOJOULE_PER_JOULE, 303 + .spr_psys_bits = true, 304 + }; 305 + 306 + static const struct rapl_defaults rapl_defaults_byt = { 307 + .floor_freq_reg_addr = IOSF_CPU_POWER_BUDGET_CTL_BYT, 308 + .check_unit = rapl_check_unit_atom, 309 + .set_floor_freq = set_floor_freq_atom, 310 + .compute_time_window = rapl_compute_time_window_atom, 311 + }; 312 + 313 + static const struct rapl_defaults rapl_defaults_tng = { 314 + .floor_freq_reg_addr = IOSF_CPU_POWER_BUDGET_CTL_TNG, 315 + .check_unit = rapl_check_unit_atom, 316 + .set_floor_freq = set_floor_freq_atom, 317 + .compute_time_window = rapl_compute_time_window_atom, 318 + }; 319 + 320 + static const struct rapl_defaults rapl_defaults_ann = { 321 + .floor_freq_reg_addr = 0, 322 + .check_unit = rapl_check_unit_atom, 323 + .set_floor_freq = NULL, 324 + .compute_time_window = rapl_compute_time_window_atom, 325 + }; 326 + 327 + static const struct rapl_defaults rapl_defaults_cht = { 328 + .floor_freq_reg_addr = 0, 329 + .check_unit = rapl_check_unit_atom, 330 + .set_floor_freq = NULL, 331 + .compute_time_window = rapl_compute_time_window_atom, 332 + }; 333 + 334 + static const struct rapl_defaults rapl_defaults_amd = { 335 + .check_unit = rapl_default_check_unit, 336 + }; 337 + 338 + static const struct rapl_defaults rapl_defaults_core_pl4 = { 339 + .floor_freq_reg_addr = 0, 340 + .check_unit = rapl_default_check_unit, 341 + .set_floor_freq = rapl_default_set_floor_freq, 342 + .compute_time_window = rapl_default_compute_time_window, 343 + .msr_pl4_support = 1, 344 + }; 345 + 346 + static const struct rapl_defaults rapl_defaults_core_pl4_pmu = { 347 + .floor_freq_reg_addr = 0, 348 + .check_unit = rapl_default_check_unit, 349 + .set_floor_freq = rapl_default_set_floor_freq, 350 + .compute_time_window = rapl_default_compute_time_window, 351 + .msr_pl4_support = 1, 352 + .msr_pmu_support = 1, 353 + }; 354 + 355 + static const struct x86_cpu_id rapl_ids[] = { 356 + X86_MATCH_VFM(INTEL_SANDYBRIDGE, &rapl_defaults_core), 357 + X86_MATCH_VFM(INTEL_SANDYBRIDGE_X, &rapl_defaults_core), 358 + 359 + X86_MATCH_VFM(INTEL_IVYBRIDGE, &rapl_defaults_core), 360 + X86_MATCH_VFM(INTEL_IVYBRIDGE_X, &rapl_defaults_core), 361 + 362 + X86_MATCH_VFM(INTEL_HASWELL, &rapl_defaults_core), 363 + X86_MATCH_VFM(INTEL_HASWELL_L, &rapl_defaults_core), 364 + X86_MATCH_VFM(INTEL_HASWELL_G, &rapl_defaults_core), 365 + X86_MATCH_VFM(INTEL_HASWELL_X, &rapl_defaults_hsw_server), 366 + 367 + X86_MATCH_VFM(INTEL_BROADWELL, &rapl_defaults_core), 368 + X86_MATCH_VFM(INTEL_BROADWELL_G, &rapl_defaults_core), 369 + X86_MATCH_VFM(INTEL_BROADWELL_D, &rapl_defaults_core), 370 + X86_MATCH_VFM(INTEL_BROADWELL_X, &rapl_defaults_hsw_server), 371 + 372 + X86_MATCH_VFM(INTEL_SKYLAKE, &rapl_defaults_core), 373 + X86_MATCH_VFM(INTEL_SKYLAKE_L, &rapl_defaults_core), 374 + X86_MATCH_VFM(INTEL_SKYLAKE_X, &rapl_defaults_hsw_server), 375 + X86_MATCH_VFM(INTEL_KABYLAKE_L, &rapl_defaults_core), 376 + X86_MATCH_VFM(INTEL_KABYLAKE, &rapl_defaults_core), 377 + X86_MATCH_VFM(INTEL_CANNONLAKE_L, &rapl_defaults_core), 378 + X86_MATCH_VFM(INTEL_ICELAKE_L, &rapl_defaults_core_pl4), 379 + X86_MATCH_VFM(INTEL_ICELAKE, &rapl_defaults_core), 380 + X86_MATCH_VFM(INTEL_ICELAKE_NNPI, &rapl_defaults_core), 381 + X86_MATCH_VFM(INTEL_ICELAKE_X, &rapl_defaults_hsw_server), 382 + X86_MATCH_VFM(INTEL_ICELAKE_D, &rapl_defaults_hsw_server), 383 + X86_MATCH_VFM(INTEL_COMETLAKE_L, &rapl_defaults_core), 384 + X86_MATCH_VFM(INTEL_COMETLAKE, &rapl_defaults_core), 385 + X86_MATCH_VFM(INTEL_TIGERLAKE_L, &rapl_defaults_core_pl4), 386 + X86_MATCH_VFM(INTEL_TIGERLAKE, &rapl_defaults_core), 387 + X86_MATCH_VFM(INTEL_ROCKETLAKE, &rapl_defaults_core), 388 + X86_MATCH_VFM(INTEL_ALDERLAKE, &rapl_defaults_core_pl4), 389 + X86_MATCH_VFM(INTEL_ALDERLAKE_L, &rapl_defaults_core_pl4), 390 + X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, &rapl_defaults_core_pl4), 391 + X86_MATCH_VFM(INTEL_RAPTORLAKE, &rapl_defaults_core_pl4), 392 + X86_MATCH_VFM(INTEL_RAPTORLAKE_P, &rapl_defaults_core_pl4), 393 + X86_MATCH_VFM(INTEL_RAPTORLAKE_S, &rapl_defaults_core), 394 + X86_MATCH_VFM(INTEL_BARTLETTLAKE, &rapl_defaults_core), 395 + X86_MATCH_VFM(INTEL_METEORLAKE, &rapl_defaults_core_pl4), 396 + X86_MATCH_VFM(INTEL_METEORLAKE_L, &rapl_defaults_core_pl4), 397 + X86_MATCH_VFM(INTEL_SAPPHIRERAPIDS_X, &rapl_defaults_spr_server), 398 + X86_MATCH_VFM(INTEL_EMERALDRAPIDS_X, &rapl_defaults_spr_server), 399 + X86_MATCH_VFM(INTEL_LUNARLAKE_M, &rapl_defaults_core), 400 + X86_MATCH_VFM(INTEL_PANTHERLAKE_L, &rapl_defaults_core_pl4_pmu), 401 + X86_MATCH_VFM(INTEL_WILDCATLAKE_L, &rapl_defaults_core_pl4_pmu), 402 + X86_MATCH_VFM(INTEL_NOVALAKE, &rapl_defaults_core_pl4), 403 + X86_MATCH_VFM(INTEL_NOVALAKE_L, &rapl_defaults_core_pl4), 404 + X86_MATCH_VFM(INTEL_ARROWLAKE_H, &rapl_defaults_core_pl4), 405 + X86_MATCH_VFM(INTEL_ARROWLAKE, &rapl_defaults_core), 406 + X86_MATCH_VFM(INTEL_ARROWLAKE_U, &rapl_defaults_core_pl4), 407 + X86_MATCH_VFM(INTEL_LAKEFIELD, &rapl_defaults_core), 408 + 409 + X86_MATCH_VFM(INTEL_ATOM_SILVERMONT, &rapl_defaults_byt), 410 + X86_MATCH_VFM(INTEL_ATOM_AIRMONT, &rapl_defaults_cht), 411 + X86_MATCH_VFM(INTEL_ATOM_SILVERMONT_MID, &rapl_defaults_tng), 412 + X86_MATCH_VFM(INTEL_ATOM_SILVERMONT_MID2, &rapl_defaults_ann), 413 + X86_MATCH_VFM(INTEL_ATOM_GOLDMONT, &rapl_defaults_core), 414 + X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_PLUS, &rapl_defaults_core), 415 + X86_MATCH_VFM(INTEL_ATOM_GOLDMONT_D, &rapl_defaults_core), 416 + X86_MATCH_VFM(INTEL_ATOM_TREMONT, &rapl_defaults_core), 417 + X86_MATCH_VFM(INTEL_ATOM_TREMONT_D, &rapl_defaults_core), 418 + X86_MATCH_VFM(INTEL_ATOM_TREMONT_L, &rapl_defaults_core), 419 + 420 + X86_MATCH_VFM(INTEL_XEON_PHI_KNL, &rapl_defaults_hsw_server), 421 + X86_MATCH_VFM(INTEL_XEON_PHI_KNM, &rapl_defaults_hsw_server), 422 + 423 + X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd), 424 + X86_MATCH_VENDOR_FAM(AMD, 0x19, &rapl_defaults_amd), 425 + X86_MATCH_VENDOR_FAM(AMD, 0x1A, &rapl_defaults_amd), 426 + X86_MATCH_VENDOR_FAM(HYGON, 0x18, &rapl_defaults_amd), 243 427 {} 244 428 }; 429 + MODULE_DEVICE_TABLE(x86cpu, rapl_ids); 245 430 246 431 static int rapl_msr_probe(struct platform_device *pdev) 247 432 { 248 - const struct x86_cpu_id *id = x86_match_cpu(pl4_support_ids); 249 433 int ret; 250 434 251 435 switch (boot_cpu_data.x86_vendor) { ··· 504 204 } 505 205 rapl_msr_priv->read_raw = rapl_msr_read_raw; 506 206 rapl_msr_priv->write_raw = rapl_msr_write_raw; 207 + rapl_msr_priv->defaults = (const struct rapl_defaults *)pdev->dev.platform_data; 208 + rapl_msr_priv->rpi = rpi_msr; 507 209 508 - if (id) { 210 + if (rapl_msr_priv->defaults->msr_pl4_support) { 509 211 rapl_msr_priv->limits[RAPL_DOMAIN_PACKAGE] |= BIT(POWER_LIMIT4); 510 212 rapl_msr_priv->regs[RAPL_DOMAIN_PACKAGE][RAPL_DOMAIN_REG_PL4].msr = 511 213 MSR_VR_CURRENT_CONFIG; 512 - pr_info("PL4 support detected.\n"); 214 + pr_info("PL4 support detected (updated).\n"); 513 215 } 514 216 515 - if (x86_match_cpu(pmu_support_ids)) { 217 + if (rapl_msr_priv->defaults->msr_pmu_support) { 516 218 rapl_msr_pmu = true; 517 - pr_info("MSR-based RAPL PMU support enabled\n"); 219 + pr_info("MSR-based RAPL PMU support enabled (updated)\n"); 518 220 } 519 221 520 222 rapl_msr_priv->control_type = powercap_register_control_type(NULL, "intel-rapl", NULL); ··· 560 258 }, 561 259 }; 562 260 563 - module_platform_driver(intel_rapl_msr_driver); 261 + static struct platform_device *rapl_msr_platdev; 262 + 263 + static int intel_rapl_msr_init(void) 264 + { 265 + const struct rapl_defaults *def; 266 + const struct x86_cpu_id *id; 267 + int ret; 268 + 269 + ret = platform_driver_register(&intel_rapl_msr_driver); 270 + if (ret) 271 + return ret; 272 + 273 + /* Create the MSR RAPL platform device for supported platforms */ 274 + id = x86_match_cpu(rapl_ids); 275 + if (!id) 276 + return 0; 277 + 278 + def = (const struct rapl_defaults *)id->driver_data; 279 + 280 + rapl_msr_platdev = platform_device_register_data(NULL, "intel_rapl_msr", 0, def, 281 + sizeof(*def)); 282 + if (IS_ERR(rapl_msr_platdev)) 283 + pr_debug("intel_rapl_msr device register failed, ret:%ld\n", 284 + PTR_ERR(rapl_msr_platdev)); 285 + 286 + return 0; 287 + } 288 + module_init(intel_rapl_msr_init); 289 + 290 + static void intel_rapl_msr_exit(void) 291 + { 292 + platform_device_unregister(rapl_msr_platdev); 293 + platform_driver_unregister(&intel_rapl_msr_driver); 294 + } 295 + module_exit(intel_rapl_msr_exit); 564 296 565 297 MODULE_DESCRIPTION("Driver for Intel RAPL (Running Average Power Limit) control via MSR interface"); 566 298 MODULE_AUTHOR("Zhang Rui <rui.zhang@intel.com>"); 567 299 MODULE_LICENSE("GPL v2"); 300 + MODULE_IMPORT_NS("INTEL_RAPL");
+101
drivers/powercap/intel_rapl_tpmi.c
··· 9 9 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 10 10 11 11 #include <linux/auxiliary_bus.h> 12 + #include <linux/bits.h> 12 13 #include <linux/intel_rapl.h> 13 14 #include <linux/intel_tpmi.h> 14 15 #include <linux/intel_vsec.h> 15 16 #include <linux/io.h> 16 17 #include <linux/module.h> 17 18 #include <linux/slab.h> 19 + #include <linux/units.h> 18 20 19 21 #define TPMI_RAPL_MAJOR_VERSION 0 20 22 #define TPMI_RAPL_MINOR_VERSION 1 ··· 61 59 static DEFINE_MUTEX(tpmi_rapl_lock); 62 60 63 61 static struct powercap_control_type *tpmi_control_type; 62 + 63 + /* bitmasks for RAPL TPMI, used by primitive access functions */ 64 + #define TPMI_POWER_LIMIT_MASK GENMASK_ULL(17, 0) 65 + #define TPMI_POWER_LIMIT_ENABLE BIT_ULL(62) 66 + #define TPMI_POWER_HIGH_LOCK BIT_ULL(63) 67 + #define TPMI_TIME_WINDOW_MASK GENMASK_ULL(24, 18) 68 + #define TPMI_INFO_SPEC_MASK GENMASK_ULL(17, 0) 69 + #define TPMI_INFO_MIN_MASK GENMASK_ULL(35, 18) 70 + #define TPMI_INFO_MAX_MASK GENMASK_ULL(53, 36) 71 + #define TPMI_INFO_MAX_TIME_WIN_MASK GENMASK_ULL(60, 54) 72 + #define TPMI_ENERGY_STATUS_MASK GENMASK(31, 0) 73 + #define TPMI_PERF_STATUS_THROTTLE_TIME_MASK GENMASK(31, 0) 74 + 75 + /* RAPL primitives for TPMI I/F */ 76 + static struct rapl_primitive_info rpi_tpmi[NR_RAPL_PRIMITIVES] = { 77 + /* name, mask, shift, msr index, unit divisor */ 78 + [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, TPMI_POWER_LIMIT_MASK, 0, 79 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 80 + [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, TPMI_POWER_LIMIT_MASK, 0, 81 + RAPL_DOMAIN_REG_PL2, POWER_UNIT, 0), 82 + [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, TPMI_POWER_LIMIT_MASK, 0, 83 + RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), 84 + [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, TPMI_ENERGY_STATUS_MASK, 0, 85 + RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), 86 + [PL1_LOCK] = PRIMITIVE_INFO_INIT(PL1_LOCK, TPMI_POWER_HIGH_LOCK, 63, 87 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 88 + [PL2_LOCK] = PRIMITIVE_INFO_INIT(PL2_LOCK, TPMI_POWER_HIGH_LOCK, 63, 89 + RAPL_DOMAIN_REG_PL2, ARBITRARY_UNIT, 0), 90 + [PL4_LOCK] = PRIMITIVE_INFO_INIT(PL4_LOCK, TPMI_POWER_HIGH_LOCK, 63, 91 + RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0), 92 + [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62, 93 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 94 + [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62, 95 + RAPL_DOMAIN_REG_PL2, ARBITRARY_UNIT, 0), 96 + [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, TPMI_POWER_LIMIT_ENABLE, 62, 97 + RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0), 98 + [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TPMI_TIME_WINDOW_MASK, 18, 99 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 100 + [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TPMI_TIME_WINDOW_MASK, 18, 101 + RAPL_DOMAIN_REG_PL2, TIME_UNIT, 0), 102 + [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, TPMI_INFO_SPEC_MASK, 0, 103 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 104 + [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, TPMI_INFO_MAX_MASK, 36, 105 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 106 + [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, TPMI_INFO_MIN_MASK, 18, 107 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 108 + [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, TPMI_INFO_MAX_TIME_WIN_MASK, 109 + 54, RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0), 110 + [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, 111 + TPMI_PERF_STATUS_THROTTLE_TIME_MASK, 112 + 0, RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), 113 + }; 64 114 65 115 static int tpmi_rapl_read_raw(int id, struct reg_action *ra, bool atomic) 66 116 { ··· 304 250 return 0; 305 251 } 306 252 253 + /* TPMI Unit register has different layout */ 254 + #define TPMI_ENERGY_UNIT_SCALE 1000 255 + #define TPMI_POWER_UNIT_OFFSET 0x00 256 + #define TPMI_POWER_UNIT_MASK GENMASK(3, 0) 257 + #define TPMI_ENERGY_UNIT_OFFSET 0x06 258 + #define TPMI_ENERGY_UNIT_MASK GENMASK_ULL(10, 6) 259 + #define TPMI_TIME_UNIT_OFFSET 0x0C 260 + #define TPMI_TIME_UNIT_MASK GENMASK_ULL(15, 12) 261 + 262 + static int rapl_check_unit_tpmi(struct rapl_domain *rd) 263 + { 264 + struct reg_action ra; 265 + u32 value; 266 + 267 + ra.reg = rd->regs[RAPL_DOMAIN_REG_UNIT]; 268 + ra.mask = ~0; 269 + if (tpmi_rapl_read_raw(rd->rp->id, &ra, false)) { 270 + pr_err("Failed to read power unit REG 0x%llx on %s:%s, exit.\n", 271 + ra.reg.val, rd->rp->name, rd->name); 272 + return -ENODEV; 273 + } 274 + 275 + value = (ra.value & TPMI_ENERGY_UNIT_MASK) >> TPMI_ENERGY_UNIT_OFFSET; 276 + rd->energy_unit = (TPMI_ENERGY_UNIT_SCALE * MICROJOULE_PER_JOULE) >> value; 277 + 278 + value = (ra.value & TPMI_POWER_UNIT_MASK) >> TPMI_POWER_UNIT_OFFSET; 279 + rd->power_unit = MICROWATT_PER_WATT >> value; 280 + 281 + value = (ra.value & TPMI_TIME_UNIT_MASK) >> TPMI_TIME_UNIT_OFFSET; 282 + rd->time_unit = USEC_PER_SEC >> value; 283 + 284 + pr_debug("Core CPU %s:%s energy=%dpJ, time=%dus, power=%duW\n", 285 + rd->rp->name, rd->name, rd->energy_unit, rd->time_unit, rd->power_unit); 286 + 287 + return 0; 288 + } 289 + 290 + static const struct rapl_defaults defaults_tpmi = { 291 + .check_unit = rapl_check_unit_tpmi, 292 + /* Reuse existing logic, ignore the PL_CLAMP failures and enable all Power Limits */ 293 + .set_floor_freq = rapl_default_set_floor_freq, 294 + .compute_time_window = rapl_default_compute_time_window, 295 + }; 296 + 307 297 static int intel_rapl_tpmi_probe(struct auxiliary_device *auxdev, 308 298 const struct auxiliary_device_id *id) 309 299 { ··· 395 297 trp->priv.read_raw = tpmi_rapl_read_raw; 396 298 trp->priv.write_raw = tpmi_rapl_write_raw; 397 299 trp->priv.control_type = tpmi_control_type; 300 + trp->priv.defaults = &defaults_tpmi; 301 + trp->priv.rpi = rpi_tpmi; 398 302 399 303 /* RAPL TPMI I/F is per physical package */ 400 304 trp->rp = rapl_find_package_domain(info->package_id, &trp->priv, false); ··· 448 348 449 349 module_auxiliary_driver(intel_rapl_tpmi_driver) 450 350 351 + MODULE_IMPORT_NS("INTEL_RAPL"); 451 352 MODULE_IMPORT_NS("INTEL_TPMI"); 452 353 453 354 MODULE_DESCRIPTION("Intel RAPL TPMI Driver");
+81
drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c
··· 11 11 12 12 static struct rapl_if_priv rapl_mmio_priv; 13 13 14 + /* bitmasks for RAPL MSRs, used by primitive access functions */ 15 + #define MMIO_ENERGY_STATUS_MASK GENMASK(31, 0) 16 + 17 + #define MMIO_POWER_LIMIT1_MASK GENMASK(14, 0) 18 + #define MMIO_POWER_LIMIT1_ENABLE BIT(15) 19 + #define MMIO_POWER_LIMIT1_CLAMP BIT(16) 20 + 21 + #define MMIO_POWER_LIMIT2_MASK GENMASK_ULL(46, 32) 22 + #define MMIO_POWER_LIMIT2_ENABLE BIT_ULL(47) 23 + #define MMIO_POWER_LIMIT2_CLAMP BIT_ULL(48) 24 + 25 + #define MMIO_POWER_LOW_LOCK BIT(31) 26 + #define MMIO_POWER_HIGH_LOCK BIT_ULL(63) 27 + 28 + #define MMIO_POWER_LIMIT4_MASK GENMASK(12, 0) 29 + 30 + #define MMIO_TIME_WINDOW1_MASK GENMASK_ULL(23, 17) 31 + #define MMIO_TIME_WINDOW2_MASK GENMASK_ULL(55, 49) 32 + 33 + #define MMIO_POWER_INFO_MAX_MASK GENMASK_ULL(46, 32) 34 + #define MMIO_POWER_INFO_MIN_MASK GENMASK_ULL(30, 16) 35 + #define MMIO_POWER_INFO_MAX_TIME_WIN_MASK GENMASK_ULL(53, 48) 36 + #define MMIO_POWER_INFO_THERMAL_SPEC_MASK GENMASK(14, 0) 37 + 38 + #define MMIO_PERF_STATUS_THROTTLE_TIME_MASK GENMASK(31, 0) 39 + #define MMIO_PP_POLICY_MASK GENMASK(4, 0) 40 + 41 + /* RAPL primitives for MMIO I/F */ 42 + static struct rapl_primitive_info rpi_mmio[NR_RAPL_PRIMITIVES] = { 43 + /* name, mask, shift, msr index, unit divisor */ 44 + [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, MMIO_POWER_LIMIT1_MASK, 0, 45 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 46 + [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, MMIO_POWER_LIMIT2_MASK, 32, 47 + RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), 48 + [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, MMIO_POWER_LIMIT4_MASK, 0, 49 + RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), 50 + [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, MMIO_ENERGY_STATUS_MASK, 0, 51 + RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), 52 + [FW_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, MMIO_POWER_LOW_LOCK, 31, 53 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 54 + [FW_HIGH_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, MMIO_POWER_HIGH_LOCK, 63, 55 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 56 + [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, MMIO_POWER_LIMIT1_ENABLE, 15, 57 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 58 + [PL1_CLAMP] = PRIMITIVE_INFO_INIT(PL1_CLAMP, MMIO_POWER_LIMIT1_CLAMP, 16, 59 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 60 + [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, MMIO_POWER_LIMIT2_ENABLE, 47, 61 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 62 + [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, MMIO_POWER_LIMIT2_CLAMP, 48, 63 + RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), 64 + [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, MMIO_TIME_WINDOW1_MASK, 17, 65 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 66 + [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, MMIO_TIME_WINDOW2_MASK, 49, 67 + RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), 68 + [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, 69 + MMIO_POWER_INFO_THERMAL_SPEC_MASK, 0, 70 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 71 + [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, MMIO_POWER_INFO_MAX_MASK, 32, 72 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 73 + [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, MMIO_POWER_INFO_MIN_MASK, 16, 74 + RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), 75 + [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, 76 + MMIO_POWER_INFO_MAX_TIME_WIN_MASK, 48, 77 + RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0), 78 + [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, 79 + MMIO_PERF_STATUS_THROTTLE_TIME_MASK, 0, 80 + RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), 81 + [PRIORITY_LEVEL] = PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, MMIO_PP_POLICY_MASK, 0, 82 + RAPL_DOMAIN_REG_POLICY, ARBITRARY_UNIT, 0), 83 + }; 84 + 14 85 static const struct rapl_mmio_regs rapl_mmio_default = { 15 86 .reg_unit = 0x5938, 16 87 .regs[RAPL_DOMAIN_PACKAGE] = { 0x59a0, 0x593c, 0x58f0, 0, 0x5930, 0x59b0}, 17 88 .regs[RAPL_DOMAIN_DRAM] = { 0x58e0, 0x58e8, 0x58ec, 0, 0}, 18 89 .limits[RAPL_DOMAIN_PACKAGE] = BIT(POWER_LIMIT2) | BIT(POWER_LIMIT4), 19 90 .limits[RAPL_DOMAIN_DRAM] = BIT(POWER_LIMIT2), 91 + }; 92 + 93 + static const struct rapl_defaults rapl_defaults_mmio = { 94 + .floor_freq_reg_addr = 0, 95 + .check_unit = rapl_default_check_unit, 96 + .set_floor_freq = rapl_default_set_floor_freq, 97 + .compute_time_window = rapl_default_compute_time_window, 20 98 }; 21 99 22 100 static int rapl_mmio_read_raw(int cpu, struct reg_action *ra, bool atomic) ··· 145 67 146 68 rapl_mmio_priv.read_raw = rapl_mmio_read_raw; 147 69 rapl_mmio_priv.write_raw = rapl_mmio_write_raw; 70 + rapl_mmio_priv.defaults = &rapl_defaults_mmio; 71 + rapl_mmio_priv.rpi = rpi_mmio; 148 72 149 73 rapl_mmio_priv.control_type = powercap_register_control_type(NULL, "intel-rapl-mmio", NULL); 150 74 if (IS_ERR(rapl_mmio_priv.control_type)) { ··· 191 111 EXPORT_SYMBOL_GPL(proc_thermal_rapl_remove); 192 112 193 113 MODULE_LICENSE("GPL v2"); 114 + MODULE_IMPORT_NS("INTEL_RAPL"); 194 115 MODULE_DESCRIPTION("RAPL interface using MMIO");
+1
include/acpi/cppc_acpi.h
··· 162 162 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps); 163 163 extern bool cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu); 164 164 extern bool cppc_perf_ctrs_in_pcc(void); 165 + extern u64 cppc_get_dmi_max_khz(void); 165 166 extern unsigned int cppc_perf_to_khz(struct cppc_perf_caps *caps, unsigned int perf); 166 167 extern unsigned int cppc_khz_to_perf(struct cppc_perf_caps *caps, unsigned int freq); 167 168 extern bool acpi_cpc_valid(void);
+6 -5
include/linux/cpufreq.h
··· 79 79 * called, but you're in IRQ context */ 80 80 81 81 struct freq_constraints constraints; 82 - struct freq_qos_request *min_freq_req; 83 - struct freq_qos_request *max_freq_req; 82 + struct freq_qos_request min_freq_req; 83 + struct freq_qos_request max_freq_req; 84 + struct freq_qos_request boost_freq_req; 84 85 85 86 struct cpufreq_frequency_table *freq_table; 86 87 enum cpufreq_table_sorting freq_table_sorted; ··· 233 232 234 233 static inline bool policy_is_shared(struct cpufreq_policy *policy) 235 234 { 236 - return cpumask_weight(policy->cpus) > 1; 235 + return cpumask_nth(1, policy->cpus) < nr_cpumask_bits; 237 236 } 238 237 239 238 #ifdef CONFIG_CPU_FREQ ··· 373 372 * conditions) scale invariance can be disabled, which causes the 374 373 * schedutil governor to fall back to the latter. 375 374 */ 376 - void (*adjust_perf)(unsigned int cpu, 375 + void (*adjust_perf)(struct cpufreq_policy *policy, 377 376 unsigned long min_perf, 378 377 unsigned long target_perf, 379 378 unsigned long capacity); ··· 618 617 /* Pass a target to the cpufreq driver */ 619 618 unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy, 620 619 unsigned int target_freq); 621 - void cpufreq_driver_adjust_perf(unsigned int cpu, 620 + void cpufreq_driver_adjust_perf(struct cpufreq_policy *policy, 622 621 unsigned long min_perf, 623 622 unsigned long target_perf, 624 623 unsigned long capacity);
+47 -5
include/linux/intel_rapl.h
··· 77 77 PSYS_TIME_WINDOW1, 78 78 PSYS_TIME_WINDOW2, 79 79 /* below are not raw primitive data */ 80 - AVERAGE_POWER, 81 80 NR_RAPL_PRIMITIVES, 82 81 }; 83 82 ··· 127 128 int err; 128 129 }; 129 130 131 + struct rapl_defaults { 132 + u8 floor_freq_reg_addr; 133 + int (*check_unit)(struct rapl_domain *rd); 134 + void (*set_floor_freq)(struct rapl_domain *rd, bool mode); 135 + u64 (*compute_time_window)(struct rapl_domain *rd, u64 val, bool to_raw); 136 + unsigned int dram_domain_energy_unit; 137 + unsigned int psys_domain_energy_unit; 138 + bool spr_psys_bits; 139 + bool msr_pl4_support; 140 + bool msr_pmu_support; 141 + }; 142 + 143 + #define PRIMITIVE_INFO_INIT(p, m, s, i, u, f) { \ 144 + .name = #p, \ 145 + .mask = m, \ 146 + .shift = s, \ 147 + .id = i, \ 148 + .unit = u, \ 149 + .flag = f \ 150 + } 151 + 152 + enum unit_type { 153 + ARBITRARY_UNIT, /* no translation */ 154 + POWER_UNIT, 155 + ENERGY_UNIT, 156 + TIME_UNIT, 157 + }; 158 + 159 + /* per domain data. used to describe individual knobs such that access function 160 + * can be consolidated into one instead of many inline functions. 161 + */ 162 + struct rapl_primitive_info { 163 + const char *name; 164 + u64 mask; 165 + int shift; 166 + enum rapl_domain_reg_id id; 167 + enum unit_type unit; 168 + u32 flag; 169 + }; 170 + 130 171 /** 131 172 * struct rapl_if_priv: private data for different RAPL interfaces 132 173 * @control_type: Each RAPL interface must have its own powercap ··· 181 142 * registers. 182 143 * @write_raw: Callback for writing RAPL interface specific 183 144 * registers. 184 - * @defaults: internal pointer to interface default settings 185 - * @rpi: internal pointer to interface primitive info 145 + * @defaults: pointer to default settings 146 + * @rpi: pointer to interface primitive info 186 147 */ 187 148 struct rapl_if_priv { 188 149 enum rapl_if_type type; ··· 193 154 int limits[RAPL_DOMAIN_MAX]; 194 155 int (*read_raw)(int id, struct reg_action *ra, bool pmu_ctx); 195 156 int (*write_raw)(int id, struct reg_action *ra); 196 - void *defaults; 197 - void *rpi; 157 + const struct rapl_defaults *defaults; 158 + struct rapl_primitive_info *rpi; 198 159 }; 199 160 200 161 #ifdef CONFIG_PERF_EVENTS ··· 250 211 struct rapl_package *rapl_find_package_domain(int id, struct rapl_if_priv *priv, bool id_is_cpu); 251 212 struct rapl_package *rapl_add_package(int id, struct rapl_if_priv *priv, bool id_is_cpu); 252 213 void rapl_remove_package(struct rapl_package *rp); 214 + int rapl_default_check_unit(struct rapl_domain *rd); 215 + void rapl_default_set_floor_freq(struct rapl_domain *rd, bool mode); 216 + u64 rapl_default_compute_time_window(struct rapl_domain *rd, u64 value, bool to_raw); 253 217 254 218 #ifdef CONFIG_PERF_EVENTS 255 219 int rapl_package_add_pmu(struct rapl_package *rp);
+2 -2
include/linux/powercap.h
··· 238 238 * Advantage of this parameter is that client can embed 239 239 * this data in its data structures and allocate in a 240 240 * single call, preventing multiple allocations. 241 - * @control_type_name: The Name of this control_type, which will be shown 241 + * @name: The Name of this control_type, which will be shown 242 242 * in the sysfs Interface. 243 243 * @ops: Callbacks for control type. This parameter is optional. 244 244 * ··· 277 277 * @name: A name for this zone. 278 278 * @parent: A pointer to the parent power zone instance if any or NULL 279 279 * @ops: Pointer to zone operation callback structure. 280 - * @no_constraints: Number of constraints for this zone 280 + * @nr_constraints: Number of constraints for this zone 281 281 * @const_ops: Pointer to constraint callback structure 282 282 * 283 283 * Register a power zone under a given control type. A power zone must register
+3
include/linux/units.h
··· 57 57 #define MICROWATT_PER_MILLIWATT 1000UL 58 58 #define MICROWATT_PER_WATT 1000000UL 59 59 60 + #define MICROJOULE_PER_JOULE 1000000UL 61 + #define NANOJOULE_PER_JOULE 1000000000UL 62 + 60 63 #define BYTES_PER_KBIT (KILO / BITS_PER_BYTE) 61 64 #define BYTES_PER_MBIT (MEGA / BITS_PER_BYTE) 62 65 #define BYTES_PER_GBIT (GIGA / BITS_PER_BYTE)
+5 -2
kernel/power/user.c
··· 322 322 error = snapshot_write_finalize(&data->handle); 323 323 if (error) 324 324 break; 325 - if (data->mode != O_WRONLY || !data->frozen || 326 - !snapshot_image_loaded(&data->handle)) { 325 + if (data->mode != O_WRONLY || !data->frozen) { 327 326 error = -EPERM; 327 + break; 328 + } 329 + if (!snapshot_image_loaded(&data->handle)) { 330 + error = -ENODATA; 328 331 break; 329 332 } 330 333 error = hibernation_restore(data->platform_support);
+3 -2
kernel/sched/cpufreq_schedutil.c
··· 461 461 unsigned int flags) 462 462 { 463 463 struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util); 464 + struct sugov_policy *sg_policy = sg_cpu->sg_policy; 464 465 unsigned long prev_util = sg_cpu->util; 465 466 unsigned long max_cap; 466 467 ··· 483 482 if (sugov_hold_freq(sg_cpu) && sg_cpu->util < prev_util) 484 483 sg_cpu->util = prev_util; 485 484 486 - cpufreq_driver_adjust_perf(sg_cpu->cpu, sg_cpu->bw_min, 485 + cpufreq_driver_adjust_perf(sg_policy->policy, sg_cpu->bw_min, 487 486 sg_cpu->util, max_cap); 488 487 489 - sg_cpu->sg_policy->last_freq_update_time = time; 488 + sg_policy->last_freq_update_time = time; 490 489 } 491 490 492 491 static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
+6 -7
rust/kernel/cpufreq.rs
··· 1257 1257 /// # Safety 1258 1258 /// 1259 1259 /// - This function may only be called from the cpufreq C infrastructure. 1260 + /// - The pointer arguments must be valid pointers. 1260 1261 unsafe extern "C" fn adjust_perf_callback( 1261 - cpu: c_uint, 1262 + ptr: *mut bindings::cpufreq_policy, 1262 1263 min_perf: c_ulong, 1263 1264 target_perf: c_ulong, 1264 1265 capacity: c_ulong, 1265 1266 ) { 1266 - // SAFETY: The C API guarantees that `cpu` refers to a valid CPU number. 1267 - let cpu_id = unsafe { CpuId::from_u32_unchecked(cpu) }; 1268 - 1269 - if let Ok(mut policy) = PolicyCpu::from_cpu(cpu_id) { 1270 - T::adjust_perf(&mut policy, min_perf, target_perf, capacity); 1271 - } 1267 + // SAFETY: The `ptr` is guaranteed to be valid by the contract with the C code for the 1268 + // lifetime of `policy`. 1269 + let policy = unsafe { Policy::from_raw_mut(ptr) }; 1270 + T::adjust_perf(policy, min_perf, target_perf, capacity); 1272 1271 } 1273 1272 1274 1273 /// Driver's `get_intermediate` callback.
+1 -1
tools/arch/x86/include/asm/cpufeatures.h
··· 415 415 */ 416 416 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* "overflow_recov" MCA overflow recovery support */ 417 417 #define X86_FEATURE_SUCCOR (17*32+ 1) /* "succor" Uncorrectable error containment and recovery */ 418 - 418 + #define X86_FEATURE_CPPC_PERF_PRIO (17*32+ 2) /* CPPC Floor Perf support */ 419 419 #define X86_FEATURE_SMCA (17*32+ 3) /* "smca" Scalable MCA */ 420 420 421 421 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
+7 -1
tools/power/cpupower/man/cpupower-frequency-info.1
··· 32 32 \fB\-g\fR \fB\-\-governors\fR 33 33 Determines available cpufreq governors. 34 34 .TP 35 + \fB\-b\fR \fB\-\-boost\fR 36 + Gets the current boost state support. 37 + .TP 38 + \fB\-z\fR \fB\-\-epp\fR 39 + Gets the current EPP (energy performance preference). 40 + .TP 35 41 \fB\-r\fR \fB\-\-related\-cpus\fR 36 42 Determines which CPUs run at the same hardware frequency. 37 43 .TP ··· 59 53 \fB\-n\fR \fB\-\-no-rounding\fR 60 54 Output frequencies and latencies without rounding off values. 61 55 .TP 62 - \fB\-c\fR \fB\-\-perf\fR 56 + \fB\-c\fR \fB\-\-performance\fR 63 57 Get performances and frequencies capabilities of CPPC, by reading it from hardware (only available on the hardware with CPPC). 64 58 .TP 65 59 .SH "REMARKS"
+2 -2
tools/power/cpupower/man/cpupower-idle-info.1
··· 11 11 .SH "OPTIONS" 12 12 .LP 13 13 .TP 14 - \fB\-f\fR \fB\-\-silent\fR 14 + \fB\-s\fR \fB\-\-silent\fR 15 15 Only print a summary of all available C-states in the system. 16 16 .TP 17 - \fB\-e\fR \fB\-\-proc\fR 17 + \fB\-o\fR \fB\-\-proc\fR 18 18 deprecated. 19 19 Prints out idle information in old /proc/acpi/processor/*/power format. This 20 20 interface has been removed from the kernel for quite some time, do not let
+8 -1
tools/power/cpupower/man/cpupower-info.1
··· 3 3 cpupower\-info \- Shows processor power related kernel or hardware configurations 4 4 .SH SYNOPSIS 5 5 .ft B 6 - .B cpupower info [ \-b ] 6 + .B cpupower info [\fIoptions\fP] 7 7 8 8 .SH DESCRIPTION 9 9 \fBcpupower info \fP shows kernel configurations or processor hardware ··· 12 12 Some options are platform wide, some affect single cores. By default values 13 13 of core zero are displayed only. cpupower --cpu all cpuinfo will show the 14 14 settings of all cores, see cpupower(1) how to choose specific cores. 15 + 16 + .SH "OPTIONS" 17 + .LP 18 + .TP 19 + \fB\-b\fR \fB\-\-perf-bias\fR 20 + Gets the current performance bias value. 21 + .TP 15 22 16 23 .SH "SEE ALSO" 17 24 Options are described in detail in:
-2
tools/power/cpupower/utils/cpufreq-info.c
··· 542 542 543 543 int cmd_freq_info(int argc, char **argv) 544 544 { 545 - extern char *optarg; 546 - extern int optind, opterr, optopt; 547 545 int ret = 0, cont = 1; 548 546 unsigned int cpu = 0; 549 547 unsigned int human = 0;
-2
tools/power/cpupower/utils/cpufreq-set.c
··· 195 195 196 196 int cmd_freq_set(int argc, char **argv) 197 197 { 198 - extern char *optarg; 199 - extern int optind, opterr, optopt; 200 198 int ret = 0, cont = 1; 201 199 int double_parm = 0, related = 0, policychange = 0; 202 200 unsigned long freq = 0;
-2
tools/power/cpupower/utils/cpuidle-info.c
··· 139 139 140 140 int cmd_idle_info(int argc, char **argv) 141 141 { 142 - extern char *optarg; 143 - extern int optind, opterr, optopt; 144 142 int ret = 0, cont = 1, output_param = 0, verbose = 1; 145 143 unsigned int cpu = 0; 146 144
-2
tools/power/cpupower/utils/cpuidle-set.c
··· 24 24 25 25 int cmd_idle_set(int argc, char **argv) 26 26 { 27 - extern char *optarg; 28 - extern int optind, opterr, optopt; 29 27 int ret = 0, cont = 1, param = 0, disabled; 30 28 unsigned long long latency = 0, state_latency; 31 29 unsigned int cpu = 0, idlestate = 0, idlestates = 0;
-2
tools/power/cpupower/utils/cpupower-info.c
··· 28 28 29 29 int cmd_info(int argc, char **argv) 30 30 { 31 - extern char *optarg; 32 - extern int optind, opterr, optopt; 33 31 unsigned int cpu; 34 32 struct utsname uts; 35 33
-2
tools/power/cpupower/utils/cpupower-set.c
··· 33 33 34 34 int cmd_set(int argc, char **argv) 35 35 { 36 - extern char *optarg; 37 - extern int optind, opterr, optopt; 38 36 unsigned int cpu; 39 37 struct utsname uts; 40 38