Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

- Enable turbostat extensions to add both perf and PMT (Intel
Platform Monitoring Technology) counters via the cmdline

- Demonstrate PMT access with built-in support for Meteor Lake's
Die C6 counter

* tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2024.07.26
tools/power turbostat: Include umask=%x in perf counter's config
tools/power turbostat: Document PMT in turbostat.8
tools/power turbostat: Add MTL's PMT DC6 builtin counter
tools/power turbostat: Add early support for PMT counters
tools/power turbostat: Add selftests for added perf counters
tools/power turbostat: Add selftests for SMI, APERF and MPERF counters
tools/power turbostat: Move verbose counter messages to level 2
tools/power turbostat: Move debug prints from stdout to stderr
tools/power turbostat: Fix typo in turbostat.8
tools/power turbostat: Add perf added counter example to turbostat.8
tools/power turbostat: Fix formatting in turbostat.8
tools/power turbostat: Extend --add option with perf counters
tools/power turbostat: Group SMI counter with APERF and MPERF
tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array
tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source
tools/power turbostat: Remove anonymous union from rapl_counter_info_t
tools/power/turbostat: Switch to new Intel CPU model defines

+2275 -496
+1
tools/power/x86/turbostat/Makefile
··· 46 46 @echo "#define GENMASK_ULL(h, l) (((~0ULL) << (l)) & (~0ULL >> (sizeof(long long) * 8 - 1 - (h))))" >> $(SNAPSHOT)/bits.h 47 47 48 48 @echo '#define BUILD_BUG_ON(cond) do { enum { compile_time_check ## __COUNTER__ = 1/(!(cond)) }; } while (0)' > $(SNAPSHOT)/build_bug.h 49 + @echo '#define __must_be_array(arr) 0' >> $(SNAPSHOT)/build_bug.h 49 50 50 51 @echo PWD=. > $(SNAPSHOT)/Makefile 51 52 @echo "CFLAGS += -DMSRHEADER='\"msr-index.h\"'" >> $(SNAPSHOT)/Makefile
+92 -6
tools/power/x86/turbostat/turbostat.8
··· 28 28 .PP 29 29 \fB--add attributes\fP add column with counter having specified 'attributes'. The 'location' attribute is required, all others are optional. 30 30 .nf 31 - location: {\fBmsrDDD\fP | \fBmsr0xXXX\fP | \fB/sys/path...\fP} 31 + location: {\fBmsrDDD\fP | \fBmsr0xXXX\fP | \fB/sys/path...\fP | \fBperf/<device>/<event>\fP} 32 32 msrDDD is a decimal offset, eg. msr16 33 33 msr0xXXX is a hex offset, eg. msr0x10 34 34 /sys/path... is an absolute path to a sysfs attribute 35 + <device> is a perf device from /sys/bus/event_source/devices/<device> eg. cstate_core 36 + <event> is a perf event for given device from /sys/bus/event_source/devices/<device>/events/<event> eg. c1-residency 37 + perf/cstate_core/c1-residency would then use /sys/bus/event_source/devices/cstate_core/events/c1-residency 35 38 36 39 scope: {\fBcpu\fP | \fBcore\fP | \fBpackage\fP} 37 40 sample and print the counter for every cpu, core, or package. ··· 55 52 as the column header. 56 53 .fi 57 54 .PP 55 + \fB--add pmt,[attr_name=attr_value, ...]\fP add column with a PMT (Intel Platform Monitoring Technology) counter in a similar way to --add option above, but require PMT metadata to be supplied to correctly read and display the counter. The metadata can be found in the Intel PMT XML files, hosted at https://github.com/intel/Intel-PMT. For a complete example see "ADD PMT COUNTER EXAMPLE". 56 + .nf 57 + name="name_string" 58 + For column header. 59 + 60 + type={\fBraw\fP} 61 + 'raw' shows the counter contents in hex. 62 + default: raw 63 + 64 + format={\fBraw\fP | \fBdelta\fP} 65 + 'raw' shows the counter contents in hex. 66 + 'delta' shows the difference in values during the measurement interval. 67 + default: raw 68 + 69 + domain={\fBcpu%u\fP | \fBcore%u\fP | \fBpackage%u\fP} 70 + 'cpu' per cpu/thread counter. 71 + 'core' per core counter. 72 + 'package' per package counter. 73 + '%u' denotes id of the domain that the counter is associated with. For example core4 would mean that the counter is associated with core number 4. 74 + 75 + offset=\fB%u\fP 76 + '%u' offset within the PMT MMIO region. 77 + 78 + lsb=\fB%u\fP 79 + '%u' least significant bit within the 64 bit value read from 'offset'. Together with 'msb', used to form a read mask. 80 + 81 + msb=\fB%u\fP 82 + '%u' most significant bit within the 64 bit value read from 'offset'. Together with 'lsb', used to form a read mask. 83 + 84 + guid=\fB%x\fP 85 + '%x' hex identifier of the PMT MMIO region. 86 + .fi 87 + .PP 58 88 \fB--cpu cpu-set\fP limit output to system summary plus the specified cpu-set. If cpu-set is the string "core", then the system summary plus the first CPU in each core are printed -- eg. subsequent HT siblings are not printed. Or if cpu-set is the string "package", then the system summary plus the first CPU in each package is printed. Otherwise, the system summary plus the specified set of CPUs are printed. The cpu-set is ordered from low to high, comma delimited with ".." and "-" permitted to denote a range. eg. 1,2,8,14..17,21-44 59 89 .PP 60 90 \fB--hide column\fP do not show the specified built-in columns. May be invoked multiple times, or with a comma-separated list of column names. ··· 103 67 .PP 104 68 \fB--quiet\fP Do not decode and print the system configuration header information. 105 69 .PP 106 - +\fB--no-msr\fP Disable all the uses of the MSR driver. 107 - +.PP 108 - +\fB--no-perf\fP Disable all the uses of the perf API. 109 - +.PP 70 + \fB--no-msr\fP Disable all the uses of the MSR driver. 71 + .PP 72 + \fB--no-perf\fP Disable all the uses of the perf API. 73 + .PP 110 74 \fB--interval seconds\fP overrides the default 5.0 second measurement interval. 111 75 .PP 112 76 \fB--num_iterations num\fP number of the measurement iterations. ··· 356 320 Here we limit turbostat to showing just the CPU number for cpu0 - cpu3. 357 321 We add a counter showing the 32-bit raw value of MSR 0x199 (MSR_IA32_PERF_CTL), 358 322 labeling it with the column header, "PRF_CTRL", and display it only once, 359 - afte the conclusion of a 0.1 second sleep. 323 + after the conclusion of a 0.1 second sleep. 360 324 .nf 361 325 sudo ./turbostat --quiet --cpu 0-3 --show CPU --add msr0x199,u32,raw,PRF_CTRL sleep .1 362 326 0.101604 sec ··· 367 331 2 0x00000a00 368 332 3 0x00000800 369 333 334 + .fi 335 + 336 + .SH ADD PERF COUNTER EXAMPLE 337 + Here we limit turbostat to showing just the CPU number for cpu0 - cpu3. 338 + We add a counter showing time spent in C1 core cstate, 339 + labeling it with the column header, "pCPU%c1", and display it only once, 340 + after the conclusion of 0.1 second sleep. 341 + We also show CPU%c1 built-in counter that should show similar values. 342 + .nf 343 + sudo ./turbostat --quiet --cpu 0-3 --show CPU,CPU%c1 --add perf/cstate_core/c1-residency,cpu,delta,percent,pCPU%c1 sleep .1 344 + 0.102448 sec 345 + CPU pCPU%c1 CPU%c1 346 + - 34.89 34.89 347 + 0 45.99 45.99 348 + 1 45.94 45.94 349 + 2 23.83 23.83 350 + 3 23.84 23.84 351 + 352 + .fi 353 + 354 + .SH ADD PMT COUNTER EXAMPLE 355 + Here we limit turbostat to showing just the CPU number 0. 356 + We add two counters, showing crystal clock count and the DC6 residency. 357 + All the parameters passed are based on the metadata found in the PMT XML files. 358 + 359 + For the crystal clock count, we 360 + label it with the column header, "XTAL", 361 + we set the type to 'raw', to read the number of clock ticks in hex, 362 + we set the format to 'delta', to display the difference in ticks during the measurement interval, 363 + we set the domain to 'package0', to collect it and associate it with the whole package number 0, 364 + we set the offset to '0', which is a offset of the counter within the PMT MMIO region, 365 + we set the lsb and msb to cover all 64 bits of the read 64 bit value, 366 + and finally we set the guid to '0x1a067102', that identifies the PMT MMIO region to which the 'offset' is applied to read the counter value. 367 + 368 + For the DC6 residency counter, we 369 + label it with the column header, "Die%c6", 370 + we set the type to 'txtal_time', to obtain the percent residency value 371 + we set the format to 'delta', to display the difference in ticks during the measurement interval, 372 + we set the domain to 'package0', to collect it and associate it with the whole package number 0, 373 + we set the offset to '0', which is a offset of the counter within the PMT MMIO region, 374 + we set the lsb and msb to cover all 64 bits of the read 64 bit value, 375 + and finally we set the guid to '0x1a067102', that identifies the PMT MMIO region to which the 'offset' is applied to read the counter value. 376 + 377 + .nf 378 + sudo ./turbostat --quiet --cpu 0 --show CPU --add pmt,name=XTAL,type=raw,format=delta,domain=package0,offset=0,lsb=0,msb=63,guid=0x1a067102 --add pmt,name=Die%c6,type=txtal_time,format=delta,domain=package0,offset=120,lsb=0,msb=63,guid=0x1a067102 379 + 0.104352 sec 380 + CPU XTAL Die%c6 381 + - 0x0000006d4d957ca7 0.00 382 + 0 0x0000006d4d957ca7 0.00 383 + 0.102448 sec 370 384 .fi 371 385 372 386 .SH INPUT
+1847 -490
tools/power/x86/turbostat/turbostat.c
··· 9 9 10 10 #define _GNU_SOURCE 11 11 #include MSRHEADER 12 + 13 + // copied from arch/x86/include/asm/cpu_device_id.h 14 + #define VFM_MODEL_BIT 0 15 + #define VFM_FAMILY_BIT 8 16 + #define VFM_VENDOR_BIT 16 17 + #define VFM_RSVD_BIT 24 18 + 19 + #define VFM_MODEL_MASK GENMASK(VFM_FAMILY_BIT - 1, VFM_MODEL_BIT) 20 + #define VFM_FAMILY_MASK GENMASK(VFM_VENDOR_BIT - 1, VFM_FAMILY_BIT) 21 + #define VFM_VENDOR_MASK GENMASK(VFM_RSVD_BIT - 1, VFM_VENDOR_BIT) 22 + 23 + #define VFM_MODEL(vfm) (((vfm) & VFM_MODEL_MASK) >> VFM_MODEL_BIT) 24 + #define VFM_FAMILY(vfm) (((vfm) & VFM_FAMILY_MASK) >> VFM_FAMILY_BIT) 25 + #define VFM_VENDOR(vfm) (((vfm) & VFM_VENDOR_MASK) >> VFM_VENDOR_BIT) 26 + 27 + #define VFM_MAKE(_vendor, _family, _model) ( \ 28 + ((_model) << VFM_MODEL_BIT) | \ 29 + ((_family) << VFM_FAMILY_BIT) | \ 30 + ((_vendor) << VFM_VENDOR_BIT) \ 31 + ) 32 + // end copied section 33 + 34 + #define X86_VENDOR_INTEL 0 35 + 12 36 #include INTEL_FAMILY_HEADER 13 37 #include BUILD_BUG_HEADER 14 38 #include <stdarg.h> ··· 44 20 #include <sys/stat.h> 45 21 #include <sys/select.h> 46 22 #include <sys/resource.h> 23 + #include <sys/mman.h> 47 24 #include <fcntl.h> 48 25 #include <signal.h> 49 26 #include <sys/time.h> ··· 80 55 */ 81 56 #define NAME_BYTES 20 82 57 #define PATH_BYTES 128 58 + #define PERF_NAME_BYTES 128 83 59 84 60 #define MAX_NOFILE 0x8000 61 + 62 + #define COUNTER_KIND_PERF_PREFIX "perf/" 63 + #define COUNTER_KIND_PERF_PREFIX_LEN strlen(COUNTER_KIND_PERF_PREFIX) 64 + #define PERF_DEV_NAME_BYTES 32 65 + #define PERF_EVT_NAME_BYTES 32 85 66 86 67 enum counter_scope { SCOPE_CPU, SCOPE_CORE, SCOPE_PACKAGE }; 87 68 enum counter_type { COUNTER_ITEMS, COUNTER_CYCLES, COUNTER_SECONDS, COUNTER_USEC, COUNTER_K2M }; 88 69 enum counter_format { FORMAT_RAW, FORMAT_DELTA, FORMAT_PERCENT, FORMAT_AVERAGE }; 89 - enum amperf_source { AMPERF_SOURCE_PERF, AMPERF_SOURCE_MSR }; 90 - enum rapl_source { RAPL_SOURCE_NONE, RAPL_SOURCE_PERF, RAPL_SOURCE_MSR }; 91 - enum cstate_source { CSTATE_SOURCE_NONE, CSTATE_SOURCE_PERF, CSTATE_SOURCE_MSR }; 70 + enum counter_source { COUNTER_SOURCE_NONE, COUNTER_SOURCE_PERF, COUNTER_SOURCE_MSR }; 71 + 72 + struct perf_counter_info { 73 + struct perf_counter_info *next; 74 + 75 + /* How to open the counter / What counter it is. */ 76 + char device[PERF_DEV_NAME_BYTES]; 77 + char event[PERF_EVT_NAME_BYTES]; 78 + 79 + /* How to show/format the counter. */ 80 + char name[PERF_NAME_BYTES]; 81 + unsigned int width; 82 + enum counter_scope scope; 83 + enum counter_type type; 84 + enum counter_format format; 85 + double scale; 86 + 87 + /* For reading the counter. */ 88 + int *fd_perf_per_domain; 89 + size_t num_domains; 90 + }; 92 91 93 92 struct sysfs_path { 94 93 char path[PATH_BYTES]; ··· 193 144 { 0x0, "SAM%mc6", NULL, 0, 0, 0, NULL, 0 }, 194 145 { 0x0, "SAMMHz", NULL, 0, 0, 0, NULL, 0 }, 195 146 { 0x0, "SAMAMHz", NULL, 0, 0, 0, NULL, 0 }, 147 + { 0x0, "Die%c6", NULL, 0, 0, 0, NULL, 0 }, 196 148 }; 197 149 198 150 #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter)) ··· 255 205 #define BIC_SAM_mc6 (1ULL << 55) 256 206 #define BIC_SAMMHz (1ULL << 56) 257 207 #define BIC_SAMACTMHz (1ULL << 57) 208 + #define BIC_Diec6 (1ULL << 58) 258 209 259 210 #define BIC_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die ) 260 211 #define BIC_THERMAL_PWR ( BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__) 261 212 #define BIC_FREQUENCY (BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz | BIC_SAMMHz | BIC_SAMACTMHz | BIC_UNCORE_MHZ) 262 - #define BIC_IDLE (BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6) 213 + #define BIC_IDLE (BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6 | BIC_Diec6) 263 214 #define BIC_OTHER ( BIC_IRQ | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC) 264 215 265 216 #define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC) ··· 303 252 FILE *outf; 304 253 int *fd_percpu; 305 254 int *fd_instr_count_percpu; 306 - struct amperf_group_fd *fd_amperf_percpu; /* File descriptors for perf group with APERF and MPERF counters. */ 307 255 struct timeval interval_tv = { 5, 0 }; 308 256 struct timespec interval_ts = { 5, 0 }; 309 257 ··· 317 267 unsigned int list_header_only; 318 268 unsigned int dump_only; 319 269 unsigned int has_aperf; 270 + unsigned int has_aperf_access; 320 271 unsigned int has_epb; 321 272 unsigned int has_turbo; 322 273 unsigned int is_hybrid; ··· 358 307 int ignore_stdin; 359 308 bool no_msr; 360 309 bool no_perf; 361 - enum amperf_source amperf_source; 362 310 363 311 enum gfx_sysfs_idx { 364 312 GFX_rc6, ··· 417 367 }; 418 368 419 369 struct platform_data { 420 - unsigned int model; 370 + unsigned int vfm; 421 371 const struct platform_features *features; 422 372 }; 423 373 ··· 960 910 }; 961 911 962 912 static const struct platform_data turbostat_pdata[] = { 963 - { INTEL_FAM6_NEHALEM, &nhm_features }, 964 - { INTEL_FAM6_NEHALEM_G, &nhm_features }, 965 - { INTEL_FAM6_NEHALEM_EP, &nhm_features }, 966 - { INTEL_FAM6_NEHALEM_EX, &nhx_features }, 967 - { INTEL_FAM6_WESTMERE, &nhm_features }, 968 - { INTEL_FAM6_WESTMERE_EP, &nhm_features }, 969 - { INTEL_FAM6_WESTMERE_EX, &nhx_features }, 970 - { INTEL_FAM6_SANDYBRIDGE, &snb_features }, 971 - { INTEL_FAM6_SANDYBRIDGE_X, &snx_features }, 972 - { INTEL_FAM6_IVYBRIDGE, &ivb_features }, 973 - { INTEL_FAM6_IVYBRIDGE_X, &ivx_features }, 974 - { INTEL_FAM6_HASWELL, &hsw_features }, 975 - { INTEL_FAM6_HASWELL_X, &hsx_features }, 976 - { INTEL_FAM6_HASWELL_L, &hswl_features }, 977 - { INTEL_FAM6_HASWELL_G, &hswg_features }, 978 - { INTEL_FAM6_BROADWELL, &bdw_features }, 979 - { INTEL_FAM6_BROADWELL_G, &bdwg_features }, 980 - { INTEL_FAM6_BROADWELL_X, &bdx_features }, 981 - { INTEL_FAM6_BROADWELL_D, &bdx_features }, 982 - { INTEL_FAM6_SKYLAKE_L, &skl_features }, 983 - { INTEL_FAM6_SKYLAKE, &skl_features }, 984 - { INTEL_FAM6_SKYLAKE_X, &skx_features }, 985 - { INTEL_FAM6_KABYLAKE_L, &skl_features }, 986 - { INTEL_FAM6_KABYLAKE, &skl_features }, 987 - { INTEL_FAM6_COMETLAKE, &skl_features }, 988 - { INTEL_FAM6_COMETLAKE_L, &skl_features }, 989 - { INTEL_FAM6_CANNONLAKE_L, &cnl_features }, 990 - { INTEL_FAM6_ICELAKE_X, &icx_features }, 991 - { INTEL_FAM6_ICELAKE_D, &icx_features }, 992 - { INTEL_FAM6_ICELAKE_L, &cnl_features }, 993 - { INTEL_FAM6_ICELAKE_NNPI, &cnl_features }, 994 - { INTEL_FAM6_ROCKETLAKE, &cnl_features }, 995 - { INTEL_FAM6_TIGERLAKE_L, &cnl_features }, 996 - { INTEL_FAM6_TIGERLAKE, &cnl_features }, 997 - { INTEL_FAM6_SAPPHIRERAPIDS_X, &spr_features }, 998 - { INTEL_FAM6_EMERALDRAPIDS_X, &spr_features }, 999 - { INTEL_FAM6_GRANITERAPIDS_X, &spr_features }, 1000 - { INTEL_FAM6_LAKEFIELD, &cnl_features }, 1001 - { INTEL_FAM6_ALDERLAKE, &adl_features }, 1002 - { INTEL_FAM6_ALDERLAKE_L, &adl_features }, 1003 - { INTEL_FAM6_RAPTORLAKE, &adl_features }, 1004 - { INTEL_FAM6_RAPTORLAKE_P, &adl_features }, 1005 - { INTEL_FAM6_RAPTORLAKE_S, &adl_features }, 1006 - { INTEL_FAM6_METEORLAKE, &cnl_features }, 1007 - { INTEL_FAM6_METEORLAKE_L, &cnl_features }, 1008 - { INTEL_FAM6_ARROWLAKE_H, &arl_features }, 1009 - { INTEL_FAM6_ARROWLAKE_U, &arl_features }, 1010 - { INTEL_FAM6_ARROWLAKE, &arl_features }, 1011 - { INTEL_FAM6_LUNARLAKE_M, &arl_features }, 1012 - { INTEL_FAM6_ATOM_SILVERMONT, &slv_features }, 1013 - { INTEL_FAM6_ATOM_SILVERMONT_D, &slvd_features }, 1014 - { INTEL_FAM6_ATOM_AIRMONT, &amt_features }, 1015 - { INTEL_FAM6_ATOM_GOLDMONT, &gmt_features }, 1016 - { INTEL_FAM6_ATOM_GOLDMONT_D, &gmtd_features }, 1017 - { INTEL_FAM6_ATOM_GOLDMONT_PLUS, &gmtp_features }, 1018 - { INTEL_FAM6_ATOM_TREMONT_D, &tmtd_features }, 1019 - { INTEL_FAM6_ATOM_TREMONT, &tmt_features }, 1020 - { INTEL_FAM6_ATOM_TREMONT_L, &tmt_features }, 1021 - { INTEL_FAM6_ATOM_GRACEMONT, &adl_features }, 1022 - { INTEL_FAM6_ATOM_CRESTMONT_X, &srf_features }, 1023 - { INTEL_FAM6_ATOM_CRESTMONT, &grr_features }, 1024 - { INTEL_FAM6_XEON_PHI_KNL, &knl_features }, 1025 - { INTEL_FAM6_XEON_PHI_KNM, &knl_features }, 913 + { INTEL_NEHALEM, &nhm_features }, 914 + { INTEL_NEHALEM_G, &nhm_features }, 915 + { INTEL_NEHALEM_EP, &nhm_features }, 916 + { INTEL_NEHALEM_EX, &nhx_features }, 917 + { INTEL_WESTMERE, &nhm_features }, 918 + { INTEL_WESTMERE_EP, &nhm_features }, 919 + { INTEL_WESTMERE_EX, &nhx_features }, 920 + { INTEL_SANDYBRIDGE, &snb_features }, 921 + { INTEL_SANDYBRIDGE_X, &snx_features }, 922 + { INTEL_IVYBRIDGE, &ivb_features }, 923 + { INTEL_IVYBRIDGE_X, &ivx_features }, 924 + { INTEL_HASWELL, &hsw_features }, 925 + { INTEL_HASWELL_X, &hsx_features }, 926 + { INTEL_HASWELL_L, &hswl_features }, 927 + { INTEL_HASWELL_G, &hswg_features }, 928 + { INTEL_BROADWELL, &bdw_features }, 929 + { INTEL_BROADWELL_G, &bdwg_features }, 930 + { INTEL_BROADWELL_X, &bdx_features }, 931 + { INTEL_BROADWELL_D, &bdx_features }, 932 + { INTEL_SKYLAKE_L, &skl_features }, 933 + { INTEL_SKYLAKE, &skl_features }, 934 + { INTEL_SKYLAKE_X, &skx_features }, 935 + { INTEL_KABYLAKE_L, &skl_features }, 936 + { INTEL_KABYLAKE, &skl_features }, 937 + { INTEL_COMETLAKE, &skl_features }, 938 + { INTEL_COMETLAKE_L, &skl_features }, 939 + { INTEL_CANNONLAKE_L, &cnl_features }, 940 + { INTEL_ICELAKE_X, &icx_features }, 941 + { INTEL_ICELAKE_D, &icx_features }, 942 + { INTEL_ICELAKE_L, &cnl_features }, 943 + { INTEL_ICELAKE_NNPI, &cnl_features }, 944 + { INTEL_ROCKETLAKE, &cnl_features }, 945 + { INTEL_TIGERLAKE_L, &cnl_features }, 946 + { INTEL_TIGERLAKE, &cnl_features }, 947 + { INTEL_SAPPHIRERAPIDS_X, &spr_features }, 948 + { INTEL_EMERALDRAPIDS_X, &spr_features }, 949 + { INTEL_GRANITERAPIDS_X, &spr_features }, 950 + { INTEL_LAKEFIELD, &cnl_features }, 951 + { INTEL_ALDERLAKE, &adl_features }, 952 + { INTEL_ALDERLAKE_L, &adl_features }, 953 + { INTEL_RAPTORLAKE, &adl_features }, 954 + { INTEL_RAPTORLAKE_P, &adl_features }, 955 + { INTEL_RAPTORLAKE_S, &adl_features }, 956 + { INTEL_METEORLAKE, &cnl_features }, 957 + { INTEL_METEORLAKE_L, &cnl_features }, 958 + { INTEL_ARROWLAKE_H, &arl_features }, 959 + { INTEL_ARROWLAKE_U, &arl_features }, 960 + { INTEL_ARROWLAKE, &arl_features }, 961 + { INTEL_LUNARLAKE_M, &arl_features }, 962 + { INTEL_ATOM_SILVERMONT, &slv_features }, 963 + { INTEL_ATOM_SILVERMONT_D, &slvd_features }, 964 + { INTEL_ATOM_AIRMONT, &amt_features }, 965 + { INTEL_ATOM_GOLDMONT, &gmt_features }, 966 + { INTEL_ATOM_GOLDMONT_D, &gmtd_features }, 967 + { INTEL_ATOM_GOLDMONT_PLUS, &gmtp_features }, 968 + { INTEL_ATOM_TREMONT_D, &tmtd_features }, 969 + { INTEL_ATOM_TREMONT, &tmt_features }, 970 + { INTEL_ATOM_TREMONT_L, &tmt_features }, 971 + { INTEL_ATOM_GRACEMONT, &adl_features }, 972 + { INTEL_ATOM_CRESTMONT_X, &srf_features }, 973 + { INTEL_ATOM_CRESTMONT, &grr_features }, 974 + { INTEL_XEON_PHI_KNL, &knl_features }, 975 + { INTEL_XEON_PHI_KNM, &knl_features }, 1026 976 /* 1027 977 * Missing support for 1028 - * INTEL_FAM6_ICELAKE 1029 - * INTEL_FAM6_ATOM_SILVERMONT_MID 1030 - * INTEL_FAM6_ATOM_AIRMONT_MID 1031 - * INTEL_FAM6_ATOM_AIRMONT_NP 978 + * INTEL_ICELAKE 979 + * INTEL_ATOM_SILVERMONT_MID 980 + * INTEL_ATOM_AIRMONT_MID 981 + * INTEL_ATOM_AIRMONT_NP 1032 982 */ 1033 983 { 0, NULL }, 1034 984 }; ··· 1053 1003 return; 1054 1004 } 1055 1005 1056 - if (!genuine_intel || family != 6) 1006 + if (!genuine_intel) 1057 1007 return; 1058 1008 1059 1009 for (i = 0; turbostat_pdata[i].features; i++) { 1060 - if (turbostat_pdata[i].model == model) { 1010 + if (VFM_FAMILY(turbostat_pdata[i].vfm) == family && VFM_MODEL(turbostat_pdata[i].vfm) == model) { 1061 1011 platform = turbostat_pdata[i].features; 1062 1012 return; 1063 1013 } ··· 1084 1034 #define MAX_ADDED_THREAD_COUNTERS 24 1085 1035 #define MAX_ADDED_CORE_COUNTERS 8 1086 1036 #define MAX_ADDED_PACKAGE_COUNTERS 16 1037 + #define PMT_MAX_ADDED_THREAD_COUNTERS 24 1038 + #define PMT_MAX_ADDED_CORE_COUNTERS 8 1039 + #define PMT_MAX_ADDED_PACKAGE_COUNTERS 16 1087 1040 #define BITMASK_SIZE 32 1041 + 1042 + #define ZERO_ARRAY(arr) (memset(arr, 0, sizeof(arr)) + __must_be_array(arr)) 1088 1043 1089 1044 /* Indexes used to map data read from perf and MSRs into global variables */ 1090 1045 enum rapl_rci_index { ··· 1111 1056 1112 1057 struct rapl_counter_info_t { 1113 1058 unsigned long long data[NUM_RAPL_COUNTERS]; 1114 - enum rapl_source source[NUM_RAPL_COUNTERS]; 1059 + enum counter_source source[NUM_RAPL_COUNTERS]; 1115 1060 unsigned long long flags[NUM_RAPL_COUNTERS]; 1116 1061 double scale[NUM_RAPL_COUNTERS]; 1117 1062 enum rapl_unit unit[NUM_RAPL_COUNTERS]; 1118 - 1119 - union { 1120 - /* Active when source == RAPL_SOURCE_MSR */ 1121 - struct { 1122 - unsigned long long msr[NUM_RAPL_COUNTERS]; 1123 - unsigned long long msr_mask[NUM_RAPL_COUNTERS]; 1124 - int msr_shift[NUM_RAPL_COUNTERS]; 1125 - }; 1126 - }; 1063 + unsigned long long msr[NUM_RAPL_COUNTERS]; 1064 + unsigned long long msr_mask[NUM_RAPL_COUNTERS]; 1065 + int msr_shift[NUM_RAPL_COUNTERS]; 1127 1066 1128 1067 int fd_perf; 1129 1068 }; ··· 1273 1224 1274 1225 struct cstate_counter_info_t { 1275 1226 unsigned long long data[NUM_CSTATE_COUNTERS]; 1276 - enum cstate_source source[NUM_CSTATE_COUNTERS]; 1227 + enum counter_source source[NUM_CSTATE_COUNTERS]; 1277 1228 unsigned long long msr[NUM_CSTATE_COUNTERS]; 1278 1229 int fd_perf_core; 1279 1230 int fd_perf_pkg; ··· 1410 1361 }, 1411 1362 }; 1412 1363 1364 + /* Indexes used to map data read from perf and MSRs into global variables */ 1365 + enum msr_rci_index { 1366 + MSR_RCI_INDEX_APERF = 0, 1367 + MSR_RCI_INDEX_MPERF = 1, 1368 + MSR_RCI_INDEX_SMI = 2, 1369 + NUM_MSR_COUNTERS, 1370 + }; 1371 + 1372 + struct msr_counter_info_t { 1373 + unsigned long long data[NUM_MSR_COUNTERS]; 1374 + enum counter_source source[NUM_MSR_COUNTERS]; 1375 + unsigned long long msr[NUM_MSR_COUNTERS]; 1376 + unsigned long long msr_mask[NUM_MSR_COUNTERS]; 1377 + int fd_perf; 1378 + }; 1379 + 1380 + struct msr_counter_info_t *msr_counter_info; 1381 + unsigned int msr_counter_info_size; 1382 + 1383 + struct msr_counter_arch_info { 1384 + const char *perf_subsys; 1385 + const char *perf_name; 1386 + unsigned long long msr; 1387 + unsigned long long msr_mask; 1388 + unsigned int rci_index; /* Maps data from perf counters to global variables */ 1389 + bool needed; 1390 + bool present; 1391 + }; 1392 + 1393 + enum msr_arch_info_index { 1394 + MSR_ARCH_INFO_APERF_INDEX = 0, 1395 + MSR_ARCH_INFO_MPERF_INDEX = 1, 1396 + MSR_ARCH_INFO_SMI_INDEX = 2, 1397 + }; 1398 + 1399 + static struct msr_counter_arch_info msr_counter_arch_infos[] = { 1400 + [MSR_ARCH_INFO_APERF_INDEX] = { 1401 + .perf_subsys = "msr", 1402 + .perf_name = "aperf", 1403 + .msr = MSR_IA32_APERF, 1404 + .msr_mask = 0xFFFFFFFFFFFFFFFF, 1405 + .rci_index = MSR_RCI_INDEX_APERF, 1406 + }, 1407 + 1408 + [MSR_ARCH_INFO_MPERF_INDEX] = { 1409 + .perf_subsys = "msr", 1410 + .perf_name = "mperf", 1411 + .msr = MSR_IA32_MPERF, 1412 + .msr_mask = 0xFFFFFFFFFFFFFFFF, 1413 + .rci_index = MSR_RCI_INDEX_MPERF, 1414 + }, 1415 + 1416 + [MSR_ARCH_INFO_SMI_INDEX] = { 1417 + .perf_subsys = "msr", 1418 + .perf_name = "smi", 1419 + .msr = MSR_SMI_COUNT, 1420 + .msr_mask = 0xFFFFFFFF, 1421 + .rci_index = MSR_RCI_INDEX_SMI, 1422 + }, 1423 + }; 1424 + 1425 + /* Can be redefined when compiling, useful for testing. */ 1426 + #ifndef SYSFS_TELEM_PATH 1427 + #define SYSFS_TELEM_PATH "/sys/class/intel_pmt" 1428 + #endif 1429 + 1430 + #define PMT_COUNTER_MTL_DC6_OFFSET 120 1431 + #define PMT_COUNTER_MTL_DC6_LSB 0 1432 + #define PMT_COUNTER_MTL_DC6_MSB 63 1433 + #define PMT_MTL_DC6_GUID 0x1a067102 1434 + 1435 + #define PMT_COUNTER_NAME_SIZE_BYTES 16 1436 + #define PMT_COUNTER_TYPE_NAME_SIZE_BYTES 32 1437 + 1438 + struct pmt_mmio { 1439 + struct pmt_mmio *next; 1440 + 1441 + unsigned int guid; 1442 + unsigned int size; 1443 + 1444 + /* Base pointer to the mmaped memory. */ 1445 + void *mmio_base; 1446 + 1447 + /* 1448 + * Offset to be applied to the mmio_base 1449 + * to get the beginning of the PMT counters for given GUID. 1450 + */ 1451 + unsigned long pmt_offset; 1452 + } *pmt_mmios; 1453 + 1454 + enum pmt_datatype { 1455 + PMT_TYPE_RAW, 1456 + PMT_TYPE_XTAL_TIME, 1457 + }; 1458 + 1459 + struct pmt_domain_info { 1460 + /* 1461 + * Pointer to the MMIO obtained by applying a counter offset 1462 + * to the mmio_base of the mmaped region for the given GUID. 1463 + * 1464 + * This is where to read the raw value of the counter from. 1465 + */ 1466 + unsigned long *pcounter; 1467 + }; 1468 + 1469 + struct pmt_counter { 1470 + struct pmt_counter *next; 1471 + 1472 + /* PMT metadata */ 1473 + char name[PMT_COUNTER_NAME_SIZE_BYTES]; 1474 + enum pmt_datatype type; 1475 + enum counter_scope scope; 1476 + unsigned int lsb; 1477 + unsigned int msb; 1478 + 1479 + /* BIC-like metadata */ 1480 + enum counter_format format; 1481 + 1482 + unsigned int num_domains; 1483 + struct pmt_domain_info *domains; 1484 + }; 1485 + 1486 + unsigned int pmt_counter_get_width(const struct pmt_counter *p) 1487 + { 1488 + return (p->msb - p->lsb) + 1; 1489 + } 1490 + 1491 + void pmt_counter_resize_(struct pmt_counter *pcounter, unsigned int new_size) 1492 + { 1493 + struct pmt_domain_info *new_mem; 1494 + 1495 + new_mem = (struct pmt_domain_info *)reallocarray(pcounter->domains, new_size, sizeof(*pcounter->domains)); 1496 + if (!new_mem) { 1497 + fprintf(stderr, "%s: failed to allocate memory for PMT counters\n", __func__); 1498 + exit(1); 1499 + } 1500 + 1501 + /* Zero initialize just allocated memory. */ 1502 + const size_t num_new_domains = new_size - pcounter->num_domains; 1503 + 1504 + memset(&new_mem[pcounter->num_domains], 0, num_new_domains * sizeof(*pcounter->domains)); 1505 + 1506 + pcounter->num_domains = new_size; 1507 + pcounter->domains = new_mem; 1508 + } 1509 + 1510 + void pmt_counter_resize(struct pmt_counter *pcounter, unsigned int new_size) 1511 + { 1512 + /* 1513 + * Allocate more memory ahead of time. 1514 + * 1515 + * Always allocate space for at least 8 elements 1516 + * and double the size when growing. 1517 + */ 1518 + if (new_size < 8) 1519 + new_size = 8; 1520 + new_size = MAX(new_size, pcounter->num_domains * 2); 1521 + 1522 + pmt_counter_resize_(pcounter, new_size); 1523 + } 1524 + 1413 1525 struct thread_data { 1414 1526 struct timeval tv_begin; 1415 1527 struct timeval tv_end; ··· 1588 1378 unsigned int flags; 1589 1379 bool is_atom; 1590 1380 unsigned long long counter[MAX_ADDED_THREAD_COUNTERS]; 1381 + unsigned long long perf_counter[MAX_ADDED_THREAD_COUNTERS]; 1382 + unsigned long long pmt_counter[PMT_MAX_ADDED_THREAD_COUNTERS]; 1591 1383 } *thread_even, *thread_odd; 1592 1384 1593 1385 struct core_data { ··· 1603 1391 unsigned int core_id; 1604 1392 unsigned long long core_throt_cnt; 1605 1393 unsigned long long counter[MAX_ADDED_CORE_COUNTERS]; 1394 + unsigned long long perf_counter[MAX_ADDED_CORE_COUNTERS]; 1395 + unsigned long long pmt_counter[PMT_MAX_ADDED_CORE_COUNTERS]; 1606 1396 } *core_even, *core_odd; 1607 1397 1608 1398 struct pkg_data { ··· 1637 1423 struct rapl_counter rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */ 1638 1424 unsigned int pkg_temp_c; 1639 1425 unsigned int uncore_mhz; 1426 + unsigned long long die_c6; 1640 1427 unsigned long long counter[MAX_ADDED_PACKAGE_COUNTERS]; 1428 + unsigned long long perf_counter[MAX_ADDED_PACKAGE_COUNTERS]; 1429 + unsigned long long pmt_counter[PMT_MAX_ADDED_PACKAGE_COUNTERS]; 1641 1430 } *package_even, *package_odd; 1642 1431 1643 1432 #define ODD_COUNTERS thread_odd, core_odd, package_odd ··· 1775 1558 } 1776 1559 1777 1560 struct sys_counters { 1561 + /* MSR added counters */ 1778 1562 unsigned int added_thread_counters; 1779 1563 unsigned int added_core_counters; 1780 1564 unsigned int added_package_counters; 1781 1565 struct msr_counter *tp; 1782 1566 struct msr_counter *cp; 1783 1567 struct msr_counter *pp; 1568 + 1569 + /* perf added counters */ 1570 + unsigned int added_thread_perf_counters; 1571 + unsigned int added_core_perf_counters; 1572 + unsigned int added_package_perf_counters; 1573 + struct perf_counter_info *perf_tp; 1574 + struct perf_counter_info *perf_cp; 1575 + struct perf_counter_info *perf_pp; 1576 + 1577 + struct pmt_counter *pmt_tp; 1578 + struct pmt_counter *pmt_cp; 1579 + struct pmt_counter *pmt_pp; 1784 1580 } sys; 1785 1581 1786 1582 static size_t free_msr_counters_(struct msr_counter **pp) ··· 1977 1747 1978 1748 static void bic_disable_msr_access(void) 1979 1749 { 1980 - const unsigned long bic_msrs = BIC_SMI | BIC_Mod_c6 | BIC_CoreTmp | 1750 + const unsigned long bic_msrs = BIC_Mod_c6 | BIC_CoreTmp | 1981 1751 BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_PkgTmp; 1982 1752 1983 1753 bic_enabled &= ~bic_msrs; ··· 2053 1823 return 0; 2054 1824 } 2055 1825 1826 + /* Convert CPU ID to domain ID for given added perf counter. */ 1827 + unsigned int cpu_to_domain(const struct perf_counter_info *pc, int cpu) 1828 + { 1829 + switch (pc->scope) { 1830 + case SCOPE_CPU: 1831 + return cpu; 1832 + 1833 + case SCOPE_CORE: 1834 + return cpus[cpu].physical_core_id; 1835 + 1836 + case SCOPE_PACKAGE: 1837 + return cpus[cpu].physical_package_id; 1838 + } 1839 + 1840 + __builtin_unreachable(); 1841 + } 1842 + 2056 1843 #define MAX_DEFERRED 16 2057 1844 char *deferred_add_names[MAX_DEFERRED]; 2058 1845 char *deferred_skip_names[MAX_DEFERRED]; ··· 2093 1846 "to print statistics, until interrupted.\n" 2094 1847 " -a, --add add a counter\n" 2095 1848 " eg. --add msr0x10,u64,cpu,delta,MY_TSC\n" 1849 + " eg. --add perf/cstate_pkg/c2-residency,package,delta,percent,perfPC2\n" 1850 + " eg. --add pmt,name=XTAL,type=raw,domain=package0,offset=0,lsb=0,msb=63,guid=0x1a067102\n" 2096 1851 " -c, --cpu cpu-set limit output to summary plus cpu-set:\n" 2097 1852 " {core | package | j,k,l..m,n-p }\n" 2098 1853 " -d, --debug displays usec, Time_Of_Day_Seconds and more debugging\n" 1854 + " debug messages are printed to stderr\n" 2099 1855 " -D, --Dump displays the raw counter values\n" 2100 1856 " -e, --enable [all | column]\n" 2101 1857 " shows all or the specified disabled column\n" ··· 2205 1955 void print_header(char *delim) 2206 1956 { 2207 1957 struct msr_counter *mp; 1958 + struct perf_counter_info *pp; 1959 + struct pmt_counter *ppmt; 2208 1960 int printed = 0; 2209 1961 2210 1962 if (DO_BIC(BIC_USEC)) ··· 2264 2012 } 2265 2013 } 2266 2014 2015 + for (pp = sys.perf_tp; pp; pp = pp->next) { 2016 + 2017 + if (pp->format == FORMAT_RAW) { 2018 + if (pp->width == 64) 2019 + outp += sprintf(outp, "%s%18.18s", (printed++ ? delim : ""), pp->name); 2020 + else 2021 + outp += sprintf(outp, "%s%10.10s", (printed++ ? delim : ""), pp->name); 2022 + } else { 2023 + if ((pp->type == COUNTER_ITEMS) && sums_need_wide_columns) 2024 + outp += sprintf(outp, "%s%8s", (printed++ ? delim : ""), pp->name); 2025 + else 2026 + outp += sprintf(outp, "%s%s", (printed++ ? delim : ""), pp->name); 2027 + } 2028 + } 2029 + 2030 + ppmt = sys.pmt_tp; 2031 + while (ppmt) { 2032 + switch (ppmt->type) { 2033 + case PMT_TYPE_RAW: 2034 + if (pmt_counter_get_width(ppmt) <= 32) 2035 + outp += sprintf(outp, "%s%10.10s", (printed++ ? delim : ""), ppmt->name); 2036 + else 2037 + outp += sprintf(outp, "%s%18.18s", (printed++ ? delim : ""), ppmt->name); 2038 + 2039 + break; 2040 + 2041 + case PMT_TYPE_XTAL_TIME: 2042 + outp += sprintf(outp, "%s%s", delim, ppmt->name); 2043 + break; 2044 + } 2045 + 2046 + ppmt = ppmt->next; 2047 + } 2048 + 2267 2049 if (DO_BIC(BIC_CPU_c1)) 2268 2050 outp += sprintf(outp, "%sCPU%%c1", (printed++ ? delim : "")); 2269 2051 if (DO_BIC(BIC_CPU_c3)) ··· 2336 2050 else 2337 2051 outp += sprintf(outp, "%s%s", delim, mp->name); 2338 2052 } 2053 + } 2054 + 2055 + for (pp = sys.perf_cp; pp; pp = pp->next) { 2056 + 2057 + if (pp->format == FORMAT_RAW) { 2058 + if (pp->width == 64) 2059 + outp += sprintf(outp, "%s%18.18s", (printed++ ? delim : ""), pp->name); 2060 + else 2061 + outp += sprintf(outp, "%s%10.10s", (printed++ ? delim : ""), pp->name); 2062 + } else { 2063 + if ((pp->type == COUNTER_ITEMS) && sums_need_wide_columns) 2064 + outp += sprintf(outp, "%s%8s", (printed++ ? delim : ""), pp->name); 2065 + else 2066 + outp += sprintf(outp, "%s%s", (printed++ ? delim : ""), pp->name); 2067 + } 2068 + } 2069 + 2070 + ppmt = sys.pmt_cp; 2071 + while (ppmt) { 2072 + switch (ppmt->type) { 2073 + case PMT_TYPE_RAW: 2074 + if (pmt_counter_get_width(ppmt) <= 32) 2075 + outp += sprintf(outp, "%s%10.10s", (printed++ ? delim : ""), ppmt->name); 2076 + else 2077 + outp += sprintf(outp, "%s%18.18s", (printed++ ? delim : ""), ppmt->name); 2078 + 2079 + break; 2080 + 2081 + case PMT_TYPE_XTAL_TIME: 2082 + outp += sprintf(outp, "%s%s", delim, ppmt->name); 2083 + break; 2084 + } 2085 + 2086 + ppmt = ppmt->next; 2339 2087 } 2340 2088 2341 2089 if (DO_BIC(BIC_PkgTmp)) ··· 2416 2096 outp += sprintf(outp, "%sPkg%%pc9", (printed++ ? delim : "")); 2417 2097 if (DO_BIC(BIC_Pkgpc10)) 2418 2098 outp += sprintf(outp, "%sPk%%pc10", (printed++ ? delim : "")); 2099 + if (DO_BIC(BIC_Diec6)) 2100 + outp += sprintf(outp, "%sDie%%c6", (printed++ ? delim : "")); 2419 2101 if (DO_BIC(BIC_CPU_LPI)) 2420 2102 outp += sprintf(outp, "%sCPU%%LPI", (printed++ ? delim : "")); 2421 2103 if (DO_BIC(BIC_SYS_LPI)) ··· 2467 2145 else 2468 2146 outp += sprintf(outp, "%s%7.7s", delim, mp->name); 2469 2147 } 2148 + } 2149 + 2150 + for (pp = sys.perf_pp; pp; pp = pp->next) { 2151 + 2152 + if (pp->format == FORMAT_RAW) { 2153 + if (pp->width == 64) 2154 + outp += sprintf(outp, "%s%18.18s", (printed++ ? delim : ""), pp->name); 2155 + else 2156 + outp += sprintf(outp, "%s%10.10s", (printed++ ? delim : ""), pp->name); 2157 + } else { 2158 + if ((pp->type == COUNTER_ITEMS) && sums_need_wide_columns) 2159 + outp += sprintf(outp, "%s%8s", (printed++ ? delim : ""), pp->name); 2160 + else 2161 + outp += sprintf(outp, "%s%s", (printed++ ? delim : ""), pp->name); 2162 + } 2163 + } 2164 + 2165 + ppmt = sys.pmt_pp; 2166 + while (ppmt) { 2167 + switch (ppmt->type) { 2168 + case PMT_TYPE_RAW: 2169 + if (pmt_counter_get_width(ppmt) <= 32) 2170 + outp += sprintf(outp, "%s%10.10s", (printed++ ? delim : ""), ppmt->name); 2171 + else 2172 + outp += sprintf(outp, "%s%18.18s", (printed++ ? delim : ""), ppmt->name); 2173 + 2174 + break; 2175 + 2176 + case PMT_TYPE_XTAL_TIME: 2177 + outp += sprintf(outp, "%s%s", delim, ppmt->name); 2178 + break; 2179 + } 2180 + 2181 + ppmt = ppmt->next; 2470 2182 } 2471 2183 2472 2184 outp += sprintf(outp, "\n"); ··· 2623 2267 char *fmt8; 2624 2268 int i; 2625 2269 struct msr_counter *mp; 2270 + struct perf_counter_info *pp; 2271 + struct pmt_counter *ppmt; 2626 2272 char *delim = "\t"; 2627 2273 int printed = 0; 2628 2274 ··· 2762 2404 } 2763 2405 } 2764 2406 2407 + /* Added perf counters */ 2408 + for (i = 0, pp = sys.perf_tp; pp; ++i, pp = pp->next) { 2409 + if (pp->format == FORMAT_RAW) { 2410 + if (pp->width == 32) 2411 + outp += 2412 + sprintf(outp, "%s0x%08x", (printed++ ? delim : ""), 2413 + (unsigned int)t->perf_counter[i]); 2414 + else 2415 + outp += sprintf(outp, "%s0x%016llx", (printed++ ? delim : ""), t->perf_counter[i]); 2416 + } else if (pp->format == FORMAT_DELTA) { 2417 + if ((pp->type == COUNTER_ITEMS) && sums_need_wide_columns) 2418 + outp += sprintf(outp, "%s%8lld", (printed++ ? delim : ""), t->perf_counter[i]); 2419 + else 2420 + outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), t->perf_counter[i]); 2421 + } else if (pp->format == FORMAT_PERCENT) { 2422 + if (pp->type == COUNTER_USEC) 2423 + outp += 2424 + sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 2425 + t->perf_counter[i] / interval_float / 10000); 2426 + else 2427 + outp += 2428 + sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * t->perf_counter[i] / tsc); 2429 + } 2430 + } 2431 + 2432 + for (i = 0, ppmt = sys.pmt_tp; ppmt; i++, ppmt = ppmt->next) { 2433 + switch (ppmt->type) { 2434 + case PMT_TYPE_RAW: 2435 + if (pmt_counter_get_width(ppmt) <= 32) 2436 + outp += sprintf(outp, "%s0x%08x", (printed++ ? delim : ""), 2437 + (unsigned int)t->pmt_counter[i]); 2438 + else 2439 + outp += sprintf(outp, "%s0x%016llx", (printed++ ? delim : ""), t->pmt_counter[i]); 2440 + 2441 + break; 2442 + 2443 + case PMT_TYPE_XTAL_TIME: 2444 + const unsigned long value_raw = t->pmt_counter[i]; 2445 + const double value_converted = 100.0 * value_raw / crystal_hz / interval_float; 2446 + 2447 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 2448 + break; 2449 + } 2450 + } 2451 + 2765 2452 /* C1 */ 2766 2453 if (DO_BIC(BIC_CPU_c1)) 2767 2454 outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * t->c1 / tsc); ··· 2847 2444 outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), c->counter[i]); 2848 2445 } else if (mp->format == FORMAT_PERCENT) { 2849 2446 outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * c->counter[i] / tsc); 2447 + } 2448 + } 2449 + 2450 + for (i = 0, pp = sys.perf_cp; pp; i++, pp = pp->next) { 2451 + if (pp->format == FORMAT_RAW) { 2452 + if (pp->width == 32) 2453 + outp += 2454 + sprintf(outp, "%s0x%08x", (printed++ ? delim : ""), 2455 + (unsigned int)c->perf_counter[i]); 2456 + else 2457 + outp += sprintf(outp, "%s0x%016llx", (printed++ ? delim : ""), c->perf_counter[i]); 2458 + } else if (pp->format == FORMAT_DELTA) { 2459 + if ((pp->type == COUNTER_ITEMS) && sums_need_wide_columns) 2460 + outp += sprintf(outp, "%s%8lld", (printed++ ? delim : ""), c->perf_counter[i]); 2461 + else 2462 + outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), c->perf_counter[i]); 2463 + } else if (pp->format == FORMAT_PERCENT) { 2464 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * c->perf_counter[i] / tsc); 2465 + } 2466 + } 2467 + 2468 + for (i = 0, ppmt = sys.pmt_cp; ppmt; i++, ppmt = ppmt->next) { 2469 + switch (ppmt->type) { 2470 + case PMT_TYPE_RAW: 2471 + if (pmt_counter_get_width(ppmt) <= 32) 2472 + outp += sprintf(outp, "%s0x%08x", (printed++ ? delim : ""), 2473 + (unsigned int)c->pmt_counter[i]); 2474 + else 2475 + outp += sprintf(outp, "%s0x%016llx", (printed++ ? delim : ""), c->pmt_counter[i]); 2476 + 2477 + break; 2478 + 2479 + case PMT_TYPE_XTAL_TIME: 2480 + const unsigned long value_raw = c->pmt_counter[i]; 2481 + const double value_converted = 100.0 * value_raw / crystal_hz / interval_float; 2482 + 2483 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 2484 + break; 2850 2485 } 2851 2486 } 2852 2487 ··· 2967 2526 if (DO_BIC(BIC_Pkgpc10)) 2968 2527 outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->pc10 / tsc); 2969 2528 2529 + if (DO_BIC(BIC_Diec6)) 2530 + outp += 2531 + sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->die_c6 / crystal_hz / interval_float); 2532 + 2970 2533 if (DO_BIC(BIC_CPU_LPI)) { 2971 2534 if (p->cpu_lpi >= 0) 2972 2535 outp += ··· 3046 2601 outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), (unsigned int)p->counter[i] / 1000); 3047 2602 } 3048 2603 2604 + for (i = 0, pp = sys.perf_pp; pp; i++, pp = pp->next) { 2605 + if (pp->format == FORMAT_RAW) { 2606 + if (pp->width == 32) 2607 + outp += 2608 + sprintf(outp, "%s0x%08x", (printed++ ? delim : ""), 2609 + (unsigned int)p->perf_counter[i]); 2610 + else 2611 + outp += sprintf(outp, "%s0x%016llx", (printed++ ? delim : ""), p->perf_counter[i]); 2612 + } else if (pp->format == FORMAT_DELTA) { 2613 + if ((pp->type == COUNTER_ITEMS) && sums_need_wide_columns) 2614 + outp += sprintf(outp, "%s%8lld", (printed++ ? delim : ""), p->perf_counter[i]); 2615 + else 2616 + outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), p->perf_counter[i]); 2617 + } else if (pp->format == FORMAT_PERCENT) { 2618 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), 100.0 * p->perf_counter[i] / tsc); 2619 + } else if (pp->type == COUNTER_K2M) { 2620 + outp += 2621 + sprintf(outp, "%s%d", (printed++ ? delim : ""), (unsigned int)p->perf_counter[i] / 1000); 2622 + } 2623 + } 2624 + 2625 + for (i = 0, ppmt = sys.pmt_pp; ppmt; i++, ppmt = ppmt->next) { 2626 + switch (ppmt->type) { 2627 + case PMT_TYPE_RAW: 2628 + if (pmt_counter_get_width(ppmt) <= 32) 2629 + outp += sprintf(outp, "%s0x%08x", (printed++ ? delim : ""), 2630 + (unsigned int)p->pmt_counter[i]); 2631 + else 2632 + outp += sprintf(outp, "%s0x%016llx", (printed++ ? delim : ""), p->pmt_counter[i]); 2633 + 2634 + break; 2635 + 2636 + case PMT_TYPE_XTAL_TIME: 2637 + const unsigned long value_raw = p->pmt_counter[i]; 2638 + const double value_converted = 100.0 * value_raw / crystal_hz / interval_float; 2639 + 2640 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 2641 + break; 2642 + } 2643 + } 2644 + 3049 2645 done: 3050 2646 if (*(outp - 1) != '\n') 3051 2647 outp += sprintf(outp, "\n"); ··· 3140 2654 { 3141 2655 int i; 3142 2656 struct msr_counter *mp; 2657 + struct perf_counter_info *pp; 2658 + struct pmt_counter *ppmt; 3143 2659 3144 2660 if (DO_BIC(BIC_Totl_c0)) 3145 2661 old->pkg_wtd_core_c0 = new->pkg_wtd_core_c0 - old->pkg_wtd_core_c0; ··· 3162 2674 old->pc8 = new->pc8 - old->pc8; 3163 2675 old->pc9 = new->pc9 - old->pc9; 3164 2676 old->pc10 = new->pc10 - old->pc10; 2677 + old->die_c6 = new->die_c6 - old->die_c6; 3165 2678 old->cpu_lpi = new->cpu_lpi - old->cpu_lpi; 3166 2679 old->sys_lpi = new->sys_lpi - old->sys_lpi; 3167 2680 old->pkg_temp_c = new->pkg_temp_c; ··· 3203 2714 old->counter[i] = new->counter[i] - old->counter[i]; 3204 2715 } 3205 2716 2717 + for (i = 0, pp = sys.perf_pp; pp; i++, pp = pp->next) { 2718 + if (pp->format == FORMAT_RAW) 2719 + old->perf_counter[i] = new->perf_counter[i]; 2720 + else if (pp->format == FORMAT_AVERAGE) 2721 + old->perf_counter[i] = new->perf_counter[i]; 2722 + else 2723 + old->perf_counter[i] = new->perf_counter[i] - old->perf_counter[i]; 2724 + } 2725 + 2726 + for (i = 0, ppmt = sys.pmt_pp; ppmt; i++, ppmt = ppmt->next) { 2727 + if (ppmt->format == FORMAT_RAW) 2728 + old->pmt_counter[i] = new->pmt_counter[i]; 2729 + else 2730 + old->pmt_counter[i] = new->pmt_counter[i] - old->pmt_counter[i]; 2731 + } 2732 + 3206 2733 return 0; 3207 2734 } 3208 2735 ··· 3226 2721 { 3227 2722 int i; 3228 2723 struct msr_counter *mp; 2724 + struct perf_counter_info *pp; 2725 + struct pmt_counter *ppmt; 3229 2726 3230 2727 old->c3 = new->c3 - old->c3; 3231 2728 old->c6 = new->c6 - old->c6; ··· 3243 2736 old->counter[i] = new->counter[i]; 3244 2737 else 3245 2738 old->counter[i] = new->counter[i] - old->counter[i]; 2739 + } 2740 + 2741 + for (i = 0, pp = sys.perf_cp; pp; i++, pp = pp->next) { 2742 + if (pp->format == FORMAT_RAW) 2743 + old->perf_counter[i] = new->perf_counter[i]; 2744 + else 2745 + old->perf_counter[i] = new->perf_counter[i] - old->perf_counter[i]; 2746 + } 2747 + 2748 + for (i = 0, ppmt = sys.pmt_cp; ppmt; i++, ppmt = ppmt->next) { 2749 + if (ppmt->format == FORMAT_RAW) 2750 + old->pmt_counter[i] = new->pmt_counter[i]; 2751 + else 2752 + old->pmt_counter[i] = new->pmt_counter[i] - old->pmt_counter[i]; 3246 2753 } 3247 2754 } 3248 2755 ··· 3275 2754 { 3276 2755 int i; 3277 2756 struct msr_counter *mp; 2757 + struct perf_counter_info *pp; 2758 + struct pmt_counter *ppmt; 3278 2759 3279 2760 /* we run cpuid just the 1st time, copy the results */ 3280 2761 if (DO_BIC(BIC_APIC)) ··· 3355 2832 else 3356 2833 old->counter[i] = new->counter[i] - old->counter[i]; 3357 2834 } 2835 + 2836 + for (i = 0, pp = sys.perf_tp; pp; i++, pp = pp->next) { 2837 + if (pp->format == FORMAT_RAW) 2838 + old->perf_counter[i] = new->perf_counter[i]; 2839 + else 2840 + old->perf_counter[i] = new->perf_counter[i] - old->perf_counter[i]; 2841 + } 2842 + 2843 + for (i = 0, ppmt = sys.pmt_tp; ppmt; i++, ppmt = ppmt->next) { 2844 + if (ppmt->format == FORMAT_RAW) 2845 + old->pmt_counter[i] = new->pmt_counter[i]; 2846 + else 2847 + old->pmt_counter[i] = new->pmt_counter[i] - old->pmt_counter[i]; 2848 + } 2849 + 3358 2850 return 0; 3359 2851 } 3360 2852 ··· 3446 2908 p->pc8 = 0; 3447 2909 p->pc9 = 0; 3448 2910 p->pc10 = 0; 2911 + p->die_c6 = 0; 3449 2912 p->cpu_lpi = 0; 3450 2913 p->sys_lpi = 0; 3451 2914 ··· 3473 2934 3474 2935 for (i = 0, mp = sys.pp; mp; i++, mp = mp->next) 3475 2936 p->counter[i] = 0; 2937 + 2938 + memset(&t->perf_counter[0], 0, sizeof(t->perf_counter)); 2939 + memset(&c->perf_counter[0], 0, sizeof(c->perf_counter)); 2940 + memset(&p->perf_counter[0], 0, sizeof(p->perf_counter)); 2941 + 2942 + memset(&t->pmt_counter[0], 0, ARRAY_SIZE(t->pmt_counter)); 2943 + memset(&c->pmt_counter[0], 0, ARRAY_SIZE(c->pmt_counter)); 2944 + memset(&p->pmt_counter[0], 0, ARRAY_SIZE(p->pmt_counter)); 3476 2945 } 3477 2946 3478 2947 void rapl_counter_accumulate(struct rapl_counter *dst, const struct rapl_counter *src) ··· 3501 2954 { 3502 2955 int i; 3503 2956 struct msr_counter *mp; 2957 + struct perf_counter_info *pp; 2958 + struct pmt_counter *ppmt; 3504 2959 3505 2960 /* copy un-changing apic_id's */ 3506 2961 if (DO_BIC(BIC_APIC)) ··· 3533 2984 average.threads.counter[i] += t->counter[i]; 3534 2985 } 3535 2986 2987 + for (i = 0, pp = sys.perf_tp; pp; i++, pp = pp->next) { 2988 + if (pp->format == FORMAT_RAW) 2989 + continue; 2990 + average.threads.perf_counter[i] += t->perf_counter[i]; 2991 + } 2992 + 2993 + for (i = 0, ppmt = sys.pmt_tp; ppmt; i++, ppmt = ppmt->next) { 2994 + average.threads.pmt_counter[i] += t->pmt_counter[i]; 2995 + } 2996 + 3536 2997 /* sum per-core values only for 1st thread in core */ 3537 2998 if (!is_cpu_first_thread_in_core(t, c, p)) 3538 2999 return 0; ··· 3561 3002 if (mp->format == FORMAT_RAW) 3562 3003 continue; 3563 3004 average.cores.counter[i] += c->counter[i]; 3005 + } 3006 + 3007 + for (i = 0, pp = sys.perf_cp; pp; i++, pp = pp->next) { 3008 + if (pp->format == FORMAT_RAW) 3009 + continue; 3010 + average.cores.perf_counter[i] += c->perf_counter[i]; 3011 + } 3012 + 3013 + for (i = 0, ppmt = sys.pmt_cp; ppmt; i++, ppmt = ppmt->next) { 3014 + average.cores.pmt_counter[i] += c->pmt_counter[i]; 3564 3015 } 3565 3016 3566 3017 /* sum per-pkg values only for 1st core in pkg */ ··· 3596 3027 average.packages.pc8 += p->pc8; 3597 3028 average.packages.pc9 += p->pc9; 3598 3029 average.packages.pc10 += p->pc10; 3030 + average.packages.die_c6 += p->die_c6; 3599 3031 3600 3032 average.packages.cpu_lpi = p->cpu_lpi; 3601 3033 average.packages.sys_lpi = p->sys_lpi; ··· 3625 3055 else 3626 3056 average.packages.counter[i] += p->counter[i]; 3627 3057 } 3058 + 3059 + for (i = 0, pp = sys.perf_pp; pp; i++, pp = pp->next) { 3060 + if ((pp->format == FORMAT_RAW) && (topo.num_packages == 0)) 3061 + average.packages.perf_counter[i] = p->perf_counter[i]; 3062 + else 3063 + average.packages.perf_counter[i] += p->perf_counter[i]; 3064 + } 3065 + 3066 + for (i = 0, ppmt = sys.pmt_pp; ppmt; i++, ppmt = ppmt->next) { 3067 + average.packages.pmt_counter[i] += p->pmt_counter[i]; 3068 + } 3069 + 3628 3070 return 0; 3629 3071 } 3630 3072 ··· 3648 3066 { 3649 3067 int i; 3650 3068 struct msr_counter *mp; 3069 + struct perf_counter_info *pp; 3070 + struct pmt_counter *ppmt; 3651 3071 3652 3072 clear_counters(&average.threads, &average.cores, &average.packages); 3653 3073 ··· 3692 3108 average.packages.pc8 /= topo.allowed_packages; 3693 3109 average.packages.pc9 /= topo.allowed_packages; 3694 3110 average.packages.pc10 /= topo.allowed_packages; 3111 + average.packages.die_c6 /= topo.allowed_packages; 3695 3112 3696 3113 for (i = 0, mp = sys.tp; mp; i++, mp = mp->next) { 3697 3114 if (mp->format == FORMAT_RAW) ··· 3721 3136 sums_need_wide_columns = 1; 3722 3137 } 3723 3138 average.packages.counter[i] /= topo.allowed_packages; 3139 + } 3140 + 3141 + for (i = 0, pp = sys.perf_tp; pp; i++, pp = pp->next) { 3142 + if (pp->format == FORMAT_RAW) 3143 + continue; 3144 + if (pp->type == COUNTER_ITEMS) { 3145 + if (average.threads.perf_counter[i] > 9999999) 3146 + sums_need_wide_columns = 1; 3147 + continue; 3148 + } 3149 + average.threads.perf_counter[i] /= topo.allowed_cpus; 3150 + } 3151 + for (i = 0, pp = sys.perf_cp; pp; i++, pp = pp->next) { 3152 + if (pp->format == FORMAT_RAW) 3153 + continue; 3154 + if (pp->type == COUNTER_ITEMS) { 3155 + if (average.cores.perf_counter[i] > 9999999) 3156 + sums_need_wide_columns = 1; 3157 + } 3158 + average.cores.perf_counter[i] /= topo.allowed_cores; 3159 + } 3160 + for (i = 0, pp = sys.perf_pp; pp; i++, pp = pp->next) { 3161 + if (pp->format == FORMAT_RAW) 3162 + continue; 3163 + if (pp->type == COUNTER_ITEMS) { 3164 + if (average.packages.perf_counter[i] > 9999999) 3165 + sums_need_wide_columns = 1; 3166 + } 3167 + average.packages.perf_counter[i] /= topo.allowed_packages; 3168 + } 3169 + 3170 + for (i = 0, ppmt = sys.pmt_tp; ppmt; i++, ppmt = ppmt->next) { 3171 + average.threads.pmt_counter[i] /= topo.allowed_cpus; 3172 + } 3173 + for (i = 0, ppmt = sys.pmt_cp; ppmt; i++, ppmt = ppmt->next) { 3174 + average.cores.pmt_counter[i] /= topo.allowed_cores; 3175 + } 3176 + for (i = 0, ppmt = sys.pmt_pp; ppmt; i++, ppmt = ppmt->next) { 3177 + average.packages.pmt_counter[i] /= topo.allowed_packages; 3724 3178 } 3725 3179 } 3726 3180 ··· 4006 3382 return v; 4007 3383 } 4008 3384 4009 - static unsigned int read_msr_type(void) 4010 - { 4011 - const char *const path = "/sys/bus/event_source/devices/msr/type"; 4012 - const char *const format = "%u"; 4013 - 4014 - return read_perf_counter_info_n(path, format); 4015 - } 4016 - 4017 - static unsigned int read_aperf_config(void) 4018 - { 4019 - const char *const path = "/sys/bus/event_source/devices/msr/events/aperf"; 4020 - const char *const format = "event=%x"; 4021 - 4022 - return read_perf_counter_info_n(path, format); 4023 - } 4024 - 4025 - static unsigned int read_mperf_config(void) 4026 - { 4027 - const char *const path = "/sys/bus/event_source/devices/msr/events/mperf"; 4028 - const char *const format = "event=%x"; 4029 - 4030 - return read_perf_counter_info_n(path, format); 4031 - } 4032 - 4033 3385 static unsigned int read_perf_type(const char *subsys) 4034 3386 { 4035 3387 const char *const path_format = "/sys/bus/event_source/devices/%s/type"; ··· 4017 3417 return read_perf_counter_info_n(path, format); 4018 3418 } 4019 3419 4020 - static unsigned int read_rapl_config(const char *subsys, const char *event_name) 3420 + static unsigned int read_perf_config(const char *subsys, const char *event_name) 4021 3421 { 4022 3422 const char *const path_format = "/sys/bus/event_source/devices/%s/events/%s"; 4023 - const char *const format = "event=%x"; 3423 + FILE *fconfig = NULL; 4024 3424 char path[128]; 3425 + char config_str[64]; 3426 + unsigned int config; 3427 + unsigned int umask; 3428 + bool has_config = false; 3429 + bool has_umask = false; 3430 + unsigned int ret = -1; 4025 3431 4026 3432 snprintf(path, sizeof(path), path_format, subsys, event_name); 4027 3433 4028 - return read_perf_counter_info_n(path, format); 3434 + fconfig = fopen(path, "r"); 3435 + if (!fconfig) 3436 + return -1; 3437 + 3438 + if (fgets(config_str, ARRAY_SIZE(config_str), fconfig) != config_str) 3439 + goto cleanup_and_exit; 3440 + 3441 + for (char *pconfig_str = &config_str[0]; pconfig_str;) { 3442 + if (sscanf(pconfig_str, "event=%x", &config) == 1) { 3443 + has_config = true; 3444 + goto next; 3445 + } 3446 + 3447 + if (sscanf(pconfig_str, "umask=%x", &umask) == 1) { 3448 + has_umask = true; 3449 + goto next; 3450 + } 3451 + 3452 + next: 3453 + pconfig_str = strchr(pconfig_str, ','); 3454 + if (pconfig_str) { 3455 + *pconfig_str = '\0'; 3456 + ++pconfig_str; 3457 + } 3458 + } 3459 + 3460 + if (!has_umask) 3461 + umask = 0; 3462 + 3463 + if (has_config) 3464 + ret = (umask << 8) | config; 3465 + 3466 + cleanup_and_exit: 3467 + fclose(fconfig); 3468 + return ret; 4029 3469 } 4030 3470 4031 3471 static unsigned int read_perf_rapl_unit(const char *subsys, const char *event_name) ··· 4084 3444 return RAPL_UNIT_INVALID; 4085 3445 } 4086 3446 4087 - static double read_perf_rapl_scale(const char *subsys, const char *event_name) 3447 + static double read_perf_scale(const char *subsys, const char *event_name) 4088 3448 { 4089 3449 const char *const path_format = "/sys/bus/event_source/devices/%s/events/%s.scale"; 4090 3450 const char *const format = "%lf"; ··· 4099 3459 return scale; 4100 3460 } 4101 3461 4102 - static struct amperf_group_fd open_amperf_fd(int cpu) 4103 - { 4104 - const unsigned int msr_type = read_msr_type(); 4105 - const unsigned int aperf_config = read_aperf_config(); 4106 - const unsigned int mperf_config = read_mperf_config(); 4107 - struct amperf_group_fd fds = {.aperf = -1, .mperf = -1 }; 4108 - 4109 - fds.aperf = open_perf_counter(cpu, msr_type, aperf_config, -1, PERF_FORMAT_GROUP); 4110 - fds.mperf = open_perf_counter(cpu, msr_type, mperf_config, fds.aperf, PERF_FORMAT_GROUP); 4111 - 4112 - return fds; 4113 - } 4114 - 4115 - static int get_amperf_fd(int cpu) 4116 - { 4117 - assert(fd_amperf_percpu); 4118 - 4119 - if (fd_amperf_percpu[cpu].aperf) 4120 - return fd_amperf_percpu[cpu].aperf; 4121 - 4122 - fd_amperf_percpu[cpu] = open_amperf_fd(cpu); 4123 - 4124 - return fd_amperf_percpu[cpu].aperf; 4125 - } 4126 - 4127 - /* Read APERF, MPERF and TSC using the perf API. */ 4128 - static int read_aperf_mperf_tsc_perf(struct thread_data *t, int cpu) 4129 - { 4130 - union { 4131 - struct { 4132 - unsigned long nr_entries; 4133 - unsigned long aperf; 4134 - unsigned long mperf; 4135 - }; 4136 - 4137 - unsigned long as_array[3]; 4138 - } cnt; 4139 - 4140 - const int fd_amperf = get_amperf_fd(cpu); 4141 - 4142 - /* 4143 - * Read the TSC with rdtsc, because we want the absolute value and not 4144 - * the offset from the start of the counter. 4145 - */ 4146 - t->tsc = rdtsc(); 4147 - 4148 - const int n = read(fd_amperf, &cnt.as_array[0], sizeof(cnt.as_array)); 4149 - 4150 - if (n != sizeof(cnt.as_array)) 4151 - return -2; 4152 - 4153 - t->aperf = cnt.aperf * aperf_mperf_multiplier; 4154 - t->mperf = cnt.mperf * aperf_mperf_multiplier; 4155 - 4156 - return 0; 4157 - } 4158 - 4159 - /* Read APERF, MPERF and TSC using the MSR driver and rdtsc instruction. */ 4160 - static int read_aperf_mperf_tsc_msr(struct thread_data *t, int cpu) 4161 - { 4162 - unsigned long long tsc_before, tsc_between, tsc_after, aperf_time, mperf_time; 4163 - int aperf_mperf_retry_count = 0; 4164 - 4165 - /* 4166 - * The TSC, APERF and MPERF must be read together for 4167 - * APERF/MPERF and MPERF/TSC to give accurate results. 4168 - * 4169 - * Unfortunately, APERF and MPERF are read by 4170 - * individual system call, so delays may occur 4171 - * between them. If the time to read them 4172 - * varies by a large amount, we re-read them. 4173 - */ 4174 - 4175 - /* 4176 - * This initial dummy APERF read has been seen to 4177 - * reduce jitter in the subsequent reads. 4178 - */ 4179 - 4180 - if (get_msr(cpu, MSR_IA32_APERF, &t->aperf)) 4181 - return -3; 4182 - 4183 - retry: 4184 - t->tsc = rdtsc(); /* re-read close to APERF */ 4185 - 4186 - tsc_before = t->tsc; 4187 - 4188 - if (get_msr(cpu, MSR_IA32_APERF, &t->aperf)) 4189 - return -3; 4190 - 4191 - tsc_between = rdtsc(); 4192 - 4193 - if (get_msr(cpu, MSR_IA32_MPERF, &t->mperf)) 4194 - return -4; 4195 - 4196 - tsc_after = rdtsc(); 4197 - 4198 - aperf_time = tsc_between - tsc_before; 4199 - mperf_time = tsc_after - tsc_between; 4200 - 4201 - /* 4202 - * If the system call latency to read APERF and MPERF 4203 - * differ by more than 2x, then try again. 4204 - */ 4205 - if ((aperf_time > (2 * mperf_time)) || (mperf_time > (2 * aperf_time))) { 4206 - aperf_mperf_retry_count++; 4207 - if (aperf_mperf_retry_count < 5) 4208 - goto retry; 4209 - else 4210 - warnx("cpu%d jitter %lld %lld", cpu, aperf_time, mperf_time); 4211 - } 4212 - aperf_mperf_retry_count = 0; 4213 - 4214 - t->aperf = t->aperf * aperf_mperf_multiplier; 4215 - t->mperf = t->mperf * aperf_mperf_multiplier; 4216 - 4217 - return 0; 4218 - } 4219 - 4220 3462 size_t rapl_counter_info_count_perf(const struct rapl_counter_info_t *rci) 4221 3463 { 4222 3464 size_t ret = 0; 4223 3465 4224 3466 for (int i = 0; i < NUM_RAPL_COUNTERS; ++i) 4225 - if (rci->source[i] == RAPL_SOURCE_PERF) 3467 + if (rci->source[i] == COUNTER_SOURCE_PERF) 4226 3468 ++ret; 4227 3469 4228 3470 return ret; ··· 4115 3593 size_t ret = 0; 4116 3594 4117 3595 for (int i = 0; i < NUM_CSTATE_COUNTERS; ++i) 4118 - if (cci->source[i] == CSTATE_SOURCE_PERF) 3596 + if (cci->source[i] == COUNTER_SOURCE_PERF) 4119 3597 ++ret; 4120 3598 4121 3599 return ret; ··· 4133 3611 unsigned long long perf_data[NUM_RAPL_COUNTERS + 1]; 4134 3612 struct rapl_counter_info_t *rci; 4135 3613 4136 - if (debug) 3614 + if (debug >= 2) 4137 3615 fprintf(stderr, "%s: cpu%d domain%d\n", __func__, cpu, domain); 4138 3616 4139 3617 assert(rapl_counter_info_perdomain); ··· 4156 3634 4157 3635 for (unsigned int i = 0, pi = 1; i < NUM_RAPL_COUNTERS; ++i) { 4158 3636 switch (rci->source[i]) { 4159 - case RAPL_SOURCE_NONE: 3637 + case COUNTER_SOURCE_NONE: 4160 3638 break; 4161 3639 4162 - case RAPL_SOURCE_PERF: 3640 + case COUNTER_SOURCE_PERF: 4163 3641 assert(pi < ARRAY_SIZE(perf_data)); 4164 3642 assert(rci->fd_perf != -1); 4165 3643 4166 - if (debug) 3644 + if (debug >= 2) 4167 3645 fprintf(stderr, "Reading rapl counter via perf at %u (%llu %e %lf)\n", 4168 3646 i, perf_data[pi], rci->scale[i], perf_data[pi] * rci->scale[i]); 4169 3647 ··· 4172 3650 ++pi; 4173 3651 break; 4174 3652 4175 - case RAPL_SOURCE_MSR: 4176 - if (debug) 3653 + case COUNTER_SOURCE_MSR: 3654 + if (debug >= 2) 4177 3655 fprintf(stderr, "Reading rapl counter via msr at %u\n", i); 4178 3656 4179 3657 assert(!no_msr); ··· 4231 3709 4232 3710 struct cstate_counter_info_t *cci; 4233 3711 4234 - if (debug) 3712 + if (debug >= 2) 4235 3713 fprintf(stderr, "%s: cpu%d\n", __func__, cpu); 4236 3714 4237 3715 assert(ccstate_counter_info); 4238 3716 assert(cpu <= ccstate_counter_info_size); 4239 3717 4240 - memset(perf_data, 0, sizeof(perf_data)); 4241 - memset(perf_data_core, 0, sizeof(perf_data_core)); 4242 - memset(perf_data_pkg, 0, sizeof(perf_data_pkg)); 3718 + ZERO_ARRAY(perf_data); 3719 + ZERO_ARRAY(perf_data_core); 3720 + ZERO_ARRAY(perf_data_pkg); 4243 3721 4244 3722 cci = &ccstate_counter_info[cpu]; 4245 3723 ··· 4294 3772 4295 3773 for (unsigned int i = 0, pi = 0; i < NUM_CSTATE_COUNTERS; ++i) { 4296 3774 switch (cci->source[i]) { 4297 - case CSTATE_SOURCE_NONE: 3775 + case COUNTER_SOURCE_NONE: 4298 3776 break; 4299 3777 4300 - case CSTATE_SOURCE_PERF: 3778 + case COUNTER_SOURCE_PERF: 4301 3779 assert(pi < ARRAY_SIZE(perf_data)); 4302 3780 assert(cci->fd_perf_core != -1 || cci->fd_perf_pkg != -1); 4303 3781 4304 - if (debug) { 3782 + if (debug >= 2) 4305 3783 fprintf(stderr, "cstate via %s %u: %llu\n", "perf", i, perf_data[pi]); 4306 - } 4307 3784 4308 3785 cci->data[i] = perf_data[pi]; 4309 3786 4310 3787 ++pi; 4311 3788 break; 4312 3789 4313 - case CSTATE_SOURCE_MSR: 3790 + case COUNTER_SOURCE_MSR: 4314 3791 assert(!no_msr); 4315 3792 if (get_msr(cpu, cci->msr[i], &cci->data[i])) 4316 3793 return -13 - i; 4317 3794 4318 - if (debug) { 3795 + if (debug >= 2) 4319 3796 fprintf(stderr, "cstate via %s0x%llx %u: %llu\n", "msr", cci->msr[i], i, cci->data[i]); 4320 - } 4321 3797 4322 3798 break; 4323 3799 } ··· 4329 3809 * when invoked for the thread sibling. 4330 3810 */ 4331 3811 #define PERF_COUNTER_WRITE_DATA(out_counter, index) do { \ 4332 - if (cci->source[index] != CSTATE_SOURCE_NONE) \ 3812 + if (cci->source[index] != COUNTER_SOURCE_NONE) \ 4333 3813 out_counter = cci->data[index]; \ 4334 3814 } while (0) 4335 3815 ··· 4353 3833 return 0; 4354 3834 } 4355 3835 3836 + size_t msr_counter_info_count_perf(const struct msr_counter_info_t *mci) 3837 + { 3838 + size_t ret = 0; 3839 + 3840 + for (int i = 0; i < NUM_MSR_COUNTERS; ++i) 3841 + if (mci->source[i] == COUNTER_SOURCE_PERF) 3842 + ++ret; 3843 + 3844 + return ret; 3845 + } 3846 + 3847 + int get_smi_aperf_mperf(unsigned int cpu, struct thread_data *t) 3848 + { 3849 + unsigned long long perf_data[NUM_MSR_COUNTERS + 1]; 3850 + 3851 + struct msr_counter_info_t *mci; 3852 + 3853 + if (debug >= 2) 3854 + fprintf(stderr, "%s: cpu%d\n", __func__, cpu); 3855 + 3856 + assert(msr_counter_info); 3857 + assert(cpu <= msr_counter_info_size); 3858 + 3859 + mci = &msr_counter_info[cpu]; 3860 + 3861 + ZERO_ARRAY(perf_data); 3862 + ZERO_ARRAY(mci->data); 3863 + 3864 + if (mci->fd_perf != -1) { 3865 + const size_t num_perf_counters = msr_counter_info_count_perf(mci); 3866 + const ssize_t expected_read_size = (num_perf_counters + 1) * sizeof(unsigned long long); 3867 + const ssize_t actual_read_size = read(mci->fd_perf, &perf_data[0], sizeof(perf_data)); 3868 + 3869 + if (actual_read_size != expected_read_size) 3870 + err(-1, "%s: failed to read perf_data (%zu %zu)", __func__, expected_read_size, 3871 + actual_read_size); 3872 + } 3873 + 3874 + for (unsigned int i = 0, pi = 1; i < NUM_MSR_COUNTERS; ++i) { 3875 + switch (mci->source[i]) { 3876 + case COUNTER_SOURCE_NONE: 3877 + break; 3878 + 3879 + case COUNTER_SOURCE_PERF: 3880 + assert(pi < ARRAY_SIZE(perf_data)); 3881 + assert(mci->fd_perf != -1); 3882 + 3883 + if (debug >= 2) 3884 + fprintf(stderr, "Reading msr counter via perf at %u: %llu\n", i, perf_data[pi]); 3885 + 3886 + mci->data[i] = perf_data[pi]; 3887 + 3888 + ++pi; 3889 + break; 3890 + 3891 + case COUNTER_SOURCE_MSR: 3892 + assert(!no_msr); 3893 + 3894 + if (get_msr(cpu, mci->msr[i], &mci->data[i])) 3895 + return -2 - i; 3896 + 3897 + mci->data[i] &= mci->msr_mask[i]; 3898 + 3899 + if (debug >= 2) 3900 + fprintf(stderr, "Reading msr counter via msr at %u: %llu\n", i, mci->data[i]); 3901 + 3902 + break; 3903 + } 3904 + } 3905 + 3906 + BUILD_BUG_ON(NUM_MSR_COUNTERS != 3); 3907 + t->aperf = mci->data[MSR_RCI_INDEX_APERF]; 3908 + t->mperf = mci->data[MSR_RCI_INDEX_MPERF]; 3909 + t->smi_count = mci->data[MSR_RCI_INDEX_SMI]; 3910 + 3911 + return 0; 3912 + } 3913 + 3914 + int perf_counter_info_read_values(struct perf_counter_info *pp, int cpu, unsigned long long *out, size_t out_size) 3915 + { 3916 + unsigned int domain; 3917 + unsigned long long value; 3918 + int fd_counter; 3919 + 3920 + for (size_t i = 0; pp; ++i, pp = pp->next) { 3921 + domain = cpu_to_domain(pp, cpu); 3922 + assert(domain < pp->num_domains); 3923 + 3924 + fd_counter = pp->fd_perf_per_domain[domain]; 3925 + 3926 + if (fd_counter == -1) 3927 + continue; 3928 + 3929 + if (read(fd_counter, &value, sizeof(value)) != sizeof(value)) 3930 + return 1; 3931 + 3932 + assert(i < out_size); 3933 + out[i] = value * pp->scale; 3934 + } 3935 + 3936 + return 0; 3937 + } 3938 + 3939 + unsigned long pmt_gen_value_mask(unsigned int lsb, unsigned int msb) 3940 + { 3941 + unsigned long mask; 3942 + 3943 + if (msb == 63) 3944 + mask = 0xffffffffffffffff; 3945 + else 3946 + mask = ((1 << (msb + 1)) - 1); 3947 + 3948 + mask -= (1 << lsb) - 1; 3949 + 3950 + return mask; 3951 + } 3952 + 3953 + unsigned long pmt_read_counter(struct pmt_counter *ppmt, unsigned int domain_id) 3954 + { 3955 + assert(domain_id < ppmt->num_domains); 3956 + 3957 + const unsigned long *pmmio = ppmt->domains[domain_id].pcounter; 3958 + const unsigned long value = pmmio ? *pmmio : 0; 3959 + const unsigned long value_mask = pmt_gen_value_mask(ppmt->lsb, ppmt->msb); 3960 + const unsigned long value_shift = ppmt->lsb; 3961 + 3962 + return (value & value_mask) >> value_shift; 3963 + } 3964 + 4356 3965 /* 4357 3966 * get_counters(...) 4358 3967 * migrate to cpu ··· 4492 3843 int cpu = t->cpu_id; 4493 3844 unsigned long long msr; 4494 3845 struct msr_counter *mp; 3846 + struct pmt_counter *pp; 4495 3847 int i; 4496 3848 int status; 4497 3849 ··· 4508 3858 4509 3859 t->tsc = rdtsc(); /* we are running on local CPU of interest */ 4510 3860 4511 - if (DO_BIC(BIC_Avg_MHz) || DO_BIC(BIC_Busy) || DO_BIC(BIC_Bzy_MHz) || DO_BIC(BIC_IPC) 4512 - || soft_c1_residency_display(BIC_Avg_MHz)) { 4513 - int status = -1; 4514 - 4515 - assert(!no_perf || !no_msr); 4516 - 4517 - switch (amperf_source) { 4518 - case AMPERF_SOURCE_PERF: 4519 - status = read_aperf_mperf_tsc_perf(t, cpu); 4520 - break; 4521 - case AMPERF_SOURCE_MSR: 4522 - status = read_aperf_mperf_tsc_msr(t, cpu); 4523 - break; 4524 - } 4525 - 4526 - if (status != 0) 4527 - return status; 4528 - } 3861 + get_smi_aperf_mperf(cpu, t); 4529 3862 4530 3863 if (DO_BIC(BIC_IPC)) 4531 3864 if (read(get_instr_count_fd(cpu), &t->instr_count, sizeof(long long)) != sizeof(long long)) ··· 4516 3883 4517 3884 if (DO_BIC(BIC_IRQ)) 4518 3885 t->irq_count = irqs_per_cpu[cpu]; 4519 - if (DO_BIC(BIC_SMI)) { 4520 - if (get_msr(cpu, MSR_SMI_COUNT, &msr)) 4521 - return -5; 4522 - t->smi_count = msr & 0xFFFFFFFF; 4523 - } 4524 3886 4525 3887 get_cstate_counters(cpu, t, c, p); 4526 3888 ··· 4523 3895 if (get_mp(cpu, mp, &t->counter[i], mp->sp->path)) 4524 3896 return -10; 4525 3897 } 3898 + 3899 + if (perf_counter_info_read_values(sys.perf_tp, cpu, t->perf_counter, MAX_ADDED_THREAD_COUNTERS)) 3900 + return -10; 3901 + 3902 + for (i = 0, pp = sys.pmt_tp; pp; i++, pp = pp->next) 3903 + t->pmt_counter[i] = pmt_read_counter(pp, t->cpu_id); 4526 3904 4527 3905 /* collect core counters only for 1st thread in core */ 4528 3906 if (!is_cpu_first_thread_in_core(t, c, p)) ··· 4567 3933 if (get_mp(cpu, mp, &c->counter[i], mp->sp->path)) 4568 3934 return -10; 4569 3935 } 3936 + 3937 + if (perf_counter_info_read_values(sys.perf_cp, cpu, c->perf_counter, MAX_ADDED_CORE_COUNTERS)) 3938 + return -10; 3939 + 3940 + for (i = 0, pp = sys.pmt_cp; pp; i++, pp = pp->next) 3941 + c->pmt_counter[i] = pmt_read_counter(pp, c->core_id); 4570 3942 4571 3943 /* collect package counters only for 1st core in package */ 4572 3944 if (!is_cpu_first_core_in_package(t, c, p)) ··· 4646 4006 if (get_mp(cpu, mp, &p->counter[i], path)) 4647 4007 return -10; 4648 4008 } 4009 + 4010 + if (perf_counter_info_read_values(sys.perf_pp, cpu, p->perf_counter, MAX_ADDED_PACKAGE_COUNTERS)) 4011 + return -10; 4012 + 4013 + for (i = 0, pp = sys.pmt_pp; pp; i++, pp = pp->next) 4014 + p->pmt_counter[i] = pmt_read_counter(pp, p->package_id); 4015 + 4649 4016 done: 4650 4017 gettimeofday(&t->tv_end, (struct timezone *)NULL); 4651 4018 ··· 5116 4469 fd_percpu = NULL; 5117 4470 } 5118 4471 5119 - void free_fd_amperf_percpu(void) 5120 - { 5121 - int i; 5122 - 5123 - if (!fd_amperf_percpu) 5124 - return; 5125 - 5126 - for (i = 0; i < topo.max_cpu_num + 1; ++i) { 5127 - if (fd_amperf_percpu[i].mperf != 0) 5128 - close(fd_amperf_percpu[i].mperf); 5129 - 5130 - if (fd_amperf_percpu[i].aperf != 0) 5131 - close(fd_amperf_percpu[i].aperf); 5132 - } 5133 - 5134 - free(fd_amperf_percpu); 5135 - fd_amperf_percpu = NULL; 5136 - } 5137 - 5138 4472 void free_fd_instr_count_percpu(void) 5139 4473 { 5140 4474 if (!fd_instr_count_percpu) ··· 5150 4522 ccstate_counter_info_size = 0; 5151 4523 } 5152 4524 4525 + void free_fd_msr(void) 4526 + { 4527 + if (!msr_counter_info) 4528 + return; 4529 + 4530 + for (int cpu = 0; cpu < topo.max_cpu_num; ++cpu) { 4531 + if (msr_counter_info[cpu].fd_perf != -1) 4532 + close(msr_counter_info[cpu].fd_perf); 4533 + } 4534 + 4535 + free(msr_counter_info); 4536 + msr_counter_info = NULL; 4537 + msr_counter_info_size = 0; 4538 + } 4539 + 5153 4540 void free_fd_rapl_percpu(void) 5154 4541 { 5155 4542 if (!rapl_counter_info_perdomain) ··· 5180 4537 free(rapl_counter_info_perdomain); 5181 4538 rapl_counter_info_perdomain = NULL; 5182 4539 rapl_counter_info_perdomain_size = 0; 4540 + } 4541 + 4542 + void free_fd_added_perf_counters_(struct perf_counter_info *pp) 4543 + { 4544 + if (!pp) 4545 + return; 4546 + 4547 + if (!pp->fd_perf_per_domain) 4548 + return; 4549 + 4550 + while (pp) { 4551 + for (size_t domain = 0; domain < pp->num_domains; ++domain) { 4552 + if (pp->fd_perf_per_domain[domain] != -1) { 4553 + close(pp->fd_perf_per_domain[domain]); 4554 + pp->fd_perf_per_domain[domain] = -1; 4555 + } 4556 + } 4557 + 4558 + free(pp->fd_perf_per_domain); 4559 + pp->fd_perf_per_domain = NULL; 4560 + 4561 + pp = pp->next; 4562 + } 4563 + } 4564 + 4565 + void free_fd_added_perf_counters(void) 4566 + { 4567 + free_fd_added_perf_counters_(sys.perf_tp); 4568 + free_fd_added_perf_counters_(sys.perf_cp); 4569 + free_fd_added_perf_counters_(sys.perf_pp); 5183 4570 } 5184 4571 5185 4572 void free_all_buffers(void) ··· 5254 4581 5255 4582 free_fd_percpu(); 5256 4583 free_fd_instr_count_percpu(); 5257 - free_fd_amperf_percpu(); 4584 + free_fd_msr(); 5258 4585 free_fd_rapl_percpu(); 5259 4586 free_fd_cstate(); 4587 + free_fd_added_perf_counters(); 5260 4588 5261 4589 free(irq_column_2_cpu); 5262 4590 free(irqs_per_cpu); ··· 5592 4918 } 5593 4919 5594 4920 void linux_perf_init(void); 4921 + void msr_perf_init(void); 5595 4922 void rapl_perf_init(void); 5596 4923 void cstate_perf_init(void); 4924 + void added_perf_counters_init(void); 4925 + void pmt_init(void); 5597 4926 5598 4927 void re_initialize(void) 5599 4928 { 5600 4929 free_all_buffers(); 5601 4930 setup_all_buffers(false); 5602 4931 linux_perf_init(); 4932 + msr_perf_init(); 5603 4933 rapl_perf_init(); 5604 4934 cstate_perf_init(); 4935 + added_perf_counters_init(); 4936 + pmt_init(); 5605 4937 fprintf(outf, "turbostat: re-initialized with num_cpus %d, allowed_cpus %d\n", topo.num_cpus, 5606 4938 topo.allowed_cpus); 5607 4939 } ··· 7459 6779 return has_access; 7460 6780 } 7461 6781 7462 - bool is_aperf_access_required(void) 7463 - { 7464 - return BIC_IS_ENABLED(BIC_Avg_MHz) 7465 - || BIC_IS_ENABLED(BIC_Busy) 7466 - || BIC_IS_ENABLED(BIC_Bzy_MHz) 7467 - || BIC_IS_ENABLED(BIC_IPC) 7468 - || BIC_IS_ENABLED(BIC_CPU_c1); 7469 - } 7470 - 7471 6782 int add_rapl_perf_counter_(int cpu, struct rapl_counter_info_t *rci, const struct rapl_counter_arch_info *cai, 7472 6783 double *scale_, enum rapl_unit *unit_) 7473 6784 { 7474 6785 if (no_perf) 7475 6786 return -1; 7476 6787 7477 - const double scale = read_perf_rapl_scale(cai->perf_subsys, cai->perf_name); 6788 + const double scale = read_perf_scale(cai->perf_subsys, cai->perf_name); 7478 6789 7479 6790 if (scale == 0.0) 7480 6791 return -1; ··· 7476 6805 return -1; 7477 6806 7478 6807 const unsigned int rapl_type = read_perf_type(cai->perf_subsys); 7479 - const unsigned int rapl_energy_pkg_config = read_rapl_config(cai->perf_subsys, cai->perf_name); 6808 + const unsigned int rapl_energy_pkg_config = read_perf_config(cai->perf_subsys, cai->perf_name); 7480 6809 7481 6810 const int fd_counter = 7482 6811 open_perf_counter(cpu, rapl_type, rapl_energy_pkg_config, rci->fd_perf, PERF_FORMAT_GROUP); ··· 7497 6826 { 7498 6827 int ret = add_rapl_perf_counter_(cpu, rci, cai, scale, unit); 7499 6828 7500 - if (debug) 6829 + if (debug >= 2) 7501 6830 fprintf(stderr, "%s: %d (cpu: %d)\n", __func__, ret, cpu); 7502 6831 7503 6832 return ret; ··· 7516 6845 fd_instr_count_percpu = calloc(topo.max_cpu_num + 1, sizeof(int)); 7517 6846 if (fd_instr_count_percpu == NULL) 7518 6847 err(-1, "calloc fd_instr_count_percpu"); 7519 - } 7520 - 7521 - const bool aperf_required = is_aperf_access_required(); 7522 - 7523 - if (aperf_required && has_aperf && amperf_source == AMPERF_SOURCE_PERF) { 7524 - fd_amperf_percpu = calloc(topo.max_cpu_num + 1, sizeof(*fd_amperf_percpu)); 7525 - if (fd_amperf_percpu == NULL) 7526 - err(-1, "calloc fd_amperf_percpu"); 7527 6848 } 7528 6849 } 7529 6850 ··· 7538 6875 rci->fd_perf = -1; 7539 6876 for (size_t i = 0; i < NUM_RAPL_COUNTERS; ++i) { 7540 6877 rci->data[i] = 0; 7541 - rci->source[i] = RAPL_SOURCE_NONE; 6878 + rci->source[i] = COUNTER_SOURCE_NONE; 7542 6879 } 7543 6880 } 7544 6881 ··· 7580 6917 /* Use perf API for this counter */ 7581 6918 if (!no_perf && cai->perf_name 7582 6919 && add_rapl_perf_counter(cpu, rci, cai, &scale, &unit) != -1) { 7583 - rci->source[cai->rci_index] = RAPL_SOURCE_PERF; 6920 + rci->source[cai->rci_index] = COUNTER_SOURCE_PERF; 7584 6921 rci->scale[cai->rci_index] = scale * cai->compat_scale; 7585 6922 rci->unit[cai->rci_index] = unit; 7586 6923 rci->flags[cai->rci_index] = cai->flags; 7587 6924 7588 6925 /* Use MSR for this counter */ 7589 6926 } else if (!no_msr && cai->msr && probe_msr(cpu, cai->msr) == 0) { 7590 - rci->source[cai->rci_index] = RAPL_SOURCE_MSR; 6927 + rci->source[cai->rci_index] = COUNTER_SOURCE_MSR; 7591 6928 rci->msr[cai->rci_index] = cai->msr; 7592 6929 rci->msr_mask[cai->rci_index] = cai->msr_mask; 7593 6930 rci->msr_shift[cai->rci_index] = cai->msr_shift; ··· 7597 6934 } 7598 6935 } 7599 6936 7600 - if (rci->source[cai->rci_index] != RAPL_SOURCE_NONE) 6937 + if (rci->source[cai->rci_index] != COUNTER_SOURCE_NONE) 7601 6938 has_counter = 1; 7602 6939 } 7603 6940 ··· 7609 6946 free(domain_visited); 7610 6947 } 7611 6948 7612 - static int has_amperf_access_via_msr(void) 7613 - { 7614 - if (no_msr) 7615 - return 0; 7616 - 7617 - if (probe_msr(base_cpu, MSR_IA32_APERF)) 7618 - return 0; 7619 - 7620 - if (probe_msr(base_cpu, MSR_IA32_MPERF)) 7621 - return 0; 7622 - 7623 - return 1; 7624 - } 7625 - 7626 - static int has_amperf_access_via_perf(void) 7627 - { 7628 - struct amperf_group_fd fds; 7629 - 7630 - /* 7631 - * Cache the last result, so we don't warn the user multiple times 7632 - * 7633 - * Negative means cached, no access 7634 - * Zero means not cached 7635 - * Positive means cached, has access 7636 - */ 7637 - static int has_access_cached; 7638 - 7639 - if (no_perf) 7640 - return 0; 7641 - 7642 - if (has_access_cached != 0) 7643 - return has_access_cached > 0; 7644 - 7645 - fds = open_amperf_fd(base_cpu); 7646 - has_access_cached = (fds.aperf != -1) && (fds.mperf != -1); 7647 - 7648 - if (fds.aperf == -1) 7649 - warnx("Failed to access %s. Some of the counters may not be available\n" 7650 - "\tRun as root to enable them or use %s to disable the access explicitly", 7651 - "APERF perf counter", "--no-perf"); 7652 - else 7653 - close(fds.aperf); 7654 - 7655 - if (fds.mperf == -1) 7656 - warnx("Failed to access %s. Some of the counters may not be available\n" 7657 - "\tRun as root to enable them or use %s to disable the access explicitly", 7658 - "MPERF perf counter", "--no-perf"); 7659 - else 7660 - close(fds.mperf); 7661 - 7662 - if (has_access_cached == 0) 7663 - has_access_cached = -1; 7664 - 7665 - return has_access_cached > 0; 7666 - } 7667 - 7668 - /* Check if we can access APERF and MPERF */ 6949 + /* Assumes msr_counter_info is populated */ 7669 6950 static int has_amperf_access(void) 7670 6951 { 7671 - if (!is_aperf_access_required()) 7672 - return 0; 7673 - 7674 - if (!no_msr && has_amperf_access_via_msr()) 7675 - return 1; 7676 - 7677 - if (!no_perf && has_amperf_access_via_perf()) 7678 - return 1; 7679 - 7680 - return 0; 6952 + return msr_counter_arch_infos[MSR_ARCH_INFO_APERF_INDEX].present && 6953 + msr_counter_arch_infos[MSR_ARCH_INFO_MPERF_INDEX].present; 7681 6954 } 7682 6955 7683 6956 int *get_cstate_perf_group_fd(struct cstate_counter_info_t *cci, const char *group_name) ··· 7638 7039 return -1; 7639 7040 7640 7041 const unsigned int type = read_perf_type(cai->perf_subsys); 7641 - const unsigned int config = read_rapl_config(cai->perf_subsys, cai->perf_name); 7042 + const unsigned int config = read_perf_config(cai->perf_subsys, cai->perf_name); 7642 7043 7643 7044 const int fd_counter = open_perf_counter(cpu, type, config, *pfd_group, PERF_FORMAT_GROUP); 7644 7045 ··· 7656 7057 { 7657 7058 int ret = add_cstate_perf_counter_(cpu, cci, cai); 7658 7059 7659 - if (debug) 7060 + if (debug >= 2) 7660 7061 fprintf(stderr, "%s: %d (cpu: %d)\n", __func__, ret, cpu); 7661 7062 7662 7063 return ret; 7064 + } 7065 + 7066 + int add_msr_perf_counter_(int cpu, struct msr_counter_info_t *cci, const struct msr_counter_arch_info *cai) 7067 + { 7068 + if (no_perf) 7069 + return -1; 7070 + 7071 + const unsigned int type = read_perf_type(cai->perf_subsys); 7072 + const unsigned int config = read_perf_config(cai->perf_subsys, cai->perf_name); 7073 + 7074 + const int fd_counter = open_perf_counter(cpu, type, config, cci->fd_perf, PERF_FORMAT_GROUP); 7075 + 7076 + if (fd_counter == -1) 7077 + return -1; 7078 + 7079 + /* If it's the first counter opened, make it a group descriptor */ 7080 + if (cci->fd_perf == -1) 7081 + cci->fd_perf = fd_counter; 7082 + 7083 + return fd_counter; 7084 + } 7085 + 7086 + int add_msr_perf_counter(int cpu, struct msr_counter_info_t *cci, const struct msr_counter_arch_info *cai) 7087 + { 7088 + int ret = add_msr_perf_counter_(cpu, cci, cai); 7089 + 7090 + if (debug) 7091 + fprintf(stderr, "%s: %s/%s: %d (cpu: %d)\n", __func__, cai->perf_subsys, cai->perf_name, ret, cpu); 7092 + 7093 + return ret; 7094 + } 7095 + 7096 + void msr_perf_init_(void) 7097 + { 7098 + const int mci_num = topo.max_cpu_num + 1; 7099 + 7100 + msr_counter_info = calloc(mci_num, sizeof(*msr_counter_info)); 7101 + if (!msr_counter_info) 7102 + err(1, "calloc msr_counter_info"); 7103 + msr_counter_info_size = mci_num; 7104 + 7105 + for (int cpu = 0; cpu < mci_num; ++cpu) 7106 + msr_counter_info[cpu].fd_perf = -1; 7107 + 7108 + for (int cidx = 0; cidx < NUM_MSR_COUNTERS; ++cidx) { 7109 + 7110 + struct msr_counter_arch_info *cai = &msr_counter_arch_infos[cidx]; 7111 + 7112 + cai->present = false; 7113 + 7114 + for (int cpu = 0; cpu < mci_num; ++cpu) { 7115 + 7116 + struct msr_counter_info_t *const cci = &msr_counter_info[cpu]; 7117 + 7118 + if (cpu_is_not_allowed(cpu)) 7119 + continue; 7120 + 7121 + if (cai->needed) { 7122 + /* Use perf API for this counter */ 7123 + if (!no_perf && cai->perf_name && add_msr_perf_counter(cpu, cci, cai) != -1) { 7124 + cci->source[cai->rci_index] = COUNTER_SOURCE_PERF; 7125 + cai->present = true; 7126 + 7127 + /* User MSR for this counter */ 7128 + } else if (!no_msr && cai->msr && probe_msr(cpu, cai->msr) == 0) { 7129 + cci->source[cai->rci_index] = COUNTER_SOURCE_MSR; 7130 + cci->msr[cai->rci_index] = cai->msr; 7131 + cci->msr_mask[cai->rci_index] = cai->msr_mask; 7132 + cai->present = true; 7133 + } 7134 + } 7135 + } 7136 + } 7137 + } 7138 + 7139 + /* Initialize data for reading perf counters from the MSR group. */ 7140 + void msr_perf_init(void) 7141 + { 7142 + bool need_amperf = false, need_smi = false; 7143 + const bool need_soft_c1 = (!platform->has_msr_core_c1_res) && (platform->supported_cstates & CC1); 7144 + 7145 + need_amperf = BIC_IS_ENABLED(BIC_Avg_MHz) || BIC_IS_ENABLED(BIC_Busy) || BIC_IS_ENABLED(BIC_Bzy_MHz) 7146 + || BIC_IS_ENABLED(BIC_IPC) || need_soft_c1; 7147 + 7148 + if (BIC_IS_ENABLED(BIC_SMI)) 7149 + need_smi = true; 7150 + 7151 + /* Enable needed counters */ 7152 + msr_counter_arch_infos[MSR_ARCH_INFO_APERF_INDEX].needed = need_amperf; 7153 + msr_counter_arch_infos[MSR_ARCH_INFO_MPERF_INDEX].needed = need_amperf; 7154 + msr_counter_arch_infos[MSR_ARCH_INFO_SMI_INDEX].needed = need_smi; 7155 + 7156 + msr_perf_init_(); 7157 + 7158 + const bool has_amperf = has_amperf_access(); 7159 + const bool has_smi = msr_counter_arch_infos[MSR_ARCH_INFO_SMI_INDEX].present; 7160 + 7161 + has_aperf_access = has_amperf; 7162 + 7163 + if (has_amperf) { 7164 + BIC_PRESENT(BIC_Avg_MHz); 7165 + BIC_PRESENT(BIC_Busy); 7166 + BIC_PRESENT(BIC_Bzy_MHz); 7167 + BIC_PRESENT(BIC_SMI); 7168 + } 7169 + 7170 + if (has_smi) 7171 + BIC_PRESENT(BIC_SMI); 7663 7172 } 7664 7173 7665 7174 void cstate_perf_init_(bool soft_c1) ··· 7834 7127 /* Use perf API for this counter */ 7835 7128 if (!no_perf && cai->perf_name && add_cstate_perf_counter(cpu, cci, cai) != -1) { 7836 7129 7837 - cci->source[cai->rci_index] = CSTATE_SOURCE_PERF; 7130 + cci->source[cai->rci_index] = COUNTER_SOURCE_PERF; 7838 7131 7839 7132 /* User MSR for this counter */ 7840 7133 } else if (!no_msr && cai->msr && pkg_cstate_limit >= cai->pkg_cstate_limit 7841 7134 && probe_msr(cpu, cai->msr) == 0) { 7842 - cci->source[cai->rci_index] = CSTATE_SOURCE_MSR; 7135 + cci->source[cai->rci_index] = COUNTER_SOURCE_MSR; 7843 7136 cci->msr[cai->rci_index] = cai->msr; 7844 7137 } 7845 7138 } 7846 7139 7847 - if (cci->source[cai->rci_index] != CSTATE_SOURCE_NONE) { 7140 + if (cci->source[cai->rci_index] != COUNTER_SOURCE_NONE) { 7848 7141 has_counter = true; 7849 7142 cores_visited[core_id] = true; 7850 7143 pkg_visited[pkg_id] = true; ··· 8027 7320 8028 7321 __cpuid(0x6, eax, ebx, ecx, edx); 8029 7322 has_aperf = ecx & (1 << 0); 8030 - if (has_aperf && has_amperf_access()) { 8031 - BIC_PRESENT(BIC_Avg_MHz); 8032 - BIC_PRESENT(BIC_Busy); 8033 - BIC_PRESENT(BIC_Bzy_MHz); 8034 - BIC_PRESENT(BIC_IPC); 8035 - } 8036 7323 do_dts = eax & (1 << 0); 8037 7324 if (do_dts) 8038 7325 BIC_PRESENT(BIC_CoreTmp); ··· 8142 7441 8143 7442 if (platform->has_msr_atom_pkg_c6_residency && cai->msr == MSR_PKG_C6_RESIDENCY) 8144 7443 cai->msr = MSR_ATOM_PKG_C6_RESIDENCY; 7444 + } 7445 + 7446 + for (int i = 0; i < NUM_MSR_COUNTERS; ++i) { 7447 + msr_counter_arch_infos[i].present = false; 7448 + msr_counter_arch_infos[i].needed = false; 8145 7449 } 8146 7450 } 8147 7451 ··· 8523 7817 err(-ENODEV, "No valid cpus found"); 8524 7818 } 8525 7819 8526 - static void set_amperf_source(void) 8527 - { 8528 - amperf_source = AMPERF_SOURCE_PERF; 8529 - 8530 - const bool aperf_required = is_aperf_access_required(); 8531 - 8532 - if (no_perf || !aperf_required || !has_amperf_access_via_perf()) 8533 - amperf_source = AMPERF_SOURCE_MSR; 8534 - 8535 - if (quiet || !debug) 8536 - return; 8537 - 8538 - fprintf(outf, "aperf/mperf source preference: %s\n", amperf_source == AMPERF_SOURCE_MSR ? "msr" : "perf"); 8539 - } 8540 - 8541 7820 bool has_added_counters(void) 8542 7821 { 8543 7822 /* ··· 8533 7842 return sys.added_core_counters | sys.added_thread_counters | sys.added_package_counters; 8534 7843 } 8535 7844 8536 - bool is_msr_access_required(void) 8537 - { 8538 - if (no_msr) 8539 - return false; 8540 - 8541 - if (has_added_counters()) 8542 - return true; 8543 - 8544 - return BIC_IS_ENABLED(BIC_SMI) 8545 - || BIC_IS_ENABLED(BIC_CPU_c1) 8546 - || BIC_IS_ENABLED(BIC_CPU_c3) 8547 - || BIC_IS_ENABLED(BIC_CPU_c6) 8548 - || BIC_IS_ENABLED(BIC_CPU_c7) 8549 - || BIC_IS_ENABLED(BIC_Mod_c6) 8550 - || BIC_IS_ENABLED(BIC_CoreTmp) 8551 - || BIC_IS_ENABLED(BIC_Totl_c0) 8552 - || BIC_IS_ENABLED(BIC_Any_c0) 8553 - || BIC_IS_ENABLED(BIC_GFX_c0) 8554 - || BIC_IS_ENABLED(BIC_CPUGFX) 8555 - || BIC_IS_ENABLED(BIC_Pkgpc3) 8556 - || BIC_IS_ENABLED(BIC_Pkgpc6) 8557 - || BIC_IS_ENABLED(BIC_Pkgpc2) 8558 - || BIC_IS_ENABLED(BIC_Pkgpc7) 8559 - || BIC_IS_ENABLED(BIC_Pkgpc8) 8560 - || BIC_IS_ENABLED(BIC_Pkgpc9) 8561 - || BIC_IS_ENABLED(BIC_Pkgpc10) 8562 - /* TODO: Multiplex access with perf */ 8563 - || BIC_IS_ENABLED(BIC_CorWatt) 8564 - || BIC_IS_ENABLED(BIC_Cor_J) 8565 - || BIC_IS_ENABLED(BIC_PkgWatt) 8566 - || BIC_IS_ENABLED(BIC_CorWatt) 8567 - || BIC_IS_ENABLED(BIC_GFXWatt) 8568 - || BIC_IS_ENABLED(BIC_RAMWatt) 8569 - || BIC_IS_ENABLED(BIC_Pkg_J) 8570 - || BIC_IS_ENABLED(BIC_Cor_J) 8571 - || BIC_IS_ENABLED(BIC_GFX_J) 8572 - || BIC_IS_ENABLED(BIC_RAM_J) 8573 - || BIC_IS_ENABLED(BIC_PKG__) 8574 - || BIC_IS_ENABLED(BIC_RAM__) 8575 - || BIC_IS_ENABLED(BIC_PkgTmp) 8576 - || (is_aperf_access_required() && !has_amperf_access_via_perf()); 8577 - } 8578 - 8579 7845 void check_msr_access(void) 8580 7846 { 8581 - if (!is_msr_access_required()) 8582 - no_msr = 1; 8583 - 8584 7847 check_dev_msr(); 8585 7848 check_msr_permission(); 8586 7849 ··· 8544 7899 8545 7900 void check_perf_access(void) 8546 7901 { 8547 - const bool intrcount_required = BIC_IS_ENABLED(BIC_IPC); 8548 - 8549 - if (no_perf || !intrcount_required || !has_instr_count_access()) 7902 + if (no_perf || !BIC_IS_ENABLED(BIC_IPC) || !has_instr_count_access()) 8550 7903 bic_enabled &= ~BIC_IPC; 7904 + } 8551 7905 8552 - const bool aperf_required = is_aperf_access_required(); 7906 + int added_perf_counters_init_(struct perf_counter_info *pinfo) 7907 + { 7908 + size_t num_domains = 0; 7909 + unsigned int next_domain; 7910 + bool *domain_visited; 7911 + unsigned int perf_type, perf_config; 7912 + double perf_scale; 7913 + int fd_perf; 8553 7914 8554 - if (!aperf_required || !has_amperf_access()) { 8555 - bic_enabled &= ~BIC_Avg_MHz; 8556 - bic_enabled &= ~BIC_Busy; 8557 - bic_enabled &= ~BIC_Bzy_MHz; 8558 - bic_enabled &= ~BIC_IPC; 7915 + if (!pinfo) 7916 + return 0; 7917 + 7918 + const size_t max_num_domains = MAX(topo.max_cpu_num + 1, MAX(topo.max_core_id + 1, topo.max_package_id + 1)); 7919 + 7920 + domain_visited = calloc(max_num_domains, sizeof(*domain_visited)); 7921 + 7922 + while (pinfo) { 7923 + switch (pinfo->scope) { 7924 + case SCOPE_CPU: 7925 + num_domains = topo.max_cpu_num + 1; 7926 + break; 7927 + 7928 + case SCOPE_CORE: 7929 + num_domains = topo.max_core_id + 1; 7930 + break; 7931 + 7932 + case SCOPE_PACKAGE: 7933 + num_domains = topo.max_package_id + 1; 7934 + break; 7935 + } 7936 + 7937 + /* Allocate buffer for file descriptor for each domain. */ 7938 + pinfo->fd_perf_per_domain = calloc(num_domains, sizeof(*pinfo->fd_perf_per_domain)); 7939 + if (!pinfo->fd_perf_per_domain) 7940 + errx(1, "%s: alloc %s", __func__, "fd_perf_per_domain"); 7941 + 7942 + for (size_t i = 0; i < num_domains; ++i) 7943 + pinfo->fd_perf_per_domain[i] = -1; 7944 + 7945 + pinfo->num_domains = num_domains; 7946 + pinfo->scale = 1.0; 7947 + 7948 + memset(domain_visited, 0, max_num_domains * sizeof(*domain_visited)); 7949 + 7950 + for (int cpu = 0; cpu < topo.max_cpu_num + 1; ++cpu) { 7951 + 7952 + next_domain = cpu_to_domain(pinfo, cpu); 7953 + 7954 + assert(next_domain < num_domains); 7955 + 7956 + if (cpu_is_not_allowed(cpu)) 7957 + continue; 7958 + 7959 + if (domain_visited[next_domain]) 7960 + continue; 7961 + 7962 + perf_type = read_perf_type(pinfo->device); 7963 + if (perf_type == (unsigned int)-1) { 7964 + warnx("%s: perf/%s/%s: failed to read %s", 7965 + __func__, pinfo->device, pinfo->event, "type"); 7966 + continue; 7967 + } 7968 + 7969 + perf_config = read_perf_config(pinfo->device, pinfo->event); 7970 + if (perf_config == (unsigned int)-1) { 7971 + warnx("%s: perf/%s/%s: failed to read %s", 7972 + __func__, pinfo->device, pinfo->event, "config"); 7973 + continue; 7974 + } 7975 + 7976 + /* Scale is not required, some counters just don't have it. */ 7977 + perf_scale = read_perf_scale(pinfo->device, pinfo->event); 7978 + if (perf_scale == 0.0) 7979 + perf_scale = 1.0; 7980 + 7981 + fd_perf = open_perf_counter(cpu, perf_type, perf_config, -1, 0); 7982 + if (fd_perf == -1) { 7983 + warnx("%s: perf/%s/%s: failed to open counter on cpu%d", 7984 + __func__, pinfo->device, pinfo->event, cpu); 7985 + continue; 7986 + } 7987 + 7988 + domain_visited[next_domain] = 1; 7989 + pinfo->fd_perf_per_domain[next_domain] = fd_perf; 7990 + pinfo->scale = perf_scale; 7991 + 7992 + if (debug) 7993 + fprintf(stderr, "Add perf/%s/%s cpu%d: %d\n", 7994 + pinfo->device, pinfo->event, cpu, pinfo->fd_perf_per_domain[next_domain]); 7995 + } 7996 + 7997 + pinfo = pinfo->next; 7998 + } 7999 + 8000 + free(domain_visited); 8001 + 8002 + return 0; 8003 + } 8004 + 8005 + void added_perf_counters_init(void) 8006 + { 8007 + if (added_perf_counters_init_(sys.perf_tp)) 8008 + errx(1, "%s: %s", __func__, "thread"); 8009 + 8010 + if (added_perf_counters_init_(sys.perf_cp)) 8011 + errx(1, "%s: %s", __func__, "core"); 8012 + 8013 + if (added_perf_counters_init_(sys.perf_pp)) 8014 + errx(1, "%s: %s", __func__, "package"); 8015 + } 8016 + 8017 + int parse_telem_info_file(int fd_dir, const char *info_filename, const char *format, unsigned long *output) 8018 + { 8019 + int fd_telem_info; 8020 + FILE *file_telem_info; 8021 + unsigned long value; 8022 + 8023 + fd_telem_info = openat(fd_dir, info_filename, O_RDONLY); 8024 + if (fd_telem_info == -1) 8025 + return -1; 8026 + 8027 + file_telem_info = fdopen(fd_telem_info, "r"); 8028 + if (file_telem_info == NULL) { 8029 + close(fd_telem_info); 8030 + return -1; 8031 + } 8032 + 8033 + if (fscanf(file_telem_info, format, &value) != 1) { 8034 + fclose(file_telem_info); 8035 + return -1; 8036 + } 8037 + 8038 + fclose(file_telem_info); 8039 + 8040 + *output = value; 8041 + 8042 + return 0; 8043 + } 8044 + 8045 + struct pmt_mmio *pmt_mmio_open(unsigned int target_guid) 8046 + { 8047 + DIR *dirp; 8048 + struct dirent *entry; 8049 + struct stat st; 8050 + unsigned int telem_idx; 8051 + int fd_telem_dir, fd_pmt; 8052 + unsigned long guid, size, offset; 8053 + size_t mmap_size; 8054 + void *mmio; 8055 + struct pmt_mmio *ret = NULL; 8056 + 8057 + if (stat(SYSFS_TELEM_PATH, &st) == -1) 8058 + return NULL; 8059 + 8060 + dirp = opendir(SYSFS_TELEM_PATH); 8061 + if (dirp == NULL) 8062 + return NULL; 8063 + 8064 + for (;;) { 8065 + entry = readdir(dirp); 8066 + 8067 + if (entry == NULL) 8068 + break; 8069 + 8070 + if (strcmp(entry->d_name, ".") == 0) 8071 + continue; 8072 + 8073 + if (strcmp(entry->d_name, "..") == 0) 8074 + continue; 8075 + 8076 + if (sscanf(entry->d_name, "telem%u", &telem_idx) != 1) 8077 + continue; 8078 + 8079 + if (fstatat(dirfd(dirp), entry->d_name, &st, 0) == -1) { 8080 + break; 8081 + } 8082 + 8083 + if (!S_ISDIR(st.st_mode)) 8084 + continue; 8085 + 8086 + fd_telem_dir = openat(dirfd(dirp), entry->d_name, O_RDONLY); 8087 + if (fd_telem_dir == -1) { 8088 + break; 8089 + } 8090 + 8091 + if (parse_telem_info_file(fd_telem_dir, "guid", "%lx", &guid)) { 8092 + close(fd_telem_dir); 8093 + break; 8094 + } 8095 + 8096 + if (parse_telem_info_file(fd_telem_dir, "size", "%lu", &size)) { 8097 + close(fd_telem_dir); 8098 + break; 8099 + } 8100 + 8101 + if (guid != target_guid) { 8102 + close(fd_telem_dir); 8103 + continue; 8104 + } 8105 + 8106 + if (parse_telem_info_file(fd_telem_dir, "offset", "%lu", &offset)) { 8107 + close(fd_telem_dir); 8108 + break; 8109 + } 8110 + 8111 + assert(offset == 0); 8112 + 8113 + fd_pmt = openat(fd_telem_dir, "telem", O_RDONLY); 8114 + if (fd_pmt == -1) 8115 + goto loop_cleanup_and_break; 8116 + 8117 + mmap_size = (size + 0x1000UL) & (~0x1000UL); 8118 + mmio = mmap(0, mmap_size, PROT_READ, MAP_SHARED, fd_pmt, 0); 8119 + if (mmio != MAP_FAILED) { 8120 + 8121 + if (debug) 8122 + fprintf(stderr, "%s: 0x%lx mmaped at: %p\n", __func__, guid, mmio); 8123 + 8124 + ret = calloc(1, sizeof(*ret)); 8125 + 8126 + if (!ret) { 8127 + fprintf(stderr, "%s: Failed to allocate pmt_mmio\n", __func__); 8128 + exit(1); 8129 + } 8130 + 8131 + ret->guid = guid; 8132 + ret->mmio_base = mmio; 8133 + ret->pmt_offset = offset; 8134 + ret->size = size; 8135 + 8136 + ret->next = pmt_mmios; 8137 + pmt_mmios = ret; 8138 + } 8139 + 8140 + loop_cleanup_and_break: 8141 + close(fd_pmt); 8142 + close(fd_telem_dir); 8143 + break; 8144 + } 8145 + 8146 + closedir(dirp); 8147 + 8148 + return ret; 8149 + } 8150 + 8151 + struct pmt_mmio *pmt_mmio_find(unsigned int guid) 8152 + { 8153 + struct pmt_mmio *pmmio = pmt_mmios; 8154 + 8155 + while (pmmio) { 8156 + if (pmmio->guid == guid) 8157 + return pmmio; 8158 + 8159 + pmmio = pmmio->next; 8160 + } 8161 + 8162 + return NULL; 8163 + } 8164 + 8165 + void *pmt_get_counter_pointer(struct pmt_mmio *pmmio, unsigned long counter_offset) 8166 + { 8167 + char *ret; 8168 + 8169 + /* Get base of mmaped PMT file. */ 8170 + ret = (char *)pmmio->mmio_base; 8171 + 8172 + /* 8173 + * Apply PMT MMIO offset to obtain beginning of the mmaped telemetry data. 8174 + * It's not guaranteed that the mmaped memory begins with the telemetry data 8175 + * - we might have to apply the offset first. 8176 + */ 8177 + ret += pmmio->pmt_offset; 8178 + 8179 + /* Apply the counter offset to get the address to the mmaped counter. */ 8180 + ret += counter_offset; 8181 + 8182 + return ret; 8183 + } 8184 + 8185 + struct pmt_mmio *pmt_add_guid(unsigned int guid) 8186 + { 8187 + struct pmt_mmio *ret; 8188 + 8189 + ret = pmt_mmio_find(guid); 8190 + if (!ret) 8191 + ret = pmt_mmio_open(guid); 8192 + 8193 + return ret; 8194 + } 8195 + 8196 + enum pmt_open_mode { 8197 + PMT_OPEN_TRY, /* Open failure is not an error. */ 8198 + PMT_OPEN_REQUIRED, /* Open failure is a fatal error. */ 8199 + }; 8200 + 8201 + struct pmt_counter *pmt_find_counter(struct pmt_counter *pcounter, const char *name) 8202 + { 8203 + while (pcounter) { 8204 + if (strcmp(pcounter->name, name) == 0) 8205 + break; 8206 + 8207 + pcounter = pcounter->next; 8208 + } 8209 + 8210 + return pcounter; 8211 + } 8212 + 8213 + struct pmt_counter **pmt_get_scope_root(enum counter_scope scope) 8214 + { 8215 + switch (scope) { 8216 + case SCOPE_CPU: 8217 + return &sys.pmt_tp; 8218 + case SCOPE_CORE: 8219 + return &sys.pmt_cp; 8220 + case SCOPE_PACKAGE: 8221 + return &sys.pmt_pp; 8222 + } 8223 + 8224 + __builtin_unreachable(); 8225 + } 8226 + 8227 + void pmt_counter_add_domain(struct pmt_counter *pcounter, unsigned long *pmmio, unsigned int domain_id) 8228 + { 8229 + /* Make sure the new domain fits. */ 8230 + if (domain_id >= pcounter->num_domains) 8231 + pmt_counter_resize(pcounter, domain_id + 1); 8232 + 8233 + assert(pcounter->domains); 8234 + assert(domain_id < pcounter->num_domains); 8235 + 8236 + pcounter->domains[domain_id].pcounter = pmmio; 8237 + } 8238 + 8239 + int pmt_add_counter(unsigned int guid, const char *name, enum pmt_datatype type, 8240 + unsigned int lsb, unsigned int msb, unsigned int offset, enum counter_scope scope, 8241 + enum counter_format format, unsigned int domain_id, enum pmt_open_mode mode) 8242 + { 8243 + struct pmt_mmio *mmio; 8244 + struct pmt_counter *pcounter; 8245 + struct pmt_counter **const pmt_root = pmt_get_scope_root(scope); 8246 + bool new_counter = false; 8247 + int conflict = 0; 8248 + 8249 + if (lsb > msb) { 8250 + fprintf(stderr, "%s: %s: `%s` must be satisfied\n", __func__, "lsb <= msb", name); 8251 + exit(1); 8252 + } 8253 + 8254 + if (msb >= 64) { 8255 + fprintf(stderr, "%s: %s: `%s` must be satisfied\n", __func__, "msb < 64", name); 8256 + exit(1); 8257 + } 8258 + 8259 + mmio = pmt_add_guid(guid); 8260 + if (!mmio) { 8261 + if (mode != PMT_OPEN_TRY) { 8262 + fprintf(stderr, "%s: failed to map PMT MMIO for guid %x\n", __func__, guid); 8263 + exit(1); 8264 + } 8265 + 8266 + return 1; 8267 + } 8268 + 8269 + if (offset >= mmio->size) { 8270 + if (mode != PMT_OPEN_TRY) { 8271 + fprintf(stderr, "%s: offset %u outside of PMT MMIO size %u\n", __func__, offset, mmio->size); 8272 + exit(1); 8273 + } 8274 + 8275 + return 1; 8276 + } 8277 + 8278 + pcounter = pmt_find_counter(*pmt_root, name); 8279 + if (!pcounter) { 8280 + pcounter = calloc(1, sizeof(*pcounter)); 8281 + new_counter = true; 8282 + } 8283 + 8284 + if (new_counter) { 8285 + strncpy(pcounter->name, name, ARRAY_SIZE(pcounter->name) - 1); 8286 + pcounter->type = type; 8287 + pcounter->scope = scope; 8288 + pcounter->lsb = lsb; 8289 + pcounter->msb = msb; 8290 + pcounter->format = format; 8291 + } else { 8292 + conflict += pcounter->type != type; 8293 + conflict += pcounter->scope != scope; 8294 + conflict += pcounter->lsb != lsb; 8295 + conflict += pcounter->msb != msb; 8296 + conflict += pcounter->format != format; 8297 + } 8298 + 8299 + if (conflict) { 8300 + fprintf(stderr, "%s: conflicting parameters for the PMT counter with the same name %s\n", 8301 + __func__, name); 8302 + exit(1); 8303 + } 8304 + 8305 + pmt_counter_add_domain(pcounter, pmt_get_counter_pointer(mmio, offset), domain_id); 8306 + 8307 + if (new_counter) { 8308 + pcounter->next = *pmt_root; 8309 + *pmt_root = pcounter; 8310 + } 8311 + 8312 + return 0; 8313 + } 8314 + 8315 + void pmt_init(void) 8316 + { 8317 + if (BIC_IS_ENABLED(BIC_Diec6)) { 8318 + pmt_add_counter(PMT_MTL_DC6_GUID, "Die%c6", PMT_TYPE_XTAL_TIME, PMT_COUNTER_MTL_DC6_LSB, 8319 + PMT_COUNTER_MTL_DC6_MSB, PMT_COUNTER_MTL_DC6_OFFSET, SCOPE_PACKAGE, FORMAT_DELTA, 8320 + 0, PMT_OPEN_TRY); 8559 8321 } 8560 8322 } 8561 8323 ··· 8975 7923 process_cpuid(); 8976 7924 counter_info_init(); 8977 7925 probe_pm_features(); 8978 - set_amperf_source(); 7926 + msr_perf_init(); 8979 7927 linux_perf_init(); 8980 7928 rapl_perf_init(); 8981 7929 cstate_perf_init(); 7930 + added_perf_counters_init(); 7931 + pmt_init(); 8982 7932 8983 7933 for_all_cpus(get_cpu_type, ODD_COUNTERS); 8984 7934 for_all_cpus(get_cpu_type, EVEN_COUNTERS); 8985 7935 8986 - if (DO_BIC(BIC_IPC)) 8987 - (void)get_instr_count_fd(base_cpu); 7936 + if (BIC_IS_ENABLED(BIC_IPC) && has_aperf_access && get_instr_count_fd(base_cpu) != -1) 7937 + BIC_PRESENT(BIC_IPC); 8988 7938 8989 7939 /* 8990 7940 * If TSC tweak is needed, but couldn't get it, ··· 9071 8017 9072 8018 void print_version() 9073 8019 { 9074 - fprintf(outf, "turbostat version 2024.05.10 - Len Brown <lenb@kernel.org>\n"); 8020 + fprintf(outf, "turbostat version 2024.07.26 - Len Brown <lenb@kernel.org>\n"); 9075 8021 } 9076 8022 9077 8023 #define COMMAND_LINE_SIZE 2048 ··· 9103 8049 9104 8050 for (mp = head; mp; mp = mp->next) { 9105 8051 if (debug) 9106 - printf("%s: %s %s\n", __func__, name, mp->name); 8052 + fprintf(stderr, "%s: %s %s\n", __func__, name, mp->name); 9107 8053 if (!strncmp(name, mp->name, strlen(mp->name))) 9108 8054 return mp; 9109 8055 } ··· 9120 8066 errx(1, "Requested MSR counter 0x%x, but in --no-msr mode", msr_num); 9121 8067 9122 8068 if (debug) 9123 - printf("%s(msr%d, %s, %s, width%d, scope%d, type%d, format%d, flags%x, id%d)\n", __func__, msr_num, 9124 - path, name, width, scope, type, format, flags, id); 8069 + fprintf(stderr, "%s(msr%d, %s, %s, width%d, scope%d, type%d, format%d, flags%x, id%d)\n", 8070 + __func__, msr_num, path, name, width, scope, type, format, flags, id); 9125 8071 9126 8072 switch (scope) { 9127 8073 ··· 9129 8075 msrp = find_msrp_by_name(sys.tp, name); 9130 8076 if (msrp) { 9131 8077 if (debug) 9132 - printf("%s: %s FOUND\n", __func__, name); 8078 + fprintf(stderr, "%s: %s FOUND\n", __func__, name); 9133 8079 break; 9134 8080 } 9135 8081 if (sys.added_thread_counters++ >= MAX_ADDED_THREAD_COUNTERS) { ··· 9141 8087 msrp = find_msrp_by_name(sys.cp, name); 9142 8088 if (msrp) { 9143 8089 if (debug) 9144 - printf("%s: %s FOUND\n", __func__, name); 8090 + fprintf(stderr, "%s: %s FOUND\n", __func__, name); 9145 8091 break; 9146 8092 } 9147 8093 if (sys.added_core_counters++ >= MAX_ADDED_CORE_COUNTERS) { ··· 9153 8099 msrp = find_msrp_by_name(sys.pp, name); 9154 8100 if (msrp) { 9155 8101 if (debug) 9156 - printf("%s: %s FOUND\n", __func__, name); 8102 + fprintf(stderr, "%s: %s FOUND\n", __func__, name); 9157 8103 break; 9158 8104 } 9159 8105 if (sys.added_package_counters++ >= MAX_ADDED_PACKAGE_COUNTERS) { ··· 9170 8116 msrp = calloc(1, sizeof(struct msr_counter)); 9171 8117 if (msrp == NULL) 9172 8118 err(-1, "calloc msr_counter"); 8119 + 9173 8120 msrp->msr_num = msr_num; 9174 8121 strncpy(msrp->name, name, NAME_BYTES - 1); 9175 8122 msrp->width = width; ··· 9211 8156 return 0; 9212 8157 } 9213 8158 9214 - void parse_add_command(char *add_command) 8159 + /* 8160 + * Initialize the fields used for identifying and opening the counter. 8161 + * 8162 + * Defer the initialization of any runtime buffers for actually reading 8163 + * the counters for when we initialize all perf counters, so we can later 8164 + * easily call re_initialize(). 8165 + */ 8166 + struct perf_counter_info *make_perf_counter_info(const char *perf_device, 8167 + const char *perf_event, 8168 + const char *name, 8169 + unsigned int width, 8170 + enum counter_scope scope, 8171 + enum counter_type type, enum counter_format format) 8172 + { 8173 + struct perf_counter_info *pinfo; 8174 + 8175 + pinfo = calloc(1, sizeof(*pinfo)); 8176 + if (!pinfo) 8177 + errx(1, "%s: Failed to allocate %s/%s\n", __func__, perf_device, perf_event); 8178 + 8179 + strncpy(pinfo->device, perf_device, ARRAY_SIZE(pinfo->device) - 1); 8180 + strncpy(pinfo->event, perf_event, ARRAY_SIZE(pinfo->event) - 1); 8181 + 8182 + strncpy(pinfo->name, name, ARRAY_SIZE(pinfo->name) - 1); 8183 + pinfo->width = width; 8184 + pinfo->scope = scope; 8185 + pinfo->type = type; 8186 + pinfo->format = format; 8187 + 8188 + return pinfo; 8189 + } 8190 + 8191 + int add_perf_counter(const char *perf_device, const char *perf_event, const char *name_buffer, unsigned int width, 8192 + enum counter_scope scope, enum counter_type type, enum counter_format format) 8193 + { 8194 + struct perf_counter_info *pinfo; 8195 + 8196 + switch (scope) { 8197 + case SCOPE_CPU: 8198 + if (sys.added_thread_perf_counters >= MAX_ADDED_THREAD_COUNTERS) { 8199 + warnx("ignoring thread counter perf/%s/%s", perf_device, perf_event); 8200 + return -1; 8201 + } 8202 + break; 8203 + 8204 + case SCOPE_CORE: 8205 + if (sys.added_core_perf_counters >= MAX_ADDED_CORE_COUNTERS) { 8206 + warnx("ignoring core counter perf/%s/%s", perf_device, perf_event); 8207 + return -1; 8208 + } 8209 + break; 8210 + 8211 + case SCOPE_PACKAGE: 8212 + if (sys.added_package_perf_counters >= MAX_ADDED_PACKAGE_COUNTERS) { 8213 + warnx("ignoring package counter perf/%s/%s", perf_device, perf_event); 8214 + return -1; 8215 + } 8216 + break; 8217 + } 8218 + 8219 + pinfo = make_perf_counter_info(perf_device, perf_event, name_buffer, width, scope, type, format); 8220 + 8221 + if (!pinfo) 8222 + return -1; 8223 + 8224 + switch (scope) { 8225 + case SCOPE_CPU: 8226 + pinfo->next = sys.perf_tp; 8227 + sys.perf_tp = pinfo; 8228 + ++sys.added_thread_perf_counters; 8229 + break; 8230 + 8231 + case SCOPE_CORE: 8232 + pinfo->next = sys.perf_cp; 8233 + sys.perf_cp = pinfo; 8234 + ++sys.added_core_perf_counters; 8235 + break; 8236 + 8237 + case SCOPE_PACKAGE: 8238 + pinfo->next = sys.perf_pp; 8239 + sys.perf_pp = pinfo; 8240 + ++sys.added_package_perf_counters; 8241 + break; 8242 + } 8243 + 8244 + // FIXME: we might not have debug here yet 8245 + if (debug) 8246 + fprintf(stderr, "%s: %s/%s, name: %s, scope%d\n", 8247 + __func__, pinfo->device, pinfo->event, pinfo->name, pinfo->scope); 8248 + 8249 + return 0; 8250 + } 8251 + 8252 + void parse_add_command_msr(char *add_command) 9215 8253 { 9216 8254 int msr_num = 0; 9217 8255 char *path = NULL; 9218 - char name_buffer[NAME_BYTES] = ""; 8256 + char perf_device[PERF_DEV_NAME_BYTES] = ""; 8257 + char perf_event[PERF_EVT_NAME_BYTES] = ""; 8258 + char name_buffer[PERF_NAME_BYTES] = ""; 9219 8259 int width = 64; 9220 8260 int fail = 0; 9221 8261 enum counter_scope scope = SCOPE_CPU; ··· 9323 8173 goto next; 9324 8174 9325 8175 if (sscanf(add_command, "msr%d", &msr_num) == 1) 8176 + goto next; 8177 + 8178 + BUILD_BUG_ON(ARRAY_SIZE(perf_device) <= 31); 8179 + BUILD_BUG_ON(ARRAY_SIZE(perf_event) <= 31); 8180 + if (sscanf(add_command, "perf/%31[^/]/%31[^,]", &perf_device[0], &perf_event[0]) == 2) 9326 8181 goto next; 9327 8182 9328 8183 if (*add_command == '/') { ··· 9377 8222 goto next; 9378 8223 } 9379 8224 9380 - if (sscanf(add_command, "%18s,%*s", name_buffer) == 1) { /* 18 < NAME_BYTES */ 8225 + BUILD_BUG_ON(ARRAY_SIZE(name_buffer) <= 18); 8226 + if (sscanf(add_command, "%18s,%*s", name_buffer) == 1) { 9381 8227 char *eos; 9382 8228 9383 8229 eos = strchr(name_buffer, ','); ··· 9395 8239 } 9396 8240 9397 8241 } 9398 - if ((msr_num == 0) && (path == NULL)) { 9399 - fprintf(stderr, "--add: (msrDDD | msr0xXXX | /path_to_counter ) required\n"); 8242 + if ((msr_num == 0) && (path == NULL) && (perf_device[0] == '\0' || perf_event[0] == '\0')) { 8243 + fprintf(stderr, "--add: (msrDDD | msr0xXXX | /path_to_counter | perf/device/event ) required\n"); 9400 8244 fail++; 9401 8245 } 8246 + 8247 + /* Test for non-empty perf_device and perf_event */ 8248 + const bool is_perf_counter = perf_device[0] && perf_event[0]; 9402 8249 9403 8250 /* generate default column header */ 9404 8251 if (*name_buffer == '\0') { 9405 - if (width == 32) 9406 - sprintf(name_buffer, "M0x%x%s", msr_num, format == FORMAT_PERCENT ? "%" : ""); 9407 - else 9408 - sprintf(name_buffer, "M0X%x%s", msr_num, format == FORMAT_PERCENT ? "%" : ""); 8252 + if (is_perf_counter) { 8253 + snprintf(name_buffer, ARRAY_SIZE(name_buffer), "perf/%s", perf_event); 8254 + } else { 8255 + if (width == 32) 8256 + sprintf(name_buffer, "M0x%x%s", msr_num, format == FORMAT_PERCENT ? "%" : ""); 8257 + else 8258 + sprintf(name_buffer, "M0X%x%s", msr_num, format == FORMAT_PERCENT ? "%" : ""); 8259 + } 9409 8260 } 9410 8261 9411 - if (add_counter(msr_num, path, name_buffer, width, scope, type, format, 0, 0)) 9412 - fail++; 8262 + if (is_perf_counter) { 8263 + if (add_perf_counter(perf_device, perf_event, name_buffer, width, scope, type, format)) 8264 + fail++; 8265 + } else { 8266 + if (add_counter(msr_num, path, name_buffer, width, scope, type, format, 0, 0)) 8267 + fail++; 8268 + } 9413 8269 9414 8270 if (fail) { 9415 8271 help(); 9416 8272 exit(1); 9417 8273 } 8274 + } 8275 + 8276 + bool starts_with(const char *str, const char *prefix) 8277 + { 8278 + return strncmp(prefix, str, strlen(prefix)) == 0; 8279 + } 8280 + 8281 + void parse_add_command_pmt(char *add_command) 8282 + { 8283 + char *name = NULL; 8284 + char *type_name = NULL; 8285 + char *format_name = NULL; 8286 + unsigned int offset; 8287 + unsigned int lsb; 8288 + unsigned int msb; 8289 + unsigned int guid; 8290 + unsigned int domain_id; 8291 + enum counter_scope scope = 0; 8292 + enum pmt_datatype type = PMT_TYPE_RAW; 8293 + enum counter_format format = FORMAT_RAW; 8294 + bool has_offset = false; 8295 + bool has_lsb = false; 8296 + bool has_msb = false; 8297 + bool has_format = true; /* Format has a default value. */ 8298 + bool has_guid = false; 8299 + bool has_scope = false; 8300 + bool has_type = true; /* Type has a default value. */ 8301 + 8302 + /* Consume the "pmt," prefix. */ 8303 + add_command = strchr(add_command, ','); 8304 + if (!add_command) { 8305 + help(); 8306 + exit(1); 8307 + } 8308 + ++add_command; 8309 + 8310 + while (add_command) { 8311 + if (starts_with(add_command, "name=")) { 8312 + name = add_command + strlen("name="); 8313 + goto next; 8314 + } 8315 + 8316 + if (starts_with(add_command, "type=")) { 8317 + type_name = add_command + strlen("type="); 8318 + goto next; 8319 + } 8320 + 8321 + if (starts_with(add_command, "domain=")) { 8322 + const size_t prefix_len = strlen("domain="); 8323 + 8324 + if (sscanf(add_command + prefix_len, "cpu%u", &domain_id) == 1) { 8325 + scope = SCOPE_CPU; 8326 + has_scope = true; 8327 + } else if (sscanf(add_command + prefix_len, "core%u", &domain_id) == 1) { 8328 + scope = SCOPE_CORE; 8329 + has_scope = true; 8330 + } else if (sscanf(add_command + prefix_len, "package%u", &domain_id) == 1) { 8331 + scope = SCOPE_PACKAGE; 8332 + has_scope = true; 8333 + } 8334 + 8335 + if (!has_scope) { 8336 + printf("%s: invalid value for scope. Expected cpu%%u, core%%u or package%%u.\n", 8337 + __func__); 8338 + exit(1); 8339 + } 8340 + 8341 + goto next; 8342 + } 8343 + 8344 + if (starts_with(add_command, "format=")) { 8345 + format_name = add_command + strlen("format="); 8346 + goto next; 8347 + } 8348 + 8349 + if (sscanf(add_command, "offset=%u", &offset) == 1) { 8350 + has_offset = true; 8351 + goto next; 8352 + } 8353 + 8354 + if (sscanf(add_command, "lsb=%u", &lsb) == 1) { 8355 + has_lsb = true; 8356 + goto next; 8357 + } 8358 + 8359 + if (sscanf(add_command, "msb=%u", &msb) == 1) { 8360 + has_msb = true; 8361 + goto next; 8362 + } 8363 + 8364 + if (sscanf(add_command, "guid=%x", &guid) == 1) { 8365 + has_guid = true; 8366 + goto next; 8367 + } 8368 + 8369 + next: 8370 + add_command = strchr(add_command, ','); 8371 + if (add_command) { 8372 + *add_command = '\0'; 8373 + add_command++; 8374 + } 8375 + } 8376 + 8377 + if (!name) { 8378 + printf("%s: missing %s\n", __func__, "name"); 8379 + exit(1); 8380 + } 8381 + 8382 + if (strlen(name) >= PMT_COUNTER_NAME_SIZE_BYTES) { 8383 + printf("%s: name has to be at most %d characters long\n", __func__, PMT_COUNTER_NAME_SIZE_BYTES); 8384 + exit(1); 8385 + } 8386 + 8387 + if (format_name) { 8388 + has_format = false; 8389 + 8390 + if (strcmp("raw", format_name) == 0) { 8391 + format = FORMAT_RAW; 8392 + has_format = true; 8393 + } 8394 + 8395 + if (strcmp("delta", format_name) == 0) { 8396 + format = FORMAT_DELTA; 8397 + has_format = true; 8398 + } 8399 + 8400 + if (!has_format) { 8401 + fprintf(stderr, "%s: Invalid format %s. Expected raw or delta\n", __func__, format_name); 8402 + exit(1); 8403 + } 8404 + } 8405 + 8406 + if (type_name) { 8407 + has_type = false; 8408 + 8409 + if (strcmp("raw", type_name) == 0) { 8410 + type = PMT_TYPE_RAW; 8411 + has_type = true; 8412 + } 8413 + 8414 + if (strcmp("txtal_time", type_name) == 0) { 8415 + type = PMT_TYPE_XTAL_TIME; 8416 + has_type = true; 8417 + } 8418 + 8419 + if (!has_type) { 8420 + printf("%s: invalid %s: %s\n", __func__, "type", type_name); 8421 + exit(1); 8422 + } 8423 + } 8424 + 8425 + if (!has_offset) { 8426 + printf("%s : missing %s\n", __func__, "offset"); 8427 + exit(1); 8428 + } 8429 + 8430 + if (!has_lsb) { 8431 + printf("%s: missing %s\n", __func__, "lsb"); 8432 + exit(1); 8433 + } 8434 + 8435 + if (!has_msb) { 8436 + printf("%s: missing %s\n", __func__, "msb"); 8437 + exit(1); 8438 + } 8439 + 8440 + if (!has_guid) { 8441 + printf("%s: missing %s\n", __func__, "guid"); 8442 + exit(1); 8443 + } 8444 + 8445 + if (!has_scope) { 8446 + printf("%s: missing %s\n", __func__, "scope"); 8447 + exit(1); 8448 + } 8449 + 8450 + if (lsb > msb) { 8451 + printf("%s: lsb > msb doesn't make sense\n", __func__); 8452 + exit(1); 8453 + } 8454 + 8455 + pmt_add_counter(guid, name, type, lsb, msb, offset, scope, format, domain_id, PMT_OPEN_REQUIRED); 8456 + } 8457 + 8458 + void parse_add_command(char *add_command) 8459 + { 8460 + if (strncmp(add_command, "pmt", strlen("pmt")) == 0) 8461 + return parse_add_command_pmt(add_command); 8462 + return parse_add_command_msr(add_command); 9418 8463 } 9419 8464 9420 8465 int is_deferred_add(char *name)
+178
tools/testing/selftests/turbostat/added_perf_counters.py
··· 1 + #!/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + import subprocess 5 + from shutil import which 6 + from os import pread 7 + 8 + class PerfCounterInfo: 9 + def __init__(self, subsys, event): 10 + self.subsys = subsys 11 + self.event = event 12 + 13 + def get_perf_event_name(self): 14 + return f'{self.subsys}/{self.event}/' 15 + 16 + def get_turbostat_perf_id(self, counter_scope, counter_type, column_name): 17 + return f'perf/{self.subsys}/{self.event},{counter_scope},{counter_type},{column_name}' 18 + 19 + PERF_COUNTERS_CANDIDATES = [ 20 + PerfCounterInfo('msr', 'mperf'), 21 + PerfCounterInfo('msr', 'aperf'), 22 + PerfCounterInfo('msr', 'tsc'), 23 + PerfCounterInfo('cstate_core', 'c1-residency'), 24 + PerfCounterInfo('cstate_core', 'c6-residency'), 25 + PerfCounterInfo('cstate_core', 'c7-residency'), 26 + PerfCounterInfo('cstate_pkg', 'c2-residency'), 27 + PerfCounterInfo('cstate_pkg', 'c3-residency'), 28 + PerfCounterInfo('cstate_pkg', 'c6-residency'), 29 + PerfCounterInfo('cstate_pkg', 'c7-residency'), 30 + PerfCounterInfo('cstate_pkg', 'c8-residency'), 31 + PerfCounterInfo('cstate_pkg', 'c9-residency'), 32 + PerfCounterInfo('cstate_pkg', 'c10-residency'), 33 + ] 34 + present_perf_counters = [] 35 + 36 + def check_perf_access(): 37 + perf = which('perf') 38 + if perf is None: 39 + print('SKIP: Could not find perf binary, thus could not determine perf access.') 40 + return False 41 + 42 + def has_perf_counter_access(counter_name): 43 + proc_perf = subprocess.run([perf, 'stat', '-e', counter_name, '--timeout', '10'], 44 + capture_output = True) 45 + 46 + if proc_perf.returncode != 0: 47 + print(f'SKIP: Could not read {counter_name} perf counter.') 48 + return False 49 + 50 + if b'<not supported>' in proc_perf.stderr: 51 + print(f'SKIP: Could not read {counter_name} perf counter.') 52 + return False 53 + 54 + return True 55 + 56 + for counter in PERF_COUNTERS_CANDIDATES: 57 + if has_perf_counter_access(counter.get_perf_event_name()): 58 + present_perf_counters.append(counter) 59 + 60 + if len(present_perf_counters) == 0: 61 + print('SKIP: Could not read any perf counter.') 62 + return False 63 + 64 + if len(present_perf_counters) != len(PERF_COUNTERS_CANDIDATES): 65 + print(f'WARN: Could not access all of the counters - some will be left untested') 66 + 67 + return True 68 + 69 + if not check_perf_access(): 70 + exit(0) 71 + 72 + turbostat_counter_source_opts = [''] 73 + 74 + turbostat = which('turbostat') 75 + if turbostat is None: 76 + print('Could not find turbostat binary') 77 + exit(1) 78 + 79 + timeout = which('timeout') 80 + if timeout is None: 81 + print('Could not find timeout binary') 82 + exit(1) 83 + 84 + proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True) 85 + if proc_turbostat.returncode != 0: 86 + print(f'turbostat failed with {proc_turbostat.returncode}') 87 + exit(1) 88 + 89 + EXPECTED_COLUMNS_DEBUG_DEFAULT = [b'usec', b'Time_Of_Day_Seconds', b'APIC', b'X2APIC'] 90 + 91 + expected_columns = [b'CPU'] 92 + counters_argv = [] 93 + for counter in present_perf_counters: 94 + if counter.subsys == 'cstate_core': 95 + counter_scope = 'core' 96 + elif counter.subsys == 'cstate_pkg': 97 + counter_scope = 'package' 98 + else: 99 + counter_scope = 'cpu' 100 + 101 + counter_type = 'delta' 102 + column_name = counter.event 103 + 104 + cparams = counter.get_turbostat_perf_id( 105 + counter_scope = counter_scope, 106 + counter_type = counter_type, 107 + column_name = column_name 108 + ) 109 + expected_columns.append(column_name.encode()) 110 + counters_argv.extend(['--add', cparams]) 111 + 112 + expected_columns_debug = EXPECTED_COLUMNS_DEBUG_DEFAULT + expected_columns 113 + 114 + def gen_user_friendly_cmdline(argv_): 115 + argv = argv_[:] 116 + ret = '' 117 + 118 + while len(argv) != 0: 119 + arg = argv.pop(0) 120 + arg_next = '' 121 + 122 + if arg in ('-i', '--show', '--add'): 123 + arg_next = argv.pop(0) if len(argv) > 0 else '' 124 + 125 + ret += f'{arg} {arg_next} \\\n\t' 126 + 127 + # Remove the last separator and return 128 + return ret[:-4] 129 + 130 + # 131 + # Run turbostat for some time and send SIGINT 132 + # 133 + timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '0.2s'] 134 + turbostat_argv = [turbostat, '-i', '0.50', '--show', 'CPU'] + counters_argv 135 + 136 + def check_columns_or_fail(expected_columns: list, actual_columns: list): 137 + if len(actual_columns) != len(expected_columns): 138 + print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}') 139 + exit(1) 140 + 141 + failed = False 142 + for expected_column in expected_columns: 143 + if expected_column not in actual_columns: 144 + print(f'turbostat column check failed: missing column {expected_column.decode()}') 145 + failed = True 146 + 147 + if failed: 148 + exit(1) 149 + 150 + cmdline = gen_user_friendly_cmdline(turbostat_argv) 151 + print(f'Running turbostat with:\n\t{cmdline}\n... ', end = '', flush = True) 152 + proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True) 153 + if proc_turbostat.returncode != 0: 154 + print(f'turbostat failed with {proc_turbostat.returncode}') 155 + exit(1) 156 + 157 + actual_columns = proc_turbostat.stdout.split(b'\n')[0].split(b'\t') 158 + check_columns_or_fail(expected_columns, actual_columns) 159 + print('OK') 160 + 161 + # 162 + # Same, but with --debug 163 + # 164 + # We explicitly specify '--show CPU' to make sure turbostat 165 + # don't show a bunch of default counters instead. 166 + # 167 + turbostat_argv.append('--debug') 168 + 169 + cmdline = gen_user_friendly_cmdline(turbostat_argv) 170 + print(f'Running turbostat (in debug mode) with:\n\t{cmdline}\n... ', end = '', flush = True) 171 + proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True) 172 + if proc_turbostat.returncode != 0: 173 + print(f'turbostat failed with {proc_turbostat.returncode}') 174 + exit(1) 175 + 176 + actual_columns = proc_turbostat.stdout.split(b'\n')[0].split(b'\t') 177 + check_columns_or_fail(expected_columns_debug, actual_columns) 178 + print('OK')
+157
tools/testing/selftests/turbostat/smi_aperf_mperf.py
··· 1 + #!/bin/env python3 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + import subprocess 5 + from shutil import which 6 + from os import pread 7 + 8 + # CDLL calls dlopen underneath. 9 + # Calling it with None (null), we get handle to the our own image (python interpreter). 10 + # We hope to find sched_getcpu() inside ;] 11 + # This is a bit ugly, but helps shipping working software, so.. 12 + try: 13 + import ctypes 14 + 15 + this_image = ctypes.CDLL(None) 16 + BASE_CPU = this_image.sched_getcpu() 17 + except: 18 + BASE_CPU = 0 # If we fail, set to 0 and pray it's not offline. 19 + 20 + MSR_IA32_MPERF = 0x000000e7 21 + MSR_IA32_APERF = 0x000000e8 22 + 23 + def check_perf_access(): 24 + perf = which('perf') 25 + if perf is None: 26 + print('SKIP: Could not find perf binary, thus could not determine perf access.') 27 + return False 28 + 29 + def has_perf_counter_access(counter_name): 30 + proc_perf = subprocess.run([perf, 'stat', '-e', counter_name, '--timeout', '10'], 31 + capture_output = True) 32 + 33 + if proc_perf.returncode != 0: 34 + print(f'SKIP: Could not read {counter_name} perf counter, assuming no access.') 35 + return False 36 + 37 + if b'<not supported>' in proc_perf.stderr: 38 + print(f'SKIP: Could not read {counter_name} perf counter, assuming no access.') 39 + return False 40 + 41 + return True 42 + 43 + if not has_perf_counter_access('msr/mperf/'): 44 + return False 45 + if not has_perf_counter_access('msr/aperf/'): 46 + return False 47 + if not has_perf_counter_access('msr/smi/'): 48 + return False 49 + 50 + return True 51 + 52 + def check_msr_access(): 53 + try: 54 + file_msr = open(f'/dev/cpu/{BASE_CPU}/msr', 'rb') 55 + except: 56 + return False 57 + 58 + if len(pread(file_msr.fileno(), 8, MSR_IA32_MPERF)) != 8: 59 + return False 60 + 61 + if len(pread(file_msr.fileno(), 8, MSR_IA32_APERF)) != 8: 62 + return False 63 + 64 + return True 65 + 66 + has_perf_access = check_perf_access() 67 + has_msr_access = check_msr_access() 68 + 69 + turbostat_counter_source_opts = [''] 70 + 71 + if has_msr_access: 72 + turbostat_counter_source_opts.append('--no-perf') 73 + else: 74 + print('SKIP: doesn\'t have MSR access, skipping run with --no-perf') 75 + 76 + if has_perf_access: 77 + turbostat_counter_source_opts.append('--no-msr') 78 + else: 79 + print('SKIP: doesn\'t have perf access, skipping run with --no-msr') 80 + 81 + if not has_msr_access and not has_perf_access: 82 + print('SKIP: No MSR nor perf access detected. Skipping the tests entirely') 83 + exit(0) 84 + 85 + turbostat = which('turbostat') 86 + if turbostat is None: 87 + print('Could not find turbostat binary') 88 + exit(1) 89 + 90 + timeout = which('timeout') 91 + if timeout is None: 92 + print('Could not find timeout binary') 93 + exit(1) 94 + 95 + proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True) 96 + if proc_turbostat.returncode != 0: 97 + print(f'turbostat failed with {proc_turbostat.returncode}') 98 + exit(1) 99 + 100 + EXPECTED_COLUMNS_DEBUG_DEFAULT = b'usec\tTime_Of_Day_Seconds\tAPIC\tX2APIC' 101 + 102 + SMI_APERF_MPERF_DEPENDENT_BICS = [ 103 + 'SMI', 104 + 'Avg_MHz', 105 + 'Busy%', 106 + 'Bzy_MHz', 107 + ] 108 + if has_perf_access: 109 + SMI_APERF_MPERF_DEPENDENT_BICS.append('IPC') 110 + 111 + for bic in SMI_APERF_MPERF_DEPENDENT_BICS: 112 + for counter_source_opt in turbostat_counter_source_opts: 113 + 114 + # Ugly special case, but it is what it is.. 115 + if counter_source_opt == '--no-perf' and bic == 'IPC': 116 + continue 117 + 118 + expected_columns = bic.encode() 119 + expected_columns_debug = EXPECTED_COLUMNS_DEBUG_DEFAULT + f'\t{bic}'.encode() 120 + 121 + # 122 + # Run turbostat for some time and send SIGINT 123 + # 124 + timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '0.2s'] 125 + turbostat_argv = [turbostat, '-i', '0.50', '--show', bic] 126 + 127 + if counter_source_opt: 128 + turbostat_argv.append(counter_source_opt) 129 + 130 + print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True) 131 + proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True) 132 + if proc_turbostat.returncode != 0: 133 + print(f'turbostat failed with {proc_turbostat.returncode}') 134 + exit(1) 135 + 136 + actual_columns = proc_turbostat.stdout.split(b'\n')[0] 137 + if expected_columns != actual_columns: 138 + print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}') 139 + exit(1) 140 + print('OK') 141 + 142 + # 143 + # Same, but with --debug 144 + # 145 + turbostat_argv.append('--debug') 146 + 147 + print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True) 148 + proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True) 149 + if proc_turbostat.returncode != 0: 150 + print(f'turbostat failed with {proc_turbostat.returncode}') 151 + exit(1) 152 + 153 + actual_columns = proc_turbostat.stdout.split(b'\n')[0] 154 + if expected_columns_debug != actual_columns: 155 + print(f'turbostat column check failed\n{expected_columns_debug=}\n{actual_columns=}') 156 + exit(1) 157 + print('OK')