Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

- Fix regression that affinitized forked child in one-shot mode.

- Harden one-shot mode against hotplug online/offline

- Enable RAPL SysWatt column by default

- Add initial PTL, CWF platform support

- Harden initial PMT code in response to early use

- Enable first built-in PMT counter: CWF c1e residency

- Refuse to run on unsupported platforms without --force, to encourage
updating to a version that supports the system, and to avoid
no-so-useful measurement results

* tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits)
tools/power turbostat: version 2025.02.02
tools/power turbostat: Add CPU%c1e BIC for CWF
tools/power turbostat: Harden one-shot mode against cpu offline
tools/power turbostat: Fix forked child affinity regression
tools/power turbostat: Add tcore clock PMT type
tools/power turbostat: version 2025.01.14
tools/power turbostat: Allow adding PMT counters directly by sysfs path
tools/power turbostat: Allow mapping multiple PMT files with the same GUID
tools/power turbostat: Add PMT directory iterator helper
tools/power turbostat: Extend PMT identification with a sequence number
tools/power turbostat: Return default value for unmapped PMT domains
tools/power turbostat: Check for non-zero value when MSR probing
tools/power turbostat: Enhance turbostat self-performance visibility
tools/power turbostat: Add fixed RAPL PSYS divisor for SPR
tools/power turbostat: Fix PMT mmaped file size rounding
tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT
tools/power turbostat: Add an NMI column
tools/power turbostat: add Busy% to "show idle"
tools/power turbostat: Introduce --force parameter
tools/power turbostat: Improve --help output
...

+575 -109
+29 -3
tools/power/x86/turbostat/turbostat.8
··· 136 136 The system configuration dump (if --quiet is not used) is followed by statistics. The first row of the statistics labels the content of each column (below). The second row of statistics is the system summary line. The system summary line has a '-' in the columns for the Package, Core, and CPU. The contents of the system summary line depends on the type of column. Columns that count items (eg. IRQ) show the sum across all CPUs in the system. Columns that show a percentage show the average across all CPUs in the system. Columns that dump raw MSR values simply show 0 in the summary. After the system summary row, each row describes a specific Package/Core/CPU. Note that if the --cpu parameter is used to limit which specific CPUs are displayed, turbostat will still collect statistics for all CPUs in the system and will still show the system summary for all CPUs in the system. 137 137 .SH COLUMN DESCRIPTIONS 138 138 .PP 139 - \fBusec\fP For each CPU, the number of microseconds elapsed during counter collection, including thread migration -- if any. This counter is disabled by default, and is enabled with "--enable usec", or --debug. On the summary row, usec refers to the total elapsed time to collect the counters on all cpus. 139 + \fBusec\fP For each CPU, the number of microseconds elapsed during counter collection, including thread migration -- if any. This counter is disabled by default, and is enabled with "--enable usec", or --debug. On the summary row, usec refers to the total elapsed time to snapshot the procfs/sysfs and collect the counters on all cpus. 140 140 .PP 141 141 \fBTime_Of_Day_Seconds\fP For each CPU, the gettimeofday(2) value (seconds.subsec since Epoch) when the counters ending the measurement interval were collected. This column is disabled by default, and can be enabled with "--enable Time_Of_Day_Seconds" or "--debug". On the summary row, Time_Of_Day_Seconds refers to the timestamp following collection of counters on the last CPU. 142 142 .PP ··· 190 190 .PP 191 191 \fBRAMWatt\fP Watts consumed by the DRAM DIMMS -- available only on server processors. 192 192 .PP 193 - \fBSysWatt\fP Watts consumed by the whole platform (RAPL PSYS). Disabled by default. Enable with --enable SysWatt. 193 + \fBSysWatt\fP Watts consumed by the whole platform (RAPL PSYS). 194 194 .PP 195 195 \fBPKG_%\fP percent of the interval that RAPL throttling was active on the Package. Note that the system summary is the sum of the package throttling time, and thus may be higher than 100% on a multi-package system. Note that the meaning of this field is model specific. For example, some hardware increments this counter when RAPL responds to thermal limits, but does not increment this counter when RAPL responds to power limits. Comparing PkgWatt and PkgTmp to system limits is necessary. 196 196 .PP ··· 516 516 Volume 3B: System Programming Guide" 517 517 https://www.intel.com/products/processor/manuals/ 518 518 519 + .SH RUN THE LATEST VERSION 520 + If turbostat complains that it doesn't recognize your processor, 521 + please try the latest version. 522 + 523 + The latest version of turbostat does not require the latest version of the Linux kernel. 524 + However, some features, such as perf(1) counters, do require kernel support. 525 + 526 + The latest turbostat release is available in the upstream Linux Kernel source tree. 527 + eg. "git pull https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git" 528 + and run make in tools/power/x86/turbostat/. 529 + 530 + n.b. "make install" will update your system manually, but a distro update may subsequently downgrade your turbostat to an older version. 531 + For this reason, manually installing to /usr/local/bin may be what you want. 532 + 533 + Note that turbostat/Makefile has a "make snapshot" target, which will create a tar file 534 + that can build without a local kernel source tree. 535 + 536 + If the upstream version isn't new enough, the development tree can be found here: 537 + "git pull https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git turbostat" 538 + 539 + If the development tree doesn't work, please contact the author via chat, 540 + or via email with the word "turbostat" on the Subject line. 541 + 519 542 .SH FILES 520 543 .ta 521 544 .nf 545 + /sys/bus/event_source/devices/ 522 546 /dev/cpu/*/msr 547 + /sys/class/intel_pmt/ 548 + /sys/devices/system/cpu/ 523 549 .fi 524 550 525 551 .SH "SEE ALSO" 526 - msr(4), vmstat(8) 552 + perf(1), msr(4), vmstat(8) 527 553 .PP 528 554 .SH AUTHOR 529 555 .nf
+546 -106
tools/power/x86/turbostat/turbostat.c
··· 3 3 * turbostat -- show CPU frequency and C-state residency 4 4 * on modern Intel and AMD processors. 5 5 * 6 - * Copyright (c) 2024 Intel Corporation. 6 + * Copyright (c) 2025 Intel Corporation. 7 7 * Len Brown <len.brown@intel.com> 8 8 */ 9 9 ··· 94 94 95 95 #define INTEL_ECORE_TYPE 0x20 96 96 #define INTEL_PCORE_TYPE 0x40 97 + 98 + #define ROUND_UP_TO_PAGE_SIZE(n) (((n) + 0x1000UL-1UL) & ~(0x1000UL-1UL)) 97 99 98 100 enum counter_scope { SCOPE_CPU, SCOPE_CORE, SCOPE_PACKAGE }; 99 101 enum counter_type { COUNTER_ITEMS, COUNTER_CYCLES, COUNTER_SECONDS, COUNTER_USEC, COUNTER_K2M }; ··· 204 202 { 0x0, "Die%c6", NULL, 0, 0, 0, NULL, 0 }, 205 203 { 0x0, "SysWatt", NULL, 0, 0, 0, NULL, 0 }, 206 204 { 0x0, "Sys_J", NULL, 0, 0, 0, NULL, 0 }, 205 + { 0x0, "NMI", NULL, 0, 0, 0, NULL, 0 }, 206 + { 0x0, "CPU%c1e", NULL, 0, 0, 0, NULL, 0 }, 207 207 }; 208 208 209 209 #define MAX_BIC (sizeof(bic) / sizeof(struct msr_counter)) ··· 270 266 #define BIC_Diec6 (1ULL << 58) 271 267 #define BIC_SysWatt (1ULL << 59) 272 268 #define BIC_Sys_J (1ULL << 60) 269 + #define BIC_NMI (1ULL << 61) 270 + #define BIC_CPU_c1e (1ULL << 62) 273 271 274 - #define BIC_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die ) 275 - #define BIC_THERMAL_PWR ( BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__) 272 + #define BIC_TOPOLOGY (BIC_Package | BIC_Node | BIC_CoreCnt | BIC_PkgCnt | BIC_Core | BIC_CPU | BIC_Die) 273 + #define BIC_THERMAL_PWR (BIC_CoreTmp | BIC_PkgTmp | BIC_PkgWatt | BIC_CorWatt | BIC_GFXWatt | BIC_RAMWatt | BIC_PKG__ | BIC_RAM__ | BIC_SysWatt) 276 274 #define BIC_FREQUENCY (BIC_Avg_MHz | BIC_Busy | BIC_Bzy_MHz | BIC_TSC_MHz | BIC_GFXMHz | BIC_GFXACTMHz | BIC_SAMMHz | BIC_SAMACTMHz | BIC_UNCORE_MHZ) 277 - #define BIC_IDLE (BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6 | BIC_Diec6) 278 - #define BIC_OTHER ( BIC_IRQ | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC) 275 + #define BIC_IDLE (BIC_Busy | BIC_sysfs | BIC_CPU_c1 | BIC_CPU_c3 | BIC_CPU_c6 | BIC_CPU_c7 | BIC_GFX_rc6 | BIC_Pkgpc2 | BIC_Pkgpc3 | BIC_Pkgpc6 | BIC_Pkgpc7 | BIC_Pkgpc8 | BIC_Pkgpc9 | BIC_Pkgpc10 | BIC_CPU_LPI | BIC_SYS_LPI | BIC_Mod_c6 | BIC_Totl_c0 | BIC_Any_c0 | BIC_GFX_c0 | BIC_CPUGFX | BIC_SAM_mc6 | BIC_Diec6) 276 + #define BIC_OTHER (BIC_IRQ | BIC_NMI | BIC_SMI | BIC_ThreadC | BIC_CoreTmp | BIC_IPC) 279 277 280 - #define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC | BIC_SysWatt | BIC_Sys_J) 278 + #define BIC_DISABLED_BY_DEFAULT (BIC_USEC | BIC_TOD | BIC_APIC | BIC_X2APIC) 281 279 282 280 unsigned long long bic_enabled = (0xFFFFFFFFFFFFFFFFULL & ~BIC_DISABLED_BY_DEFAULT); 283 281 unsigned long long bic_present = BIC_USEC | BIC_TOD | BIC_sysfs | BIC_APIC | BIC_X2APIC; ··· 332 326 unsigned int summary_only; 333 327 unsigned int list_header_only; 334 328 unsigned int dump_only; 329 + unsigned int force_load; 335 330 unsigned int has_aperf; 336 331 unsigned int has_aperf_access; 337 332 unsigned int has_epb; ··· 360 353 unsigned int tj_max; 361 354 unsigned int tj_max_override; 362 355 double rapl_power_units, rapl_time_units; 363 - double rapl_dram_energy_units, rapl_energy_units; 356 + double rapl_dram_energy_units, rapl_energy_units, rapl_psys_energy_units; 364 357 double rapl_joule_counter_range; 365 358 unsigned int crystal_hz; 366 359 unsigned long long tsc_hz; ··· 372 365 unsigned int has_hwp_epp; /* IA32_HWP_REQUEST[bits 31:24] */ 373 366 unsigned int has_hwp_pkg; /* IA32_HWP_REQUEST_PKG */ 374 367 unsigned int first_counter_read = 1; 368 + 369 + static struct timeval procsysfs_tv_begin; 370 + 375 371 int ignore_stdin; 376 372 bool no_msr; 377 373 bool no_perf; ··· 429 419 bool has_per_core_rapl; /* Indicates cores energy collection is per-core, not per-package. AMD specific for now */ 430 420 bool has_rapl_divisor; /* Divisor for Energy unit raw value from MSR_RAPL_POWER_UNIT */ 431 421 bool has_fixed_rapl_unit; /* Fixed Energy Unit used for DRAM RAPL Domain */ 422 + bool has_fixed_rapl_psys_unit; /* Fixed Energy Unit used for PSYS RAPL Domain */ 432 423 int rapl_quirk_tdp; /* Hardcoded TDP value when cannot be retrieved from hardware */ 433 424 int tcc_offset_bits; /* TCC Offset bits in MSR_IA32_TEMPERATURE_TARGET */ 434 425 bool enable_tsc_tweak; /* Use CPU Base freq instead of TSC freq for aperf/mperf counter */ ··· 830 819 .has_msr_core_c1_res = 1, 831 820 .has_irtl_msrs = 1, 832 821 .has_cst_prewake_bit = 1, 822 + .has_fixed_rapl_psys_unit = 1, 833 823 .trl_msrs = TRL_BASE | TRL_CORECOUNT, 834 824 .rapl_msrs = RAPL_PKG_ALL | RAPL_DRAM_ALL | RAPL_PSYS, 835 825 }; ··· 1036 1024 { INTEL_ARROWLAKE_U, &adl_features }, 1037 1025 { INTEL_ARROWLAKE, &adl_features }, 1038 1026 { INTEL_LUNARLAKE_M, &lnl_features }, 1027 + { INTEL_PANTHERLAKE_L, &lnl_features }, 1039 1028 { INTEL_ATOM_SILVERMONT, &slv_features }, 1040 1029 { INTEL_ATOM_SILVERMONT_D, &slvd_features }, 1041 1030 { INTEL_ATOM_AIRMONT, &amt_features }, ··· 1049 1036 { INTEL_ATOM_GRACEMONT, &adl_features }, 1050 1037 { INTEL_ATOM_CRESTMONT_X, &srf_features }, 1051 1038 { INTEL_ATOM_CRESTMONT, &grr_features }, 1039 + { INTEL_ATOM_DARKMONT_X, &srf_features }, 1052 1040 { INTEL_XEON_PHI_KNL, &knl_features }, 1053 1041 { INTEL_XEON_PHI_KNM, &knl_features }, 1054 1042 /* ··· 1068 1054 { 1069 1055 int i; 1070 1056 1071 - platform = &default_features; 1072 1057 1073 1058 if (authentic_amd || hygon_genuine) { 1059 + /* fallback to default features on unsupported models */ 1060 + force_load++; 1074 1061 if (max_extended_level >= 0x80000007) { 1075 1062 unsigned int eax, ebx, ecx, edx; 1076 1063 ··· 1080 1065 if ((edx & (1 << 14)) && family >= 0x17) 1081 1066 platform = &amd_features_with_rapl; 1082 1067 } 1083 - return; 1068 + goto end; 1084 1069 } 1085 1070 1086 1071 if (!genuine_intel) 1087 - return; 1072 + goto end; 1088 1073 1089 1074 for (i = 0; turbostat_pdata[i].features; i++) { 1090 1075 if (VFM_FAMILY(turbostat_pdata[i].vfm) == family && VFM_MODEL(turbostat_pdata[i].vfm) == model) { ··· 1092 1077 return; 1093 1078 } 1094 1079 } 1080 + 1081 + end: 1082 + if (force_load && !platform) { 1083 + fprintf(outf, "Forced to run on unsupported platform!\n"); 1084 + platform = &default_features; 1085 + } 1086 + 1087 + if (platform) 1088 + return; 1089 + 1090 + fprintf(stderr, "Unsupported platform detected.\n" 1091 + "\tSee RUN THE LATEST VERSION on turbostat(8)\n"); 1092 + exit(1); 1095 1093 } 1096 1094 1097 1095 /* Model specific support End */ ··· 1122 1094 char *progname; 1123 1095 1124 1096 #define CPU_SUBSET_MAXCPUS 1024 /* need to use before probe... */ 1125 - cpu_set_t *cpu_present_set, *cpu_effective_set, *cpu_allowed_set, *cpu_affinity_set, *cpu_subset; 1126 - size_t cpu_present_setsize, cpu_effective_setsize, cpu_allowed_setsize, cpu_affinity_setsize, cpu_subset_size; 1097 + cpu_set_t *cpu_present_set, *cpu_possible_set, *cpu_effective_set, *cpu_allowed_set, *cpu_affinity_set, *cpu_subset; 1098 + size_t cpu_present_setsize, cpu_possible_setsize, cpu_effective_setsize, cpu_allowed_setsize, cpu_affinity_setsize, cpu_subset_size; 1127 1099 #define MAX_ADDED_THREAD_COUNTERS 24 1128 1100 #define MAX_ADDED_CORE_COUNTERS 8 1129 1101 #define MAX_ADDED_PACKAGE_COUNTERS 16 ··· 1299 1271 .msr = MSR_PLATFORM_ENERGY_STATUS, 1300 1272 .msr_mask = 0x00000000FFFFFFFF, 1301 1273 .msr_shift = 0, 1302 - .platform_rapl_msr_scale = &rapl_energy_units, 1274 + .platform_rapl_msr_scale = &rapl_psys_energy_units, 1303 1275 .rci_index = RAPL_RCI_INDEX_ENERGY_PLATFORM, 1304 1276 .bic = BIC_SysWatt | BIC_Sys_J, 1305 1277 .compat_scale = 1.0, ··· 1538 1510 #define PMT_COUNTER_MTL_DC6_LSB 0 1539 1511 #define PMT_COUNTER_MTL_DC6_MSB 63 1540 1512 #define PMT_MTL_DC6_GUID 0x1a067102 1513 + #define PMT_MTL_DC6_SEQ 0 1514 + 1515 + #define PMT_COUNTER_CWF_MC1E_OFFSET_BASE 20936 1516 + #define PMT_COUNTER_CWF_MC1E_OFFSET_INCREMENT 24 1517 + #define PMT_COUNTER_CWF_MC1E_NUM_MODULES_PER_FILE 12 1518 + #define PMT_COUNTER_CWF_CPUS_PER_MODULE 4 1519 + #define PMT_COUNTER_CWF_MC1E_LSB 0 1520 + #define PMT_COUNTER_CWF_MC1E_MSB 63 1521 + #define PMT_CWF_MC1E_GUID 0x14421519 1522 + 1523 + unsigned long long tcore_clock_freq_hz = 800000000; 1541 1524 1542 1525 #define PMT_COUNTER_NAME_SIZE_BYTES 16 1543 1526 #define PMT_COUNTER_TYPE_NAME_SIZE_BYTES 32 ··· 1572 1533 enum pmt_datatype { 1573 1534 PMT_TYPE_RAW, 1574 1535 PMT_TYPE_XTAL_TIME, 1536 + PMT_TYPE_TCORE_CLOCK, 1575 1537 }; 1576 1538 1577 1539 struct pmt_domain_info { ··· 1601 1561 unsigned int num_domains; 1602 1562 struct pmt_domain_info *domains; 1603 1563 }; 1564 + 1565 + /* 1566 + * PMT telemetry directory iterator. 1567 + * Used to iterate telemetry files in sysfs in correct order. 1568 + */ 1569 + struct pmt_diriter_t { 1570 + DIR *dir; 1571 + struct dirent **namelist; 1572 + unsigned int num_names; 1573 + unsigned int current_name_idx; 1574 + }; 1575 + 1576 + int pmt_telemdir_filter(const struct dirent *e) 1577 + { 1578 + unsigned int dummy; 1579 + 1580 + return sscanf(e->d_name, "telem%u", &dummy); 1581 + } 1582 + 1583 + int pmt_telemdir_sort(const struct dirent **a, const struct dirent **b) 1584 + { 1585 + unsigned int aidx = 0, bidx = 0; 1586 + 1587 + sscanf((*a)->d_name, "telem%u", &aidx); 1588 + sscanf((*b)->d_name, "telem%u", &bidx); 1589 + 1590 + return aidx >= bidx; 1591 + } 1592 + 1593 + const struct dirent *pmt_diriter_next(struct pmt_diriter_t *iter) 1594 + { 1595 + const struct dirent *ret = NULL; 1596 + 1597 + if (!iter->dir) 1598 + return NULL; 1599 + 1600 + if (iter->current_name_idx >= iter->num_names) 1601 + return NULL; 1602 + 1603 + ret = iter->namelist[iter->current_name_idx]; 1604 + ++iter->current_name_idx; 1605 + 1606 + return ret; 1607 + } 1608 + 1609 + const struct dirent *pmt_diriter_begin(struct pmt_diriter_t *iter, const char *pmt_root_path) 1610 + { 1611 + int num_names = iter->num_names; 1612 + 1613 + if (!iter->dir) { 1614 + iter->dir = opendir(pmt_root_path); 1615 + if (iter->dir == NULL) 1616 + return NULL; 1617 + 1618 + num_names = scandir(pmt_root_path, &iter->namelist, pmt_telemdir_filter, pmt_telemdir_sort); 1619 + if (num_names == -1) 1620 + return NULL; 1621 + } 1622 + 1623 + iter->current_name_idx = 0; 1624 + iter->num_names = num_names; 1625 + 1626 + return pmt_diriter_next(iter); 1627 + } 1628 + 1629 + void pmt_diriter_init(struct pmt_diriter_t *iter) 1630 + { 1631 + memset(iter, 0, sizeof(*iter)); 1632 + } 1633 + 1634 + void pmt_diriter_remove(struct pmt_diriter_t *iter) 1635 + { 1636 + if (iter->namelist) { 1637 + for (unsigned int i = 0; i < iter->num_names; i++) { 1638 + free(iter->namelist[i]); 1639 + iter->namelist[i] = NULL; 1640 + } 1641 + } 1642 + 1643 + free(iter->namelist); 1644 + iter->namelist = NULL; 1645 + iter->num_names = 0; 1646 + iter->current_name_idx = 0; 1647 + 1648 + closedir(iter->dir); 1649 + iter->dir = NULL; 1650 + } 1604 1651 1605 1652 unsigned int pmt_counter_get_width(const struct pmt_counter *p) 1606 1653 { ··· 1738 1611 unsigned long long c1; 1739 1612 unsigned long long instr_count; 1740 1613 unsigned long long irq_count; 1614 + unsigned long long nmi_count; 1741 1615 unsigned int smi_count; 1742 1616 unsigned int cpu_id; 1743 1617 unsigned int apic_id; ··· 2045 1917 2046 1918 int *irq_column_2_cpu; /* /proc/interrupts column numbers */ 2047 1919 int *irqs_per_cpu; /* indexed by cpu_num */ 1920 + int *nmi_per_cpu; /* indexed by cpu_num */ 2048 1921 2049 1922 void setup_all_buffers(bool startup); 2050 1923 ··· 2073 1944 { 2074 1945 int retval, pkg_no, core_no, thread_no, node_no; 2075 1946 1947 + retval = 0; 1948 + 2076 1949 for (pkg_no = 0; pkg_no < topo.num_packages; ++pkg_no) { 2077 1950 for (node_no = 0; node_no < topo.nodes_per_pkg; node_no++) { 2078 1951 for (core_no = 0; core_no < topo.cores_per_node; ++core_no) { ··· 2090 1959 c = GET_CORE(core_base, core_no, node_no, pkg_no); 2091 1960 p = GET_PKG(pkg_base, pkg_no); 2092 1961 2093 - retval = func(t, c, p); 2094 - if (retval) 2095 - return retval; 1962 + retval |= func(t, c, p); 2096 1963 } 2097 1964 } 2098 1965 } 2099 1966 } 2100 - return 0; 1967 + return retval; 2101 1968 } 2102 1969 2103 1970 int is_cpu_first_thread_in_core(struct thread_data *t, struct core_data *c, struct pkg_data *p) ··· 2214 2085 int probe_msr(int cpu, off_t offset) 2215 2086 { 2216 2087 ssize_t retval; 2217 - unsigned long long dummy; 2088 + unsigned long long value; 2218 2089 2219 2090 assert(!no_msr); 2220 2091 2221 - retval = pread(get_msr_fd(cpu), &dummy, sizeof(dummy), offset); 2092 + retval = pread(get_msr_fd(cpu), &value, sizeof(value), offset); 2222 2093 2223 - if (retval != sizeof(dummy)) 2094 + /* 2095 + * Expect MSRs to accumulate some non-zero value since the system was powered on. 2096 + * Treat zero as a read failure. 2097 + */ 2098 + if (retval != sizeof(value) || value == 0) 2224 2099 return 1; 2225 2100 2226 2101 return 0; ··· 2268 2135 "when COMMAND completes.\n" 2269 2136 "If no COMMAND is specified, turbostat wakes every 5-seconds\n" 2270 2137 "to print statistics, until interrupted.\n" 2271 - " -a, --add add a counter\n" 2138 + " -a, --add counter\n" 2139 + " add a counter\n" 2272 2140 " eg. --add msr0x10,u64,cpu,delta,MY_TSC\n" 2273 2141 " eg. --add perf/cstate_pkg/c2-residency,package,delta,percent,perfPC2\n" 2274 2142 " eg. --add pmt,name=XTAL,type=raw,domain=package0,offset=0,lsb=0,msb=63,guid=0x1a067102\n" 2275 - " -c, --cpu cpu-set limit output to summary plus cpu-set:\n" 2143 + " -c, --cpu cpu-set\n" 2144 + " limit output to summary plus cpu-set:\n" 2276 2145 " {core | package | j,k,l..m,n-p }\n" 2277 - " -d, --debug displays usec, Time_Of_Day_Seconds and more debugging\n" 2146 + " -d, --debug\n" 2147 + " displays usec, Time_Of_Day_Seconds and more debugging\n" 2278 2148 " debug messages are printed to stderr\n" 2279 - " -D, --Dump displays the raw counter values\n" 2280 - " -e, --enable [all | column]\n" 2149 + " -D, --Dump\n" 2150 + " displays the raw counter values\n" 2151 + " -e, --enable [all | column]\n" 2281 2152 " shows all or the specified disabled column\n" 2282 - " -H, --hide [column|column,column,...]\n" 2153 + " -f, --force\n" 2154 + " force load turbostat with minimum default features on unsupported platforms.\n" 2155 + " -H, --hide [column | column,column,...]\n" 2283 2156 " hide the specified column(s)\n" 2284 2157 " -i, --interval sec.subsec\n" 2285 - " Override default 5-second measurement interval\n" 2286 - " -J, --Joules displays energy in Joules instead of Watts\n" 2287 - " -l, --list list column headers only\n" 2288 - " -M, --no-msr Disable all uses of the MSR driver\n" 2289 - " -P, --no-perf Disable all uses of the perf API\n" 2158 + " override default 5-second measurement interval\n" 2159 + " -J, --Joules\n" 2160 + " displays energy in Joules instead of Watts\n" 2161 + " -l, --list\n" 2162 + " list column headers only\n" 2163 + " -M, --no-msr\n" 2164 + " disable all uses of the MSR driver\n" 2165 + " -P, --no-perf\n" 2166 + " disable all uses of the perf API\n" 2290 2167 " -n, --num_iterations num\n" 2291 2168 " number of the measurement iterations\n" 2292 2169 " -N, --header_iterations num\n" 2293 2170 " print header every num iterations\n" 2294 2171 " -o, --out file\n" 2295 2172 " create or truncate \"file\" for all output\n" 2296 - " -q, --quiet skip decoding system configuration header\n" 2297 - " -s, --show [column|column,column,...]\n" 2173 + " -q, --quiet\n" 2174 + " skip decoding system configuration header\n" 2175 + " -s, --show [column | column,column,...]\n" 2298 2176 " show only the specified column(s)\n" 2299 2177 " -S, --Summary\n" 2300 2178 " limits output to 1-line system summary per interval\n" 2301 2179 " -T, --TCC temperature\n" 2302 2180 " sets the Thermal Control Circuit temperature in\n" 2303 2181 " degrees Celsius\n" 2304 - " -h, --help print this help message\n" 2305 - " -v, --version print version information\n" "\n" "For more help, run \"man turbostat\"\n"); 2182 + " -h, --help\n" 2183 + " print this help message\n" 2184 + " -v, --version\n" 2185 + " print version information\n\nFor more help, run \"man turbostat\"\n"); 2306 2186 } 2307 2187 2308 2188 /* ··· 2435 2289 else 2436 2290 outp += sprintf(outp, "%sIRQ", (printed++ ? delim : "")); 2437 2291 } 2292 + if (DO_BIC(BIC_NMI)) { 2293 + if (sums_need_wide_columns) 2294 + outp += sprintf(outp, "%s NMI", (printed++ ? delim : "")); 2295 + else 2296 + outp += sprintf(outp, "%sNMI", (printed++ ? delim : "")); 2297 + } 2438 2298 2439 2299 if (DO_BIC(BIC_SMI)) 2440 2300 outp += sprintf(outp, "%sSMI", (printed++ ? delim : "")); ··· 2487 2335 break; 2488 2336 2489 2337 case PMT_TYPE_XTAL_TIME: 2338 + case PMT_TYPE_TCORE_CLOCK: 2490 2339 outp += sprintf(outp, "%s%s", (printed++ ? delim : ""), ppmt->name); 2491 2340 break; 2492 2341 } ··· 2562 2409 break; 2563 2410 2564 2411 case PMT_TYPE_XTAL_TIME: 2412 + case PMT_TYPE_TCORE_CLOCK: 2565 2413 outp += sprintf(outp, "%s%s", (printed++ ? delim : ""), ppmt->name); 2566 2414 break; 2567 2415 } ··· 2694 2540 break; 2695 2541 2696 2542 case PMT_TYPE_XTAL_TIME: 2543 + case PMT_TYPE_TCORE_CLOCK: 2697 2544 outp += sprintf(outp, "%s%s", (printed++ ? delim : ""), ppmt->name); 2698 2545 break; 2699 2546 } ··· 2730 2575 2731 2576 if (DO_BIC(BIC_IRQ)) 2732 2577 outp += sprintf(outp, "IRQ: %lld\n", t->irq_count); 2578 + if (DO_BIC(BIC_NMI)) 2579 + outp += sprintf(outp, "IRQ: %lld\n", t->nmi_count); 2733 2580 if (DO_BIC(BIC_SMI)) 2734 2581 outp += sprintf(outp, "SMI: %d\n", t->smi_count); 2735 2582 ··· 2951 2794 outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), t->irq_count); 2952 2795 } 2953 2796 2797 + /* NMI */ 2798 + if (DO_BIC(BIC_NMI)) { 2799 + if (sums_need_wide_columns) 2800 + outp += sprintf(outp, "%s%8lld", (printed++ ? delim : ""), t->nmi_count); 2801 + else 2802 + outp += sprintf(outp, "%s%lld", (printed++ ? delim : ""), t->nmi_count); 2803 + } 2804 + 2954 2805 /* SMI */ 2955 2806 if (DO_BIC(BIC_SMI)) 2956 2807 outp += sprintf(outp, "%s%d", (printed++ ? delim : ""), t->smi_count); ··· 3013 2848 3014 2849 for (i = 0, ppmt = sys.pmt_tp; ppmt; i++, ppmt = ppmt->next) { 3015 2850 const unsigned long value_raw = t->pmt_counter[i]; 3016 - const double value_converted = 100.0 * value_raw / crystal_hz / interval_float; 2851 + double value_converted; 3017 2852 switch (ppmt->type) { 3018 2853 case PMT_TYPE_RAW: 3019 2854 if (pmt_counter_get_width(ppmt) <= 32) ··· 3025 2860 break; 3026 2861 3027 2862 case PMT_TYPE_XTAL_TIME: 2863 + value_converted = 100.0 * value_raw / crystal_hz / interval_float; 3028 2864 outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 3029 2865 break; 2866 + 2867 + case PMT_TYPE_TCORE_CLOCK: 2868 + value_converted = 100.0 * value_raw / tcore_clock_freq_hz / interval_float; 2869 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 3030 2870 } 3031 2871 } 3032 2872 ··· 3098 2928 3099 2929 for (i = 0, ppmt = sys.pmt_cp; ppmt; i++, ppmt = ppmt->next) { 3100 2930 const unsigned long value_raw = c->pmt_counter[i]; 3101 - const double value_converted = 100.0 * value_raw / crystal_hz / interval_float; 2931 + double value_converted; 3102 2932 switch (ppmt->type) { 3103 2933 case PMT_TYPE_RAW: 3104 2934 if (pmt_counter_get_width(ppmt) <= 32) ··· 3110 2940 break; 3111 2941 3112 2942 case PMT_TYPE_XTAL_TIME: 2943 + value_converted = 100.0 * value_raw / crystal_hz / interval_float; 3113 2944 outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 3114 2945 break; 2946 + 2947 + case PMT_TYPE_TCORE_CLOCK: 2948 + value_converted = 100.0 * value_raw / tcore_clock_freq_hz / interval_float; 2949 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 3115 2950 } 3116 2951 } 3117 2952 ··· 3301 3126 3302 3127 for (i = 0, ppmt = sys.pmt_pp; ppmt; i++, ppmt = ppmt->next) { 3303 3128 const unsigned long value_raw = p->pmt_counter[i]; 3304 - const double value_converted = 100.0 * value_raw / crystal_hz / interval_float; 3129 + double value_converted; 3305 3130 switch (ppmt->type) { 3306 3131 case PMT_TYPE_RAW: 3307 3132 if (pmt_counter_get_width(ppmt) <= 32) ··· 3313 3138 break; 3314 3139 3315 3140 case PMT_TYPE_XTAL_TIME: 3141 + value_converted = 100.0 * value_raw / crystal_hz / interval_float; 3316 3142 outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 3317 3143 break; 3144 + 3145 + case PMT_TYPE_TCORE_CLOCK: 3146 + value_converted = 100.0 * value_raw / tcore_clock_freq_hz / interval_float; 3147 + outp += sprintf(outp, "%s%.2f", (printed++ ? delim : ""), value_converted); 3318 3148 } 3319 3149 } 3320 3150 ··· 3589 3409 if (DO_BIC(BIC_IRQ)) 3590 3410 old->irq_count = new->irq_count - old->irq_count; 3591 3411 3412 + if (DO_BIC(BIC_NMI)) 3413 + old->nmi_count = new->nmi_count - old->nmi_count; 3414 + 3592 3415 if (DO_BIC(BIC_SMI)) 3593 3416 old->smi_count = new->smi_count - old->smi_count; 3594 3417 ··· 3630 3447 3631 3448 /* always calculate thread delta */ 3632 3449 retval = delta_thread(t, t2, c2); /* c2 is core delta */ 3633 - if (retval) 3634 - return retval; 3635 3450 3636 3451 /* calculate package delta only for 1st core in package */ 3637 3452 if (is_cpu_first_core_in_package(t, c, p)) 3638 - retval = delta_package(p, p2); 3453 + retval |= delta_package(p, p2); 3639 3454 3640 3455 return retval; 3641 3456 } ··· 3670 3489 t->instr_count = 0; 3671 3490 3672 3491 t->irq_count = 0; 3492 + t->nmi_count = 0; 3673 3493 t->smi_count = 0; 3674 3494 3675 3495 c->c3 = 0; ··· 3762 3580 3763 3581 /* remember first tv_begin */ 3764 3582 if (average.threads.tv_begin.tv_sec == 0) 3765 - average.threads.tv_begin = t->tv_begin; 3583 + average.threads.tv_begin = procsysfs_tv_begin; 3766 3584 3767 3585 /* remember last tv_end */ 3768 3586 average.threads.tv_end = t->tv_end; ··· 3775 3593 average.threads.instr_count += t->instr_count; 3776 3594 3777 3595 average.threads.irq_count += t->irq_count; 3596 + average.threads.nmi_count += t->nmi_count; 3778 3597 average.threads.smi_count += t->smi_count; 3779 3598 3780 3599 for (i = 0, mp = sys.tp; mp; i++, mp = mp->next) { ··· 3917 3734 3918 3735 if (average.threads.irq_count > 9999999) 3919 3736 sums_need_wide_columns = 1; 3737 + if (average.threads.nmi_count > 9999999) 3738 + sums_need_wide_columns = 1; 3739 + 3920 3740 3921 3741 average.cores.c3 /= topo.allowed_cores; 3922 3742 average.cores.c6 /= topo.allowed_cores; ··· 4732 4546 4733 4547 unsigned long pmt_read_counter(struct pmt_counter *ppmt, unsigned int domain_id) 4734 4548 { 4735 - assert(domain_id < ppmt->num_domains); 4549 + if (domain_id >= ppmt->num_domains) 4550 + return 0; 4736 4551 4737 4552 const unsigned long *pmmio = ppmt->domains[domain_id].pcounter; 4738 4553 const unsigned long value = pmmio ? *pmmio : 0; ··· 4777 4590 4778 4591 if (DO_BIC(BIC_IRQ)) 4779 4592 t->irq_count = irqs_per_cpu[cpu]; 4593 + if (DO_BIC(BIC_NMI)) 4594 + t->nmi_count = nmi_per_cpu[cpu]; 4780 4595 4781 4596 get_cstate_counters(cpu, t, c, p); 4782 4597 ··· 5524 5335 5525 5336 free(irq_column_2_cpu); 5526 5337 free(irqs_per_cpu); 5338 + free(nmi_per_cpu); 5527 5339 5528 5340 for (i = 0; i <= topo.max_cpu_num; ++i) { 5529 5341 if (cpus[i].put_ids) ··· 5756 5566 { 5757 5567 int retval, pkg_no, node_no, core_no, thread_no; 5758 5568 5569 + retval = 0; 5570 + 5759 5571 for (pkg_no = 0; pkg_no < topo.num_packages; ++pkg_no) { 5760 5572 for (node_no = 0; node_no < topo.nodes_per_pkg; ++node_no) { 5761 5573 for (core_no = 0; core_no < topo.cores_per_node; ++core_no) { ··· 5779 5587 p = GET_PKG(pkg_base, pkg_no); 5780 5588 p2 = GET_PKG(pkg_base2, pkg_no); 5781 5589 5782 - retval = func(t, c, p, t2, c2, p2); 5783 - if (retval) 5784 - return retval; 5590 + retval |= func(t, c, p, t2, c2, p2); 5785 5591 } 5786 5592 } 5787 5593 } 5788 5594 } 5789 - return 0; 5595 + return retval; 5790 5596 } 5791 5597 5792 5598 /* ··· 5981 5791 5982 5792 irq_column_2_cpu[column] = cpu_number; 5983 5793 irqs_per_cpu[cpu_number] = 0; 5794 + nmi_per_cpu[cpu_number] = 0; 5984 5795 } 5985 5796 5986 5797 /* read /proc/interrupt count lines and sum up irqs per cpu */ 5987 5798 while (1) { 5988 5799 int column; 5989 5800 char buf[64]; 5801 + int this_row_is_nmi = 0; 5990 5802 5991 - retval = fscanf(fp, " %s:", buf); /* flush irq# "N:" */ 5803 + retval = fscanf(fp, " %s:", buf); /* irq# "N:" */ 5992 5804 if (retval != 1) 5993 5805 break; 5806 + 5807 + if (strncmp(buf, "NMI", strlen("NMI")) == 0) 5808 + this_row_is_nmi = 1; 5994 5809 5995 5810 /* read the count per cpu */ 5996 5811 for (column = 0; column < topo.num_cpus; ++column) { ··· 6003 5808 int cpu_number, irq_count; 6004 5809 6005 5810 retval = fscanf(fp, " %d", &irq_count); 5811 + 6006 5812 if (retval != 1) 6007 5813 break; 6008 5814 6009 5815 cpu_number = irq_column_2_cpu[column]; 6010 5816 irqs_per_cpu[cpu_number] += irq_count; 6011 - 5817 + if (this_row_is_nmi) 5818 + nmi_per_cpu[cpu_number] += irq_count; 6012 5819 } 6013 - 6014 5820 while (getc(fp) != '\n') ; /* flush interrupt description */ 6015 5821 6016 5822 } ··· 6108 5912 */ 6109 5913 int snapshot_proc_sysfs_files(void) 6110 5914 { 6111 - if (DO_BIC(BIC_IRQ)) 5915 + gettimeofday(&procsysfs_tv_begin, (struct timezone *)NULL); 5916 + 5917 + if (DO_BIC(BIC_IRQ) || DO_BIC(BIC_NMI)) 6112 5918 if (snapshot_proc_interrupts()) 6113 5919 return 1; 6114 5920 ··· 7240 7042 rapl_dram_energy_units = (15.3 / 1000000); 7241 7043 else 7242 7044 rapl_dram_energy_units = rapl_energy_units; 7045 + 7046 + if (platform->has_fixed_rapl_psys_unit) 7047 + rapl_psys_energy_units = 1.0; 7048 + else 7049 + rapl_psys_energy_units = rapl_energy_units; 7243 7050 7244 7051 time_unit = msr >> 16 & 0xF; 7245 7052 if (time_unit == 0) ··· 8436 8233 aperf_mperf_multiplier = platform->need_perf_multiplier ? 1024 : 1; 8437 8234 8438 8235 BIC_PRESENT(BIC_IRQ); 8236 + BIC_PRESENT(BIC_NMI); 8439 8237 BIC_PRESENT(BIC_TSC_MHz); 8440 8238 } 8441 8239 ··· 8496 8292 return 0; 8497 8293 } 8498 8294 8295 + char *possible_file = "/sys/devices/system/cpu/possible"; 8296 + char possible_buf[1024]; 8297 + 8298 + int initialize_cpu_possible_set(void) 8299 + { 8300 + FILE *fp; 8301 + 8302 + fp = fopen(possible_file, "r"); 8303 + if (!fp) { 8304 + warn("open %s", possible_file); 8305 + return -1; 8306 + } 8307 + if (fread(possible_buf, sizeof(char), 1024, fp) == 0) { 8308 + warn("read %s", possible_file); 8309 + goto err; 8310 + } 8311 + if (parse_cpu_str(possible_buf, cpu_possible_set, cpu_possible_setsize)) { 8312 + warnx("%s: cpu str malformat %s\n", possible_file, cpu_effective_str); 8313 + goto err; 8314 + } 8315 + return 0; 8316 + 8317 + err: 8318 + fclose(fp); 8319 + return -1; 8320 + } 8321 + 8499 8322 void topology_probe(bool startup) 8500 8323 { 8501 8324 int i; ··· 8553 8322 cpu_present_setsize = CPU_ALLOC_SIZE((topo.max_cpu_num + 1)); 8554 8323 CPU_ZERO_S(cpu_present_setsize, cpu_present_set); 8555 8324 for_all_proc_cpus(mark_cpu_present); 8325 + 8326 + /* 8327 + * Allocate and initialize cpu_possible_set 8328 + */ 8329 + cpu_possible_set = CPU_ALLOC((topo.max_cpu_num + 1)); 8330 + if (cpu_possible_set == NULL) 8331 + err(3, "CPU_ALLOC"); 8332 + cpu_possible_setsize = CPU_ALLOC_SIZE((topo.max_cpu_num + 1)); 8333 + CPU_ZERO_S(cpu_possible_setsize, cpu_possible_set); 8334 + initialize_cpu_possible_set(); 8556 8335 8557 8336 /* 8558 8337 * Allocate and initialize cpu_effective_set ··· 8824 8583 8825 8584 irqs_per_cpu = calloc(topo.max_cpu_num + 1, sizeof(int)); 8826 8585 if (irqs_per_cpu == NULL) 8827 - err(-1, "calloc %d", topo.max_cpu_num + 1); 8586 + err(-1, "calloc %d IRQ", topo.max_cpu_num + 1); 8587 + 8588 + nmi_per_cpu = calloc(topo.max_cpu_num + 1, sizeof(int)); 8589 + if (nmi_per_cpu == NULL) 8590 + err(-1, "calloc %d NMI", topo.max_cpu_num + 1); 8828 8591 } 8829 8592 8830 8593 int update_topo(struct thread_data *t, struct core_data *c, struct pkg_data *p) ··· 9099 8854 9100 8855 struct pmt_mmio *pmt_mmio_open(unsigned int target_guid) 9101 8856 { 9102 - DIR *dirp; 9103 - struct dirent *entry; 8857 + struct pmt_diriter_t pmt_iter; 8858 + const struct dirent *entry; 9104 8859 struct stat st; 9105 - unsigned int telem_idx; 9106 8860 int fd_telem_dir, fd_pmt; 9107 8861 unsigned long guid, size, offset; 9108 8862 size_t mmap_size; 9109 8863 void *mmio; 9110 - struct pmt_mmio *ret = NULL; 8864 + struct pmt_mmio *head = NULL, *last = NULL; 8865 + struct pmt_mmio *new_pmt = NULL; 9111 8866 9112 8867 if (stat(SYSFS_TELEM_PATH, &st) == -1) 9113 8868 return NULL; 9114 8869 9115 - dirp = opendir(SYSFS_TELEM_PATH); 9116 - if (dirp == NULL) 8870 + pmt_diriter_init(&pmt_iter); 8871 + entry = pmt_diriter_begin(&pmt_iter, SYSFS_TELEM_PATH); 8872 + if (!entry) { 8873 + pmt_diriter_remove(&pmt_iter); 9117 8874 return NULL; 8875 + } 9118 8876 9119 - for (;;) { 9120 - entry = readdir(dirp); 9121 - 9122 - if (entry == NULL) 8877 + for ( ; entry != NULL; entry = pmt_diriter_next(&pmt_iter)) { 8878 + if (fstatat(dirfd(pmt_iter.dir), entry->d_name, &st, 0) == -1) 9123 8879 break; 9124 - 9125 - if (strcmp(entry->d_name, ".") == 0) 9126 - continue; 9127 - 9128 - if (strcmp(entry->d_name, "..") == 0) 9129 - continue; 9130 - 9131 - if (sscanf(entry->d_name, "telem%u", &telem_idx) != 1) 9132 - continue; 9133 - 9134 - if (fstatat(dirfd(dirp), entry->d_name, &st, 0) == -1) { 9135 - break; 9136 - } 9137 8880 9138 8881 if (!S_ISDIR(st.st_mode)) 9139 8882 continue; 9140 8883 9141 - fd_telem_dir = openat(dirfd(dirp), entry->d_name, O_RDONLY); 9142 - if (fd_telem_dir == -1) { 8884 + fd_telem_dir = openat(dirfd(pmt_iter.dir), entry->d_name, O_RDONLY); 8885 + if (fd_telem_dir == -1) 9143 8886 break; 9144 - } 9145 8887 9146 8888 if (parse_telem_info_file(fd_telem_dir, "guid", "%lx", &guid)) { 9147 8889 close(fd_telem_dir); ··· 9156 8924 if (fd_pmt == -1) 9157 8925 goto loop_cleanup_and_break; 9158 8926 9159 - mmap_size = (size + 0x1000UL) & (~0x1000UL); 8927 + mmap_size = ROUND_UP_TO_PAGE_SIZE(size); 9160 8928 mmio = mmap(0, mmap_size, PROT_READ, MAP_SHARED, fd_pmt, 0); 9161 8929 if (mmio != MAP_FAILED) { 9162 - 9163 8930 if (debug) 9164 8931 fprintf(stderr, "%s: 0x%lx mmaped at: %p\n", __func__, guid, mmio); 9165 8932 9166 - ret = calloc(1, sizeof(*ret)); 8933 + new_pmt = calloc(1, sizeof(*new_pmt)); 9167 8934 9168 - if (!ret) { 8935 + if (!new_pmt) { 9169 8936 fprintf(stderr, "%s: Failed to allocate pmt_mmio\n", __func__); 9170 8937 exit(1); 9171 8938 } 9172 8939 9173 - ret->guid = guid; 9174 - ret->mmio_base = mmio; 9175 - ret->pmt_offset = offset; 9176 - ret->size = size; 8940 + /* 8941 + * Create linked list of mmaped regions, 8942 + * but preserve the ordering from sysfs. 8943 + * Ordering is important for the user to 8944 + * use the seq=%u parameter when adding a counter. 8945 + */ 8946 + new_pmt->guid = guid; 8947 + new_pmt->mmio_base = mmio; 8948 + new_pmt->pmt_offset = offset; 8949 + new_pmt->size = size; 8950 + new_pmt->next = pmt_mmios; 9177 8951 9178 - ret->next = pmt_mmios; 9179 - pmt_mmios = ret; 8952 + if (last) 8953 + last->next = new_pmt; 8954 + else 8955 + head = new_pmt; 8956 + 8957 + last = new_pmt; 9180 8958 } 9181 8959 9182 8960 loop_cleanup_and_break: 9183 8961 close(fd_pmt); 9184 8962 close(fd_telem_dir); 9185 - break; 9186 8963 } 9187 8964 9188 - closedir(dirp); 8965 + pmt_diriter_remove(&pmt_iter); 9189 8966 9190 - return ret; 8967 + /* 8968 + * If we found something, stick just 8969 + * created linked list to the front. 8970 + */ 8971 + if (head) 8972 + pmt_mmios = head; 8973 + 8974 + return head; 9191 8975 } 9192 8976 9193 8977 struct pmt_mmio *pmt_mmio_find(unsigned int guid) ··· 9240 8992 return ret; 9241 8993 } 9242 8994 9243 - struct pmt_mmio *pmt_add_guid(unsigned int guid) 8995 + struct pmt_mmio *pmt_add_guid(unsigned int guid, unsigned int seq) 9244 8996 { 9245 8997 struct pmt_mmio *ret; 9246 8998 9247 8999 ret = pmt_mmio_find(guid); 9248 9000 if (!ret) 9249 9001 ret = pmt_mmio_open(guid); 9002 + 9003 + while (ret && seq) { 9004 + ret = ret->next; 9005 + --seq; 9006 + } 9250 9007 9251 9008 return ret; 9252 9009 } ··· 9299 9046 pcounter->domains[domain_id].pcounter = pmmio; 9300 9047 } 9301 9048 9302 - int pmt_add_counter(unsigned int guid, const char *name, enum pmt_datatype type, 9049 + int pmt_add_counter(unsigned int guid, unsigned int seq, const char *name, enum pmt_datatype type, 9303 9050 unsigned int lsb, unsigned int msb, unsigned int offset, enum counter_scope scope, 9304 9051 enum counter_format format, unsigned int domain_id, enum pmt_open_mode mode) 9305 9052 { ··· 9319 9066 exit(1); 9320 9067 } 9321 9068 9322 - mmio = pmt_add_guid(guid); 9069 + mmio = pmt_add_guid(guid, seq); 9323 9070 if (!mmio) { 9324 9071 if (mode != PMT_OPEN_TRY) { 9325 - fprintf(stderr, "%s: failed to map PMT MMIO for guid %x\n", __func__, guid); 9072 + fprintf(stderr, "%s: failed to map PMT MMIO for guid %x, seq %u\n", __func__, guid, seq); 9326 9073 exit(1); 9327 9074 } 9328 9075 ··· 9377 9124 9378 9125 void pmt_init(void) 9379 9126 { 9127 + int cpu_num; 9128 + unsigned long seq, offset, mod_num; 9129 + 9380 9130 if (BIC_IS_ENABLED(BIC_Diec6)) { 9381 - pmt_add_counter(PMT_MTL_DC6_GUID, "Die%c6", PMT_TYPE_XTAL_TIME, PMT_COUNTER_MTL_DC6_LSB, 9382 - PMT_COUNTER_MTL_DC6_MSB, PMT_COUNTER_MTL_DC6_OFFSET, SCOPE_PACKAGE, FORMAT_DELTA, 9383 - 0, PMT_OPEN_TRY); 9131 + pmt_add_counter(PMT_MTL_DC6_GUID, PMT_MTL_DC6_SEQ, "Die%c6", PMT_TYPE_XTAL_TIME, 9132 + PMT_COUNTER_MTL_DC6_LSB, PMT_COUNTER_MTL_DC6_MSB, PMT_COUNTER_MTL_DC6_OFFSET, 9133 + SCOPE_PACKAGE, FORMAT_DELTA, 0, PMT_OPEN_TRY); 9134 + } 9135 + 9136 + if (BIC_IS_ENABLED(BIC_CPU_c1e)) { 9137 + seq = 0; 9138 + offset = PMT_COUNTER_CWF_MC1E_OFFSET_BASE; 9139 + mod_num = 0; /* Relative module number for current PMT file. */ 9140 + 9141 + /* Open the counter for each CPU. */ 9142 + for (cpu_num = 0; cpu_num < topo.max_cpu_num;) { 9143 + 9144 + if (cpu_is_not_allowed(cpu_num)) 9145 + goto next_loop_iter; 9146 + 9147 + /* 9148 + * Set the scope to CPU, even though CWF report the counter per module. 9149 + * CPUs inside the same module will read from the same location, instead of reporting zeros. 9150 + * 9151 + * CWF with newer firmware might require a PMT_TYPE_XTAL_TIME intead of PMT_TYPE_TCORE_CLOCK. 9152 + */ 9153 + pmt_add_counter(PMT_CWF_MC1E_GUID, seq, "CPU%c1e", PMT_TYPE_TCORE_CLOCK, 9154 + PMT_COUNTER_CWF_MC1E_LSB, PMT_COUNTER_CWF_MC1E_MSB, offset, SCOPE_CPU, 9155 + FORMAT_DELTA, cpu_num, PMT_OPEN_TRY); 9156 + 9157 + /* 9158 + * Rather complex logic for each time we go to the next loop iteration, 9159 + * so keep it as a label. 9160 + */ 9161 + next_loop_iter: 9162 + /* 9163 + * Advance the cpu number and check if we should also advance offset to 9164 + * the next counter inside the PMT file. 9165 + * 9166 + * On Clearwater Forest platform, the counter is reported per module, 9167 + * so open the same counter for all of the CPUs inside the module. 9168 + * That way, reported table show the correct value for all of the CPUs inside the module, 9169 + * instead of zeros. 9170 + */ 9171 + ++cpu_num; 9172 + if (cpu_num % PMT_COUNTER_CWF_CPUS_PER_MODULE == 0) { 9173 + offset += PMT_COUNTER_CWF_MC1E_OFFSET_INCREMENT; 9174 + ++mod_num; 9175 + } 9176 + 9177 + /* 9178 + * There are PMT_COUNTER_CWF_MC1E_NUM_MODULES_PER_FILE in each PMT file. 9179 + * 9180 + * If that number is reached, seq must be incremented to advance to the next file in a sequence. 9181 + * Offset inside that file and a module counter has to be reset. 9182 + */ 9183 + if (mod_num == PMT_COUNTER_CWF_MC1E_NUM_MODULES_PER_FILE) { 9184 + ++seq; 9185 + offset = PMT_COUNTER_CWF_MC1E_OFFSET_BASE; 9186 + mod_num = 0; 9187 + } 9188 + } 9384 9189 } 9385 9190 } 9386 9191 ··· 9474 9163 } 9475 9164 } 9476 9165 9166 + void affinitize_child(void) 9167 + { 9168 + /* Prefer cpu_possible_set, if available */ 9169 + if (sched_setaffinity(0, cpu_possible_setsize, cpu_possible_set)) { 9170 + warn("sched_setaffinity cpu_possible_set"); 9171 + 9172 + /* Otherwise, allow child to run on same cpu set as turbostat */ 9173 + if (sched_setaffinity(0, cpu_allowed_setsize, cpu_allowed_set)) 9174 + warn("sched_setaffinity cpu_allowed_set"); 9175 + } 9176 + } 9177 + 9477 9178 int fork_it(char **argv) 9478 9179 { 9479 9180 pid_t child_pid; ··· 9501 9178 child_pid = fork(); 9502 9179 if (!child_pid) { 9503 9180 /* child */ 9181 + affinitize_child(); 9504 9182 execvp(argv[0], argv); 9505 9183 err(errno, "exec %s", argv[0]); 9506 9184 } else { ··· 9528 9204 timersub(&tv_odd, &tv_even, &tv_delta); 9529 9205 if (for_all_cpus_2(delta_cpu, ODD_COUNTERS, EVEN_COUNTERS)) 9530 9206 fprintf(outf, "%s: Counter reset detected\n", progname); 9531 - else { 9532 - compute_average(EVEN_COUNTERS); 9533 - format_all_counters(EVEN_COUNTERS); 9534 - } 9207 + 9208 + compute_average(EVEN_COUNTERS); 9209 + format_all_counters(EVEN_COUNTERS); 9535 9210 9536 9211 fprintf(outf, "%.6f sec\n", tv_delta.tv_sec + tv_delta.tv_usec / 1000000.0); 9537 9212 ··· 9559 9236 9560 9237 void print_version() 9561 9238 { 9562 - fprintf(outf, "turbostat version 2024.11.30 - Len Brown <lenb@kernel.org>\n"); 9239 + fprintf(outf, "turbostat version 2025.02.02 - Len Brown <lenb@kernel.org>\n"); 9563 9240 } 9564 9241 9565 9242 #define COMMAND_LINE_SIZE 2048 ··· 9884 9561 9885 9562 } 9886 9563 if ((msr_num == 0) && (path == NULL) && (perf_device[0] == '\0' || perf_event[0] == '\0')) { 9887 - fprintf(stderr, "--add: (msrDDD | msr0xXXX | /path_to_counter | perf/device/event ) required\n"); 9564 + fprintf(stderr, "--add: (msrDDD | msr0xXXX | /path_to_counter | perf/device/event) required\n"); 9888 9565 fail++; 9889 9566 } 9890 9567 ··· 9922 9599 return strncmp(prefix, str, strlen(prefix)) == 0; 9923 9600 } 9924 9601 9602 + int pmt_parse_from_path(const char *target_path, unsigned int *out_guid, unsigned int *out_seq) 9603 + { 9604 + struct pmt_diriter_t pmt_iter; 9605 + const struct dirent *dirname; 9606 + struct stat stat, target_stat; 9607 + int fd_telem_dir = -1; 9608 + int fd_target_dir; 9609 + unsigned int seq = 0; 9610 + unsigned long guid, target_guid; 9611 + int ret = -1; 9612 + 9613 + fd_target_dir = open(target_path, O_RDONLY | O_DIRECTORY); 9614 + if (fd_target_dir == -1) { 9615 + return -1; 9616 + } 9617 + 9618 + if (fstat(fd_target_dir, &target_stat) == -1) { 9619 + fprintf(stderr, "%s: Failed to stat the target: %s", __func__, strerror(errno)); 9620 + exit(1); 9621 + } 9622 + 9623 + if (parse_telem_info_file(fd_target_dir, "guid", "%lx", &target_guid)) { 9624 + fprintf(stderr, "%s: Failed to parse the target guid file: %s", __func__, strerror(errno)); 9625 + exit(1); 9626 + } 9627 + 9628 + close(fd_target_dir); 9629 + 9630 + pmt_diriter_init(&pmt_iter); 9631 + 9632 + for (dirname = pmt_diriter_begin(&pmt_iter, SYSFS_TELEM_PATH); dirname != NULL; 9633 + dirname = pmt_diriter_next(&pmt_iter)) { 9634 + 9635 + fd_telem_dir = openat(dirfd(pmt_iter.dir), dirname->d_name, O_RDONLY | O_DIRECTORY); 9636 + if (fd_telem_dir == -1) 9637 + continue; 9638 + 9639 + if (parse_telem_info_file(fd_telem_dir, "guid", "%lx", &guid)) { 9640 + fprintf(stderr, "%s: Failed to parse the guid file: %s", __func__, strerror(errno)); 9641 + continue; 9642 + } 9643 + 9644 + if (fstat(fd_telem_dir, &stat) == -1) { 9645 + fprintf(stderr, "%s: Failed to stat %s directory: %s", __func__, 9646 + dirname->d_name, strerror(errno)); 9647 + continue; 9648 + } 9649 + 9650 + /* 9651 + * If reached the same directory as target, exit the loop. 9652 + * Seq has the correct value now. 9653 + */ 9654 + if (stat.st_dev == target_stat.st_dev && stat.st_ino == target_stat.st_ino) { 9655 + ret = 0; 9656 + break; 9657 + } 9658 + 9659 + /* 9660 + * If reached directory with the same guid, 9661 + * but it's not the target directory yet, 9662 + * increment seq and continue the search. 9663 + */ 9664 + if (guid == target_guid) 9665 + ++seq; 9666 + 9667 + close(fd_telem_dir); 9668 + fd_telem_dir = -1; 9669 + } 9670 + 9671 + pmt_diriter_remove(&pmt_iter); 9672 + 9673 + if (fd_telem_dir != -1) 9674 + close(fd_telem_dir); 9675 + 9676 + if (!ret) { 9677 + *out_guid = target_guid; 9678 + *out_seq = seq; 9679 + } 9680 + 9681 + return ret; 9682 + } 9683 + 9925 9684 void parse_add_command_pmt(char *add_command) 9926 9685 { 9927 9686 char *name = NULL; 9928 9687 char *type_name = NULL; 9929 9688 char *format_name = NULL; 9689 + char *direct_path = NULL; 9690 + static const char direct_path_prefix[] = "path="; 9930 9691 unsigned int offset; 9931 9692 unsigned int lsb; 9932 9693 unsigned int msb; 9933 9694 unsigned int guid; 9695 + unsigned int seq = 0; /* By default, pick first file in a sequence with a given GUID. */ 9934 9696 unsigned int domain_id; 9935 9697 enum counter_scope scope = 0; 9936 9698 enum pmt_datatype type = PMT_TYPE_RAW; ··· 10095 9687 goto next; 10096 9688 } 10097 9689 9690 + if (sscanf(add_command, "seq=%x", &seq) == 1) 9691 + goto next; 9692 + 9693 + if (strncmp(add_command, direct_path_prefix, strlen(direct_path_prefix)) == 0) { 9694 + direct_path = add_command + strlen(direct_path_prefix); 9695 + goto next; 9696 + } 10098 9697 next: 10099 9698 add_command = strchr(add_command, ','); 10100 9699 if (add_command) { ··· 10152 9737 has_type = true; 10153 9738 } 10154 9739 9740 + if (strcmp("tcore_clock", type_name) == 0) { 9741 + type = PMT_TYPE_TCORE_CLOCK; 9742 + has_type = true; 9743 + } 9744 + 10155 9745 if (!has_type) { 10156 9746 printf("%s: invalid %s: %s\n", __func__, "type", type_name); 10157 9747 exit(1); ··· 10178 9758 exit(1); 10179 9759 } 10180 9760 9761 + if (direct_path && has_guid) { 9762 + printf("%s: path and guid+seq parameters are mutually exclusive\n" 9763 + "notice: passed guid=0x%x and path=%s\n", __func__, guid, direct_path); 9764 + exit(1); 9765 + } 9766 + 9767 + if (direct_path) { 9768 + if (pmt_parse_from_path(direct_path, &guid, &seq)) { 9769 + printf("%s: failed to parse PMT file from %s\n", __func__, direct_path); 9770 + exit(1); 9771 + } 9772 + 9773 + /* GUID was just infered from the direct path. */ 9774 + has_guid = true; 9775 + } 9776 + 10181 9777 if (!has_guid) { 10182 - printf("%s: missing %s\n", __func__, "guid"); 9778 + printf("%s: missing %s\n", __func__, "guid or path"); 10183 9779 exit(1); 10184 9780 } 10185 9781 ··· 10209 9773 exit(1); 10210 9774 } 10211 9775 10212 - pmt_add_counter(guid, name, type, lsb, msb, offset, scope, format, domain_id, PMT_OPEN_REQUIRED); 9776 + pmt_add_counter(guid, seq, name, type, lsb, msb, offset, scope, format, domain_id, PMT_OPEN_REQUIRED); 10213 9777 } 10214 9778 10215 9779 void parse_add_command(char *add_command) ··· 10357 9921 { "Dump", no_argument, 0, 'D' }, 10358 9922 { "debug", no_argument, 0, 'd' }, /* internal, not documented */ 10359 9923 { "enable", required_argument, 0, 'e' }, 9924 + { "force", no_argument, 0, 'f' }, 10360 9925 { "interval", required_argument, 0, 'i' }, 10361 9926 { "IPC", no_argument, 0, 'I' }, 10362 9927 { "num_iterations", required_argument, 0, 'n' }, ··· 10417 9980 case 'e': 10418 9981 /* --enable specified counter */ 10419 9982 bic_enabled = bic_enabled | bic_lookup(optarg, SHOW_LIST); 9983 + break; 9984 + case 'f': 9985 + force_load++; 10420 9986 break; 10421 9987 case 'd': 10422 9988 debug++;