Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

perf vendor events: Add/update tigerlake events/metrics

Update events from v1.15 to v1.16.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v1.16:
https://github.com/intel/perfmon/commit/43f3b8d6f82f3174bd3bffe8587e2179f086d2ce

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/

Co-authored-by: Weilin Wang <weilin.wang@intel.com>
Co-authored-by: Caleb Biggers <caleb.biggers@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240620181752.3945845-35-irogers@google.com

authored by

Ian Rogers
Weilin Wang
Caleb Biggers
and committed by
Namhyung Kim
321e0ffa 7c79eb5c

+446 -80
+1 -1
tools/perf/pmu-events/arch/x86/mapfile.csv
··· 32 32 GenuineIntel-6-(4E|5E|8E|9E|A5|A6),v59,skylake,core 33 33 GenuineIntel-6-55-[01234],v1.35,skylakex,core 34 34 GenuineIntel-6-86,v1.23,snowridgex,core 35 - GenuineIntel-6-8[CD],v1.15,tigerlake,core 35 + GenuineIntel-6-8[CD],v1.16,tigerlake,core 36 36 GenuineIntel-6-2C,v5,westmereep-dp,core 37 37 GenuineIntel-6-25,v4,westmereep-sp,core 38 38 GenuineIntel-6-2F,v4,westmereex,core
+73
tools/perf/pmu-events/arch/x86/tigerlake/cache.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0x51", 5 6 "EventName": "L1D.REPLACEMENT", 6 7 "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability.", 12 + "Counter": "0,1,2,3", 13 13 "EventCode": "0x48", 14 14 "EventName": "L1D_PEND_MISS.FB_FULL", 15 15 "PublicDescription": "Counts number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Number of phases a demand request has waited due to L1D Fill Buffer (FB) unavailability.", 20 + "Counter": "0,1,2,3", 22 21 "CounterMask": "1", 23 22 "EdgeDetect": "1", 24 23 "EventCode": "0x48", ··· 30 27 }, 31 28 { 32 29 "BriefDescription": "Number of cycles a demand request has waited due to L1D due to lack of L2 resources.", 30 + "Counter": "0,1,2,3", 33 31 "EventCode": "0x48", 34 32 "EventName": "L1D_PEND_MISS.L2_STALL", 35 33 "PublicDescription": "Counts number of cycles a demand request has waited due to L1D due to lack of L2 resources. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", ··· 39 35 }, 40 36 { 41 37 "BriefDescription": "Number of L1D misses that are outstanding", 38 + "Counter": "0,1,2,3", 42 39 "EventCode": "0x48", 43 40 "EventName": "L1D_PEND_MISS.PENDING", 44 41 "PublicDescription": "Counts number of L1D misses that are outstanding in each cycle, that is each cycle the number of Fill Buffers (FB) outstanding required by Demand Reads. FB either is held by demand loads, or it is held by non-demand loads and gets hit at least once by demand. The valid outstanding interval is defined until the FB deallocation by one of the following ways: from FB allocation, if FB is allocated by demand from the demand Hit FB, if it is allocated by hardware or software prefetch. Note: In the L1D, a Demand Read contains cacheable or noncacheable demand loads, including ones causing cache-line splits and reads due to page walks resulted from any request type.", ··· 48 43 }, 49 44 { 50 45 "BriefDescription": "Cycles with L1D load Misses outstanding.", 46 + "Counter": "0,1,2,3", 51 47 "CounterMask": "1", 52 48 "EventCode": "0x48", 53 49 "EventName": "L1D_PEND_MISS.PENDING_CYCLES", ··· 58 52 }, 59 53 { 60 54 "BriefDescription": "L2 cache lines filling L2", 55 + "Counter": "0,1,2,3", 61 56 "EventCode": "0xf1", 62 57 "EventName": "L2_LINES_IN.ALL", 63 58 "PublicDescription": "Counts the number of L2 cache lines filling the L2. Counting does not cover rejects.", ··· 67 60 }, 68 61 { 69 62 "BriefDescription": "Modified cache lines that are evicted by L2 cache when triggered by an L2 cache fill.", 63 + "Counter": "0,1,2,3", 70 64 "EventCode": "0xf2", 71 65 "EventName": "L2_LINES_OUT.NON_SILENT", 72 66 "PublicDescription": "Counts the number of lines that are evicted by L2 cache when triggered by an L2 cache fill. Those lines are in Modified state. Modified lines are written back to L3", ··· 76 68 }, 77 69 { 78 70 "BriefDescription": "Non-modified cache lines that are silently dropped by L2 cache when triggered by an L2 cache fill.", 71 + "Counter": "0,1,2,3", 79 72 "EventCode": "0xf2", 80 73 "EventName": "L2_LINES_OUT.SILENT", 81 74 "PublicDescription": "Counts the number of lines that are silently dropped by L2 cache when triggered by an L2 cache fill. These lines are typically in Shared or Exclusive state. A non-threaded event.", ··· 85 76 }, 86 77 { 87 78 "BriefDescription": "L2 code requests", 79 + "Counter": "0,1,2,3", 88 80 "EventCode": "0x24", 89 81 "EventName": "L2_RQSTS.ALL_CODE_RD", 90 82 "PublicDescription": "Counts the total number of L2 code requests.", ··· 94 84 }, 95 85 { 96 86 "BriefDescription": "Demand Data Read access L2 cache", 87 + "Counter": "0,1,2,3", 97 88 "EventCode": "0x24", 98 89 "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", 99 90 "PublicDescription": "Counts Demand Data Read requests accessing the L2 cache. These requests may hit or miss L2 cache. True-miss exclude misses that were merged with ongoing L2 misses. An access is counted once.", ··· 103 92 }, 104 93 { 105 94 "BriefDescription": "RFO requests to L2 cache", 95 + "Counter": "0,1,2,3", 106 96 "EventCode": "0x24", 107 97 "EventName": "L2_RQSTS.ALL_RFO", 108 98 "PublicDescription": "Counts the total number of RFO (read for ownership) requests to L2 cache. L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches.", ··· 112 100 }, 113 101 { 114 102 "BriefDescription": "L2 cache hits when fetching instructions, code reads.", 103 + "Counter": "0,1,2,3", 115 104 "EventCode": "0x24", 116 105 "EventName": "L2_RQSTS.CODE_RD_HIT", 117 106 "PublicDescription": "Counts L2 cache hits when fetching instructions, code reads.", ··· 121 108 }, 122 109 { 123 110 "BriefDescription": "L2 cache misses when fetching instructions", 111 + "Counter": "0,1,2,3", 124 112 "EventCode": "0x24", 125 113 "EventName": "L2_RQSTS.CODE_RD_MISS", 126 114 "PublicDescription": "Counts L2 cache misses when fetching instructions.", ··· 130 116 }, 131 117 { 132 118 "BriefDescription": "Demand Data Read requests that hit L2 cache", 119 + "Counter": "0,1,2,3", 133 120 "EventCode": "0x24", 134 121 "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", 135 122 "PublicDescription": "Counts the number of demand Data Read requests initiated by load instructions that hit L2 cache.", ··· 139 124 }, 140 125 { 141 126 "BriefDescription": "Demand Data Read miss L2 cache", 127 + "Counter": "0,1,2,3", 142 128 "EventCode": "0x24", 143 129 "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", 144 130 "PublicDescription": "Counts demand Data Read requests with true-miss in the L2 cache. True-miss excludes misses that were merged with ongoing L2 misses. An access is counted once.", ··· 148 132 }, 149 133 { 150 134 "BriefDescription": "Read requests with true-miss in L2 cache", 135 + "Counter": "0,1,2,3", 151 136 "EventCode": "0x24", 152 137 "EventName": "L2_RQSTS.MISS", 153 138 "PublicDescription": "Counts read requests of any type with true-miss in the L2 cache. True-miss excludes L2 misses that were merged with ongoing L2 misses.", ··· 157 140 }, 158 141 { 159 142 "BriefDescription": "All accesses to L2 cache", 143 + "Counter": "0,1,2,3", 160 144 "EventCode": "0x24", 161 145 "EventName": "L2_RQSTS.REFERENCES", 162 146 "PublicDescription": "Counts all requests that were hit or true misses in L2 cache. True-miss excludes misses that were merged with ongoing L2 misses.", ··· 166 148 }, 167 149 { 168 150 "BriefDescription": "RFO requests that hit L2 cache", 151 + "Counter": "0,1,2,3", 169 152 "EventCode": "0x24", 170 153 "EventName": "L2_RQSTS.RFO_HIT", 171 154 "PublicDescription": "Counts the RFO (Read-for-Ownership) requests that hit L2 cache.", ··· 175 156 }, 176 157 { 177 158 "BriefDescription": "RFO requests that miss L2 cache", 159 + "Counter": "0,1,2,3", 178 160 "EventCode": "0x24", 179 161 "EventName": "L2_RQSTS.RFO_MISS", 180 162 "PublicDescription": "Counts the RFO (Read-for-Ownership) requests that miss L2 cache.", ··· 184 164 }, 185 165 { 186 166 "BriefDescription": "SW prefetch requests that hit L2 cache.", 167 + "Counter": "0,1,2,3", 187 168 "EventCode": "0x24", 188 169 "EventName": "L2_RQSTS.SWPF_HIT", 189 170 "PublicDescription": "Counts Software prefetch requests that hit the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when FB is not full.", ··· 193 172 }, 194 173 { 195 174 "BriefDescription": "SW prefetch requests that miss L2 cache.", 175 + "Counter": "0,1,2,3", 196 176 "EventCode": "0x24", 197 177 "EventName": "L2_RQSTS.SWPF_MISS", 198 178 "PublicDescription": "Counts Software prefetch requests that miss the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when FB is not full.", ··· 202 180 }, 203 181 { 204 182 "BriefDescription": "L2 writebacks that access L2 cache", 183 + "Counter": "0,1,2,3", 205 184 "EventCode": "0xf0", 206 185 "EventName": "L2_TRANS.L2_WB", 207 186 "PublicDescription": "Counts L2 writebacks that access L2 cache.", ··· 211 188 }, 212 189 { 213 190 "BriefDescription": "Cycles when L1D is locked", 191 + "Counter": "0,1,2,3", 214 192 "EventCode": "0x63", 215 193 "EventName": "LOCK_CYCLES.CACHE_LOCK_DURATION", 216 194 "PublicDescription": "This event counts the number of cycles when the L1D is locked. It is a superset of the 0x1 mask (BUS_LOCK_CLOCKS.BUS_LOCK_DURATION).", ··· 220 196 }, 221 197 { 222 198 "BriefDescription": "Core-originated cacheable requests that missed L3 (Except hardware prefetches to the L3)", 199 + "Counter": "0,1,2,3,4,5,6,7", 223 200 "EventCode": "0x2e", 224 201 "EventName": "LONGEST_LAT_CACHE.MISS", 225 202 "PublicDescription": "Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches to the L1 and L2. It does not include hardware prefetches to the L3, and may not count other types of requests to the L3.", ··· 229 204 }, 230 205 { 231 206 "BriefDescription": "Retired load instructions.", 207 + "Counter": "0,1,2,3", 232 208 "Data_LA": "1", 233 209 "EventCode": "0xd0", 234 210 "EventName": "MEM_INST_RETIRED.ALL_LOADS", ··· 240 214 }, 241 215 { 242 216 "BriefDescription": "Retired store instructions.", 217 + "Counter": "0,1,2,3", 243 218 "Data_LA": "1", 244 219 "EventCode": "0xd0", 245 220 "EventName": "MEM_INST_RETIRED.ALL_STORES", ··· 251 224 }, 252 225 { 253 226 "BriefDescription": "All retired memory instructions.", 227 + "Counter": "0,1,2,3", 254 228 "Data_LA": "1", 255 229 "EventCode": "0xd0", 256 230 "EventName": "MEM_INST_RETIRED.ANY", ··· 262 234 }, 263 235 { 264 236 "BriefDescription": "Retired load instructions with locked access.", 237 + "Counter": "0,1,2,3", 265 238 "Data_LA": "1", 266 239 "EventCode": "0xd0", 267 240 "EventName": "MEM_INST_RETIRED.LOCK_LOADS", ··· 273 244 }, 274 245 { 275 246 "BriefDescription": "Retired load instructions that split across a cacheline boundary.", 247 + "Counter": "0,1,2,3", 276 248 "Data_LA": "1", 277 249 "EventCode": "0xd0", 278 250 "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", ··· 284 254 }, 285 255 { 286 256 "BriefDescription": "Retired store instructions that split across a cacheline boundary.", 257 + "Counter": "0,1,2,3", 287 258 "Data_LA": "1", 288 259 "EventCode": "0xd0", 289 260 "EventName": "MEM_INST_RETIRED.SPLIT_STORES", ··· 295 264 }, 296 265 { 297 266 "BriefDescription": "Retired load instructions that miss the STLB.", 267 + "Counter": "0,1,2,3", 298 268 "Data_LA": "1", 299 269 "EventCode": "0xd0", 300 270 "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", ··· 306 274 }, 307 275 { 308 276 "BriefDescription": "Retired store instructions that miss the STLB.", 277 + "Counter": "0,1,2,3", 309 278 "Data_LA": "1", 310 279 "EventCode": "0xd0", 311 280 "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", ··· 317 284 }, 318 285 { 319 286 "BriefDescription": "Snoop hit a modified(HITM) or clean line(HIT_W_FWD) in another on-pkg core which forwarded the data back due to a retired load instruction.", 287 + "Counter": "0,1,2,3", 320 288 "Data_LA": "1", 321 289 "EventCode": "0xd2", 322 290 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD", ··· 328 294 }, 329 295 { 330 296 "BriefDescription": "Retired load instructions whose data sources were L3 hit and cross-core snoop missed in on-pkg core cache.", 297 + "Counter": "0,1,2,3", 331 298 "Data_LA": "1", 332 299 "EventCode": "0xd2", 333 300 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", ··· 339 304 }, 340 305 { 341 306 "BriefDescription": "Retired load instructions whose data sources were hits in L3 without snoops required", 307 + "Counter": "0,1,2,3", 342 308 "Data_LA": "1", 343 309 "EventCode": "0xd2", 344 310 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NONE", ··· 350 314 }, 351 315 { 352 316 "BriefDescription": "Snoop hit without forwarding in another on-pkg core due to a retired load instruction, data was supplied by the L3.", 317 + "Counter": "0,1,2,3", 353 318 "Data_LA": "1", 354 319 "EventCode": "0xd2", 355 320 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD", ··· 361 324 }, 362 325 { 363 326 "BriefDescription": "Retired instructions with at least 1 uncacheable load or lock.", 327 + "Counter": "0,1,2,3", 364 328 "Data_LA": "1", 365 329 "EventCode": "0xd4", 366 330 "EventName": "MEM_LOAD_MISC_RETIRED.UC", ··· 372 334 }, 373 335 { 374 336 "BriefDescription": "Number of completed demand load requests that missed the L1, but hit the FB(fill buffer), because a preceding miss to the same cacheline initiated the line to be brought into L1, but data is not yet ready in L1.", 337 + "Counter": "0,1,2,3", 375 338 "Data_LA": "1", 376 339 "EventCode": "0xd1", 377 340 "EventName": "MEM_LOAD_RETIRED.FB_HIT", ··· 383 344 }, 384 345 { 385 346 "BriefDescription": "Retired load instructions with L1 cache hits as data sources", 347 + "Counter": "0,1,2,3", 386 348 "Data_LA": "1", 387 349 "EventCode": "0xd1", 388 350 "EventName": "MEM_LOAD_RETIRED.L1_HIT", ··· 394 354 }, 395 355 { 396 356 "BriefDescription": "Retired load instructions missed L1 cache as data sources", 357 + "Counter": "0,1,2,3", 397 358 "Data_LA": "1", 398 359 "EventCode": "0xd1", 399 360 "EventName": "MEM_LOAD_RETIRED.L1_MISS", ··· 405 364 }, 406 365 { 407 366 "BriefDescription": "Retired load instructions with L2 cache hits as data sources", 367 + "Counter": "0,1,2,3", 408 368 "Data_LA": "1", 409 369 "EventCode": "0xd1", 410 370 "EventName": "MEM_LOAD_RETIRED.L2_HIT", ··· 416 374 }, 417 375 { 418 376 "BriefDescription": "Retired load instructions missed L2 cache as data sources", 377 + "Counter": "0,1,2,3", 419 378 "Data_LA": "1", 420 379 "EventCode": "0xd1", 421 380 "EventName": "MEM_LOAD_RETIRED.L2_MISS", ··· 427 384 }, 428 385 { 429 386 "BriefDescription": "Retired load instructions with L3 cache hits as data sources", 387 + "Counter": "0,1,2,3", 430 388 "Data_LA": "1", 431 389 "EventCode": "0xd1", 432 390 "EventName": "MEM_LOAD_RETIRED.L3_HIT", ··· 438 394 }, 439 395 { 440 396 "BriefDescription": "Retired load instructions missed L3 cache as data sources", 397 + "Counter": "0,1,2,3", 441 398 "Data_LA": "1", 442 399 "EventCode": "0xd1", 443 400 "EventName": "MEM_LOAD_RETIRED.L3_MISS", ··· 449 404 }, 450 405 { 451 406 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 407 + "Counter": "0,1,2,3", 452 408 "EventCode": "0xB7, 0xBB", 453 409 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM", 454 410 "MSRIndex": "0x1a6,0x1a7", ··· 459 413 }, 460 414 { 461 415 "BriefDescription": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD", 416 + "Counter": "0,1,2,3", 462 417 "EventCode": "0xB7, 0xBB", 463 418 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD", 464 419 "MSRIndex": "0x1a6,0x1a7", ··· 469 422 }, 470 423 { 471 424 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 425 + "Counter": "0,1,2,3", 472 426 "EventCode": "0xB7, 0xBB", 473 427 "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM", 474 428 "MSRIndex": "0x1a6,0x1a7", ··· 479 431 }, 480 432 { 481 433 "BriefDescription": "Demand and prefetch data reads", 434 + "Counter": "0,1,2,3", 482 435 "EventCode": "0xb0", 483 436 "EventName": "OFFCORE_REQUESTS.ALL_DATA_RD", 484 437 "PublicDescription": "Counts the demand and prefetch data reads. All Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 prefetchers). Counting also covers reads due to page walks resulted from any request type.", ··· 488 439 }, 489 440 { 490 441 "BriefDescription": "Any memory transaction that reached the SQ.", 442 + "Counter": "0,1,2,3", 491 443 "EventCode": "0xb0", 492 444 "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", 493 445 "PublicDescription": "Counts memory transactions reached the super queue including requests initiated by the core, all L3 prefetches, page walks, etc..", ··· 497 447 }, 498 448 { 499 449 "BriefDescription": "Demand Data Read requests sent to uncore", 450 + "Counter": "0,1,2,3", 500 451 "EventCode": "0xb0", 501 452 "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", 502 453 "PublicDescription": "Counts the Demand Data Read requests sent to uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determine average latency in the uncore.", ··· 506 455 }, 507 456 { 508 457 "BriefDescription": "Demand RFO requests including regular RFOs, locks, ItoM", 458 + "Counter": "0,1,2,3", 509 459 "EventCode": "0xb0", 510 460 "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", 511 461 "PublicDescription": "Counts the demand RFO (read for ownership) requests including regular RFOs, locks, ItoM.", ··· 515 463 }, 516 464 { 517 465 "BriefDescription": "Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore", 466 + "Counter": "0,1,2,3", 518 467 "EventCode": "0x60", 519 468 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", 520 469 "PublicDescription": "Counts the number of offcore outstanding cacheable Core Data Read transactions in the super queue every cycle. A transaction is considered to be in the Offcore outstanding state between L2 miss and transaction completion sent to requestor (SQ de-allocation). See corresponding Umask under OFFCORE_REQUESTS.", ··· 524 471 }, 525 472 { 526 473 "BriefDescription": "Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore.", 474 + "Counter": "0,1,2,3", 527 475 "CounterMask": "1", 528 476 "EventCode": "0x60", 529 477 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", ··· 534 480 }, 535 481 { 536 482 "BriefDescription": "Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore", 483 + "Counter": "0,1,2,3", 537 484 "CounterMask": "1", 538 485 "EventCode": "0x60", 539 486 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_DATA_RD", ··· 544 489 }, 545 490 { 546 491 "BriefDescription": "Cycles with offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore.", 492 + "Counter": "0,1,2,3", 547 493 "CounterMask": "1", 548 494 "EventCode": "0x60", 549 495 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", ··· 554 498 }, 555 499 { 556 500 "BriefDescription": "Demand Data Read transactions pending for off-core. Highly correlated.", 501 + "Counter": "0,1,2,3", 557 502 "EventCode": "0x60", 558 503 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", 559 504 "PublicDescription": "Counts the number of off-core outstanding Demand Data Read transactions every cycle. A transaction is considered to be in the Off-core outstanding state between L2 cache miss and data-return to the core.", ··· 563 506 }, 564 507 { 565 508 "BriefDescription": "Cycles with at least 6 offcore outstanding Demand Data Read transactions in uncore queue.", 509 + "Counter": "0,1,2,3", 566 510 "CounterMask": "6", 567 511 "EventCode": "0x60", 568 512 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD_GE_6", ··· 572 514 }, 573 515 { 574 516 "BriefDescription": "Store Read transactions pending for off-core. Highly correlated.", 517 + "Counter": "0,1,2,3", 575 518 "EventCode": "0x60", 576 519 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", 577 520 "PublicDescription": "Counts the number of off-core outstanding read-for-ownership (RFO) store transactions every cycle. An RFO transaction is considered to be in the Off-core outstanding state between L2 cache miss and transaction completion.", ··· 581 522 }, 582 523 { 583 524 "BriefDescription": "Counts bus locks, accounts for cache line split locks and UC locks.", 525 + "Counter": "0,1,2,3", 584 526 "EventCode": "0xf4", 585 527 "EventName": "SQ_MISC.BUS_LOCK", 586 528 "PublicDescription": "Counts the more expensive bus lock needed to enforce cache coherency for certain memory accesses that need to be done atomically. Can be created by issuing an atomic instruction (via the LOCK prefix) which causes a cache line split or accesses uncacheable memory.", ··· 590 530 }, 591 531 { 592 532 "BriefDescription": "Cycles the superQ cannot take any more entries.", 533 + "Counter": "0,1,2,3", 593 534 "EventCode": "0xf4", 594 535 "EventName": "SQ_MISC.SQ_FULL", 595 536 "PublicDescription": "Counts the cycles for which the thread is active and the superQ cannot take any more entries.", ··· 598 537 "UMask": "0x4" 599 538 }, 600 539 { 540 + "BriefDescription": "Counts the number of PREFETCHNTA, PREFETCHW, PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed.", 541 + "Counter": "0,1,2,3", 542 + "EventCode": "0x32", 543 + "EventName": "SW_PREFETCH_ACCESS.ANY", 544 + "SampleAfterValue": "100003", 545 + "UMask": "0xf" 546 + }, 547 + { 601 548 "BriefDescription": "Number of PREFETCHNTA instructions executed.", 549 + "Counter": "0,1,2,3", 602 550 "EventCode": "0x32", 603 551 "EventName": "SW_PREFETCH_ACCESS.NTA", 604 552 "PublicDescription": "Counts the number of PREFETCHNTA instructions executed.", ··· 616 546 }, 617 547 { 618 548 "BriefDescription": "Number of PREFETCHW instructions executed.", 549 + "Counter": "0,1,2,3", 619 550 "EventCode": "0x32", 620 551 "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", 621 552 "PublicDescription": "Counts the number of PREFETCHW instructions executed.", ··· 625 554 }, 626 555 { 627 556 "BriefDescription": "Number of PREFETCHT0 instructions executed.", 557 + "Counter": "0,1,2,3", 628 558 "EventCode": "0x32", 629 559 "EventName": "SW_PREFETCH_ACCESS.T0", 630 560 "PublicDescription": "Counts the number of PREFETCHT0 instructions executed.", ··· 634 562 }, 635 563 { 636 564 "BriefDescription": "Number of PREFETCHT1 or PREFETCHT2 instructions executed.", 565 + "Counter": "0,1,2,3", 637 566 "EventCode": "0x32", 638 567 "EventName": "SW_PREFETCH_ACCESS.T1_T2", 639 568 "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT2 instructions executed.",
+17
tools/perf/pmu-events/arch/x86/tigerlake/counter.json
··· 1 + [ 2 + { 3 + "Unit": "core", 4 + "CountersNumFixed": "4", 5 + "CountersNumGeneric": "8" 6 + }, 7 + { 8 + "Unit": "ARB", 9 + "CountersNumFixed": "0", 10 + "CountersNumGeneric": "2" 11 + }, 12 + { 13 + "Unit": "CLOCK", 14 + "CountersNumFixed": 1, 15 + "CountersNumGeneric": "0" 16 + } 17 + ]
+13
tools/perf/pmu-events/arch/x86/tigerlake/floating-point.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts all microcode FP assists.", 4 + "Counter": "0,1,2,3,4,5,6,7", 4 5 "EventCode": "0xc1", 5 6 "EventName": "ASSISTS.FP", 6 7 "PublicDescription": "Counts all microcode Floating Point assists.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Counts number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 12 + "Counter": "0,1,2,3,4,5,6,7", 13 13 "EventCode": "0xc7", 14 14 "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", 15 15 "PublicDescription": "Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 20 + "Counter": "0,1,2,3,4,5,6,7", 22 21 "EventCode": "0xc7", 23 22 "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", 24 23 "PublicDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 28 25 }, 29 26 { 30 27 "BriefDescription": "Counts number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 28 + "Counter": "0,1,2,3,4,5,6,7", 31 29 "EventCode": "0xc7", 32 30 "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", 33 31 "PublicDescription": "Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 37 33 }, 38 34 { 39 35 "BriefDescription": "Counts number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 36 + "Counter": "0,1,2,3,4,5,6,7", 40 37 "EventCode": "0xc7", 41 38 "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", 42 39 "PublicDescription": "Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 46 41 }, 47 42 { 48 43 "BriefDescription": "Number of SSE/AVX computational 128-bit packed single and 256-bit packed double precision FP instructions retired; some instructions will count twice as noted below. Each count represents 2 or/and 4 computation operations, 1 for each element. Applies to SSE* and AVX* packed single precision and packed double precision FP instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per element.", 44 + "Counter": "0,1,2,3,4,5,6,7", 49 45 "EventCode": "0xc7", 50 46 "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", 51 47 "PublicDescription": "Number of SSE/AVX computational 128-bit packed single precision and 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 or/and 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point and packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 55 49 }, 56 50 { 57 51 "BriefDescription": "Counts number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 52 + "Counter": "0,1,2,3,4,5,6,7", 58 53 "EventCode": "0xc7", 59 54 "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", 60 55 "PublicDescription": "Number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 64 57 }, 65 58 { 66 59 "BriefDescription": "Counts number of SSE/AVX computational 512-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 16 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 60 + "Counter": "0,1,2,3,4,5,6,7", 67 61 "EventCode": "0xc7", 68 62 "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", 69 63 "PublicDescription": "Number of SSE/AVX computational 512-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 16 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 73 65 }, 74 66 { 75 67 "BriefDescription": "Number of SSE/AVX computational 256-bit packed single precision and 512-bit packed double precision FP instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, 1 for each element. Applies to SSE* and AVX* packed single precision and double precision FP instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14 RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per element.", 68 + "Counter": "0,1,2,3,4,5,6,7", 76 69 "EventCode": "0xc7", 77 70 "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", 78 71 "PublicDescription": "Number of SSE/AVX computational 256-bit packed single precision and 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision and double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14 RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 82 73 }, 83 74 { 84 75 "BriefDescription": "Number of SSE/AVX computational scalar floating-point instructions retired; some instructions will count twice as noted below. Applies to SSE* and AVX* scalar, double and single precision floating-point: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element.", 76 + "Counter": "0,1,2,3,4,5,6,7", 85 77 "EventCode": "0xc7", 86 78 "EventName": "FP_ARITH_INST_RETIRED.SCALAR", 87 79 "PublicDescription": "Number of SSE/AVX computational scalar single precision and double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 91 81 }, 92 82 { 93 83 "BriefDescription": "Counts number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 84 + "Counter": "0,1,2,3,4,5,6,7", 94 85 "EventCode": "0xc7", 95 86 "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", 96 87 "PublicDescription": "Number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 100 89 }, 101 90 { 102 91 "BriefDescription": "Counts number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 92 + "Counter": "0,1,2,3,4,5,6,7", 103 93 "EventCode": "0xc7", 104 94 "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", 105 95 "PublicDescription": "Number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 109 97 }, 110 98 { 111 99 "BriefDescription": "Number of any Vector retired FP arithmetic instructions", 100 + "Counter": "0,1,2,3,4,5,6,7", 112 101 "EventCode": "0xc7", 113 102 "EventName": "FP_ARITH_INST_RETIRED.VECTOR", 114 103 "SampleAfterValue": "1000003",
+40 -1
tools/perf/pmu-events/arch/x86/tigerlake/frontend.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0xe6", 5 6 "EventName": "BACLEARS.ANY", 6 7 "PublicDescription": "Counts the number of times the front-end is resteered when it finds a branch instruction in a fetch line. This occurs for the first time a branch instruction is fetched or when the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]", 12 + "Counter": "0,1,2,3", 13 13 "EventCode": "0x87", 14 14 "EventName": "DECODE.LCP", 15 15 "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transitions count.", 20 + "Counter": "0,1,2,3", 22 21 "CounterMask": "1", 23 22 "EdgeDetect": "1", 24 23 "EventCode": "0xab", ··· 30 27 }, 31 28 { 32 29 "BriefDescription": "DSB-to-MITE switch true penalty cycles.", 30 + "Counter": "0,1,2,3", 33 31 "EventCode": "0xab", 34 32 "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", 35 33 "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache that holds translations of previously fetched instructions that were decoded by the legacy x86 decode pipeline (MITE). This event counts fetch penalty cycles when a transition occurs from DSB to MITE.", ··· 39 35 }, 40 36 { 41 37 "BriefDescription": "Retired Instructions who experienced DSB miss.", 38 + "Counter": "0,1,2,3,4,5,6,7", 42 39 "EventCode": "0xc6", 43 40 "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", 44 41 "MSRIndex": "0x3F7", ··· 51 46 }, 52 47 { 53 48 "BriefDescription": "Retired Instructions who experienced a critical DSB miss.", 49 + "Counter": "0,1,2,3,4,5,6,7", 54 50 "EventCode": "0xc6", 55 51 "EventName": "FRONTEND_RETIRED.DSB_MISS", 56 52 "MSRIndex": "0x3F7", ··· 63 57 }, 64 58 { 65 59 "BriefDescription": "Retired Instructions who experienced iTLB true miss.", 60 + "Counter": "0,1,2,3,4,5,6,7", 66 61 "EventCode": "0xc6", 67 62 "EventName": "FRONTEND_RETIRED.ITLB_MISS", 68 63 "MSRIndex": "0x3F7", ··· 75 68 }, 76 69 { 77 70 "BriefDescription": "Retired Instructions who experienced Instruction L1 Cache true miss.", 71 + "Counter": "0,1,2,3,4,5,6,7", 78 72 "EventCode": "0xc6", 79 73 "EventName": "FRONTEND_RETIRED.L1I_MISS", 80 74 "MSRIndex": "0x3F7", ··· 87 79 }, 88 80 { 89 81 "BriefDescription": "Retired Instructions who experienced Instruction L2 Cache true miss.", 82 + "Counter": "0,1,2,3,4,5,6,7", 90 83 "EventCode": "0xc6", 91 84 "EventName": "FRONTEND_RETIRED.L2_MISS", 92 85 "MSRIndex": "0x3F7", ··· 99 90 }, 100 91 { 101 92 "BriefDescription": "Retired instructions after front-end starvation of at least 1 cycle", 93 + "Counter": "0,1,2,3,4,5,6,7", 102 94 "EventCode": "0xc6", 103 95 "EventName": "FRONTEND_RETIRED.LATENCY_GE_1", 104 96 "MSRIndex": "0x3F7", ··· 111 101 }, 112 102 { 113 103 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 128 cycles which was not interrupted by a back-end stall.", 104 + "Counter": "0,1,2,3,4,5,6,7", 114 105 "EventCode": "0xc6", 115 106 "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", 116 107 "MSRIndex": "0x3F7", ··· 123 112 }, 124 113 { 125 114 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 16 cycles which was not interrupted by a back-end stall.", 115 + "Counter": "0,1,2,3,4,5,6,7", 126 116 "EventCode": "0xc6", 127 117 "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", 128 118 "MSRIndex": "0x3F7", ··· 135 123 }, 136 124 { 137 125 "BriefDescription": "Retired instructions after front-end starvation of at least 2 cycles", 126 + "Counter": "0,1,2,3,4,5,6,7", 138 127 "EventCode": "0xc6", 139 128 "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", 140 129 "MSRIndex": "0x3F7", ··· 147 134 }, 148 135 { 149 136 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 256 cycles which was not interrupted by a back-end stall.", 137 + "Counter": "0,1,2,3,4,5,6,7", 150 138 "EventCode": "0xc6", 151 139 "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", 152 140 "MSRIndex": "0x3F7", ··· 159 145 }, 160 146 { 161 147 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end had at least 1 bubble-slot for a period of 2 cycles which was not interrupted by a back-end stall.", 148 + "Counter": "0,1,2,3,4,5,6,7", 162 149 "EventCode": "0xc6", 163 150 "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", 164 151 "MSRIndex": "0x3F7", ··· 171 156 }, 172 157 { 173 158 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 32 cycles which was not interrupted by a back-end stall.", 159 + "Counter": "0,1,2,3,4,5,6,7", 174 160 "EventCode": "0xc6", 175 161 "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", 176 162 "MSRIndex": "0x3F7", ··· 183 167 }, 184 168 { 185 169 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 4 cycles which was not interrupted by a back-end stall.", 170 + "Counter": "0,1,2,3,4,5,6,7", 186 171 "EventCode": "0xc6", 187 172 "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", 188 173 "MSRIndex": "0x3F7", ··· 195 178 }, 196 179 { 197 180 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 512 cycles which was not interrupted by a back-end stall.", 181 + "Counter": "0,1,2,3,4,5,6,7", 198 182 "EventCode": "0xc6", 199 183 "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", 200 184 "MSRIndex": "0x3F7", ··· 207 189 }, 208 190 { 209 191 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 64 cycles which was not interrupted by a back-end stall.", 192 + "Counter": "0,1,2,3,4,5,6,7", 210 193 "EventCode": "0xc6", 211 194 "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", 212 195 "MSRIndex": "0x3F7", ··· 219 200 }, 220 201 { 221 202 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 8 cycles which was not interrupted by a back-end stall.", 203 + "Counter": "0,1,2,3,4,5,6,7", 222 204 "EventCode": "0xc6", 223 205 "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", 224 206 "MSRIndex": "0x3F7", ··· 231 211 }, 232 212 { 233 213 "BriefDescription": "Retired Instructions who experienced STLB (2nd level TLB) true miss.", 214 + "Counter": "0,1,2,3,4,5,6,7", 234 215 "EventCode": "0xc6", 235 216 "EventName": "FRONTEND_RETIRED.STLB_MISS", 236 217 "MSRIndex": "0x3F7", ··· 243 222 }, 244 223 { 245 224 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", 225 + "Counter": "0,1,2,3", 246 226 "EventCode": "0x80", 247 227 "EventName": "ICACHE_16B.IFDATA_STALL", 248 228 "PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", ··· 252 230 }, 253 231 { 254 232 "BriefDescription": "Instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity.", 233 + "Counter": "0,1,2,3", 255 234 "EventCode": "0x83", 256 235 "EventName": "ICACHE_64B.IFTAG_HIT", 257 236 "PublicDescription": "Counts instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity. Accounts for both cacheable and uncacheable accesses.", ··· 261 238 }, 262 239 { 263 240 "BriefDescription": "Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity.", 241 + "Counter": "0,1,2,3", 264 242 "EventCode": "0x83", 265 243 "EventName": "ICACHE_64B.IFTAG_MISS", 266 244 "PublicDescription": "Counts instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity. Accounts for both cacheable and uncacheable accesses.", ··· 270 246 }, 271 247 { 272 248 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", 249 + "Counter": "0,1,2,3", 273 250 "EventCode": "0x83", 274 251 "EventName": "ICACHE_64B.IFTAG_STALL", 275 252 "PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", ··· 279 254 }, 280 255 { 281 256 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]", 257 + "Counter": "0,1,2,3", 282 258 "EventCode": "0x80", 283 259 "EventName": "ICACHE_DATA.STALLS", 284 260 "PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]", ··· 288 262 }, 289 263 { 290 264 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]", 265 + "Counter": "0,1,2,3", 291 266 "EventCode": "0x83", 292 267 "EventName": "ICACHE_TAG.STALLS", 293 268 "PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]", ··· 297 270 }, 298 271 { 299 272 "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop", 273 + "Counter": "0,1,2,3", 300 274 "CounterMask": "1", 301 275 "EventCode": "0x79", 302 276 "EventName": "IDQ.DSB_CYCLES_ANY", ··· 307 279 }, 308 280 { 309 281 "BriefDescription": "Cycles DSB is delivering optimal number of Uops", 282 + "Counter": "0,1,2,3", 310 283 "CounterMask": "5", 311 284 "EventCode": "0x79", 312 285 "EventName": "IDQ.DSB_CYCLES_OK", 313 - "PublicDescription": "Counts the number of cycles where optimal number of uops was delivered to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipeline) path. During these cycles uops are not being delivered from the Decode Stream Buffer (DSB).", 286 + "PublicDescription": "Counts the number of cycles where optimal number of uops was delivered to the Instruction Decode Queue (IDQ) from the DSB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the IDQ.", 314 287 "SampleAfterValue": "2000003", 315 288 "UMask": "0x8" 316 289 }, 317 290 { 318 291 "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path", 292 + "Counter": "0,1,2,3", 319 293 "EventCode": "0x79", 320 294 "EventName": "IDQ.DSB_UOPS", 321 295 "PublicDescription": "Counts the number of uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", ··· 326 296 }, 327 297 { 328 298 "BriefDescription": "Cycles MITE is delivering any Uop", 299 + "Counter": "0,1,2,3", 329 300 "CounterMask": "1", 330 301 "EventCode": "0x79", 331 302 "EventName": "IDQ.MITE_CYCLES_ANY", ··· 336 305 }, 337 306 { 338 307 "BriefDescription": "Cycles MITE is delivering optimal number of Uops", 308 + "Counter": "0,1,2,3", 339 309 "CounterMask": "5", 340 310 "EventCode": "0x79", 341 311 "EventName": "IDQ.MITE_CYCLES_OK", ··· 346 314 }, 347 315 { 348 316 "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from MITE path", 317 + "Counter": "0,1,2,3", 349 318 "EventCode": "0x79", 350 319 "EventName": "IDQ.MITE_UOPS", 351 320 "PublicDescription": "Counts the number of uops delivered to Instruction Decode Queue (IDQ) from the MITE path. This also means that uops are not being delivered from the Decode Stream Buffer (DSB).", ··· 355 322 }, 356 323 { 357 324 "BriefDescription": "Cycles when uops are being delivered to IDQ while MS is busy", 325 + "Counter": "0,1,2,3", 358 326 "CounterMask": "1", 359 327 "EventCode": "0x79", 360 328 "EventName": "IDQ.MS_CYCLES_ANY", ··· 365 331 }, 366 332 { 367 333 "BriefDescription": "Number of switches from DSB or MITE to the MS", 334 + "Counter": "0,1,2,3", 368 335 "CounterMask": "1", 369 336 "EdgeDetect": "1", 370 337 "EventCode": "0x79", ··· 376 341 }, 377 342 { 378 343 "BriefDescription": "Uops delivered to IDQ while MS is busy", 344 + "Counter": "0,1,2,3", 379 345 "EventCode": "0x79", 380 346 "EventName": "IDQ.MS_UOPS", 381 347 "PublicDescription": "Counts the total number of uops delivered by the Microcode Sequencer (MS). Any instruction over 4 uops will be delivered by the MS. Some instructions such as transcendentals may additionally generate uops from the MS.", ··· 385 349 }, 386 350 { 387 351 "BriefDescription": "Uops not delivered by IDQ when backend of the machine is not stalled", 352 + "Counter": "0,1,2,3,4,5,6,7", 388 353 "EventCode": "0x9c", 389 354 "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", 390 355 "PublicDescription": "Counts the number of uops not delivered to by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.", ··· 394 357 }, 395 358 { 396 359 "BriefDescription": "Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled", 360 + "Counter": "0,1,2,3,4,5,6,7", 397 361 "CounterMask": "5", 398 362 "EventCode": "0x9c", 399 363 "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", ··· 404 366 }, 405 367 { 406 368 "BriefDescription": "Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled", 369 + "Counter": "0,1,2,3,4,5,6,7", 407 370 "CounterMask": "1", 408 371 "EventCode": "0x9c", 409 372 "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK",
+24
tools/perf/pmu-events/arch/x86/tigerlake/memory.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Execution stalls while L3 cache miss demand load is outstanding.", 4 + "Counter": "0,1,2,3", 4 5 "CounterMask": "6", 5 6 "EventCode": "0xa3", 6 7 "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Number of machine clears due to memory ordering conflicts.", 12 + "Counter": "0,1,2,3,4,5,6,7", 13 13 "EventCode": "0xc3", 14 14 "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", 15 15 "PublicDescription": "Counts the number of Machine Clears detected dye to memory ordering. Memory Ordering Machine Clears may apply when a memory read may not conform to the memory ordering rules of the x86 architecture", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.", 20 + "Counter": "0,1,2,3,4,5,6,7", 22 21 "Data_LA": "1", 23 22 "EventCode": "0xcd", 24 23 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128", ··· 32 29 }, 33 30 { 34 31 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.", 32 + "Counter": "0,1,2,3,4,5,6,7", 35 33 "Data_LA": "1", 36 34 "EventCode": "0xcd", 37 35 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16", ··· 45 41 }, 46 42 { 47 43 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.", 44 + "Counter": "0,1,2,3,4,5,6,7", 48 45 "Data_LA": "1", 49 46 "EventCode": "0xcd", 50 47 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256", ··· 58 53 }, 59 54 { 60 55 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.", 56 + "Counter": "0,1,2,3,4,5,6,7", 61 57 "Data_LA": "1", 62 58 "EventCode": "0xcd", 63 59 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32", ··· 71 65 }, 72 66 { 73 67 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.", 68 + "Counter": "0,1,2,3,4,5,6,7", 74 69 "Data_LA": "1", 75 70 "EventCode": "0xcd", 76 71 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4", ··· 84 77 }, 85 78 { 86 79 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.", 80 + "Counter": "0,1,2,3,4,5,6,7", 87 81 "Data_LA": "1", 88 82 "EventCode": "0xcd", 89 83 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512", ··· 97 89 }, 98 90 { 99 91 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.", 92 + "Counter": "0,1,2,3,4,5,6,7", 100 93 "Data_LA": "1", 101 94 "EventCode": "0xcd", 102 95 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64", ··· 110 101 }, 111 102 { 112 103 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.", 104 + "Counter": "0,1,2,3,4,5,6,7", 113 105 "Data_LA": "1", 114 106 "EventCode": "0xcd", 115 107 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8", ··· 123 113 }, 124 114 { 125 115 "BriefDescription": "Demand Data Read requests who miss L3 cache", 116 + "Counter": "0,1,2,3", 126 117 "EventCode": "0xb0", 127 118 "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", 128 119 "PublicDescription": "Demand Data Read requests who miss L3 cache.", ··· 132 121 }, 133 122 { 134 123 "BriefDescription": "Number of times an RTM execution aborted.", 124 + "Counter": "0,1,2,3,4,5,6,7", 135 125 "EventCode": "0xc9", 136 126 "EventName": "RTM_RETIRED.ABORTED", 127 + "PEBS": "1", 137 128 "PublicDescription": "Counts the number of times RTM abort was triggered.", 138 129 "SampleAfterValue": "100003", 139 130 "UMask": "0x4" 140 131 }, 141 132 { 142 133 "BriefDescription": "Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt)", 134 + "Counter": "0,1,2,3,4,5,6,7", 143 135 "EventCode": "0xc9", 144 136 "EventName": "RTM_RETIRED.ABORTED_EVENTS", 145 137 "PublicDescription": "Counts the number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt).", ··· 151 137 }, 152 138 { 153 139 "BriefDescription": "Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts)", 140 + "Counter": "0,1,2,3,4,5,6,7", 154 141 "EventCode": "0xc9", 155 142 "EventName": "RTM_RETIRED.ABORTED_MEM", 156 143 "PublicDescription": "Counts the number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts).", ··· 160 145 }, 161 146 { 162 147 "BriefDescription": "Number of times an RTM execution aborted due to incompatible memory type", 148 + "Counter": "0,1,2,3,4,5,6,7", 163 149 "EventCode": "0xc9", 164 150 "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", 165 151 "PublicDescription": "Counts the number of times an RTM execution aborted due to incompatible memory type.", ··· 169 153 }, 170 154 { 171 155 "BriefDescription": "Number of times an RTM execution aborted due to HLE-unfriendly instructions", 156 + "Counter": "0,1,2,3,4,5,6,7", 172 157 "EventCode": "0xc9", 173 158 "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", 174 159 "PublicDescription": "Counts the number of times an RTM execution aborted due to HLE-unfriendly instructions.", ··· 178 161 }, 179 162 { 180 163 "BriefDescription": "Number of times an RTM execution successfully committed", 164 + "Counter": "0,1,2,3,4,5,6,7", 181 165 "EventCode": "0xc9", 182 166 "EventName": "RTM_RETIRED.COMMIT", 183 167 "PublicDescription": "Counts the number of times RTM commit succeeded.", ··· 187 169 }, 188 170 { 189 171 "BriefDescription": "Number of times an RTM execution started.", 172 + "Counter": "0,1,2,3,4,5,6,7", 190 173 "EventCode": "0xc9", 191 174 "EventName": "RTM_RETIRED.START", 192 175 "PublicDescription": "Counts the number of times we entered an RTM region. Does not count nested transactions.", ··· 196 177 }, 197 178 { 198 179 "BriefDescription": "Counts the number of times a class of instructions that may cause a transactional abort was executed inside a transactional region", 180 + "Counter": "0,1,2,3,4,5,6,7", 199 181 "EventCode": "0x5d", 200 182 "EventName": "TX_EXEC.MISC2", 201 183 "PublicDescription": "Counts Unfriendly TSX abort triggered by a vzeroupper instruction.", ··· 205 185 }, 206 186 { 207 187 "BriefDescription": "Number of times an instruction execution caused the transactional nest count supported to be exceeded", 188 + "Counter": "0,1,2,3,4,5,6,7", 208 189 "EventCode": "0x5d", 209 190 "EventName": "TX_EXEC.MISC3", 210 191 "PublicDescription": "Counts Unfriendly TSX abort triggered by a nest count that is too deep.", ··· 214 193 }, 215 194 { 216 195 "BriefDescription": "Speculatively counts the number of TSX aborts due to a data capacity limitation for transactional reads", 196 + "Counter": "0,1,2,3", 217 197 "EventCode": "0x54", 218 198 "EventName": "TX_MEM.ABORT_CAPACITY_READ", 219 199 "PublicDescription": "Speculatively counts the number of Transactional Synchronization Extensions (TSX) aborts due to a data capacity limitation for transactional reads", ··· 223 201 }, 224 202 { 225 203 "BriefDescription": "Speculatively counts the number of TSX aborts due to a data capacity limitation for transactional writes.", 204 + "Counter": "0,1,2,3", 226 205 "EventCode": "0x54", 227 206 "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", 228 207 "PublicDescription": "Speculatively counts the number of Transactional Synchronization Extensions (TSX) aborts due to a data capacity limitation for transactional writes.", ··· 232 209 }, 233 210 { 234 211 "BriefDescription": "Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address", 212 + "Counter": "0,1,2,3", 235 213 "EventCode": "0x54", 236 214 "EventName": "TX_MEM.ABORT_CONFLICT", 237 215 "PublicDescription": "Counts the number of times a TSX line had a cache conflict.",
+13
tools/perf/pmu-events/arch/x86/tigerlake/metricgroups.json
··· 5 5 "BigFootprint": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 6 6 "BrMispredicts": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 7 7 "Branches": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 8 + "BvBC": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 9 + "BvBO": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 10 + "BvCB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 11 + "BvFB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 12 + "BvIO": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 13 + "BvMB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 14 + "BvML": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 15 + "BvMP": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 16 + "BvMS": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 17 + "BvMT": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 18 + "BvOB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 19 + "BvUW": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 8 20 "CacheHits": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 21 + "CacheMisses": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 9 22 "CodeGen": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 10 23 "Compute": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 11 24 "Cor": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
+4
tools/perf/pmu-events/arch/x86/tigerlake/other.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the Non-AVX turbo schedule.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0x28", 5 6 "EventName": "CORE_POWER.LVL0_TURBO_LICENSE", 6 7 "PublicDescription": "Counts Core cycles where the core was running with power-delivery for baseline license level 0. This includes non-AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the AVX2 turbo schedule.", 12 + "Counter": "0,1,2,3", 13 13 "EventCode": "0x28", 14 14 "EventName": "CORE_POWER.LVL1_TURBO_LICENSE", 15 15 "PublicDescription": "Counts Core cycles where the core was running with power-delivery for license level 1. This includes high current AVX 256-bit instructions as well as low current AVX 512-bit instructions.", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the AVX512 turbo schedule.", 20 + "Counter": "0,1,2,3", 22 21 "EventCode": "0x28", 23 22 "EventName": "CORE_POWER.LVL2_TURBO_LICENSE", 24 23 "PublicDescription": "Core cycles where the core was running with power-delivery for license level 2 (introduced in Skylake Server microarchitecture). This includes high current AVX 512-bit instructions.", ··· 28 25 }, 29 26 { 30 27 "BriefDescription": "Counts streaming stores that have any type of response.", 28 + "Counter": "0,1,2,3", 31 29 "EventCode": "0xB7, 0xBB", 32 30 "EventName": "OCR.STREAMING_WR.ANY_RESPONSE", 33 31 "MSRIndex": "0x1a6,0x1a7",
+95
tools/perf/pmu-events/arch/x86/tigerlake/pipeline.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Cycles when divide unit is busy executing divide or square root operations.", 4 + "Counter": "0,1,2,3,4,5,6,7", 4 5 "CounterMask": "1", 5 6 "EventCode": "0x14", 6 7 "EventName": "ARITH.DIVIDER_ACTIVE", ··· 11 10 }, 12 11 { 13 12 "BriefDescription": "Number of occurrences where a microcode assist is invoked by hardware.", 13 + "Counter": "0,1,2,3,4,5,6,7", 14 14 "EventCode": "0xc1", 15 15 "EventName": "ASSISTS.ANY", 16 16 "PublicDescription": "Counts the number of occurrences where a microcode assist is invoked by hardware Examples include AD (page Access Dirty), FP and AVX related assists.", ··· 20 18 }, 21 19 { 22 20 "BriefDescription": "All branch instructions retired.", 21 + "Counter": "0,1,2,3,4,5,6,7", 23 22 "EventCode": "0xc4", 24 23 "EventName": "BR_INST_RETIRED.ALL_BRANCHES", 25 24 "PEBS": "1", ··· 29 26 }, 30 27 { 31 28 "BriefDescription": "Conditional branch instructions retired.", 29 + "Counter": "0,1,2,3,4,5,6,7", 32 30 "EventCode": "0xc4", 33 31 "EventName": "BR_INST_RETIRED.COND", 34 32 "PEBS": "1", ··· 39 35 }, 40 36 { 41 37 "BriefDescription": "Not taken branch instructions retired.", 38 + "Counter": "0,1,2,3,4,5,6,7", 42 39 "EventCode": "0xc4", 43 40 "EventName": "BR_INST_RETIRED.COND_NTAKEN", 44 41 "PEBS": "1", ··· 49 44 }, 50 45 { 51 46 "BriefDescription": "Taken conditional branch instructions retired.", 47 + "Counter": "0,1,2,3,4,5,6,7", 52 48 "EventCode": "0xc4", 53 49 "EventName": "BR_INST_RETIRED.COND_TAKEN", 54 50 "PEBS": "1", ··· 59 53 }, 60 54 { 61 55 "BriefDescription": "Far branch instructions retired.", 56 + "Counter": "0,1,2,3,4,5,6,7", 62 57 "EventCode": "0xc4", 63 58 "EventName": "BR_INST_RETIRED.FAR_BRANCH", 64 59 "PEBS": "1", ··· 69 62 }, 70 63 { 71 64 "BriefDescription": "Indirect near branch instructions retired (excluding returns)", 65 + "Counter": "0,1,2,3,4,5,6,7", 72 66 "EventCode": "0xc4", 73 67 "EventName": "BR_INST_RETIRED.INDIRECT", 74 68 "PEBS": "1", ··· 79 71 }, 80 72 { 81 73 "BriefDescription": "Direct and indirect near call instructions retired.", 74 + "Counter": "0,1,2,3,4,5,6,7", 82 75 "EventCode": "0xc4", 83 76 "EventName": "BR_INST_RETIRED.NEAR_CALL", 84 77 "PEBS": "1", ··· 89 80 }, 90 81 { 91 82 "BriefDescription": "Return instructions retired.", 83 + "Counter": "0,1,2,3,4,5,6,7", 92 84 "EventCode": "0xc4", 93 85 "EventName": "BR_INST_RETIRED.NEAR_RETURN", 94 86 "PEBS": "1", ··· 99 89 }, 100 90 { 101 91 "BriefDescription": "Taken branch instructions retired.", 92 + "Counter": "0,1,2,3,4,5,6,7", 102 93 "EventCode": "0xc4", 103 94 "EventName": "BR_INST_RETIRED.NEAR_TAKEN", 104 95 "PEBS": "1", ··· 109 98 }, 110 99 { 111 100 "BriefDescription": "All mispredicted branch instructions retired.", 101 + "Counter": "0,1,2,3,4,5,6,7", 112 102 "EventCode": "0xc5", 113 103 "EventName": "BR_MISP_RETIRED.ALL_BRANCHES", 114 104 "PEBS": "1", ··· 118 106 }, 119 107 { 120 108 "BriefDescription": "Mispredicted conditional branch instructions retired.", 109 + "Counter": "0,1,2,3,4,5,6,7", 121 110 "EventCode": "0xc5", 122 111 "EventName": "BR_MISP_RETIRED.COND", 123 112 "PEBS": "1", ··· 128 115 }, 129 116 { 130 117 "BriefDescription": "Mispredicted non-taken conditional branch instructions retired.", 118 + "Counter": "0,1,2,3,4,5,6,7", 131 119 "EventCode": "0xc5", 132 120 "EventName": "BR_MISP_RETIRED.COND_NTAKEN", 133 121 "PEBS": "1", ··· 138 124 }, 139 125 { 140 126 "BriefDescription": "number of branch instructions retired that were mispredicted and taken.", 127 + "Counter": "0,1,2,3,4,5,6,7", 141 128 "EventCode": "0xc5", 142 129 "EventName": "BR_MISP_RETIRED.COND_TAKEN", 143 130 "PEBS": "1", ··· 148 133 }, 149 134 { 150 135 "BriefDescription": "All miss-predicted indirect branch instructions retired (excluding RETs. TSX aborts is considered indirect branch).", 136 + "Counter": "0,1,2,3,4,5,6,7", 151 137 "EventCode": "0xc5", 152 138 "EventName": "BR_MISP_RETIRED.INDIRECT", 153 139 "PEBS": "1", ··· 158 142 }, 159 143 { 160 144 "BriefDescription": "Mispredicted indirect CALL instructions retired.", 145 + "Counter": "0,1,2,3,4,5,6,7", 161 146 "EventCode": "0xc5", 162 147 "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", 163 148 "PEBS": "1", ··· 168 151 }, 169 152 { 170 153 "BriefDescription": "Number of near branch instructions retired that were mispredicted and taken.", 154 + "Counter": "0,1,2,3,4,5,6,7", 171 155 "EventCode": "0xc5", 172 156 "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", 173 157 "PEBS": "1", ··· 178 160 }, 179 161 { 180 162 "BriefDescription": "This event counts the number of mispredicted ret instructions retired. Non PEBS", 163 + "Counter": "0,1,2,3,4,5,6,7", 181 164 "EventCode": "0xc5", 182 165 "EventName": "BR_MISP_RETIRED.RET", 183 166 "PEBS": "1", ··· 188 169 }, 189 170 { 190 171 "BriefDescription": "Cycle counts are evenly distributed between active threads in the Core.", 172 + "Counter": "0,1,2,3,4,5,6,7", 191 173 "EventCode": "0xec", 192 174 "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", 193 175 "PublicDescription": "This event distributes cycle counts between active hyperthreads, i.e., those in C0. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If all other hyperthreads are inactive (or disabled or do not exist), all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.", ··· 197 177 }, 198 178 { 199 179 "BriefDescription": "Core crystal clock cycles when this thread is unhalted and the other thread is halted.", 180 + "Counter": "0,1,2,3,4,5,6,7", 200 181 "EventCode": "0x3c", 201 182 "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", 202 183 "PublicDescription": "Counts Core crystal clock cycles when current thread is unhalted and the other thread is halted.", ··· 206 185 }, 207 186 { 208 187 "BriefDescription": "Core crystal clock cycles. Cycle counts are evenly distributed between active threads in the Core.", 188 + "Counter": "0,1,2,3,4,5,6,7", 209 189 "EventCode": "0x3c", 210 190 "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", 211 191 "PublicDescription": "This event distributes Core crystal clock cycle counts between active hyperthreads, i.e., those in C0 sleep-state. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If one thread is active in a core, all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.", ··· 215 193 }, 216 194 { 217 195 "BriefDescription": "Reference cycles when the core is not in halt state.", 196 + "Counter": "Fixed counter 2", 218 197 "EventName": "CPU_CLK_UNHALTED.REF_TSC", 219 198 "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'. After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.", 220 199 "SampleAfterValue": "2000003", ··· 223 200 }, 224 201 { 225 202 "BriefDescription": "Core crystal clock cycles when the thread is unhalted.", 203 + "Counter": "0,1,2,3,4,5,6,7", 226 204 "EventCode": "0x3c", 227 205 "EventName": "CPU_CLK_UNHALTED.REF_XCLK", 228 206 "PublicDescription": "Counts core crystal clock cycles when the thread is unhalted.", ··· 232 208 }, 233 209 { 234 210 "BriefDescription": "Core cycles when the thread is not in halt state", 211 + "Counter": "Fixed counter 1", 235 212 "EventName": "CPU_CLK_UNHALTED.THREAD", 236 213 "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.", 237 214 "SampleAfterValue": "2000003", ··· 240 215 }, 241 216 { 242 217 "BriefDescription": "Thread cycles when thread is not in halt state", 218 + "Counter": "0,1,2,3,4,5,6,7", 243 219 "EventCode": "0x3c", 244 220 "EventName": "CPU_CLK_UNHALTED.THREAD_P", 245 221 "PublicDescription": "This is an architectural event that counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. For this reason, this event may have a changing ratio with regards to wall clock time.", ··· 248 222 }, 249 223 { 250 224 "BriefDescription": "Cycles while L1 cache miss demand load is outstanding.", 225 + "Counter": "0,1,2,3", 251 226 "CounterMask": "8", 252 227 "EventCode": "0xa3", 253 228 "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", ··· 257 230 }, 258 231 { 259 232 "BriefDescription": "Cycles while L2 cache miss demand load is outstanding.", 233 + "Counter": "0,1,2,3", 260 234 "CounterMask": "1", 261 235 "EventCode": "0xa3", 262 236 "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", ··· 266 238 }, 267 239 { 268 240 "BriefDescription": "Cycles while memory subsystem has an outstanding load.", 241 + "Counter": "0,1,2,3,4,5,6,7", 269 242 "CounterMask": "16", 270 243 "EventCode": "0xa3", 271 244 "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", ··· 275 246 }, 276 247 { 277 248 "BriefDescription": "Execution stalls while L1 cache miss demand load is outstanding.", 249 + "Counter": "0,1,2,3", 278 250 "CounterMask": "12", 279 251 "EventCode": "0xa3", 280 252 "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", ··· 284 254 }, 285 255 { 286 256 "BriefDescription": "Execution stalls while L2 cache miss demand load is outstanding.", 257 + "Counter": "0,1,2,3", 287 258 "CounterMask": "5", 288 259 "EventCode": "0xa3", 289 260 "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", ··· 293 262 }, 294 263 { 295 264 "BriefDescription": "Execution stalls while memory subsystem has an outstanding load.", 265 + "Counter": "0,1,2,3,4,5,6,7", 296 266 "CounterMask": "20", 297 267 "EventCode": "0xa3", 298 268 "EventName": "CYCLE_ACTIVITY.STALLS_MEM_ANY", ··· 302 270 }, 303 271 { 304 272 "BriefDescription": "Total execution stalls.", 273 + "Counter": "0,1,2,3,4,5,6,7", 305 274 "CounterMask": "4", 306 275 "EventCode": "0xa3", 307 276 "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", ··· 311 278 }, 312 279 { 313 280 "BriefDescription": "Cycles total of 1 uop is executed on all ports and Reservation Station was not empty.", 281 + "Counter": "0,1,2,3,4,5,6,7", 314 282 "EventCode": "0xa6", 315 283 "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", 316 284 "PublicDescription": "Counts cycles during which a total of 1 uop was executed on all ports and Reservation Station (RS) was not empty.", ··· 320 286 }, 321 287 { 322 288 "BriefDescription": "Cycles total of 2 uops are executed on all ports and Reservation Station was not empty.", 289 + "Counter": "0,1,2,3,4,5,6,7", 323 290 "EventCode": "0xa6", 324 291 "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", 325 292 "PublicDescription": "Counts cycles during which a total of 2 uops were executed on all ports and Reservation Station (RS) was not empty.", ··· 329 294 }, 330 295 { 331 296 "BriefDescription": "Cycles total of 3 uops are executed on all ports and Reservation Station was not empty.", 297 + "Counter": "0,1,2,3,4,5,6,7", 332 298 "EventCode": "0xa6", 333 299 "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", 334 300 "PublicDescription": "Cycles total of 3 uops are executed on all ports and Reservation Station (RS) was not empty.", ··· 338 302 }, 339 303 { 340 304 "BriefDescription": "Cycles total of 4 uops are executed on all ports and Reservation Station was not empty.", 305 + "Counter": "0,1,2,3,4,5,6,7", 341 306 "EventCode": "0xa6", 342 307 "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", 343 308 "PublicDescription": "Cycles total of 4 uops are executed on all ports and Reservation Station (RS) was not empty.", ··· 347 310 }, 348 311 { 349 312 "BriefDescription": "Cycles when the memory subsystem has an outstanding load. Increments by 4 for every such cycle.", 313 + "Counter": "0,1,2,3,4,5,6,7", 350 314 "CounterMask": "5", 351 315 "EventCode": "0xa6", 352 316 "EventName": "EXE_ACTIVITY.BOUND_ON_LOADS", ··· 357 319 }, 358 320 { 359 321 "BriefDescription": "Cycles where the Store Buffer was full and no loads caused an execution stall.", 322 + "Counter": "0,1,2,3,4,5,6,7", 360 323 "CounterMask": "2", 361 324 "EventCode": "0xa6", 362 325 "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", ··· 367 328 }, 368 329 { 369 330 "BriefDescription": "Cycles no uop executed while RS was not empty, the SB was not full and there was no outstanding load.", 331 + "Counter": "0,1,2,3,4,5,6,7", 370 332 "EventCode": "0xa6", 371 333 "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS", 372 334 "PublicDescription": "Number of cycles total of 0 uops executed on all ports, Reservation Station (RS) was not empty, the Store Buffer (SB) was not full and there was no outstanding load.", ··· 376 336 }, 377 337 { 378 338 "BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]", 339 + "Counter": "0,1,2,3", 379 340 "EventCode": "0x87", 380 341 "EventName": "ILD_STALL.LCP", 381 342 "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]", ··· 385 344 }, 386 345 { 387 346 "BriefDescription": "Instruction decoders utilized in a cycle", 347 + "Counter": "0,1,2,3", 388 348 "EventCode": "0x55", 389 349 "EventName": "INST_DECODED.DECODERS", 390 350 "PublicDescription": "Number of decoders utilized in a cycle when the MITE (legacy decode pipeline) fetches instructions.", ··· 394 352 }, 395 353 { 396 354 "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event", 355 + "Counter": "Fixed counter 0", 397 356 "EventName": "INST_RETIRED.ANY", 398 357 "PEBS": "1", 399 358 "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.", ··· 403 360 }, 404 361 { 405 362 "BriefDescription": "Number of instructions retired. General Counter - architectural event", 363 + "Counter": "0,1,2,3,4,5,6,7", 406 364 "EventCode": "0xc0", 407 365 "EventName": "INST_RETIRED.ANY_P", 408 366 "PEBS": "1", ··· 412 368 }, 413 369 { 414 370 "BriefDescription": "Retired NOP instructions.", 371 + "Counter": "0,1,2,3,4,5,6,7", 415 372 "EventCode": "0xc0", 416 373 "EventName": "INST_RETIRED.NOP", 417 374 "PEBS": "1", ··· 422 377 }, 423 378 { 424 379 "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution", 380 + "Counter": "Fixed counter 0", 425 381 "EventName": "INST_RETIRED.PREC_DIST", 426 382 "PEBS": "1", 427 383 "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.", ··· 431 385 }, 432 386 { 433 387 "BriefDescription": "Cycles the Backend cluster is recovering after a miss-speculation or a Store Buffer or Load Buffer drain stall.", 388 + "Counter": "0,1,2,3,4,5,6,7", 434 389 "CounterMask": "1", 435 390 "EventCode": "0x0d", 436 391 "EventName": "INT_MISC.ALL_RECOVERY_CYCLES", ··· 441 394 }, 442 395 { 443 396 "BriefDescription": "Clears speculative count", 397 + "Counter": "0,1,2,3,4,5,6,7", 444 398 "CounterMask": "1", 445 399 "EdgeDetect": "1", 446 400 "EventCode": "0x0d", ··· 452 404 }, 453 405 { 454 406 "BriefDescription": "Counts cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path.", 407 + "Counter": "0,1,2,3,4,5,6,7", 455 408 "EventCode": "0x0d", 456 409 "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", 457 410 "PublicDescription": "Cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path.", ··· 461 412 }, 462 413 { 463 414 "BriefDescription": "Core cycles the allocator was stalled due to recovery from earlier clear event for this thread", 415 + "Counter": "0,1,2,3,4,5,6,7", 464 416 "EventCode": "0x0d", 465 417 "EventName": "INT_MISC.RECOVERY_CYCLES", 466 418 "PublicDescription": "Counts core cycles when the Resource allocator was stalled due to recovery from an earlier branch misprediction or machine clear event.", ··· 470 420 }, 471 421 { 472 422 "BriefDescription": "TMA slots where uops got dropped", 423 + "Counter": "0,1,2,3,4,5,6,7", 473 424 "EventCode": "0x0d", 474 425 "EventName": "INT_MISC.UOP_DROPPING", 475 426 "PublicDescription": "Estimated number of Top-down Microarchitecture Analysis slots that got dropped due to non front-end reasons", ··· 479 428 }, 480 429 { 481 430 "BriefDescription": "The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.", 431 + "Counter": "0,1,2,3", 482 432 "EventCode": "0x03", 483 433 "EventName": "LD_BLOCKS.NO_SR", 484 434 "PublicDescription": "Counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.", ··· 488 436 }, 489 437 { 490 438 "BriefDescription": "Loads blocked due to overlapping with a preceding store that cannot be forwarded.", 439 + "Counter": "0,1,2,3", 491 440 "EventCode": "0x03", 492 441 "EventName": "LD_BLOCKS.STORE_FORWARD", 493 442 "PublicDescription": "Counts the number of times where store forwarding was prevented for a load operation. The most common case is a load blocked due to the address of memory access (partially) overlapping with a preceding uncompleted store. Note: See the table of not supported store forwards in the Optimization Guide.", ··· 497 444 }, 498 445 { 499 446 "BriefDescription": "False dependencies in MOB due to partial compare on address.", 447 + "Counter": "0,1,2,3", 500 448 "EventCode": "0x07", 501 449 "EventName": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS", 502 450 "PublicDescription": "Counts the number of times a load got blocked due to false dependencies in MOB due to partial compare on address.", ··· 506 452 }, 507 453 { 508 454 "BriefDescription": "Counts the number of demand load dispatches that hit L1D fill buffer (FB) allocated for software prefetch.", 455 + "Counter": "0,1,2,3", 509 456 "EventCode": "0x4c", 510 457 "EventName": "LOAD_HIT_PREFETCH.SWPF", 511 458 "PublicDescription": "Counts all not software-prefetch load dispatches that hit the fill buffer (FB) allocated for the software prefetch. It can also be incremented by some lock instructions. So it should only be used with profiling so that the locks can be excluded by ASM (Assembly File) inspection of the nearby instructions.", ··· 515 460 }, 516 461 { 517 462 "BriefDescription": "Cycles Uops delivered by the LSD, but didn't come from the decoder.", 463 + "Counter": "0,1,2,3", 518 464 "CounterMask": "1", 519 465 "EventCode": "0xa8", 520 466 "EventName": "LSD.CYCLES_ACTIVE", ··· 525 469 }, 526 470 { 527 471 "BriefDescription": "Cycles optimal number of Uops delivered by the LSD, but did not come from the decoder.", 472 + "Counter": "0,1,2,3", 528 473 "CounterMask": "5", 529 474 "EventCode": "0xa8", 530 475 "EventName": "LSD.CYCLES_OK", ··· 535 478 }, 536 479 { 537 480 "BriefDescription": "Number of Uops delivered by the LSD.", 481 + "Counter": "0,1,2,3", 538 482 "EventCode": "0xa8", 539 483 "EventName": "LSD.UOPS", 540 484 "PublicDescription": "Counts the number of uops delivered to the back-end by the LSD(Loop Stream Detector).", ··· 544 486 }, 545 487 { 546 488 "BriefDescription": "Number of machine clears (nukes) of any type.", 489 + "Counter": "0,1,2,3,4,5,6,7", 547 490 "CounterMask": "1", 548 491 "EdgeDetect": "1", 549 492 "EventCode": "0xc3", ··· 555 496 }, 556 497 { 557 498 "BriefDescription": "Self-modifying code (SMC) detected.", 499 + "Counter": "0,1,2,3,4,5,6,7", 558 500 "EventCode": "0xc3", 559 501 "EventName": "MACHINE_CLEARS.SMC", 560 502 "PublicDescription": "Counts self-modifying code (SMC) detected, which causes a machine clear.", ··· 564 504 }, 565 505 { 566 506 "BriefDescription": "Increments whenever there is an update to the LBR array.", 507 + "Counter": "0,1,2,3,4,5,6,7", 567 508 "EventCode": "0xcc", 568 509 "EventName": "MISC_RETIRED.LBR_INSERTS", 569 510 "PublicDescription": "Increments when an entry is added to the Last Branch Record (LBR) array (or removed from the array in case of RETURNs in call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and branch type selection via MSR_LBR_SELECT.", ··· 573 512 }, 574 513 { 575 514 "BriefDescription": "Number of retired PAUSE instructions. This event is not supported on first SKL and KBL products.", 515 + "Counter": "0,1,2,3,4,5,6,7", 576 516 "EventCode": "0xcc", 577 517 "EventName": "MISC_RETIRED.PAUSE_INST", 578 518 "PublicDescription": "Counts number of retired PAUSE instructions. This event is not supported on first SKL and KBL products.", ··· 582 520 }, 583 521 { 584 522 "BriefDescription": "Cycles stalled due to no store buffers available. (not including draining form sync).", 523 + "Counter": "0,1,2,3,4,5,6,7", 585 524 "EventCode": "0xa2", 586 525 "EventName": "RESOURCE_STALLS.SB", 587 526 "PublicDescription": "Counts allocation stall cycles caused by the store buffer (SB) being full. This counts cycles that the pipeline back-end blocked uop delivery from the front-end.", ··· 591 528 }, 592 529 { 593 530 "BriefDescription": "Counts cycles where the pipeline is stalled due to serializing operations.", 531 + "Counter": "0,1,2,3,4,5,6,7", 594 532 "EventCode": "0xa2", 595 533 "EventName": "RESOURCE_STALLS.SCOREBOARD", 596 534 "SampleAfterValue": "100003", ··· 599 535 }, 600 536 { 601 537 "BriefDescription": "Cycles when Reservation Station (RS) is empty for the thread", 538 + "Counter": "0,1,2,3,4,5,6,7", 602 539 "EventCode": "0x5e", 603 540 "EventName": "RS_EVENTS.EMPTY_CYCLES", 604 541 "PublicDescription": "Counts cycles during which the reservation station (RS) is empty for this logical processor. This is usually caused when the front-end pipeline runs into starvation periods (e.g. branch mispredictions or i-cache misses)", ··· 608 543 }, 609 544 { 610 545 "BriefDescription": "Counts end of periods where the Reservation Station (RS) was empty.", 546 + "Counter": "0,1,2,3,4,5,6,7", 611 547 "CounterMask": "1", 612 548 "EdgeDetect": "1", 613 549 "EventCode": "0x5e", ··· 620 554 }, 621 555 { 622 556 "BriefDescription": "TMA slots where no uops were being issued due to lack of back-end resources.", 557 + "Counter": "0,1,2,3,4,5,6,7", 623 558 "EventCode": "0xa4", 624 559 "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", 625 560 "PublicDescription": "Counts the number of Top-down Microarchitecture Analysis (TMA) method's slots where no micro-operations were being issued from front-end to back-end of the machine due to lack of back-end resources.", ··· 629 562 }, 630 563 { 631 564 "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event", 565 + "Counter": "Fixed counter 3", 632 566 "EventName": "TOPDOWN.SLOTS", 633 567 "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).", 634 568 "SampleAfterValue": "10000003", ··· 637 569 }, 638 570 { 639 571 "BriefDescription": "TMA slots available for an unhalted logical processor. General counter - architectural event", 572 + "Counter": "0,1,2,3,4,5,6,7", 640 573 "EventCode": "0xa4", 641 574 "EventName": "TOPDOWN.SLOTS_P", 642 575 "PublicDescription": "Counts the number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method. The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core.", ··· 646 577 }, 647 578 { 648 579 "BriefDescription": "Number of uops decoded out of instructions exclusively fetched by decoder 0", 580 + "Counter": "0,1,2,3", 649 581 "EventCode": "0x56", 650 582 "EventName": "UOPS_DECODED.DEC0", 651 583 "PublicDescription": "Uops exclusively fetched by decoder 0", ··· 655 585 }, 656 586 { 657 587 "BriefDescription": "Number of uops executed on port 0", 588 + "Counter": "0,1,2,3,4,5,6,7", 658 589 "EventCode": "0xa1", 659 590 "EventName": "UOPS_DISPATCHED.PORT_0", 660 591 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 0.", ··· 664 593 }, 665 594 { 666 595 "BriefDescription": "Number of uops executed on port 1", 596 + "Counter": "0,1,2,3,4,5,6,7", 667 597 "EventCode": "0xa1", 668 598 "EventName": "UOPS_DISPATCHED.PORT_1", 669 599 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 1.", ··· 673 601 }, 674 602 { 675 603 "BriefDescription": "Number of uops executed on port 2 and 3", 604 + "Counter": "0,1,2,3,4,5,6,7", 676 605 "EventCode": "0xa1", 677 606 "EventName": "UOPS_DISPATCHED.PORT_2_3", 678 607 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 2 and 3.", ··· 682 609 }, 683 610 { 684 611 "BriefDescription": "Number of uops executed on port 4 and 9", 612 + "Counter": "0,1,2,3,4,5,6,7", 685 613 "EventCode": "0xa1", 686 614 "EventName": "UOPS_DISPATCHED.PORT_4_9", 687 615 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 5 and 9.", ··· 691 617 }, 692 618 { 693 619 "BriefDescription": "Number of uops executed on port 5", 620 + "Counter": "0,1,2,3,4,5,6,7", 694 621 "EventCode": "0xa1", 695 622 "EventName": "UOPS_DISPATCHED.PORT_5", 696 623 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 5.", ··· 700 625 }, 701 626 { 702 627 "BriefDescription": "Number of uops executed on port 6", 628 + "Counter": "0,1,2,3,4,5,6,7", 703 629 "EventCode": "0xa1", 704 630 "EventName": "UOPS_DISPATCHED.PORT_6", 705 631 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 6.", ··· 709 633 }, 710 634 { 711 635 "BriefDescription": "Number of uops executed on port 7 and 8", 636 + "Counter": "0,1,2,3,4,5,6,7", 712 637 "EventCode": "0xa1", 713 638 "EventName": "UOPS_DISPATCHED.PORT_7_8", 714 639 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 7 and 8.", ··· 718 641 }, 719 642 { 720 643 "BriefDescription": "Number of uops executed on the core.", 644 + "Counter": "0,1,2,3,4,5,6,7", 721 645 "EventCode": "0xb1", 722 646 "EventName": "UOPS_EXECUTED.CORE", 723 647 "PublicDescription": "Counts the number of uops executed from any thread.", ··· 727 649 }, 728 650 { 729 651 "BriefDescription": "Cycles at least 1 micro-op is executed from any thread on physical core.", 652 + "Counter": "0,1,2,3,4,5,6,7", 730 653 "CounterMask": "1", 731 654 "EventCode": "0xb1", 732 655 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", ··· 737 658 }, 738 659 { 739 660 "BriefDescription": "Cycles at least 2 micro-op is executed from any thread on physical core.", 661 + "Counter": "0,1,2,3,4,5,6,7", 740 662 "CounterMask": "2", 741 663 "EventCode": "0xb1", 742 664 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", ··· 747 667 }, 748 668 { 749 669 "BriefDescription": "Cycles at least 3 micro-op is executed from any thread on physical core.", 670 + "Counter": "0,1,2,3,4,5,6,7", 750 671 "CounterMask": "3", 751 672 "EventCode": "0xb1", 752 673 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", ··· 757 676 }, 758 677 { 759 678 "BriefDescription": "Cycles at least 4 micro-op is executed from any thread on physical core.", 679 + "Counter": "0,1,2,3,4,5,6,7", 760 680 "CounterMask": "4", 761 681 "EventCode": "0xb1", 762 682 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", ··· 767 685 }, 768 686 { 769 687 "BriefDescription": "Cycles where at least 1 uop was executed per-thread", 688 + "Counter": "0,1,2,3,4,5,6,7", 770 689 "CounterMask": "1", 771 690 "EventCode": "0xb1", 772 691 "EventName": "UOPS_EXECUTED.CYCLES_GE_1", ··· 777 694 }, 778 695 { 779 696 "BriefDescription": "Cycles where at least 2 uops were executed per-thread", 697 + "Counter": "0,1,2,3,4,5,6,7", 780 698 "CounterMask": "2", 781 699 "EventCode": "0xb1", 782 700 "EventName": "UOPS_EXECUTED.CYCLES_GE_2", ··· 787 703 }, 788 704 { 789 705 "BriefDescription": "Cycles where at least 3 uops were executed per-thread", 706 + "Counter": "0,1,2,3,4,5,6,7", 790 707 "CounterMask": "3", 791 708 "EventCode": "0xb1", 792 709 "EventName": "UOPS_EXECUTED.CYCLES_GE_3", ··· 797 712 }, 798 713 { 799 714 "BriefDescription": "Cycles where at least 4 uops were executed per-thread", 715 + "Counter": "0,1,2,3,4,5,6,7", 800 716 "CounterMask": "4", 801 717 "EventCode": "0xb1", 802 718 "EventName": "UOPS_EXECUTED.CYCLES_GE_4", ··· 807 721 }, 808 722 { 809 723 "BriefDescription": "Counts number of cycles no uops were dispatched to be executed on this thread.", 724 + "Counter": "0,1,2,3,4,5,6,7", 810 725 "CounterMask": "1", 811 726 "EventCode": "0xb1", 812 727 "EventName": "UOPS_EXECUTED.STALL_CYCLES", ··· 818 731 }, 819 732 { 820 733 "BriefDescription": "Counts the number of uops to be executed per-thread each cycle.", 734 + "Counter": "0,1,2,3,4,5,6,7", 821 735 "EventCode": "0xb1", 822 736 "EventName": "UOPS_EXECUTED.THREAD", 823 737 "SampleAfterValue": "2000003", ··· 826 738 }, 827 739 { 828 740 "BriefDescription": "Counts the number of x87 uops dispatched.", 741 + "Counter": "0,1,2,3,4,5,6,7", 829 742 "EventCode": "0xb1", 830 743 "EventName": "UOPS_EXECUTED.X87", 831 744 "PublicDescription": "Counts the number of x87 uops executed.", ··· 835 746 }, 836 747 { 837 748 "BriefDescription": "Uops that RAT issues to RS", 749 + "Counter": "0,1,2,3,4,5,6,7", 838 750 "EventCode": "0x0e", 839 751 "EventName": "UOPS_ISSUED.ANY", 840 752 "PublicDescription": "Counts the number of uops that the Resource Allocation Table (RAT) issues to the Reservation Station (RS).", ··· 844 754 }, 845 755 { 846 756 "BriefDescription": "Cycles when RAT does not issue Uops to RS for the thread", 757 + "Counter": "0,1,2,3,4,5,6,7", 847 758 "CounterMask": "1", 848 759 "EventCode": "0x0e", 849 760 "EventName": "UOPS_ISSUED.STALL_CYCLES", ··· 855 764 }, 856 765 { 857 766 "BriefDescription": "Uops inserted at issue-stage in order to preserve upper bits of vector registers.", 767 + "Counter": "0,1,2,3,4,5,6,7", 858 768 "EventCode": "0x0e", 859 769 "EventName": "UOPS_ISSUED.VECTOR_WIDTH_MISMATCH", 860 770 "PublicDescription": "Counts the number of Blend Uops issued by the Resource Allocation Table (RAT) to the reservation station (RS) in order to preserve upper bits of vector registers. Starting with the Skylake microarchitecture, these Blend uops are needed since every Intel SSE instruction executed in Dirty Upper State needs to preserve bits 128-255 of the destination register. For more information, refer to Mixing Intel AVX and Intel SSE Code section of the Optimization Guide.", ··· 864 772 }, 865 773 { 866 774 "BriefDescription": "Retirement slots used.", 775 + "Counter": "0,1,2,3,4,5,6,7", 867 776 "EventCode": "0xc2", 868 777 "EventName": "UOPS_RETIRED.SLOTS", 869 778 "PublicDescription": "Counts the retirement slots used each cycle.", ··· 873 780 }, 874 781 { 875 782 "BriefDescription": "Cycles without actually retired uops.", 783 + "Counter": "0,1,2,3,4,5,6,7", 876 784 "CounterMask": "1", 877 785 "EventCode": "0xc2", 878 786 "EventName": "UOPS_RETIRED.STALL_CYCLES", ··· 884 790 }, 885 791 { 886 792 "BriefDescription": "Cycles with less than 10 actually retired uops.", 793 + "Counter": "0,1,2,3,4,5,6,7", 887 794 "CounterMask": "10", 888 795 "EventCode": "0xc2", 889 796 "EventName": "UOPS_RETIRED.TOTAL_CYCLES",
+120 -78
tools/perf/pmu-events/arch/x86/tigerlake/tgl-metrics.json
··· 104 104 { 105 105 "BriefDescription": "This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists", 106 106 "MetricExpr": "34 * ASSISTS.ANY / tma_info_thread_slots", 107 - "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_group", 107 + "MetricGroup": "BvIO;TopdownL4;tma_L4_group;tma_microcode_sequencer_group", 108 108 "MetricName": "tma_assists", 109 109 "MetricThreshold": "tma_assists > 0.1 & (tma_microcode_sequencer > 0.05 & tma_heavy_operations > 0.1)", 110 110 "PublicDescription": "This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists. Assists are long sequences of uops that are required in certain corner-cases for operations that cannot be handled natively by the execution pipeline. For example; when working with very small floating point values (so-called Denormals); the FP units are not set up to perform these operations natively. Instead; a sequence of instructions to perform the computation on the Denormals is injected into the pipeline. Since these microcode sequences might be dozens of uops long; Assists can be extremely deleterious to performance and they can be avoided in many cases. Sample with: ASSISTS.ANY", ··· 114 114 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 115 115 "DefaultMetricgroupName": "TopdownL1", 116 116 "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * INT_MISC.CLEARS_COUNT / tma_info_thread_slots", 117 - "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 117 + "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", 118 118 "MetricName": "tma_backend_bound", 119 119 "MetricThreshold": "tma_backend_bound > 0.2", 120 120 "MetricgroupNoGroup": "TopdownL1;Default", ··· 135 135 { 136 136 "BriefDescription": "This metric represents fraction of slots where the CPU was retiring branch instructions.", 137 137 "MetricExpr": "tma_light_operations * BR_INST_RETIRED.ALL_BRANCHES / (tma_retiring * tma_info_thread_slots)", 138 - "MetricGroup": "Branches;Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group", 138 + "MetricGroup": "Branches;BvBO;Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group", 139 139 "MetricName": "tma_branch_instructions", 140 140 "MetricThreshold": "tma_branch_instructions > 0.1 & tma_light_operations > 0.6", 141 141 "ScaleUnit": "100%" ··· 143 143 { 144 144 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction", 145 145 "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT) * tma_bad_speculation", 146 - "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM", 146 + "MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM", 147 147 "MetricName": "tma_branch_mispredicts", 148 148 "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15", 149 149 "MetricgroupNoGroup": "TopdownL2", ··· 181 181 "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses", 182 182 "MetricConstraint": "NO_GROUP_EVENTS", 183 183 "MetricExpr": "(49 * tma_info_system_core_frequency * (MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) + 48 * tma_info_system_core_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", 184 - "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 184 + "MetricGroup": "BvMS;DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 185 185 "MetricName": "tma_contested_accesses", 186 186 "MetricThreshold": "tma_contested_accesses > 0.05 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 187 187 "PublicDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses. Contested accesses occur when data written by one Logical Processor are read by another Logical Processor on a different Physical Core. Examples of contested accesses include synchronizations such as locks; true data sharing such as modified locked variables; and false sharing. Sample with: MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD;MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS. Related metrics: tma_data_sharing, tma_false_sharing, tma_machine_clears, tma_remote_cache", ··· 201 201 "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses", 202 202 "MetricConstraint": "NO_GROUP_EVENTS", 203 203 "MetricExpr": "48 * tma_info_system_core_frequency * (MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", 204 - "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 204 + "MetricGroup": "BvMS;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 205 205 "MetricName": "tma_data_sharing", 206 206 "MetricThreshold": "tma_data_sharing > 0.05 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 207 207 "PublicDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Data shared by multiple Logical Processors (even just read shared) may cause increased access latency due to cache coherency. Excessive data sharing can drastically harm multithreaded performance. Sample with: MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD. Related metrics: tma_contested_accesses, tma_false_sharing, tma_machine_clears, tma_remote_cache", ··· 219 219 { 220 220 "BriefDescription": "This metric represents fraction of cycles where the Divider unit was active", 221 221 "MetricExpr": "ARITH.DIVIDER_ACTIVE / tma_info_thread_clks", 222 - "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group", 222 + "MetricGroup": "BvCB;TopdownL3;tma_L3_group;tma_core_bound_group", 223 223 "MetricName": "tma_divider", 224 224 "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)", 225 225 "PublicDescription": "This metric represents fraction of cycles where the Divider unit was active. Divide and square root instructions are performed by the Divider unit and can take considerably longer latency than integer or Floating Point addition; subtraction; or multiplication. Sample with: ARITH.DIVIDER_ACTIVE", ··· 250 250 "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB", 251 251 "MetricName": "tma_dsb_switches", 252 252 "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 253 - "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty. Sample with: FRONTEND_RETIRED.DSB_MISS_PS. Related metrics: tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 253 + "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty. Sample with: FRONTEND_RETIRED.DSB_MISS_PS. Related metrics: tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 254 254 "ScaleUnit": "100%" 255 255 }, 256 256 { 257 257 "BriefDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses", 258 258 "MetricExpr": "min(7 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", 259 - "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_l1_bound_group", 259 + "MetricGroup": "BvMT;MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_l1_bound_group", 260 260 "MetricName": "tma_dtlb_load", 261 261 "MetricThreshold": "tma_dtlb_load > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 262 262 "PublicDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses. TLBs (Translation Look-aside Buffers) are processor caches for recently used entries out of the Page Tables that are used to map virtual- to physical-addresses by the operating system. This metric approximates the potential delay of demand loads missing the first-level data TLB (assuming worst case scenario with back to back misses to different pages). This includes hitting in the second-level TLB (STLB) as well as performing a hardware page walk on an STLB miss. Sample with: MEM_INST_RETIRED.STLB_MISS_LOADS_PS. Related metrics: tma_dtlb_store, tma_info_bottleneck_memory_data_tlbs, tma_info_bottleneck_memory_synchronization", ··· 265 265 { 266 266 "BriefDescription": "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses", 267 267 "MetricExpr": "(7 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_STORE_MISSES.WALK_ACTIVE) / tma_info_core_core_clks", 268 - "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_store_bound_group", 268 + "MetricGroup": "BvMT;MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_store_bound_group", 269 269 "MetricName": "tma_dtlb_store", 270 270 "MetricThreshold": "tma_dtlb_store > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 271 271 "PublicDescription": "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses. As with ordinary data caching; focus on improving data locality and reducing working-set size to reduce DTLB overhead. Additionally; consider using profile-guided optimization (PGO) to collocate frequently-used data on the same page. Try using larger page sizes for large amounts of frequently-used data. Sample with: MEM_INST_RETIRED.STLB_MISS_STORES_PS. Related metrics: tma_dtlb_load, tma_info_bottleneck_memory_data_tlbs, tma_info_bottleneck_memory_synchronization", ··· 274 274 { 275 275 "BriefDescription": "This metric roughly estimates how often CPU was handling synchronizations due to False Sharing", 276 276 "MetricExpr": "54 * tma_info_system_core_frequency * OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM / tma_info_thread_clks", 277 - "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_store_bound_group", 277 + "MetricGroup": "BvMS;DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_store_bound_group", 278 278 "MetricName": "tma_false_sharing", 279 279 "MetricThreshold": "tma_false_sharing > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 280 280 "PublicDescription": "This metric roughly estimates how often CPU was handling synchronizations due to False Sharing. False Sharing is a multithreading hiccup; where multiple Logical Processors contend on different data-elements mapped into the same cache line. Sample with: OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM. Related metrics: tma_contested_accesses, tma_data_sharing, tma_machine_clears, tma_remote_cache", ··· 283 283 { 284 284 "BriefDescription": "This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed", 285 285 "MetricExpr": "L1D_PEND_MISS.FB_FULL / tma_info_thread_clks", 286 - "MetricGroup": "MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_issueSL;tma_issueSmSt;tma_l1_bound_group", 286 + "MetricGroup": "BvMS;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_issueSL;tma_issueSmSt;tma_l1_bound_group", 287 287 "MetricName": "tma_fb_full", 288 288 "MetricThreshold": "tma_fb_full > 0.3", 289 289 "PublicDescription": "This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed. The higher the metric value; the deeper the memory hierarchy level the misses are satisfied from (metric values >1 are valid). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or external memory). Related metrics: tma_info_bottleneck_cache_memory_bandwidth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_latency, tma_streaming_stores", ··· 296 296 "MetricName": "tma_fetch_bandwidth", 297 297 "MetricThreshold": "tma_fetch_bandwidth > 0.2", 298 298 "MetricgroupNoGroup": "TopdownL2", 299 - "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 299 + "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 300 300 "ScaleUnit": "100%" 301 301 }, 302 302 { ··· 338 338 }, 339 339 { 340 340 "BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired", 341 - "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ / (tma_retiring * tma_info_thread_slots)", 341 + "MetricExpr": "FP_ARITH_INST_RETIRED.SCALAR / (tma_retiring * tma_info_thread_slots)", 342 342 "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_group;tma_issue2P", 343 343 "MetricName": "tma_fp_scalar", 344 344 "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)", ··· 347 347 }, 348 348 { 349 349 "BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths", 350 - "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0xfc@ / (tma_retiring * tma_info_thread_slots)", 350 + "MetricExpr": "FP_ARITH_INST_RETIRED.VECTOR / (tma_retiring * tma_info_thread_slots)", 351 351 "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_group;tma_issue2P", 352 352 "MetricName": "tma_fp_vector", 353 353 "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)", ··· 385 385 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 386 386 "DefaultMetricgroupName": "TopdownL1", 387 387 "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 388 - "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 388 + "MetricGroup": "BvFB;BvIO;Default;PGO;TmaL1;TopdownL1;tma_L1_group", 389 389 "MetricName": "tma_frontend_bound", 390 390 "MetricThreshold": "tma_frontend_bound > 0.15", 391 391 "MetricgroupNoGroup": "TopdownL1;Default", ··· 405 405 { 406 406 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to instruction cache misses", 407 407 "MetricExpr": "ICACHE_DATA.STALLS / tma_info_thread_clks", 408 - "MetricGroup": "BigFootprint;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma_fetch_latency_group", 408 + "MetricGroup": "BigFootprint;BvBC;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma_fetch_latency_group", 409 409 "MetricName": "tma_icache_misses", 410 410 "MetricThreshold": "tma_icache_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 411 411 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to instruction cache misses. Sample with: FRONTEND_RETIRED.L2_MISS_PS;FRONTEND_RETIRED.L1I_MISS_PS", ··· 469 469 "MetricThreshold": "tma_info_botlnk_l0_core_bound_likely > 0.5" 470 470 }, 471 471 { 472 + "BriefDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck", 473 + "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_lsd + tma_mite)))", 474 + "MetricGroup": "DSB;FetchBW;tma_issueFB", 475 + "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", 476 + "MetricThreshold": "tma_info_botlnk_l2_dsb_bandwidth > 10", 477 + "PublicDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp" 478 + }, 479 + { 472 480 "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck", 473 481 "MetricConstraint": "NO_GROUP_EVENTS", 474 482 "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))", 475 483 "MetricGroup": "DSBmiss;Fed;tma_issueFB", 476 484 "MetricName": "tma_info_botlnk_l2_dsb_misses", 477 485 "MetricThreshold": "tma_info_botlnk_l2_dsb_misses > 10", 478 - "PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp" 486 + "PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp" 479 487 }, 480 488 { 481 489 "BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck", ··· 495 487 "PublicDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck. Related metrics: " 496 488 }, 497 489 { 498 - "BriefDescription": "Total pipeline cost of \"useful operations\" - the baseline operations not covered by Branching_Overhead nor Irregular_Overhead.", 499 - "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + BR_INST_RETIRED.NEAR_CALL) / tma_info_thread_slots - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", 500 - "MetricGroup": "Ret", 501 - "MetricName": "tma_info_bottleneck_base_non_br", 502 - "MetricThreshold": "tma_info_bottleneck_base_non_br > 20" 503 - }, 504 - { 505 490 "BriefDescription": "Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)", 506 491 "MetricConstraint": "NO_GROUP_EVENTS", 507 492 "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_icache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", 508 - "MetricGroup": "BigFootprint;Fed;Frontend;IcMiss;MemoryTLB", 493 + "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", 509 494 "MetricName": "tma_info_bottleneck_big_code", 510 495 "MetricThreshold": "tma_info_bottleneck_big_code > 20" 511 496 }, 512 497 { 513 - "BriefDescription": "Total pipeline cost of branch related instructions (used for program control-flow including function calls)", 514 - "MetricExpr": "100 * ((BR_INST_RETIRED.ALL_BRANCHES + BR_INST_RETIRED.NEAR_CALL) / tma_info_thread_slots)", 515 - "MetricGroup": "Ret", 498 + "BriefDescription": "Total pipeline cost of instructions used for program control-flow - a subset of the Retiring category in TMA", 499 + "MetricExpr": "100 * ((BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slots)", 500 + "MetricGroup": "BvBO;Ret", 516 501 "MetricName": "tma_info_bottleneck_branching_overhead", 517 - "MetricThreshold": "tma_info_bottleneck_branching_overhead > 5" 502 + "MetricThreshold": "tma_info_bottleneck_branching_overhead > 5", 503 + "PublicDescription": "Total pipeline cost of instructions used for program control-flow - a subset of the Retiring category in TMA. Examples include function calls; loops and alignments. (A lower bound)" 518 504 }, 519 505 { 520 506 "BriefDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks", 521 - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", 522 - "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", 507 + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_hit_latency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", 508 + "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", 523 509 "MetricName": "tma_info_bottleneck_cache_memory_bandwidth", 524 510 "MetricThreshold": "tma_info_bottleneck_cache_memory_bandwidth > 20", 525 511 "PublicDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" 526 512 }, 527 513 { 528 514 "BriefDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks", 529 - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", 530 - "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat", 515 + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_hit_latency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_hit_latency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", 516 + "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", 531 517 "MetricName": "tma_info_bottleneck_cache_memory_latency", 532 518 "MetricThreshold": "tma_info_bottleneck_cache_memory_latency > 20", 533 519 "PublicDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_mem_latency" ··· 529 527 { 530 528 "BriefDescription": "Total pipeline cost when the execution is compute-bound - an estimation", 531 529 "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializing_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", 532 - "MetricGroup": "Cor;tma_issueComp", 530 + "MetricGroup": "BvCB;Cor;tma_issueComp", 533 531 "MetricName": "tma_info_bottleneck_compute_bound_est", 534 532 "MetricThreshold": "tma_info_bottleneck_compute_bound_est > 20", 535 533 "PublicDescription": "Total pipeline cost when the execution is compute-bound - an estimation. Covers Core Bound when High ILP as well as when long-latency execution units are busy. Related metrics: " 536 534 }, 537 535 { 538 - "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks", 536 + "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the back-end)", 539 537 "MetricConstraint": "NO_GROUP_EVENTS", 540 538 "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottleneck_big_code", 541 - "MetricGroup": "Fed;FetchBW;Frontend", 539 + "MetricGroup": "BvFB;Fed;FetchBW;Frontend", 542 540 "MetricName": "tma_info_bottleneck_instruction_fetch_bw", 543 541 "MetricThreshold": "tma_info_bottleneck_instruction_fetch_bw > 20" 544 542 }, 545 543 { 546 544 "BriefDescription": "Total pipeline cost of irregular execution (e.g", 547 545 "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", 548 - "MetricGroup": "Bad;Cor;Ret;tma_issueMS", 546 + "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", 549 547 "MetricName": "tma_info_bottleneck_irregular_overhead", 550 548 "MetricThreshold": "tma_info_bottleneck_irregular_overhead > 10", 551 549 "PublicDescription": "Total pipeline cost of irregular execution (e.g. FP-assists in HPC, Wait time with work imbalance multithreaded workloads, overhead in system services or virtualized environments). Related metrics: tma_microcode_sequencer, tma_ms_switches" ··· 553 551 { 554 552 "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)", 555 553 "MetricConstraint": "NO_GROUP_EVENTS", 556 - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", 557 - "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB", 554 + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_hit_latency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", 555 + "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", 558 556 "MetricName": "tma_info_bottleneck_memory_data_tlbs", 559 557 "MetricThreshold": "tma_info_bottleneck_memory_data_tlbs > 20", 560 558 "PublicDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs). Related metrics: tma_dtlb_load, tma_dtlb_store, tma_info_bottleneck_memory_synchronization" ··· 562 560 { 563 561 "BriefDescription": "Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors)", 564 562 "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_machine_clears * (1 - tma_other_nukes / tma_other_nukes))", 565 - "MetricGroup": "Mem;Offcore;tma_issueTLB", 563 + "MetricGroup": "BvMS;Mem;Offcore;tma_issueTLB", 566 564 "MetricName": "tma_info_bottleneck_memory_synchronization", 567 565 "MetricThreshold": "tma_info_bottleneck_memory_synchronization > 10", 568 566 "PublicDescription": "Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors). Related metrics: tma_dtlb_load, tma_dtlb_store, tma_info_bottleneck_memory_data_tlbs" ··· 571 569 "BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks", 572 570 "MetricConstraint": "NO_GROUP_EVENTS", 573 571 "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", 574 - "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM", 572 + "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", 575 573 "MetricName": "tma_info_bottleneck_mispredictions", 576 574 "MetricThreshold": "tma_info_bottleneck_mispredictions > 20", 577 575 "PublicDescription": "Total pipeline cost of Branch Misprediction related bottlenecks. Related metrics: tma_branch_mispredicts, tma_info_bad_spec_branch_misprediction_cost, tma_mispredicts_resteers" 578 576 }, 579 577 { 580 - "BriefDescription": "Total pipeline cost of remaining bottlenecks (apart from those listed in the Info.Bottlenecks metrics class)", 581 - "MetricExpr": "100 - (tma_info_bottleneck_big_code + tma_info_bottleneck_instruction_fetch_bw + tma_info_bottleneck_mispredictions + tma_info_bottleneck_cache_memory_bandwidth + tma_info_bottleneck_cache_memory_latency + tma_info_bottleneck_memory_data_tlbs + tma_info_bottleneck_memory_synchronization + tma_info_bottleneck_compute_bound_est + tma_info_bottleneck_irregular_overhead + tma_info_bottleneck_branching_overhead + tma_info_bottleneck_base_non_br)", 582 - "MetricGroup": "Cor;Offcore", 578 + "BriefDescription": "Total pipeline cost of remaining bottlenecks in the back-end", 579 + "MetricExpr": "100 - (tma_info_bottleneck_big_code + tma_info_bottleneck_instruction_fetch_bw + tma_info_bottleneck_mispredictions + tma_info_bottleneck_cache_memory_bandwidth + tma_info_bottleneck_cache_memory_latency + tma_info_bottleneck_memory_data_tlbs + tma_info_bottleneck_memory_synchronization + tma_info_bottleneck_compute_bound_est + tma_info_bottleneck_irregular_overhead + tma_info_bottleneck_branching_overhead + tma_info_bottleneck_useful_work)", 580 + "MetricGroup": "BvOB;Cor;Offcore", 583 581 "MetricName": "tma_info_bottleneck_other_bottlenecks", 584 582 "MetricThreshold": "tma_info_bottleneck_other_bottlenecks > 20", 585 - "PublicDescription": "Total pipeline cost of remaining bottlenecks (apart from those listed in the Info.Bottlenecks metrics class). Examples include data-dependencies (Core Bound when Low ILP) and other unlisted memory-related stalls." 583 + "PublicDescription": "Total pipeline cost of remaining bottlenecks in the back-end. Examples include data-dependencies (Core Bound when Low ILP) and other unlisted memory-related stalls." 584 + }, 585 + { 586 + "BriefDescription": "Total pipeline cost of \"useful operations\" - the portion of Retiring category not covered by Branching_Overhead nor Irregular_Overhead.", 587 + "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slots - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", 588 + "MetricGroup": "BvUW;Ret", 589 + "MetricName": "tma_info_bottleneck_useful_work", 590 + "MetricThreshold": "tma_info_bottleneck_useful_work > 20" 586 591 }, 587 592 { 588 593 "BriefDescription": "Fraction of branches that are CALL or RET", ··· 647 638 }, 648 639 { 649 640 "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)", 650 - "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0xfc@) / (2 * tma_info_core_core_clks)", 641 + "MetricExpr": "(FP_ARITH_INST_RETIRED.SCALAR + FP_ARITH_INST_RETIRED.VECTOR) / (2 * tma_info_core_core_clks)", 651 642 "MetricGroup": "Cor;Flops;HPC", 652 643 "MetricName": "tma_info_core_fp_arith_utilization", 653 644 "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)." ··· 664 655 "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", 665 656 "MetricName": "tma_info_frontend_dsb_coverage", 666 657 "MetricThreshold": "tma_info_frontend_dsb_coverage < 0.7 & tma_info_thread_ipc / 5 > 0.35", 667 - "PublicDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_inst_mix_iptb, tma_lcp" 658 + "PublicDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_inst_mix_iptb, tma_lcp" 668 659 }, 669 660 { 670 661 "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.", ··· 730 721 }, 731 722 { 732 723 "BriefDescription": "Instructions per FP Arithmetic instruction (lower number means higher occurrence rate)", 733 - "MetricExpr": "INST_RETIRED.ANY / (cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0xfc@)", 724 + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR + FP_ARITH_INST_RETIRED.VECTOR)", 734 725 "MetricGroup": "Flops;InsType", 735 726 "MetricName": "tma_info_inst_mix_iparith", 736 727 "MetricThreshold": "tma_info_inst_mix_iparith < 10", ··· 825 816 "MetricThreshold": "tma_info_inst_mix_ipswpf < 100" 826 817 }, 827 818 { 828 - "BriefDescription": "Instruction per taken branch", 819 + "BriefDescription": "Instructions per taken branch", 829 820 "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", 830 821 "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO;tma_issueFB", 831 822 "MetricName": "tma_info_inst_mix_iptb", 832 823 "MetricThreshold": "tma_info_inst_mix_iptb < 11", 833 - "PublicDescription": "Instruction per taken branch. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_lcp" 824 + "PublicDescription": "Instructions per taken branch. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_lcp" 834 825 }, 835 826 { 836 827 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]", ··· 863 854 "MetricName": "tma_info_memory_fb_hpki" 864 855 }, 865 856 { 866 - "BriefDescription": "", 857 + "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]", 867 858 "MetricExpr": "64 * L1D.REPLACEMENT / 1e9 / duration_time", 868 859 "MetricGroup": "Mem;MemoryBW", 869 860 "MetricName": "tma_info_memory_l1d_cache_fill_bw" ··· 881 872 "MetricName": "tma_info_memory_l1mpki_load" 882 873 }, 883 874 { 884 - "BriefDescription": "", 875 + "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]", 885 876 "MetricExpr": "64 * L2_LINES_IN.ALL / 1e9 / duration_time", 886 877 "MetricGroup": "Mem;MemoryBW", 887 878 "MetricName": "tma_info_memory_l2_cache_fill_bw" ··· 917 908 "MetricName": "tma_info_memory_l2mpki_load" 918 909 }, 919 910 { 920 - "BriefDescription": "", 911 + "BriefDescription": "Offcore requests (L2 cache miss) per kilo instruction for demand RFOs", 912 + "MetricExpr": "1e3 * L2_RQSTS.RFO_MISS / INST_RETIRED.ANY", 913 + "MetricGroup": "CacheMisses;Offcore", 914 + "MetricName": "tma_info_memory_l2mpki_rfo" 915 + }, 916 + { 917 + "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]", 921 918 "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1e9 / duration_time", 922 919 "MetricGroup": "Mem;MemoryBW;Offcore", 923 920 "MetricName": "tma_info_memory_l3_cache_access_bw" 924 921 }, 925 922 { 926 - "BriefDescription": "", 923 + "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]", 927 924 "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1e9 / duration_time", 928 925 "MetricGroup": "Mem;MemoryBW", 929 926 "MetricName": "tma_info_memory_l3_cache_fill_bw" ··· 1015 1000 "MetricName": "tma_info_memory_tlb_store_stlb_mpki" 1016 1001 }, 1017 1002 { 1018 - "BriefDescription": "", 1003 + "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per core", 1019 1004 "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@)", 1020 1005 "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", 1021 1006 "MetricName": "tma_info_pipeline_execute" 1007 + }, 1008 + { 1009 + "BriefDescription": "Average number of uops fetched from DSB per cycle", 1010 + "MetricExpr": "IDQ.DSB_UOPS / IDQ.DSB_CYCLES_ANY", 1011 + "MetricGroup": "Fed;FetchBW", 1012 + "MetricName": "tma_info_pipeline_fetch_dsb" 1013 + }, 1014 + { 1015 + "BriefDescription": "Average number of uops fetched from LSD per cycle", 1016 + "MetricExpr": "LSD.UOPS / LSD.CYCLES_ACTIVE", 1017 + "MetricGroup": "Fed;FetchBW", 1018 + "MetricName": "tma_info_pipeline_fetch_lsd" 1019 + }, 1020 + { 1021 + "BriefDescription": "Average number of uops fetched from MITE per cycle", 1022 + "MetricExpr": "IDQ.MITE_UOPS / IDQ.MITE_CYCLES_ANY", 1023 + "MetricGroup": "Fed;FetchBW", 1024 + "MetricName": "tma_info_pipeline_fetch_mite" 1022 1025 }, 1023 1026 { 1024 1027 "BriefDescription": "Instructions per a microcode Assist invocation", ··· 1060 1027 }, 1061 1028 { 1062 1029 "BriefDescription": "Average CPU Utilization (percentage)", 1063 - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", 1030 + "MetricExpr": "tma_info_system_cpus_utilized / #num_cpus_online", 1064 1031 "MetricGroup": "HPC;Summary", 1065 1032 "MetricName": "tma_info_system_cpu_utilization" 1066 1033 }, 1067 1034 { 1068 1035 "BriefDescription": "Average number of utilized CPUs", 1069 - "MetricExpr": "#num_cpus_online * tma_info_system_cpu_utilization", 1036 + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", 1070 1037 "MetricGroup": "Summary", 1071 1038 "MetricName": "tma_info_system_cpus_utilized" 1072 1039 }, ··· 1204 1171 "MetricThreshold": "tma_info_thread_uoppi > 1.05" 1205 1172 }, 1206 1173 { 1207 - "BriefDescription": "Instruction per taken branch", 1174 + "BriefDescription": "Uops per taken branch", 1208 1175 "MetricExpr": "tma_retiring * tma_info_thread_slots / BR_INST_RETIRED.NEAR_TAKEN", 1209 1176 "MetricGroup": "Branches;Fed;FetchBW", 1210 1177 "MetricName": "tma_info_thread_uptb", ··· 1213 1180 { 1214 1181 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses", 1215 1182 "MetricExpr": "ICACHE_TAG.STALLS / tma_info_thread_clks", 1216 - "MetricGroup": "BigFootprint;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;tma_fetch_latency_group", 1183 + "MetricGroup": "BigFootprint;BvBC;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;tma_fetch_latency_group", 1217 1184 "MetricName": "tma_itlb_misses", 1218 1185 "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 1219 1186 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: FRONTEND_RETIRED.STLB_MISS_PS;FRONTEND_RETIRED.ITLB_MISS_PS", ··· 1229 1196 "ScaleUnit": "100%" 1230 1197 }, 1231 1198 { 1199 + "BriefDescription": "This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache", 1200 + "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETIRED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", 1201 + "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound_group", 1202 + "MetricName": "tma_l1_hit_latency", 1203 + "MetricThreshold": "tma_l1_hit_latency > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1204 + "PublicDescription": "This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache. The short latency of the L1 data cache may be exposed in pointer-chasing memory access patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", 1205 + "ScaleUnit": "100%" 1206 + }, 1207 + { 1232 1208 "BriefDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads", 1233 1209 "MetricConstraint": "NO_GROUP_EVENTS", 1234 1210 "MetricExpr": "MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) / (MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + L1D_PEND_MISS.FB_FULL_PERIODS) * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks)", 1235 - "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group", 1211 + "MetricGroup": "BvML;CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group", 1236 1212 "MetricName": "tma_l2_bound", 1237 1213 "MetricThreshold": "tma_l2_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)", 1238 1214 "PublicDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads. Avoiding cache misses (i.e. L1 misses/L2 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_RETIRED.L2_HIT_PS", ··· 1260 1218 { 1261 1219 "BriefDescription": "This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)", 1262 1220 "MetricExpr": "17.5 * tma_info_system_core_frequency * (MEM_LOAD_RETIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2)) / tma_info_thread_clks", 1263 - "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_l3_bound_group", 1221 + "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_l3_bound_group", 1264 1222 "MetricName": "tma_l3_hit_latency", 1265 1223 "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1266 1224 "PublicDescription": "This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3 hits) will improve the latency; reduce contention with sibling physical cores and increase performance. Note the value of this node may overlap with its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tma_info_bottleneck_cache_memory_latency, tma_mem_latency", ··· 1272 1230 "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB", 1273 1231 "MetricName": "tma_lcp", 1274 1232 "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 1275 - "PublicDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb", 1233 + "PublicDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb", 1276 1234 "ScaleUnit": "100%" 1277 1235 }, 1278 1236 { ··· 1317 1275 "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1_bound_group", 1318 1276 "MetricName": "tma_lock_latency", 1319 1277 "MetricThreshold": "tma_lock_latency > 0.2 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1320 - "PublicDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations. Due to the microarchitecture handling of locks; they are classified as L1_Bound regardless of what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOADS_PS. Related metrics: tma_store_latency", 1278 + "PublicDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations. Due to the microarchitecture handling of locks; they are classified as L1_Bound regardless of what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOADS. Related metrics: tma_store_latency", 1321 1279 "ScaleUnit": "100%" 1322 1280 }, 1323 1281 { ··· 1332 1290 { 1333 1291 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears", 1334 1292 "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts)", 1335 - "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", 1293 + "MetricGroup": "BadSpec;BvMS;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", 1336 1294 "MetricName": "tma_machine_clears", 1337 1295 "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15", 1338 1296 "MetricgroupNoGroup": "TopdownL2", ··· 1342 1300 { 1343 1301 "BriefDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)", 1344 1302 "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=4@) / tma_info_thread_clks", 1345 - "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueBW", 1303 + "MetricGroup": "BvMS;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueBW", 1346 1304 "MetricName": "tma_mem_bandwidth", 1347 1305 "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1348 1306 "PublicDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuristic assumes that a similar off-core traffic is generated by all IA cores. This metric does not aggregate non-data-read requests by this logical processor; requests from other IA Logical Processors/Physical Cores/sockets; or other non-IA devices like GPU; hence the maximum external memory bandwidth limits may or may not be approached when this metric is flagged (see Uncore counters for that). Related metrics: tma_fb_full, tma_info_bottleneck_cache_memory_bandwidth, tma_info_system_dram_bw_use, tma_sq_full", ··· 1351 1309 { 1352 1310 "BriefDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)", 1353 1311 "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth", 1354 - "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueLat", 1312 + "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueLat", 1355 1313 "MetricName": "tma_mem_latency", 1356 1314 "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1357 1315 "PublicDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from other Logical Processors/Physical Cores/sockets (see Uncore counters for that). Related metrics: tma_info_bottleneck_cache_memory_latency, tma_l3_hit_latency", ··· 1388 1346 { 1389 1347 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage", 1390 1348 "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_thread_clks", 1391 - "MetricGroup": "BadSpec;BrMispredicts;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueBM", 1349 + "MetricGroup": "BadSpec;BrMispredicts;BvMP;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueBM", 1392 1350 "MetricName": "tma_mispredicts_resteers", 1393 1351 "MetricThreshold": "tma_mispredicts_resteers > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", 1394 1352 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage. Sample with: INT_MISC.CLEAR_RESTEER_CYCLES. Related metrics: tma_branch_mispredicts, tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredictions", ··· 1432 1390 { 1433 1391 "BriefDescription": "This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions", 1434 1392 "MetricExpr": "tma_light_operations * INST_RETIRED.NOP / (tma_retiring * tma_info_thread_slots)", 1435 - "MetricGroup": "Pipeline;TopdownL4;tma_L4_group;tma_other_light_ops_group", 1393 + "MetricGroup": "BvBO;Pipeline;TopdownL4;tma_L4_group;tma_other_light_ops_group", 1436 1394 "MetricName": "tma_nop_instructions", 1437 1395 "MetricThreshold": "tma_nop_instructions > 0.1 & (tma_other_light_ops > 0.3 & tma_light_operations > 0.6)", 1438 1396 "PublicDescription": "This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address of a function or loop body. Sample with: INST_RETIRED.NOP", ··· 1451 1409 { 1452 1410 "BriefDescription": "This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types).", 1453 1411 "MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.ALL_BRANCHES / (INT_MISC.CLEARS_COUNT - MACHINE_CLEARS.COUNT)), 0.0001)", 1454 - "MetricGroup": "BrMispredicts;TopdownL3;tma_L3_group;tma_branch_mispredicts_group", 1412 + "MetricGroup": "BrMispredicts;BvIO;TopdownL3;tma_L3_group;tma_branch_mispredicts_group", 1455 1413 "MetricName": "tma_other_mispredicts", 1456 1414 "MetricThreshold": "tma_other_mispredicts > 0.05 & (tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15)", 1457 1415 "ScaleUnit": "100%" ··· 1459 1417 { 1460 1418 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering.", 1461 1419 "MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.COUNT), 0.0001)", 1462 - "MetricGroup": "Machine_Clears;TopdownL3;tma_L3_group;tma_machine_clears_group", 1420 + "MetricGroup": "BvIO;Machine_Clears;TopdownL3;tma_L3_group;tma_machine_clears_group", 1463 1421 "MetricName": "tma_other_nukes", 1464 1422 "MetricThreshold": "tma_other_nukes > 0.05 & (tma_machine_clears > 0.1 & tma_bad_speculation > 0.15)", 1465 1423 "ScaleUnit": "100%" ··· 1511 1469 }, 1512 1470 { 1513 1471 "BriefDescription": "This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL, Physical Core cycles otherwise)", 1514 - "MetricExpr": "(cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ + tma_core_bound * RS_EVENTS.EMPTY_CYCLES) / tma_info_thread_clks * (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY) / tma_info_thread_clks", 1472 + "MetricExpr": "EXE_ACTIVITY.EXE_BOUND_0_PORTS / tma_info_thread_clks", 1515 1473 "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group", 1516 1474 "MetricName": "tma_ports_utilized_0", 1517 1475 "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", ··· 1539 1497 { 1540 1498 "BriefDescription": "This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)", 1541 1499 "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", 1542 - "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group", 1500 + "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group", 1543 1501 "MetricName": "tma_ports_utilized_3m", 1544 1502 "MetricThreshold": "tma_ports_utilized_3m > 0.4 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", 1545 1503 "PublicDescription": "This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise). Sample with: UOPS_EXECUTED.CYCLES_GE_3", ··· 1549 1507 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 1550 1508 "DefaultMetricgroupName": "TopdownL1", 1551 1509 "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1552 - "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 1510 + "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", 1553 1511 "MetricName": "tma_retiring", 1554 1512 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 1555 1513 "MetricgroupNoGroup": "TopdownL1;Default", ··· 1559 1517 { 1560 1518 "BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations", 1561 1519 "MetricExpr": "RESOURCE_STALLS.SCOREBOARD / tma_info_thread_clks", 1562 - "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_group;tma_issueSO", 1520 + "MetricGroup": "BvIO;PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_group;tma_issueSO", 1563 1521 "MetricName": "tma_serializing_operation", 1564 1522 "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)", 1565 1523 "PublicDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations. Instructions like CPUID; WRMSR or LFENCE serialize the out-of-order execution which may limit performance. Sample with: RESOURCE_STALLS.SCOREBOARD. Related metrics: tma_ms_switches", ··· 1596 1554 { 1597 1555 "BriefDescription": "This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)", 1598 1556 "MetricExpr": "L1D_PEND_MISS.L2_STALL / tma_info_thread_clks", 1599 - "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueBW;tma_l3_bound_group", 1557 + "MetricGroup": "BvMS;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueBW;tma_l3_bound_group", 1600 1558 "MetricName": "tma_sq_full", 1601 1559 "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1602 1560 "PublicDescription": "This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors). Related metrics: tma_fb_full, tma_info_bottleneck_cache_memory_bandwidth, tma_info_system_dram_bw_use, tma_mem_bandwidth", ··· 1624 1582 { 1625 1583 "BriefDescription": "This metric estimates fraction of cycles the CPU spent handling L1D store misses", 1626 1584 "MetricExpr": "(L2_RQSTS.RFO_HIT * 10 * (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) + (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) * min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / tma_info_thread_clks", 1627 - "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_issueSL;tma_store_bound_group", 1585 + "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_issueSL;tma_store_bound_group", 1628 1586 "MetricName": "tma_store_latency", 1629 1587 "MetricThreshold": "tma_store_latency > 0.1 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1630 1588 "PublicDescription": "This metric estimates fraction of cycles the CPU spent handling L1D store misses. Store accesses usually less impact out-of-order core performance; however; holding resources for longer time can lead into undesired implications (e.g. contention on L1D fill-buffer entries - see FB_Full). Related metrics: tma_fb_full, tma_lock_latency", ··· 1667 1625 { 1668 1626 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to new branch address clears", 1669 1627 "MetricExpr": "10 * BACLEARS.ANY / tma_info_thread_clks", 1670 - "MetricGroup": "BigFootprint;FetchLat;TopdownL4;tma_L4_group;tma_branch_resteers_group", 1628 + "MetricGroup": "BigFootprint;BvBC;FetchLat;TopdownL4;tma_L4_group;tma_branch_resteers_group", 1671 1629 "MetricName": "tma_unknown_branches", 1672 1630 "MetricThreshold": "tma_unknown_branches > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", 1673 1631 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to new branch address clears. These are fetched branches the Branch Prediction Unit was unable to recognize (e.g. first time the branch is fetched or hitting BTB capacity limit) hence called Unknown Branches. Sample with: BACLEARS.ANY",
+19
tools/perf/pmu-events/arch/x86/tigerlake/uncore-interconnect.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "UNC_ARB_COH_TRK_REQUESTS.ALL", 4 + "Counter": "0,1", 4 5 "EventCode": "0x84", 5 6 "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL", 6 7 "PerPkg": "1", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Each cycle counts number of any coherent request at memory controller that were issued by any core.", 12 + "Counter": "0", 13 13 "EventCode": "0x85", 14 14 "EventName": "UNC_ARB_DAT_OCCUPANCY.ALL", 15 + "Experimental": "1", 15 16 "PerPkg": "1", 16 17 "UMask": "0x1", 17 18 "Unit": "ARB" 18 19 }, 19 20 { 20 21 "BriefDescription": "Each cycle counts number of coherent reads pending on data return from memory controller that were issued by any core.", 22 + "Counter": "0", 21 23 "EventCode": "0x85", 22 24 "EventName": "UNC_ARB_DAT_OCCUPANCY.RD", 25 + "Experimental": "1", 23 26 "PerPkg": "1", 24 27 "UMask": "0x2", 25 28 "Unit": "ARB" 26 29 }, 27 30 { 28 31 "BriefDescription": "This event is deprecated. Refer to new event UNC_ARB_REQ_TRK_REQUEST.DRD", 32 + "Counter": "0,1", 29 33 "Deprecated": "1", 30 34 "EventCode": "0x81", 31 35 "EventName": "UNC_ARB_DAT_REQUESTS.RD", 36 + "Experimental": "1", 32 37 "PerPkg": "1", 33 38 "UMask": "0x2", 34 39 "Unit": "ARB" 35 40 }, 36 41 { 37 42 "BriefDescription": "This event is deprecated. Refer to new event UNC_ARB_DAT_OCCUPANCY.ALL", 43 + "Counter": "0", 38 44 "Deprecated": "1", 39 45 "EventCode": "0x85", 40 46 "EventName": "UNC_ARB_IFA_OCCUPANCY.ALL", 47 + "Experimental": "1", 41 48 "PerPkg": "1", 42 49 "UMask": "0x1", 43 50 "Unit": "ARB" 44 51 }, 45 52 { 46 53 "BriefDescription": "Each cycle count number of 'valid' coherent Data Read entries . Such entry is defined as valid when it is allocated till deallocation. Doesn't include prefetches [This event is alias to UNC_ARB_TRK_OCCUPANCY.RD]", 54 + "Counter": "0", 47 55 "EventCode": "0x80", 48 56 "EventName": "UNC_ARB_REQ_TRK_OCCUPANCY.DRD", 57 + "Experimental": "1", 49 58 "PerPkg": "1", 50 59 "UMask": "0x2", 51 60 "Unit": "ARB" 52 61 }, 53 62 { 54 63 "BriefDescription": "Number of all coherent Data Read entries. Doesn't include prefetches [This event is alias to UNC_ARB_TRK_REQUESTS.RD]", 64 + "Counter": "0,1", 55 65 "EventCode": "0x81", 56 66 "EventName": "UNC_ARB_REQ_TRK_REQUEST.DRD", 67 + "Experimental": "1", 57 68 "PerPkg": "1", 58 69 "UMask": "0x2", 59 70 "Unit": "ARB" 60 71 }, 61 72 { 62 73 "BriefDescription": "Each cycle count number of all outgoing valid entries in ReqTrk. Such entry is defined as valid from it's allocation in ReqTrk till deallocation. Accounts for Coherent and non-coherent traffic.", 74 + "Counter": "0", 63 75 "EventCode": "0x80", 64 76 "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL", 65 77 "PerPkg": "1", ··· 81 67 }, 82 68 { 83 69 "BriefDescription": "Each cycle count number of 'valid' coherent Data Read entries . Such entry is defined as valid when it is allocated till deallocation. Doesn't include prefetches [This event is alias to UNC_ARB_REQ_TRK_OCCUPANCY.DRD]", 70 + "Counter": "0", 84 71 "EventCode": "0x80", 85 72 "EventName": "UNC_ARB_TRK_OCCUPANCY.RD", 73 + "Experimental": "1", 86 74 "PerPkg": "1", 87 75 "UMask": "0x2", 88 76 "Unit": "ARB" 89 77 }, 90 78 { 91 79 "BriefDescription": "UNC_ARB_TRK_REQUESTS.ALL", 80 + "Counter": "0,1", 92 81 "EventCode": "0x81", 93 82 "EventName": "UNC_ARB_TRK_REQUESTS.ALL", 94 83 "PerPkg": "1", ··· 100 83 }, 101 84 { 102 85 "BriefDescription": "Number of all coherent Data Read entries. Doesn't include prefetches [This event is alias to UNC_ARB_REQ_TRK_REQUEST.DRD]", 86 + "Counter": "0,1", 103 87 "EventCode": "0x81", 104 88 "EventName": "UNC_ARB_TRK_REQUESTS.RD", 89 + "Experimental": "1", 105 90 "PerPkg": "1", 106 91 "UMask": "0x2", 107 92 "Unit": "ARB"
+6
tools/perf/pmu-events/arch/x86/tigerlake/uncore-memory.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.", 4 + "Counter": "1", 4 5 "EventCode": "0xff", 5 6 "EventName": "UNC_MC0_RDCAS_COUNT_FREERUN", 6 7 "PerPkg": "1", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Counts every 64B read and write request entering the Memory Controller to DRAM (sum of all channels). Each write request counts as a new request incrementing this counter. However, same cache line write requests (both full and partial) are combined to a single 64 byte data transfer to DRAM.", 12 + "Counter": "0", 13 13 "EventCode": "0xff", 14 14 "EventName": "UNC_MC0_TOTAL_REQCOUNT_FREERUN", 15 15 "PerPkg": "1", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.", 20 + "Counter": "2", 22 21 "EventCode": "0xff", 23 22 "EventName": "UNC_MC0_WRCAS_COUNT_FREERUN", 24 23 "PerPkg": "1", ··· 28 25 }, 29 26 { 30 27 "BriefDescription": "Counts every read (RdCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.", 28 + "Counter": "4", 31 29 "EventCode": "0xff", 32 30 "EventName": "UNC_MC1_RDCAS_COUNT_FREERUN", 33 31 "PerPkg": "1", ··· 37 33 }, 38 34 { 39 35 "BriefDescription": "Counts every 64B read and write request entering the Memory Controller to DRAM (sum of all channels). Each write request counts as a new request incrementing this counter. However, same cache line write requests (both full and partial) are combined to a single 64 byte data transfer to DRAM.", 36 + "Counter": "3", 40 37 "EventCode": "0xff", 41 38 "EventName": "UNC_MC1_TOTAL_REQCOUNT_FREERUN", 42 39 "PerPkg": "1", ··· 46 41 }, 47 42 { 48 43 "BriefDescription": "Counts every write (WrCAS) issued by the Memory Controller to DRAM (sum of all channels). All requests result in 64 byte data transfers from DRAM.", 44 + "Counter": "5", 49 45 "EventCode": "0xff", 50 46 "EventName": "UNC_MC1_WRCAS_COUNT_FREERUN", 51 47 "PerPkg": "1",
+1
tools/perf/pmu-events/arch/x86/tigerlake/uncore-other.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "UNC_CLOCK.SOCKET", 4 + "Counter": "FIXED", 4 5 "EventCode": "0xff", 5 6 "EventName": "UNC_CLOCK.SOCKET", 6 7 "PerPkg": "1",
+20
tools/perf/pmu-events/arch/x86/tigerlake/virtual-memory.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Loads that miss the DTLB and hit the STLB.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0x08", 5 6 "EventName": "DTLB_LOAD_MISSES.STLB_HIT", 6 7 "PublicDescription": "Counts loads that miss the DTLB (Data TLB) and hit the STLB (Second level TLB).", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Cycles when at least one PMH is busy with a page walk for a demand load.", 12 + "Counter": "0,1,2,3", 13 13 "CounterMask": "1", 14 14 "EventCode": "0x08", 15 15 "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", ··· 20 18 }, 21 19 { 22 20 "BriefDescription": "Load miss in all TLB levels causes a page walk that completes. (All page sizes)", 21 + "Counter": "0,1,2,3", 23 22 "EventCode": "0x08", 24 23 "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", 25 24 "PublicDescription": "Counts completed page walks (all page sizes) caused by demand data loads. This implies it missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 29 26 }, 30 27 { 31 28 "BriefDescription": "Page walks completed due to a demand data load to a 2M/4M page.", 29 + "Counter": "0,1,2,3", 32 30 "EventCode": "0x08", 33 31 "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", 34 32 "PublicDescription": "Counts completed page walks (2M/4M sizes) caused by demand data loads. This implies address translations missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 38 34 }, 39 35 { 40 36 "BriefDescription": "Page walks completed due to a demand data load to a 4K page.", 37 + "Counter": "0,1,2,3", 41 38 "EventCode": "0x08", 42 39 "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", 43 40 "PublicDescription": "Counts completed page walks (4K sizes) caused by demand data loads. This implies address translations missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 47 42 }, 48 43 { 49 44 "BriefDescription": "Number of page walks outstanding for a demand load in the PMH each cycle.", 45 + "Counter": "0,1,2,3", 50 46 "EventCode": "0x08", 51 47 "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", 52 48 "PublicDescription": "Counts the number of page walks outstanding for a demand load in the PMH (Page Miss Handler) each cycle.", ··· 56 50 }, 57 51 { 58 52 "BriefDescription": "Stores that miss the DTLB and hit the STLB.", 53 + "Counter": "0,1,2,3", 59 54 "EventCode": "0x49", 60 55 "EventName": "DTLB_STORE_MISSES.STLB_HIT", 61 56 "PublicDescription": "Counts stores that miss the DTLB (Data TLB) and hit the STLB (2nd Level TLB).", ··· 65 58 }, 66 59 { 67 60 "BriefDescription": "Cycles when at least one PMH is busy with a page walk for a store.", 61 + "Counter": "0,1,2,3", 68 62 "CounterMask": "1", 69 63 "EventCode": "0x49", 70 64 "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", ··· 75 67 }, 76 68 { 77 69 "BriefDescription": "Store misses in all TLB levels causes a page walk that completes. (All page sizes)", 70 + "Counter": "0,1,2,3", 78 71 "EventCode": "0x49", 79 72 "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", 80 73 "PublicDescription": "Counts completed page walks (all page sizes) caused by demand data stores. This implies it missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 84 75 }, 85 76 { 86 77 "BriefDescription": "Page walks completed due to a demand data store to a 2M/4M page.", 78 + "Counter": "0,1,2,3", 87 79 "EventCode": "0x49", 88 80 "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", 89 81 "PublicDescription": "Counts page walks completed due to demand data stores whose address translations missed in the TLB and were mapped to 2M/4M pages. The page walks can end with or without a page fault.", ··· 93 83 }, 94 84 { 95 85 "BriefDescription": "Page walks completed due to a demand data store to a 4K page.", 86 + "Counter": "0,1,2,3", 96 87 "EventCode": "0x49", 97 88 "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", 98 89 "PublicDescription": "Counts page walks completed due to demand data stores whose address translations missed in the TLB and were mapped to 4K pages. The page walks can end with or without a page fault.", ··· 102 91 }, 103 92 { 104 93 "BriefDescription": "Number of page walks outstanding for a store in the PMH each cycle.", 94 + "Counter": "0,1,2,3", 105 95 "EventCode": "0x49", 106 96 "EventName": "DTLB_STORE_MISSES.WALK_PENDING", 107 97 "PublicDescription": "Counts the number of page walks outstanding for a store in the PMH (Page Miss Handler) each cycle.", ··· 111 99 }, 112 100 { 113 101 "BriefDescription": "Instruction fetch requests that miss the ITLB and hit the STLB.", 102 + "Counter": "0,1,2,3", 114 103 "EventCode": "0x85", 115 104 "EventName": "ITLB_MISSES.STLB_HIT", 116 105 "PublicDescription": "Counts instruction fetch requests that miss the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", ··· 120 107 }, 121 108 { 122 109 "BriefDescription": "Cycles when at least one PMH is busy with a page walk for code (instruction fetch) request.", 110 + "Counter": "0,1,2,3", 123 111 "CounterMask": "1", 124 112 "EventCode": "0x85", 125 113 "EventName": "ITLB_MISSES.WALK_ACTIVE", ··· 130 116 }, 131 117 { 132 118 "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (All page sizes)", 119 + "Counter": "0,1,2,3", 133 120 "EventCode": "0x85", 134 121 "EventName": "ITLB_MISSES.WALK_COMPLETED", 135 122 "PublicDescription": "Counts completed page walks (all page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.", ··· 139 124 }, 140 125 { 141 126 "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (2M/4M)", 127 + "Counter": "0,1,2,3", 142 128 "EventCode": "0x85", 143 129 "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", 144 130 "PublicDescription": "Counts completed page walks (2M/4M page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.", ··· 148 132 }, 149 133 { 150 134 "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (4K)", 135 + "Counter": "0,1,2,3", 151 136 "EventCode": "0x85", 152 137 "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", 153 138 "PublicDescription": "Counts completed page walks (4K page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.", ··· 157 140 }, 158 141 { 159 142 "BriefDescription": "Number of page walks outstanding for an outstanding code request in the PMH each cycle.", 143 + "Counter": "0,1,2,3", 160 144 "EventCode": "0x85", 161 145 "EventName": "ITLB_MISSES.WALK_PENDING", 162 146 "PublicDescription": "Counts the number of page walks outstanding for an outstanding code (instruction fetch) request in the PMH (Page Miss Handler) each cycle.", ··· 166 148 }, 167 149 { 168 150 "BriefDescription": "DTLB flush attempts of the thread-specific entries", 151 + "Counter": "0,1,2,3", 169 152 "EventCode": "0xbd", 170 153 "EventName": "TLB_FLUSH.DTLB_THREAD", 171 154 "PublicDescription": "Counts the number of DTLB flush attempts of the thread-specific entries.", ··· 175 156 }, 176 157 { 177 158 "BriefDescription": "STLB flush attempts", 159 + "Counter": "0,1,2,3", 178 160 "EventCode": "0xbd", 179 161 "EventName": "TLB_FLUSH.STLB_ANY", 180 162 "PublicDescription": "Counts the number of any STLB flush attempts (such as entire, VPID, PCID, InvPage, CR3 write, etc.).",