Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

perf vendor events: Add/update rocketlake events/metrics

Update events from v1.02 to v1.03.
Update TMA metrics from v4.7 to v4.8.

Bring in the event updates v1.03:
https://github.com/intel/perfmon/commit/a7c75ffd56c7056494cd3acc2749336cd6363b90

The TMA 4.8 information was added in:
https://github.com/intel/perfmon/commit/59194d4d90ca50a3fcb2de0d82b9f6fc0c9a5736

Add counter information. The most recent RFC patch set using this
information:
https://lore.kernel.org/lkml/20240412210756.309828-1-weilin.wang@intel.com/

Adds the event SW_PREFETCH_ACCESS.ANY.

Co-authored-by: Weilin Wang <weilin.wang@intel.com>
Co-authored-by: Caleb Biggers <caleb.biggers@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240620181752.3945845-27-irogers@google.com

authored by

Ian Rogers
Weilin Wang
Caleb Biggers
and committed by
Namhyung Kim
bf0dd1f4 d6977722

+629 -88
+1 -1
tools/perf/pmu-events/arch/x86/mapfile.csv
··· 24 24 GenuineIntel-6-A[AC],v1.10,meteorlake,core 25 25 GenuineIntel-6-1[AEF],v4,nehalemep,core 26 26 GenuineIntel-6-2E,v4,nehalemex,core 27 - GenuineIntel-6-A7,v1.02,rocketlake,core 27 + GenuineIntel-6-A7,v1.03,rocketlake,core 28 28 GenuineIntel-6-2A,v19,sandybridge,core 29 29 GenuineIntel-6-8F,v1.20,sapphirerapids,core 30 30 GenuineIntel-6-AF,v1.02,sierraforest,core
+109
tools/perf/pmu-events/arch/x86/rocketlake/cache.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts the number of cache lines replaced in L1 data cache.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0x51", 5 6 "EventName": "L1D.REPLACEMENT", 6 7 "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability.", 12 + "Counter": "0,1,2,3", 13 13 "EventCode": "0x48", 14 14 "EventName": "L1D_PEND_MISS.FB_FULL", 15 15 "PublicDescription": "Counts number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailability. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Number of phases a demand request has waited due to L1D Fill Buffer (FB) unavailability.", 20 + "Counter": "0,1,2,3", 22 21 "CounterMask": "1", 23 22 "EdgeDetect": "1", 24 23 "EventCode": "0x48", ··· 30 27 }, 31 28 { 32 29 "BriefDescription": "Number of cycles a demand request has waited due to L1D due to lack of L2 resources.", 30 + "Counter": "0,1,2,3", 33 31 "EventCode": "0x48", 34 32 "EventName": "L1D_PEND_MISS.L2_STALL", 35 33 "PublicDescription": "Counts number of cycles a demand request has waited due to L1D due to lack of L2 resources. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.", ··· 39 35 }, 40 36 { 41 37 "BriefDescription": "Number of L1D misses that are outstanding", 38 + "Counter": "0,1,2,3", 42 39 "EventCode": "0x48", 43 40 "EventName": "L1D_PEND_MISS.PENDING", 44 41 "PublicDescription": "Counts number of L1D misses that are outstanding in each cycle, that is each cycle the number of Fill Buffers (FB) outstanding required by Demand Reads. FB either is held by demand loads, or it is held by non-demand loads and gets hit at least once by demand. The valid outstanding interval is defined until the FB deallocation by one of the following ways: from FB allocation, if FB is allocated by demand from the demand Hit FB, if it is allocated by hardware or software prefetch. Note: In the L1D, a Demand Read contains cacheable or noncacheable demand loads, including ones causing cache-line splits and reads due to page walks resulted from any request type.", ··· 48 43 }, 49 44 { 50 45 "BriefDescription": "Cycles with L1D load Misses outstanding.", 46 + "Counter": "0,1,2,3", 51 47 "CounterMask": "1", 52 48 "EventCode": "0x48", 53 49 "EventName": "L1D_PEND_MISS.PENDING_CYCLES", ··· 58 52 }, 59 53 { 60 54 "BriefDescription": "L2 cache lines filling L2", 55 + "Counter": "0,1,2,3", 61 56 "EventCode": "0xF1", 62 57 "EventName": "L2_LINES_IN.ALL", 63 58 "PublicDescription": "Counts the number of L2 cache lines filling the L2. Counting does not cover rejects.", ··· 67 60 }, 68 61 { 69 62 "BriefDescription": "Modified cache lines that are evicted by L2 cache when triggered by an L2 cache fill.", 63 + "Counter": "0,1,2,3", 70 64 "EventCode": "0xF2", 71 65 "EventName": "L2_LINES_OUT.NON_SILENT", 72 66 "PublicDescription": "Counts the number of lines that are evicted by L2 cache when triggered by an L2 cache fill. Those lines are in Modified state. Modified lines are written back to L3", ··· 76 68 }, 77 69 { 78 70 "BriefDescription": "Non-modified cache lines that are silently dropped by L2 cache when triggered by an L2 cache fill.", 71 + "Counter": "0,1,2,3", 79 72 "EventCode": "0xF2", 80 73 "EventName": "L2_LINES_OUT.SILENT", 81 74 "PublicDescription": "Counts the number of lines that are silently dropped by L2 cache when triggered by an L2 cache fill. These lines are typically in Shared or Exclusive state. A non-threaded event.", ··· 85 76 }, 86 77 { 87 78 "BriefDescription": "Cache lines that have been L2 hardware prefetched but not used by demand accesses", 79 + "Counter": "0,1,2,3", 88 80 "EventCode": "0xf2", 89 81 "EventName": "L2_LINES_OUT.USELESS_HWPF", 90 82 "PublicDescription": "Counts the number of cache lines that have been prefetched by the L2 hardware prefetcher but not used by demand access when evicted from the L2 cache", ··· 94 84 }, 95 85 { 96 86 "BriefDescription": "L2 code requests", 87 + "Counter": "0,1,2,3", 97 88 "EventCode": "0x24", 98 89 "EventName": "L2_RQSTS.ALL_CODE_RD", 99 90 "PublicDescription": "Counts the total number of L2 code requests.", ··· 103 92 }, 104 93 { 105 94 "BriefDescription": "Demand Data Read requests", 95 + "Counter": "0,1,2,3", 106 96 "EventCode": "0x24", 107 97 "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD", 108 98 "PublicDescription": "Counts the number of demand Data Read requests (including requests from L1D hardware prefetchers). These loads may hit or miss L2 cache. Only non rejected loads are counted.", ··· 112 100 }, 113 101 { 114 102 "BriefDescription": "Demand requests that miss L2 cache", 103 + "Counter": "0,1,2,3", 115 104 "EventCode": "0x24", 116 105 "EventName": "L2_RQSTS.ALL_DEMAND_MISS", 117 106 "PublicDescription": "Counts demand requests that miss L2 cache.", ··· 121 108 }, 122 109 { 123 110 "BriefDescription": "Demand requests to L2 cache", 111 + "Counter": "0,1,2,3", 124 112 "EventCode": "0x24", 125 113 "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES", 126 114 "PublicDescription": "Counts demand requests to L2 cache.", ··· 130 116 }, 131 117 { 132 118 "BriefDescription": "RFO requests to L2 cache", 119 + "Counter": "0,1,2,3", 133 120 "EventCode": "0x24", 134 121 "EventName": "L2_RQSTS.ALL_RFO", 135 122 "PublicDescription": "Counts the total number of RFO (read for ownership) requests to L2 cache. L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches.", ··· 139 124 }, 140 125 { 141 126 "BriefDescription": "L2 cache hits when fetching instructions, code reads.", 127 + "Counter": "0,1,2,3", 142 128 "EventCode": "0x24", 143 129 "EventName": "L2_RQSTS.CODE_RD_HIT", 144 130 "PublicDescription": "Counts L2 cache hits when fetching instructions, code reads.", ··· 148 132 }, 149 133 { 150 134 "BriefDescription": "L2 cache misses when fetching instructions", 135 + "Counter": "0,1,2,3", 151 136 "EventCode": "0x24", 152 137 "EventName": "L2_RQSTS.CODE_RD_MISS", 153 138 "PublicDescription": "Counts L2 cache misses when fetching instructions.", ··· 157 140 }, 158 141 { 159 142 "BriefDescription": "Demand Data Read requests that hit L2 cache", 143 + "Counter": "0,1,2,3", 160 144 "EventCode": "0x24", 161 145 "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT", 162 146 "PublicDescription": "Counts the number of demand Data Read requests initiated by load instructions that hit L2 cache.", ··· 166 148 }, 167 149 { 168 150 "BriefDescription": "Demand Data Read miss L2, no rejects", 151 + "Counter": "0,1,2,3", 169 152 "EventCode": "0x24", 170 153 "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS", 171 154 "PublicDescription": "Counts the number of demand Data Read requests that miss L2 cache. Only not rejected loads are counted.", ··· 175 156 }, 176 157 { 177 158 "BriefDescription": "All requests that miss L2 cache.", 159 + "Counter": "0,1,2,3", 178 160 "EventCode": "0x24", 179 161 "EventName": "L2_RQSTS.MISS", 180 162 "PublicDescription": "Counts all requests that miss L2 cache.", ··· 184 164 }, 185 165 { 186 166 "BriefDescription": "All L2 requests.", 167 + "Counter": "0,1,2,3", 187 168 "EventCode": "0x24", 188 169 "EventName": "L2_RQSTS.REFERENCES", 189 170 "PublicDescription": "Counts all L2 requests.", ··· 193 172 }, 194 173 { 195 174 "BriefDescription": "RFO requests that hit L2 cache", 175 + "Counter": "0,1,2,3", 196 176 "EventCode": "0x24", 197 177 "EventName": "L2_RQSTS.RFO_HIT", 198 178 "PublicDescription": "Counts the RFO (Read-for-Ownership) requests that hit L2 cache.", ··· 202 180 }, 203 181 { 204 182 "BriefDescription": "RFO requests that miss L2 cache", 183 + "Counter": "0,1,2,3", 205 184 "EventCode": "0x24", 206 185 "EventName": "L2_RQSTS.RFO_MISS", 207 186 "PublicDescription": "Counts the RFO (Read-for-Ownership) requests that miss L2 cache.", ··· 211 188 }, 212 189 { 213 190 "BriefDescription": "SW prefetch requests that hit L2 cache.", 191 + "Counter": "0,1,2,3", 214 192 "EventCode": "0x24", 215 193 "EventName": "L2_RQSTS.SWPF_HIT", 216 194 "PublicDescription": "Counts Software prefetch requests that hit the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when FB is not full.", ··· 220 196 }, 221 197 { 222 198 "BriefDescription": "SW prefetch requests that miss L2 cache.", 199 + "Counter": "0,1,2,3", 223 200 "EventCode": "0x24", 224 201 "EventName": "L2_RQSTS.SWPF_MISS", 225 202 "PublicDescription": "Counts Software prefetch requests that miss the L2 cache. Accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions when FB is not full.", ··· 229 204 }, 230 205 { 231 206 "BriefDescription": "L2 writebacks that access L2 cache", 207 + "Counter": "0,1,2,3", 232 208 "EventCode": "0xF0", 233 209 "EventName": "L2_TRANS.L2_WB", 234 210 "PublicDescription": "Counts L2 writebacks that access L2 cache.", ··· 238 212 }, 239 213 { 240 214 "BriefDescription": "Core-originated cacheable requests that missed L3 (Except hardware prefetches to the L3)", 215 + "Counter": "0,1,2,3,4,5,6,7", 241 216 "EventCode": "0x2e", 242 217 "EventName": "LONGEST_LAT_CACHE.MISS", 243 218 "PublicDescription": "Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include data and code reads, Reads-for-Ownership (RFOs), speculative accesses and hardware prefetches to the L1 and L2. It does not include hardware prefetches to the L3, and may not count other types of requests to the L3.", ··· 247 220 }, 248 221 { 249 222 "BriefDescription": "Retired load instructions.", 223 + "Counter": "0,1,2,3", 250 224 "Data_LA": "1", 251 225 "EventCode": "0xd0", 252 226 "EventName": "MEM_INST_RETIRED.ALL_LOADS", ··· 258 230 }, 259 231 { 260 232 "BriefDescription": "Retired store instructions.", 233 + "Counter": "0,1,2,3", 261 234 "Data_LA": "1", 262 235 "EventCode": "0xd0", 263 236 "EventName": "MEM_INST_RETIRED.ALL_STORES", ··· 269 240 }, 270 241 { 271 242 "BriefDescription": "All retired memory instructions.", 243 + "Counter": "0,1,2,3", 272 244 "Data_LA": "1", 273 245 "EventCode": "0xd0", 274 246 "EventName": "MEM_INST_RETIRED.ANY", ··· 280 250 }, 281 251 { 282 252 "BriefDescription": "Retired load instructions with locked access.", 253 + "Counter": "0,1,2,3", 283 254 "Data_LA": "1", 284 255 "EventCode": "0xd0", 285 256 "EventName": "MEM_INST_RETIRED.LOCK_LOADS", ··· 291 260 }, 292 261 { 293 262 "BriefDescription": "Retired load instructions that split across a cacheline boundary.", 263 + "Counter": "0,1,2,3", 294 264 "Data_LA": "1", 295 265 "EventCode": "0xd0", 296 266 "EventName": "MEM_INST_RETIRED.SPLIT_LOADS", ··· 302 270 }, 303 271 { 304 272 "BriefDescription": "Retired store instructions that split across a cacheline boundary.", 273 + "Counter": "0,1,2,3", 305 274 "Data_LA": "1", 306 275 "EventCode": "0xd0", 307 276 "EventName": "MEM_INST_RETIRED.SPLIT_STORES", ··· 313 280 }, 314 281 { 315 282 "BriefDescription": "Retired load instructions that miss the STLB.", 283 + "Counter": "0,1,2,3", 316 284 "Data_LA": "1", 317 285 "EventCode": "0xd0", 318 286 "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS", ··· 324 290 }, 325 291 { 326 292 "BriefDescription": "Retired store instructions that miss the STLB.", 293 + "Counter": "0,1,2,3", 327 294 "Data_LA": "1", 328 295 "EventCode": "0xd0", 329 296 "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES", ··· 335 300 }, 336 301 { 337 302 "BriefDescription": "Retired load instructions whose data sources were L3 and cross-core snoop hits in on-pkg core cache", 303 + "Counter": "0,1,2,3", 338 304 "Data_LA": "1", 339 305 "EventCode": "0xd2", 340 306 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT", ··· 346 310 }, 347 311 { 348 312 "BriefDescription": "Retired load instructions whose data sources were HitM responses from shared L3", 313 + "Counter": "0,1,2,3", 349 314 "Data_LA": "1", 350 315 "EventCode": "0xd2", 351 316 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM", ··· 357 320 }, 358 321 { 359 322 "BriefDescription": "Retired load instructions whose data sources were L3 hit and cross-core snoop missed in on-pkg core cache.", 323 + "Counter": "0,1,2,3", 360 324 "Data_LA": "1", 361 325 "EventCode": "0xd2", 362 326 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS", ··· 368 330 }, 369 331 { 370 332 "BriefDescription": "Retired load instructions whose data sources were hits in L3 without snoops required", 333 + "Counter": "0,1,2,3", 371 334 "Data_LA": "1", 372 335 "EventCode": "0xd2", 373 336 "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NONE", ··· 379 340 }, 380 341 { 381 342 "BriefDescription": "Retired instructions with at least 1 uncacheable load or Bus Lock.", 343 + "Counter": "0,1,2,3", 382 344 "Data_LA": "1", 383 345 "EventCode": "0xd4", 384 346 "EventName": "MEM_LOAD_MISC_RETIRED.UC", ··· 390 350 }, 391 351 { 392 352 "BriefDescription": "Number of completed demand load requests that missed the L1, but hit the FB(fill buffer), because a preceding miss to the same cacheline initiated the line to be brought into L1, but data is not yet ready in L1.", 353 + "Counter": "0,1,2,3", 393 354 "Data_LA": "1", 394 355 "EventCode": "0xd1", 395 356 "EventName": "MEM_LOAD_RETIRED.FB_HIT", ··· 401 360 }, 402 361 { 403 362 "BriefDescription": "Retired load instructions with L1 cache hits as data sources", 363 + "Counter": "0,1,2,3", 404 364 "Data_LA": "1", 405 365 "EventCode": "0xd1", 406 366 "EventName": "MEM_LOAD_RETIRED.L1_HIT", ··· 412 370 }, 413 371 { 414 372 "BriefDescription": "Retired load instructions missed L1 cache as data sources", 373 + "Counter": "0,1,2,3", 415 374 "Data_LA": "1", 416 375 "EventCode": "0xd1", 417 376 "EventName": "MEM_LOAD_RETIRED.L1_MISS", ··· 423 380 }, 424 381 { 425 382 "BriefDescription": "Retired load instructions with L2 cache hits as data sources", 383 + "Counter": "0,1,2,3", 426 384 "Data_LA": "1", 427 385 "EventCode": "0xd1", 428 386 "EventName": "MEM_LOAD_RETIRED.L2_HIT", ··· 434 390 }, 435 391 { 436 392 "BriefDescription": "Retired load instructions missed L2 cache as data sources", 393 + "Counter": "0,1,2,3", 437 394 "Data_LA": "1", 438 395 "EventCode": "0xd1", 439 396 "EventName": "MEM_LOAD_RETIRED.L2_MISS", ··· 445 400 }, 446 401 { 447 402 "BriefDescription": "Retired load instructions with L3 cache hits as data sources", 403 + "Counter": "0,1,2,3", 448 404 "Data_LA": "1", 449 405 "EventCode": "0xd1", 450 406 "EventName": "MEM_LOAD_RETIRED.L3_HIT", ··· 456 410 }, 457 411 { 458 412 "BriefDescription": "Retired load instructions missed L3 cache as data sources", 413 + "Counter": "0,1,2,3", 459 414 "Data_LA": "1", 460 415 "EventCode": "0xd1", 461 416 "EventName": "MEM_LOAD_RETIRED.L3_MISS", ··· 467 420 }, 468 421 { 469 422 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that hit a cacheline in the L3 where a snoop was sent or not.", 423 + "Counter": "0,1,2,3", 470 424 "EventCode": "0xB7, 0xBB", 471 425 "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.ANY", 472 426 "MSRIndex": "0x1a6,0x1a7", ··· 477 429 }, 478 430 { 479 431 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 432 + "Counter": "0,1,2,3", 480 433 "EventCode": "0xB7, 0xBB", 481 434 "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_HITM", 482 435 "MSRIndex": "0x1a6,0x1a7", ··· 487 438 }, 488 439 { 489 440 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that hit a cacheline in the L3 where a snoop hit in another core, data forwarding is not required.", 441 + "Counter": "0,1,2,3", 490 442 "EventCode": "0xB7, 0xBB", 491 443 "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_HIT_NO_FWD", 492 444 "MSRIndex": "0x1a6,0x1a7", ··· 497 447 }, 498 448 { 499 449 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 450 + "Counter": "0,1,2,3", 500 451 "EventCode": "0xB7, 0xBB", 501 452 "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_MISS", 502 453 "MSRIndex": "0x1a6,0x1a7", ··· 507 456 }, 508 457 { 509 458 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 459 + "Counter": "0,1,2,3", 510 460 "EventCode": "0xB7, 0xBB", 511 461 "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_NOT_NEEDED", 512 462 "MSRIndex": "0x1a6,0x1a7", ··· 517 465 }, 518 466 { 519 467 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that hit a cacheline in the L3 where a snoop was sent.", 468 + "Counter": "0,1,2,3", 520 469 "EventCode": "0xB7, 0xBB", 521 470 "EventName": "OCR.DEMAND_CODE_RD.L3_HIT.SNOOP_SENT", 522 471 "MSRIndex": "0x1a6,0x1a7", ··· 527 474 }, 528 475 { 529 476 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop was sent or not.", 477 + "Counter": "0,1,2,3", 530 478 "EventCode": "0xB7, 0xBB", 531 479 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.ANY", 532 480 "MSRIndex": "0x1a6,0x1a7", ··· 537 483 }, 538 484 { 539 485 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 486 + "Counter": "0,1,2,3", 540 487 "EventCode": "0xB7, 0xBB", 541 488 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM", 542 489 "MSRIndex": "0x1a6,0x1a7", ··· 547 492 }, 548 493 { 549 494 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop hit in another core, data forwarding is not required.", 495 + "Counter": "0,1,2,3", 550 496 "EventCode": "0xB7, 0xBB", 551 497 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD", 552 498 "MSRIndex": "0x1a6,0x1a7", ··· 557 501 }, 558 502 { 559 503 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 504 + "Counter": "0,1,2,3", 560 505 "EventCode": "0xB7, 0xBB", 561 506 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_MISS", 562 507 "MSRIndex": "0x1a6,0x1a7", ··· 567 510 }, 568 511 { 569 512 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 513 + "Counter": "0,1,2,3", 570 514 "EventCode": "0xB7, 0xBB", 571 515 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_NOT_NEEDED", 572 516 "MSRIndex": "0x1a6,0x1a7", ··· 577 519 }, 578 520 { 579 521 "BriefDescription": "Counts demand data reads that hit a cacheline in the L3 where a snoop was sent.", 522 + "Counter": "0,1,2,3", 580 523 "EventCode": "0xB7, 0xBB", 581 524 "EventName": "OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_SENT", 582 525 "MSRIndex": "0x1a6,0x1a7", ··· 587 528 }, 588 529 { 589 530 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop was sent or not.", 531 + "Counter": "0,1,2,3", 590 532 "EventCode": "0xB7, 0xBB", 591 533 "EventName": "OCR.DEMAND_RFO.L3_HIT.ANY", 592 534 "MSRIndex": "0x1a6,0x1a7", ··· 597 537 }, 598 538 { 599 539 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 540 + "Counter": "0,1,2,3", 600 541 "EventCode": "0xB7, 0xBB", 601 542 "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM", 602 543 "MSRIndex": "0x1a6,0x1a7", ··· 607 546 }, 608 547 { 609 548 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop hit in another core, data forwarding is not required.", 549 + "Counter": "0,1,2,3", 610 550 "EventCode": "0xB7, 0xBB", 611 551 "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_HIT_NO_FWD", 612 552 "MSRIndex": "0x1a6,0x1a7", ··· 617 555 }, 618 556 { 619 557 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 558 + "Counter": "0,1,2,3", 620 559 "EventCode": "0xB7, 0xBB", 621 560 "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_MISS", 622 561 "MSRIndex": "0x1a6,0x1a7", ··· 627 564 }, 628 565 { 629 566 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 567 + "Counter": "0,1,2,3", 630 568 "EventCode": "0xB7, 0xBB", 631 569 "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_NOT_NEEDED", 632 570 "MSRIndex": "0x1a6,0x1a7", ··· 637 573 }, 638 574 { 639 575 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that hit a cacheline in the L3 where a snoop was sent.", 576 + "Counter": "0,1,2,3", 640 577 "EventCode": "0xB7, 0xBB", 641 578 "EventName": "OCR.DEMAND_RFO.L3_HIT.SNOOP_SENT", 642 579 "MSRIndex": "0x1a6,0x1a7", ··· 647 582 }, 648 583 { 649 584 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that hit a cacheline in the L3 where a snoop was sent or not.", 585 + "Counter": "0,1,2,3", 650 586 "EventCode": "0xB7, 0xBB", 651 587 "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.ANY", 652 588 "MSRIndex": "0x1a6,0x1a7", ··· 657 591 }, 658 592 { 659 593 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 594 + "Counter": "0,1,2,3", 660 595 "EventCode": "0xB7, 0xBB", 661 596 "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.SNOOP_MISS", 662 597 "MSRIndex": "0x1a6,0x1a7", ··· 667 600 }, 668 601 { 669 602 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 603 + "Counter": "0,1,2,3", 670 604 "EventCode": "0xB7, 0xBB", 671 605 "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_HIT.SNOOP_NOT_NEEDED", 672 606 "MSRIndex": "0x1a6,0x1a7", ··· 677 609 }, 678 610 { 679 611 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that hit a cacheline in the L3 where a snoop was sent or not.", 612 + "Counter": "0,1,2,3", 680 613 "EventCode": "0xB7, 0xBB", 681 614 "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.ANY", 682 615 "MSRIndex": "0x1a6,0x1a7", ··· 687 618 }, 688 619 { 689 620 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 621 + "Counter": "0,1,2,3", 690 622 "EventCode": "0xB7, 0xBB", 691 623 "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_HITM", 692 624 "MSRIndex": "0x1a6,0x1a7", ··· 697 627 }, 698 628 { 699 629 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that hit a cacheline in the L3 where a snoop hit in another core, data forwarding is not required.", 630 + "Counter": "0,1,2,3", 700 631 "EventCode": "0xB7, 0xBB", 701 632 "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_HIT_NO_FWD", 702 633 "MSRIndex": "0x1a6,0x1a7", ··· 707 636 }, 708 637 { 709 638 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 639 + "Counter": "0,1,2,3", 710 640 "EventCode": "0xB7, 0xBB", 711 641 "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_MISS", 712 642 "MSRIndex": "0x1a6,0x1a7", ··· 717 645 }, 718 646 { 719 647 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 648 + "Counter": "0,1,2,3", 720 649 "EventCode": "0xB7, 0xBB", 721 650 "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_NOT_NEEDED", 722 651 "MSRIndex": "0x1a6,0x1a7", ··· 727 654 }, 728 655 { 729 656 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that hit a cacheline in the L3 where a snoop was sent.", 657 + "Counter": "0,1,2,3", 730 658 "EventCode": "0xB7, 0xBB", 731 659 "EventName": "OCR.HWPF_L2_DATA_RD.L3_HIT.SNOOP_SENT", 732 660 "MSRIndex": "0x1a6,0x1a7", ··· 737 663 }, 738 664 { 739 665 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that hit a cacheline in the L3 where a snoop was sent or not.", 666 + "Counter": "0,1,2,3", 740 667 "EventCode": "0xB7, 0xBB", 741 668 "EventName": "OCR.HWPF_L2_RFO.L3_HIT.ANY", 742 669 "MSRIndex": "0x1a6,0x1a7", ··· 747 672 }, 748 673 { 749 674 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that hit a cacheline in the L3 where a snoop hit in another cores caches, data forwarding is required as the data is modified.", 675 + "Counter": "0,1,2,3", 750 676 "EventCode": "0xB7, 0xBB", 751 677 "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_HITM", 752 678 "MSRIndex": "0x1a6,0x1a7", ··· 757 681 }, 758 682 { 759 683 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that hit a cacheline in the L3 where a snoop hit in another core, data forwarding is not required.", 684 + "Counter": "0,1,2,3", 760 685 "EventCode": "0xB7, 0xBB", 761 686 "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_HIT_NO_FWD", 762 687 "MSRIndex": "0x1a6,0x1a7", ··· 767 690 }, 768 691 { 769 692 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 693 + "Counter": "0,1,2,3", 770 694 "EventCode": "0xB7, 0xBB", 771 695 "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_MISS", 772 696 "MSRIndex": "0x1a6,0x1a7", ··· 777 699 }, 778 700 { 779 701 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 702 + "Counter": "0,1,2,3", 780 703 "EventCode": "0xB7, 0xBB", 781 704 "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_NOT_NEEDED", 782 705 "MSRIndex": "0x1a6,0x1a7", ··· 787 708 }, 788 709 { 789 710 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that hit a cacheline in the L3 where a snoop was sent.", 711 + "Counter": "0,1,2,3", 790 712 "EventCode": "0xB7, 0xBB", 791 713 "EventName": "OCR.HWPF_L2_RFO.L3_HIT.SNOOP_SENT", 792 714 "MSRIndex": "0x1a6,0x1a7", ··· 797 717 }, 798 718 { 799 719 "BriefDescription": "Counts hardware prefetches to the L3 only that hit a cacheline in the L3 where a snoop was sent or not.", 720 + "Counter": "0,1,2,3", 800 721 "EventCode": "0xB7, 0xBB", 801 722 "EventName": "OCR.HWPF_L3.L3_HIT.ANY", 802 723 "MSRIndex": "0x1a6,0x1a7", ··· 807 726 }, 808 727 { 809 728 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that hit a cacheline in the L3 where a snoop hit in another core, data forwarding is not required.", 729 + "Counter": "0,1,2,3", 810 730 "EventCode": "0xB7, 0xBB", 811 731 "EventName": "OCR.OTHER.L3_HIT.SNOOP_HIT_NO_FWD", 812 732 "MSRIndex": "0x1a6,0x1a7", ··· 817 735 }, 818 736 { 819 737 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that hit a cacheline in the L3 where a snoop was sent but no other cores had the data.", 738 + "Counter": "0,1,2,3", 820 739 "EventCode": "0xB7, 0xBB", 821 740 "EventName": "OCR.OTHER.L3_HIT.SNOOP_MISS", 822 741 "MSRIndex": "0x1a6,0x1a7", ··· 827 744 }, 828 745 { 829 746 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that hit a cacheline in the L3 where a snoop was not needed to satisfy the request.", 747 + "Counter": "0,1,2,3", 830 748 "EventCode": "0xB7, 0xBB", 831 749 "EventName": "OCR.OTHER.L3_HIT.SNOOP_NOT_NEEDED", 832 750 "MSRIndex": "0x1a6,0x1a7", ··· 837 753 }, 838 754 { 839 755 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that hit a cacheline in the L3 where a snoop was sent.", 756 + "Counter": "0,1,2,3", 840 757 "EventCode": "0xB7, 0xBB", 841 758 "EventName": "OCR.OTHER.L3_HIT.SNOOP_SENT", 842 759 "MSRIndex": "0x1a6,0x1a7", ··· 847 762 }, 848 763 { 849 764 "BriefDescription": "Counts streaming stores that hit a cacheline in the L3 where a snoop was sent or not.", 765 + "Counter": "0,1,2,3", 850 766 "EventCode": "0xB7, 0xBB", 851 767 "EventName": "OCR.STREAMING_WR.L3_HIT.ANY", 852 768 "MSRIndex": "0x1a6,0x1a7", ··· 857 771 }, 858 772 { 859 773 "BriefDescription": "Demand and prefetch data reads", 774 + "Counter": "0,1,2,3", 860 775 "EventCode": "0xB0", 861 776 "EventName": "OFFCORE_REQUESTS.ALL_DATA_RD", 862 777 "PublicDescription": "Counts the demand and prefetch data reads. All Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 prefetchers). Counting also covers reads due to page walks resulted from any request type.", ··· 866 779 }, 867 780 { 868 781 "BriefDescription": "Counts memory transactions sent to the uncore.", 782 + "Counter": "0,1,2,3", 869 783 "EventCode": "0xB0", 870 784 "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS", 871 785 "PublicDescription": "Counts memory transactions sent to the uncore including requests initiated by the core, all L3 prefetches, reads resulting from page walks, and snoop responses.", ··· 875 787 }, 876 788 { 877 789 "BriefDescription": "Demand Data Read requests sent to uncore", 790 + "Counter": "0,1,2,3", 878 791 "EventCode": "0xb0", 879 792 "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD", 880 793 "PublicDescription": "Counts the Demand Data Read requests sent to uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determine average latency in the uncore.", ··· 884 795 }, 885 796 { 886 797 "BriefDescription": "Demand RFO requests including regular RFOs, locks, ItoM", 798 + "Counter": "0,1,2,3", 887 799 "EventCode": "0xb0", 888 800 "EventName": "OFFCORE_REQUESTS.DEMAND_RFO", 889 801 "PublicDescription": "Counts the demand RFO (read for ownership) requests including regular RFOs, locks, ItoM.", ··· 893 803 }, 894 804 { 895 805 "BriefDescription": "For every cycle, increments by the number of outstanding data read requests pending.", 806 + "Counter": "0,1,2,3", 896 807 "EventCode": "0x60", 897 808 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD", 898 809 "PublicDescription": "For every cycle, increments by the number of outstanding data read requests pending. Data read requests include cacheable demand reads and L2 prefetches, but do not include RFOs, code reads or prefetches to the L3. Reads due to page walks resulting from any request type will also be counted. Requests are considered outstanding from the time they miss the core's L2 cache until the transaction completion message is sent to the requestor.", ··· 902 811 }, 903 812 { 904 813 "BriefDescription": "Cycles where at least 1 outstanding data read request is pending.", 814 + "Counter": "0,1,2,3", 905 815 "CounterMask": "1", 906 816 "EventCode": "0x60", 907 817 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD", ··· 912 820 }, 913 821 { 914 822 "BriefDescription": "Cycles where at least 1 outstanding Demand RFO request is pending.", 823 + "Counter": "0,1,2,3", 915 824 "CounterMask": "1", 916 825 "EventCode": "0x60", 917 826 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO", ··· 922 829 }, 923 830 { 924 831 "BriefDescription": "For every cycle, increments by the number of outstanding demand data read requests pending.", 832 + "Counter": "0,1,2,3", 925 833 "EventCode": "0x60", 926 834 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD", 927 835 "PublicDescription": "For every cycle, increments by the number of outstanding demand data read requests pending. Requests are considered outstanding from the time they miss the core's L2 cache until the transaction completion message is sent to the requestor.", ··· 931 837 }, 932 838 { 933 839 "BriefDescription": "Store Read transactions pending for off-core. Highly correlated.", 840 + "Counter": "0,1,2,3", 934 841 "EventCode": "0x60", 935 842 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_RFO", 936 843 "PublicDescription": "Counts the number of off-core outstanding read-for-ownership (RFO) store transactions every cycle. An RFO transaction is considered to be in the Off-core outstanding state between L2 cache miss and transaction completion.", ··· 940 845 }, 941 846 { 942 847 "BriefDescription": "Counts bus locks, accounts for cache line split locks and UC locks.", 848 + "Counter": "0,1,2,3", 943 849 "EventCode": "0xF4", 944 850 "EventName": "SQ_MISC.BUS_LOCK", 945 851 "PublicDescription": "Counts the more expensive bus lock needed to enforce cache coherency for certain memory accesses that need to be done atomically. Can be created by issuing an atomic instruction (via the LOCK prefix) which causes a cache line split or accesses uncacheable memory.", ··· 949 853 }, 950 854 { 951 855 "BriefDescription": "Cycles the queue waiting for offcore responses is full.", 856 + "Counter": "0,1,2,3", 952 857 "EventCode": "0xf4", 953 858 "EventName": "SQ_MISC.SQ_FULL", 954 859 "PublicDescription": "Counts the cycles for which the thread is active and the queue waiting for responses from the uncore cannot take any more entries.", ··· 957 860 "UMask": "0x4" 958 861 }, 959 862 { 863 + "BriefDescription": "Counts the number of PREFETCHNTA, PREFETCHW, PREFETCHT0, PREFETCHT1 or PREFETCHT2 instructions executed.", 864 + "Counter": "0,1,2,3", 865 + "EventCode": "0x32", 866 + "EventName": "SW_PREFETCH_ACCESS.ANY", 867 + "SampleAfterValue": "100003", 868 + "UMask": "0xf" 869 + }, 870 + { 960 871 "BriefDescription": "Number of PREFETCHNTA instructions executed.", 872 + "Counter": "0,1,2,3", 961 873 "EventCode": "0x32", 962 874 "EventName": "SW_PREFETCH_ACCESS.NTA", 963 875 "PublicDescription": "Counts the number of PREFETCHNTA instructions executed.", ··· 975 869 }, 976 870 { 977 871 "BriefDescription": "Number of PREFETCHW instructions executed.", 872 + "Counter": "0,1,2,3", 978 873 "EventCode": "0x32", 979 874 "EventName": "SW_PREFETCH_ACCESS.PREFETCHW", 980 875 "PublicDescription": "Counts the number of PREFETCHW instructions executed.", ··· 984 877 }, 985 878 { 986 879 "BriefDescription": "Number of PREFETCHT0 instructions executed.", 880 + "Counter": "0,1,2,3", 987 881 "EventCode": "0x32", 988 882 "EventName": "SW_PREFETCH_ACCESS.T0", 989 883 "PublicDescription": "Counts the number of PREFETCHT0 instructions executed.", ··· 993 885 }, 994 886 { 995 887 "BriefDescription": "Number of PREFETCHT1 or PREFETCHT2 instructions executed.", 888 + "Counter": "0,1,2,3", 996 889 "EventCode": "0x32", 997 890 "EventName": "SW_PREFETCH_ACCESS.T1_T2", 998 891 "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT2 instructions executed.",
+17
tools/perf/pmu-events/arch/x86/rocketlake/counter.json
··· 1 + [ 2 + { 3 + "Unit": "core", 4 + "CountersNumFixed": "4", 5 + "CountersNumGeneric": "8" 6 + }, 7 + { 8 + "Unit": "ARB", 9 + "CountersNumFixed": "0", 10 + "CountersNumGeneric": "2" 11 + }, 12 + { 13 + "Unit": "CLOCK", 14 + "CountersNumFixed": 1, 15 + "CountersNumGeneric": "0" 16 + } 17 + ]
+13
tools/perf/pmu-events/arch/x86/rocketlake/floating-point.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts all microcode FP assists.", 4 + "Counter": "0,1,2,3,4,5,6,7", 4 5 "EventCode": "0xc1", 5 6 "EventName": "ASSISTS.FP", 6 7 "PublicDescription": "Counts all microcode Floating Point assists.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Counts number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 12 + "Counter": "0,1,2,3,4,5,6,7", 13 13 "EventCode": "0xc7", 14 14 "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE", 15 15 "PublicDescription": "Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 20 + "Counter": "0,1,2,3,4,5,6,7", 22 21 "EventCode": "0xc7", 23 22 "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE", 24 23 "PublicDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 28 25 }, 29 26 { 30 27 "BriefDescription": "Counts number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 28 + "Counter": "0,1,2,3,4,5,6,7", 31 29 "EventCode": "0xc7", 32 30 "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE", 33 31 "PublicDescription": "Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 4 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 37 33 }, 38 34 { 39 35 "BriefDescription": "Counts number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 36 + "Counter": "0,1,2,3,4,5,6,7", 40 37 "EventCode": "0xc7", 41 38 "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE", 42 39 "PublicDescription": "Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 46 41 }, 47 42 { 48 43 "BriefDescription": "Number of SSE/AVX computational 128-bit packed single and 256-bit packed double precision FP instructions retired; some instructions will count twice as noted below. Each count represents 2 or/and 4 computation operations, 1 for each element. Applies to SSE* and AVX* packed single precision and packed double precision FP instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per element.", 44 + "Counter": "0,1,2,3,4,5,6,7", 49 45 "EventCode": "0xc7", 50 46 "EventName": "FP_ARITH_INST_RETIRED.4_FLOPS", 51 47 "PublicDescription": "Number of SSE/AVX computational 128-bit packed single precision and 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 2 or/and 4 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point and packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 55 49 }, 56 50 { 57 51 "BriefDescription": "Counts number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 52 + "Counter": "0,1,2,3,4,5,6,7", 58 53 "EventCode": "0xc7", 59 54 "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE", 60 55 "PublicDescription": "Number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 64 57 }, 65 58 { 66 59 "BriefDescription": "Counts number of SSE/AVX computational 512-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 16 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 60 + "Counter": "0,1,2,3,4,5,6,7", 67 61 "EventCode": "0xc7", 68 62 "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE", 69 63 "PublicDescription": "Number of SSE/AVX computational 512-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 16 computation operations, one for each element. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 73 65 }, 74 66 { 75 67 "BriefDescription": "Number of SSE/AVX computational 256-bit packed single precision and 512-bit packed double precision FP instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, 1 for each element. Applies to SSE* and AVX* packed single precision and double precision FP instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14 RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB count twice as they perform 2 calculations per element.", 68 + "Counter": "0,1,2,3,4,5,6,7", 76 69 "EventCode": "0xc7", 77 70 "EventName": "FP_ARITH_INST_RETIRED.8_FLOPS", 78 71 "PublicDescription": "Number of SSE/AVX computational 256-bit packed single precision and 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 8 computation operations, one for each element. Applies to SSE* and AVX* packed single precision and double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RSQRT14 RCP RCP14 DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 82 73 }, 83 74 { 84 75 "BriefDescription": "Number of SSE/AVX computational scalar floating-point instructions retired; some instructions will count twice as noted below. Applies to SSE* and AVX* scalar, double and single precision floating-point: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element.", 76 + "Counter": "0,1,2,3,4,5,6,7", 85 77 "EventCode": "0xc7", 86 78 "EventName": "FP_ARITH_INST_RETIRED.SCALAR", 87 79 "PublicDescription": "Number of SSE/AVX computational scalar single precision and double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 91 81 }, 92 82 { 93 83 "BriefDescription": "Counts number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 84 + "Counter": "0,1,2,3,4,5,6,7", 94 85 "EventCode": "0xc7", 95 86 "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE", 96 87 "PublicDescription": "Number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 100 89 }, 101 90 { 102 91 "BriefDescription": "Counts number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.", 92 + "Counter": "0,1,2,3,4,5,6,7", 103 93 "EventCode": "0xc7", 104 94 "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE", 105 95 "PublicDescription": "Number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below. Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element. The DAZ and FTZ flags in the MXCSR register need to be set when using these events.", ··· 109 97 }, 110 98 { 111 99 "BriefDescription": "Number of any Vector retired FP arithmetic instructions", 100 + "Counter": "0,1,2,3,4,5,6,7", 112 101 "EventCode": "0xc7", 113 102 "EventName": "FP_ARITH_INST_RETIRED.VECTOR", 114 103 "SampleAfterValue": "1000003",
+40 -1
tools/perf/pmu-events/arch/x86/rocketlake/frontend.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0xe6", 5 6 "EventName": "BACLEARS.ANY", 6 7 "PublicDescription": "Counts the number of times the front-end is resteered when it finds a branch instruction in a fetch line. This occurs for the first time a branch instruction is fetched or when the branch is not tracked by the BPU (Branch Prediction Unit) anymore.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to ILD_STALL.LCP]", 12 + "Counter": "0,1,2,3", 13 13 "EventCode": "0x87", 14 14 "EventName": "DECODE.LCP", 15 15 "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to ILD_STALL.LCP]", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Decode Stream Buffer (DSB)-to-MITE transitions count.", 20 + "Counter": "0,1,2,3", 22 21 "CounterMask": "1", 23 22 "EdgeDetect": "1", 24 23 "EventCode": "0xab", ··· 30 27 }, 31 28 { 32 29 "BriefDescription": "DSB-to-MITE switch true penalty cycles.", 30 + "Counter": "0,1,2,3", 33 31 "EventCode": "0xab", 34 32 "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES", 35 33 "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache that holds translations of previously fetched instructions that were decoded by the legacy x86 decode pipeline (MITE). This event counts fetch penalty cycles when a transition occurs from DSB to MITE.", ··· 39 35 }, 40 36 { 41 37 "BriefDescription": "Retired Instructions who experienced DSB miss.", 38 + "Counter": "0,1,2,3,4,5,6,7", 42 39 "EventCode": "0xc6", 43 40 "EventName": "FRONTEND_RETIRED.ANY_DSB_MISS", 44 41 "MSRIndex": "0x3F7", ··· 51 46 }, 52 47 { 53 48 "BriefDescription": "Retired Instructions who experienced a critical DSB miss.", 49 + "Counter": "0,1,2,3,4,5,6,7", 54 50 "EventCode": "0xc6", 55 51 "EventName": "FRONTEND_RETIRED.DSB_MISS", 56 52 "MSRIndex": "0x3F7", ··· 63 57 }, 64 58 { 65 59 "BriefDescription": "Retired Instructions who experienced iTLB true miss.", 60 + "Counter": "0,1,2,3,4,5,6,7", 66 61 "EventCode": "0xc6", 67 62 "EventName": "FRONTEND_RETIRED.ITLB_MISS", 68 63 "MSRIndex": "0x3F7", ··· 75 68 }, 76 69 { 77 70 "BriefDescription": "Retired Instructions who experienced Instruction L1 Cache true miss.", 71 + "Counter": "0,1,2,3,4,5,6,7", 78 72 "EventCode": "0xc6", 79 73 "EventName": "FRONTEND_RETIRED.L1I_MISS", 80 74 "MSRIndex": "0x3F7", ··· 87 79 }, 88 80 { 89 81 "BriefDescription": "Retired Instructions who experienced Instruction L2 Cache true miss.", 82 + "Counter": "0,1,2,3,4,5,6,7", 90 83 "EventCode": "0xc6", 91 84 "EventName": "FRONTEND_RETIRED.L2_MISS", 92 85 "MSRIndex": "0x3F7", ··· 99 90 }, 100 91 { 101 92 "BriefDescription": "Retired instructions after front-end starvation of at least 1 cycle", 93 + "Counter": "0,1,2,3,4,5,6,7", 102 94 "EventCode": "0xc6", 103 95 "EventName": "FRONTEND_RETIRED.LATENCY_GE_1", 104 96 "MSRIndex": "0x3F7", ··· 111 101 }, 112 102 { 113 103 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 128 cycles which was not interrupted by a back-end stall.", 104 + "Counter": "0,1,2,3,4,5,6,7", 114 105 "EventCode": "0xc6", 115 106 "EventName": "FRONTEND_RETIRED.LATENCY_GE_128", 116 107 "MSRIndex": "0x3F7", ··· 123 112 }, 124 113 { 125 114 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 16 cycles which was not interrupted by a back-end stall.", 115 + "Counter": "0,1,2,3,4,5,6,7", 126 116 "EventCode": "0xc6", 127 117 "EventName": "FRONTEND_RETIRED.LATENCY_GE_16", 128 118 "MSRIndex": "0x3F7", ··· 135 123 }, 136 124 { 137 125 "BriefDescription": "Retired instructions after front-end starvation of at least 2 cycles", 126 + "Counter": "0,1,2,3,4,5,6,7", 138 127 "EventCode": "0xc6", 139 128 "EventName": "FRONTEND_RETIRED.LATENCY_GE_2", 140 129 "MSRIndex": "0x3F7", ··· 147 134 }, 148 135 { 149 136 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 256 cycles which was not interrupted by a back-end stall.", 137 + "Counter": "0,1,2,3,4,5,6,7", 150 138 "EventCode": "0xc6", 151 139 "EventName": "FRONTEND_RETIRED.LATENCY_GE_256", 152 140 "MSRIndex": "0x3F7", ··· 159 145 }, 160 146 { 161 147 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end had at least 1 bubble-slot for a period of 2 cycles which was not interrupted by a back-end stall.", 148 + "Counter": "0,1,2,3,4,5,6,7", 162 149 "EventCode": "0xc6", 163 150 "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1", 164 151 "MSRIndex": "0x3F7", ··· 171 156 }, 172 157 { 173 158 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 32 cycles which was not interrupted by a back-end stall.", 159 + "Counter": "0,1,2,3,4,5,6,7", 174 160 "EventCode": "0xc6", 175 161 "EventName": "FRONTEND_RETIRED.LATENCY_GE_32", 176 162 "MSRIndex": "0x3F7", ··· 183 167 }, 184 168 { 185 169 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 4 cycles which was not interrupted by a back-end stall.", 170 + "Counter": "0,1,2,3,4,5,6,7", 186 171 "EventCode": "0xc6", 187 172 "EventName": "FRONTEND_RETIRED.LATENCY_GE_4", 188 173 "MSRIndex": "0x3F7", ··· 195 178 }, 196 179 { 197 180 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 512 cycles which was not interrupted by a back-end stall.", 181 + "Counter": "0,1,2,3,4,5,6,7", 198 182 "EventCode": "0xc6", 199 183 "EventName": "FRONTEND_RETIRED.LATENCY_GE_512", 200 184 "MSRIndex": "0x3F7", ··· 207 189 }, 208 190 { 209 191 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 64 cycles which was not interrupted by a back-end stall.", 192 + "Counter": "0,1,2,3,4,5,6,7", 210 193 "EventCode": "0xc6", 211 194 "EventName": "FRONTEND_RETIRED.LATENCY_GE_64", 212 195 "MSRIndex": "0x3F7", ··· 219 200 }, 220 201 { 221 202 "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 8 cycles which was not interrupted by a back-end stall.", 203 + "Counter": "0,1,2,3,4,5,6,7", 222 204 "EventCode": "0xc6", 223 205 "EventName": "FRONTEND_RETIRED.LATENCY_GE_8", 224 206 "MSRIndex": "0x3F7", ··· 231 211 }, 232 212 { 233 213 "BriefDescription": "Retired Instructions who experienced STLB (2nd level TLB) true miss.", 214 + "Counter": "0,1,2,3,4,5,6,7", 234 215 "EventCode": "0xc6", 235 216 "EventName": "FRONTEND_RETIRED.STLB_MISS", 236 217 "MSRIndex": "0x3F7", ··· 243 222 }, 244 223 { 245 224 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_DATA.STALLS]", 225 + "Counter": "0,1,2,3", 246 226 "EventCode": "0x80", 247 227 "EventName": "ICACHE_16B.IFDATA_STALL", 248 228 "PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_DATA.STALLS]", ··· 252 230 }, 253 231 { 254 232 "BriefDescription": "Instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity.", 233 + "Counter": "0,1,2,3", 255 234 "EventCode": "0x83", 256 235 "EventName": "ICACHE_64B.IFTAG_HIT", 257 236 "PublicDescription": "Counts instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity. Accounts for both cacheable and uncacheable accesses.", ··· 261 238 }, 262 239 { 263 240 "BriefDescription": "Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity.", 241 + "Counter": "0,1,2,3", 264 242 "EventCode": "0x83", 265 243 "EventName": "ICACHE_64B.IFTAG_MISS", 266 244 "PublicDescription": "Counts instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity. Accounts for both cacheable and uncacheable accesses.", ··· 270 246 }, 271 247 { 272 248 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", 249 + "Counter": "0,1,2,3", 273 250 "EventCode": "0x83", 274 251 "EventName": "ICACHE_64B.IFTAG_STALL", 275 252 "PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_TAG.STALLS]", ··· 279 254 }, 280 255 { 281 256 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss. [This event is alias to ICACHE_16B.IFDATA_STALL]", 257 + "Counter": "0,1,2,3", 282 258 "EventCode": "0x80", 283 259 "EventName": "ICACHE_DATA.STALLS", 284 260 "PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity. [This event is alias to ICACHE_16B.IFDATA_STALL]", ··· 288 262 }, 289 263 { 290 264 "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]", 265 + "Counter": "0,1,2,3", 291 266 "EventCode": "0x83", 292 267 "EventName": "ICACHE_TAG.STALLS", 293 268 "PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss. [This event is alias to ICACHE_64B.IFTAG_STALL]", ··· 297 270 }, 298 271 { 299 272 "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop", 273 + "Counter": "0,1,2,3", 300 274 "CounterMask": "1", 301 275 "EventCode": "0x79", 302 276 "EventName": "IDQ.DSB_CYCLES_ANY", ··· 307 279 }, 308 280 { 309 281 "BriefDescription": "Cycles DSB is delivering optimal number of Uops", 282 + "Counter": "0,1,2,3", 310 283 "CounterMask": "5", 311 284 "EventCode": "0x79", 312 285 "EventName": "IDQ.DSB_CYCLES_OK", 313 - "PublicDescription": "Counts the number of cycles where optimal number of uops was delivered to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipeline) path. During these cycles uops are not being delivered from the Decode Stream Buffer (DSB).", 286 + "PublicDescription": "Counts the number of cycles where optimal number of uops was delivered to the Instruction Decode Queue (IDQ) from the DSB (Decode Stream Buffer) path. Count includes uops that may 'bypass' the IDQ.", 314 287 "SampleAfterValue": "2000003", 315 288 "UMask": "0x8" 316 289 }, 317 290 { 318 291 "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path", 292 + "Counter": "0,1,2,3", 319 293 "EventCode": "0x79", 320 294 "EventName": "IDQ.DSB_UOPS", 321 295 "PublicDescription": "Counts the number of uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.", ··· 326 296 }, 327 297 { 328 298 "BriefDescription": "Cycles MITE is delivering any Uop", 299 + "Counter": "0,1,2,3", 329 300 "CounterMask": "1", 330 301 "EventCode": "0x79", 331 302 "EventName": "IDQ.MITE_CYCLES_ANY", ··· 336 305 }, 337 306 { 338 307 "BriefDescription": "Cycles MITE is delivering optimal number of Uops", 308 + "Counter": "0,1,2,3", 339 309 "CounterMask": "5", 340 310 "EventCode": "0x79", 341 311 "EventName": "IDQ.MITE_CYCLES_OK", ··· 346 314 }, 347 315 { 348 316 "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from MITE path", 317 + "Counter": "0,1,2,3", 349 318 "EventCode": "0x79", 350 319 "EventName": "IDQ.MITE_UOPS", 351 320 "PublicDescription": "Counts the number of uops delivered to Instruction Decode Queue (IDQ) from the MITE path. This also means that uops are not being delivered from the Decode Stream Buffer (DSB).", ··· 355 322 }, 356 323 { 357 324 "BriefDescription": "Cycles when uops are being delivered to IDQ while MS is busy", 325 + "Counter": "0,1,2,3", 358 326 "CounterMask": "1", 359 327 "EventCode": "0x79", 360 328 "EventName": "IDQ.MS_CYCLES_ANY", ··· 365 331 }, 366 332 { 367 333 "BriefDescription": "Number of switches from DSB or MITE to the MS", 334 + "Counter": "0,1,2,3", 368 335 "CounterMask": "1", 369 336 "EdgeDetect": "1", 370 337 "EventCode": "0x79", ··· 376 341 }, 377 342 { 378 343 "BriefDescription": "Uops delivered to IDQ while MS is busy", 344 + "Counter": "0,1,2,3", 379 345 "EventCode": "0x79", 380 346 "EventName": "IDQ.MS_UOPS", 381 347 "PublicDescription": "Counts the total number of uops delivered by the Microcode Sequencer (MS). Any instruction over 4 uops will be delivered by the MS. Some instructions such as transcendentals may additionally generate uops from the MS.", ··· 385 349 }, 386 350 { 387 351 "BriefDescription": "Uops not delivered by IDQ when backend of the machine is not stalled", 352 + "Counter": "0,1,2,3,4,5,6,7", 388 353 "EventCode": "0x9c", 389 354 "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE", 390 355 "PublicDescription": "Counts the number of uops not delivered to by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.", ··· 394 357 }, 395 358 { 396 359 "BriefDescription": "Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled", 360 + "Counter": "0,1,2,3,4,5,6,7", 397 361 "CounterMask": "5", 398 362 "EventCode": "0x9c", 399 363 "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE", ··· 404 366 }, 405 367 { 406 368 "BriefDescription": "Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled", 369 + "Counter": "0,1,2,3,4,5,6,7", 407 370 "CounterMask": "1", 408 371 "EventCode": "0x9C", 409 372 "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK",
+44
tools/perf/pmu-events/arch/x86/rocketlake/memory.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Cycles while L3 cache miss demand load is outstanding.", 4 + "Counter": "0,1,2,3", 4 5 "CounterMask": "2", 5 6 "EventCode": "0xA3", 6 7 "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Execution stalls while L3 cache miss demand load is outstanding.", 12 + "Counter": "0,1,2,3", 13 13 "CounterMask": "6", 14 14 "EventCode": "0xa3", 15 15 "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).", 20 + "Counter": "0,1,2,3,4,5,6,7", 22 21 "EventCode": "0xc8", 23 22 "EventName": "HLE_RETIRED.ABORTED", 24 23 "PublicDescription": "Counts the number of times HLE abort was triggered.", ··· 28 25 }, 29 26 { 30 27 "BriefDescription": "Number of times an HLE execution aborted due to unfriendly events (such as interrupts).", 28 + "Counter": "0,1,2,3,4,5,6,7", 31 29 "EventCode": "0xc8", 32 30 "EventName": "HLE_RETIRED.ABORTED_EVENTS", 33 31 "PublicDescription": "Counts the number of times an HLE execution aborted due to unfriendly events (such as interrupts).", ··· 37 33 }, 38 34 { 39 35 "BriefDescription": "Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).", 36 + "Counter": "0,1,2,3,4,5,6,7", 40 37 "EventCode": "0xc8", 41 38 "EventName": "HLE_RETIRED.ABORTED_MEM", 42 39 "PublicDescription": "Counts the number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).", ··· 46 41 }, 47 42 { 48 43 "BriefDescription": "Number of times an HLE execution aborted due to HLE-unfriendly instructions and certain unfriendly events (such as AD assists etc.).", 44 + "Counter": "0,1,2,3,4,5,6,7", 49 45 "EventCode": "0xc8", 50 46 "EventName": "HLE_RETIRED.ABORTED_UNFRIENDLY", 51 47 "PublicDescription": "Counts the number of times an HLE execution aborted due to HLE-unfriendly instructions and certain unfriendly events (such as AD assists etc.).", ··· 55 49 }, 56 50 { 57 51 "BriefDescription": "Number of times an HLE execution successfully committed", 52 + "Counter": "0,1,2,3,4,5,6,7", 58 53 "EventCode": "0xc8", 59 54 "EventName": "HLE_RETIRED.COMMIT", 60 55 "PublicDescription": "Counts the number of times HLE commit succeeded.", ··· 64 57 }, 65 58 { 66 59 "BriefDescription": "Number of times an HLE execution started.", 60 + "Counter": "0,1,2,3,4,5,6,7", 67 61 "EventCode": "0xc8", 68 62 "EventName": "HLE_RETIRED.START", 69 63 "PublicDescription": "Counts the number of times we entered an HLE region. Does not count nested transactions.", ··· 73 65 }, 74 66 { 75 67 "BriefDescription": "Number of machine clears due to memory ordering conflicts.", 68 + "Counter": "0,1,2,3,4,5,6,7", 76 69 "EventCode": "0xc3", 77 70 "EventName": "MACHINE_CLEARS.MEMORY_ORDERING", 78 71 "PublicDescription": "Counts the number of Machine Clears detected dye to memory ordering. Memory Ordering Machine Clears may apply when a memory read may not conform to the memory ordering rules of the x86 architecture", ··· 82 73 }, 83 74 { 84 75 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.", 76 + "Counter": "0,1,2,3,4,5,6,7", 85 77 "Data_LA": "1", 86 78 "EventCode": "0xcd", 87 79 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128", ··· 95 85 }, 96 86 { 97 87 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.", 88 + "Counter": "0,1,2,3,4,5,6,7", 98 89 "Data_LA": "1", 99 90 "EventCode": "0xcd", 100 91 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16", ··· 108 97 }, 109 98 { 110 99 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.", 100 + "Counter": "0,1,2,3,4,5,6,7", 111 101 "Data_LA": "1", 112 102 "EventCode": "0xcd", 113 103 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256", ··· 121 109 }, 122 110 { 123 111 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.", 112 + "Counter": "0,1,2,3,4,5,6,7", 124 113 "Data_LA": "1", 125 114 "EventCode": "0xcd", 126 115 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32", ··· 134 121 }, 135 122 { 136 123 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.", 124 + "Counter": "0,1,2,3,4,5,6,7", 137 125 "Data_LA": "1", 138 126 "EventCode": "0xcd", 139 127 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4", ··· 147 133 }, 148 134 { 149 135 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.", 136 + "Counter": "0,1,2,3,4,5,6,7", 150 137 "Data_LA": "1", 151 138 "EventCode": "0xcd", 152 139 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512", ··· 160 145 }, 161 146 { 162 147 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.", 148 + "Counter": "0,1,2,3,4,5,6,7", 163 149 "Data_LA": "1", 164 150 "EventCode": "0xcd", 165 151 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64", ··· 173 157 }, 174 158 { 175 159 "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.", 160 + "Counter": "0,1,2,3,4,5,6,7", 176 161 "Data_LA": "1", 177 162 "EventCode": "0xcd", 178 163 "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8", ··· 186 169 }, 187 170 { 188 171 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that was not supplied by the L3 cache.", 172 + "Counter": "0,1,2,3", 189 173 "EventCode": "0xB7, 0xBB", 190 174 "EventName": "OCR.DEMAND_CODE_RD.L3_MISS", 191 175 "MSRIndex": "0x1a6,0x1a7", ··· 196 178 }, 197 179 { 198 180 "BriefDescription": "Counts demand data reads that was not supplied by the L3 cache.", 181 + "Counter": "0,1,2,3", 199 182 "EventCode": "0xB7, 0xBB", 200 183 "EventName": "OCR.DEMAND_DATA_RD.L3_MISS", 201 184 "MSRIndex": "0x1a6,0x1a7", ··· 206 187 }, 207 188 { 208 189 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that was not supplied by the L3 cache.", 190 + "Counter": "0,1,2,3", 209 191 "EventCode": "0xB7, 0xBB", 210 192 "EventName": "OCR.DEMAND_RFO.L3_MISS", 211 193 "MSRIndex": "0x1a6,0x1a7", ··· 216 196 }, 217 197 { 218 198 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that was not supplied by the L3 cache.", 199 + "Counter": "0,1,2,3", 219 200 "EventCode": "0xB7, 0xBB", 220 201 "EventName": "OCR.HWPF_L1D_AND_SWPF.L3_MISS", 221 202 "MSRIndex": "0x1a6,0x1a7", ··· 226 205 }, 227 206 { 228 207 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that was not supplied by the L3 cache.", 208 + "Counter": "0,1,2,3", 229 209 "EventCode": "0xB7, 0xBB", 230 210 "EventName": "OCR.HWPF_L2_DATA_RD.L3_MISS", 231 211 "MSRIndex": "0x1a6,0x1a7", ··· 236 214 }, 237 215 { 238 216 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that was not supplied by the L3 cache.", 217 + "Counter": "0,1,2,3", 239 218 "EventCode": "0xB7, 0xBB", 240 219 "EventName": "OCR.HWPF_L2_RFO.L3_MISS", 241 220 "MSRIndex": "0x1a6,0x1a7", ··· 246 223 }, 247 224 { 248 225 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that was not supplied by the L3 cache.", 226 + "Counter": "0,1,2,3", 249 227 "EventCode": "0xB7, 0xBB", 250 228 "EventName": "OCR.OTHER.L3_MISS", 251 229 "MSRIndex": "0x1a6,0x1a7", ··· 256 232 }, 257 233 { 258 234 "BriefDescription": "Counts streaming stores that was not supplied by the L3 cache.", 235 + "Counter": "0,1,2,3", 259 236 "EventCode": "0xB7, 0xBB", 260 237 "EventName": "OCR.STREAMING_WR.L3_MISS", 261 238 "MSRIndex": "0x1a6,0x1a7", ··· 266 241 }, 267 242 { 268 243 "BriefDescription": "Counts demand data read requests that miss the L3 cache.", 244 + "Counter": "0,1,2,3", 269 245 "EventCode": "0xb0", 270 246 "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", 271 247 "SampleAfterValue": "100003", ··· 274 248 }, 275 249 { 276 250 "BriefDescription": "Cycles where at least one demand data read request known to have missed the L3 cache is pending.", 251 + "Counter": "0,1,2,3", 277 252 "CounterMask": "1", 278 253 "EventCode": "0x60", 279 254 "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_L3_MISS_DEMAND_DATA_RD", ··· 284 257 }, 285 258 { 286 259 "BriefDescription": "Number of times an RTM execution aborted.", 260 + "Counter": "0,1,2,3,4,5,6,7", 287 261 "EventCode": "0xc9", 288 262 "EventName": "RTM_RETIRED.ABORTED", 289 263 "PEBS": "1", ··· 294 266 }, 295 267 { 296 268 "BriefDescription": "Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt)", 269 + "Counter": "0,1,2,3,4,5,6,7", 297 270 "EventCode": "0xc9", 298 271 "EventName": "RTM_RETIRED.ABORTED_EVENTS", 299 272 "PublicDescription": "Counts the number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt).", ··· 303 274 }, 304 275 { 305 276 "BriefDescription": "Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts)", 277 + "Counter": "0,1,2,3,4,5,6,7", 306 278 "EventCode": "0xc9", 307 279 "EventName": "RTM_RETIRED.ABORTED_MEM", 308 280 "PublicDescription": "Counts the number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts).", ··· 312 282 }, 313 283 { 314 284 "BriefDescription": "Number of times an RTM execution aborted due to incompatible memory type", 285 + "Counter": "0,1,2,3,4,5,6,7", 315 286 "EventCode": "0xc9", 316 287 "EventName": "RTM_RETIRED.ABORTED_MEMTYPE", 317 288 "PublicDescription": "Counts the number of times an RTM execution aborted due to incompatible memory type.", ··· 321 290 }, 322 291 { 323 292 "BriefDescription": "Number of times an RTM execution aborted due to HLE-unfriendly instructions", 293 + "Counter": "0,1,2,3,4,5,6,7", 324 294 "EventCode": "0xc9", 325 295 "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY", 326 296 "PublicDescription": "Counts the number of times an RTM execution aborted due to HLE-unfriendly instructions.", ··· 330 298 }, 331 299 { 332 300 "BriefDescription": "Number of times an RTM execution successfully committed", 301 + "Counter": "0,1,2,3,4,5,6,7", 333 302 "EventCode": "0xc9", 334 303 "EventName": "RTM_RETIRED.COMMIT", 335 304 "PublicDescription": "Counts the number of times RTM commit succeeded.", ··· 339 306 }, 340 307 { 341 308 "BriefDescription": "Number of times an RTM execution started.", 309 + "Counter": "0,1,2,3,4,5,6,7", 342 310 "EventCode": "0xc9", 343 311 "EventName": "RTM_RETIRED.START", 344 312 "PublicDescription": "Counts the number of times we entered an RTM region. Does not count nested transactions.", ··· 348 314 }, 349 315 { 350 316 "BriefDescription": "Counts the number of times a class of instructions that may cause a transactional abort was executed inside a transactional region", 317 + "Counter": "0,1,2,3,4,5,6,7", 351 318 "EventCode": "0x5d", 352 319 "EventName": "TX_EXEC.MISC2", 353 320 "PublicDescription": "Counts Unfriendly TSX abort triggered by a vzeroupper instruction.", ··· 357 322 }, 358 323 { 359 324 "BriefDescription": "Number of times an instruction execution caused the transactional nest count supported to be exceeded", 325 + "Counter": "0,1,2,3,4,5,6,7", 360 326 "EventCode": "0x5d", 361 327 "EventName": "TX_EXEC.MISC3", 362 328 "PublicDescription": "Counts Unfriendly TSX abort triggered by a nest count that is too deep.", ··· 366 330 }, 367 331 { 368 332 "BriefDescription": "Speculatively counts the number of TSX aborts due to a data capacity limitation for transactional reads", 333 + "Counter": "0,1,2,3", 369 334 "EventCode": "0x54", 370 335 "EventName": "TX_MEM.ABORT_CAPACITY_READ", 371 336 "PublicDescription": "Speculatively counts the number of Transactional Synchronization Extensions (TSX) aborts due to a data capacity limitation for transactional reads", ··· 375 338 }, 376 339 { 377 340 "BriefDescription": "Speculatively counts the number of TSX aborts due to a data capacity limitation for transactional writes.", 341 + "Counter": "0,1,2,3", 378 342 "EventCode": "0x54", 379 343 "EventName": "TX_MEM.ABORT_CAPACITY_WRITE", 380 344 "PublicDescription": "Speculatively counts the number of Transactional Synchronization Extensions (TSX) aborts due to a data capacity limitation for transactional writes.", ··· 384 346 }, 385 347 { 386 348 "BriefDescription": "Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address", 349 + "Counter": "0,1,2,3", 387 350 "EventCode": "0x54", 388 351 "EventName": "TX_MEM.ABORT_CONFLICT", 389 352 "PublicDescription": "Counts the number of times a TSX line had a cache conflict.", ··· 393 354 }, 394 355 { 395 356 "BriefDescription": "Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer", 357 + "Counter": "0,1,2,3", 396 358 "EventCode": "0x54", 397 359 "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_MISMATCH", 398 360 "PublicDescription": "Counts the number of times a TSX Abort was triggered due to release/commit but data and address mismatch.", ··· 402 362 }, 403 363 { 404 364 "BriefDescription": "Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero.", 365 + "Counter": "0,1,2,3", 405 366 "EventCode": "0x54", 406 367 "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_NOT_EMPTY", 407 368 "PublicDescription": "Counts the number of times a TSX Abort was triggered due to commit but Lock Buffer not empty.", ··· 411 370 }, 412 371 { 413 372 "BriefDescription": "Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer.", 373 + "Counter": "0,1,2,3", 414 374 "EventCode": "0x54", 415 375 "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT", 416 376 "PublicDescription": "Counts the number of times a TSX Abort was triggered due to attempting an unsupported alignment from Lock Buffer.", ··· 420 378 }, 421 379 { 422 380 "BriefDescription": "Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer", 381 + "Counter": "0,1,2,3", 423 382 "EventCode": "0x54", 424 383 "EventName": "TX_MEM.ABORT_HLE_STORE_TO_ELIDED_LOCK", 425 384 "PublicDescription": "Counts the number of times a TSX Abort was triggered due to a non-release/commit store to lock.", ··· 429 386 }, 430 387 { 431 388 "BriefDescription": "Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.", 389 + "Counter": "0,1,2,3", 432 390 "EventCode": "0x54", 433 391 "EventName": "TX_MEM.HLE_ELISION_BUFFER_FULL", 434 392 "PublicDescription": "Counts the number of times we could not allocate Lock Buffer.",
+13
tools/perf/pmu-events/arch/x86/rocketlake/metricgroups.json
··· 5 5 "BigFootprint": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 6 6 "BrMispredicts": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 7 7 "Branches": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 8 + "BvBC": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 9 + "BvBO": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 10 + "BvCB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 11 + "BvFB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 12 + "BvIO": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 13 + "BvMB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 14 + "BvML": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 15 + "BvMP": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 16 + "BvMS": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 17 + "BvMT": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 18 + "BvOB": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 19 + "BvUW": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 8 20 "CacheHits": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 21 + "CacheMisses": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 9 22 "CodeGen": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 10 23 "Compute": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet", 11 24 "Cor": "Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet",
+27
tools/perf/pmu-events/arch/x86/rocketlake/other.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the Non-AVX turbo schedule.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0x28", 5 6 "EventName": "CORE_POWER.LVL0_TURBO_LICENSE", 6 7 "PublicDescription": "Counts Core cycles where the core was running with power-delivery for baseline license level 0. This includes non-AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes.", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the AVX2 turbo schedule.", 12 + "Counter": "0,1,2,3", 13 13 "EventCode": "0x28", 14 14 "EventName": "CORE_POWER.LVL1_TURBO_LICENSE", 15 15 "PublicDescription": "Counts Core cycles where the core was running with power-delivery for license level 1. This includes high current AVX 256-bit instructions as well as low current AVX 512-bit instructions.", ··· 19 17 }, 20 18 { 21 19 "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the AVX512 turbo schedule.", 20 + "Counter": "0,1,2,3", 22 21 "EventCode": "0x28", 23 22 "EventName": "CORE_POWER.LVL2_TURBO_LICENSE", 24 23 "PublicDescription": "Core cycles where the core was running with power-delivery for license level 2 (introduced in Skylake Server microarchitecture). This includes high current AVX 512-bit instructions.", ··· 28 25 }, 29 26 { 30 27 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that have any type of response.", 28 + "Counter": "0,1,2,3", 31 29 "EventCode": "0xB7, 0xBB", 32 30 "EventName": "OCR.DEMAND_CODE_RD.ANY_RESPONSE", 33 31 "MSRIndex": "0x1a6,0x1a7", ··· 38 34 }, 39 35 { 40 36 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that DRAM supplied the request.", 37 + "Counter": "0,1,2,3", 41 38 "EventCode": "0xB7, 0xBB", 42 39 "EventName": "OCR.DEMAND_CODE_RD.DRAM", 43 40 "MSRIndex": "0x1a6,0x1a7", ··· 48 43 }, 49 44 { 50 45 "BriefDescription": "Counts demand instruction fetches and L1 instruction cache prefetches that DRAM supplied the request.", 46 + "Counter": "0,1,2,3", 51 47 "EventCode": "0xB7, 0xBB", 52 48 "EventName": "OCR.DEMAND_CODE_RD.LOCAL_DRAM", 53 49 "MSRIndex": "0x1a6,0x1a7", ··· 58 52 }, 59 53 { 60 54 "BriefDescription": "Counts demand data reads that have any type of response.", 55 + "Counter": "0,1,2,3", 61 56 "EventCode": "0xB7, 0xBB", 62 57 "EventName": "OCR.DEMAND_DATA_RD.ANY_RESPONSE", 63 58 "MSRIndex": "0x1a6,0x1a7", ··· 68 61 }, 69 62 { 70 63 "BriefDescription": "Counts demand data reads that DRAM supplied the request.", 64 + "Counter": "0,1,2,3", 71 65 "EventCode": "0xB7, 0xBB", 72 66 "EventName": "OCR.DEMAND_DATA_RD.DRAM", 73 67 "MSRIndex": "0x1a6,0x1a7", ··· 78 70 }, 79 71 { 80 72 "BriefDescription": "Counts demand data reads that DRAM supplied the request.", 73 + "Counter": "0,1,2,3", 81 74 "EventCode": "0xB7, 0xBB", 82 75 "EventName": "OCR.DEMAND_DATA_RD.LOCAL_DRAM", 83 76 "MSRIndex": "0x1a6,0x1a7", ··· 88 79 }, 89 80 { 90 81 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that have any type of response.", 82 + "Counter": "0,1,2,3", 91 83 "EventCode": "0xB7, 0xBB", 92 84 "EventName": "OCR.DEMAND_RFO.ANY_RESPONSE", 93 85 "MSRIndex": "0x1a6,0x1a7", ··· 98 88 }, 99 89 { 100 90 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that DRAM supplied the request.", 91 + "Counter": "0,1,2,3", 101 92 "EventCode": "0xB7, 0xBB", 102 93 "EventName": "OCR.DEMAND_RFO.DRAM", 103 94 "MSRIndex": "0x1a6,0x1a7", ··· 108 97 }, 109 98 { 110 99 "BriefDescription": "Counts demand reads for ownership (RFO) requests and software prefetches for exclusive ownership (PREFETCHW) that DRAM supplied the request.", 100 + "Counter": "0,1,2,3", 111 101 "EventCode": "0xB7, 0xBB", 112 102 "EventName": "OCR.DEMAND_RFO.LOCAL_DRAM", 113 103 "MSRIndex": "0x1a6,0x1a7", ··· 118 106 }, 119 107 { 120 108 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that have any type of response.", 109 + "Counter": "0,1,2,3", 121 110 "EventCode": "0xB7, 0xBB", 122 111 "EventName": "OCR.HWPF_L1D_AND_SWPF.ANY_RESPONSE", 123 112 "MSRIndex": "0x1a6,0x1a7", ··· 128 115 }, 129 116 { 130 117 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that DRAM supplied the request.", 118 + "Counter": "0,1,2,3", 131 119 "EventCode": "0xB7, 0xBB", 132 120 "EventName": "OCR.HWPF_L1D_AND_SWPF.DRAM", 133 121 "MSRIndex": "0x1a6,0x1a7", ··· 138 124 }, 139 125 { 140 126 "BriefDescription": "Counts L1 data cache prefetch requests and software prefetches (except PREFETCHW) that DRAM supplied the request.", 127 + "Counter": "0,1,2,3", 141 128 "EventCode": "0xB7, 0xBB", 142 129 "EventName": "OCR.HWPF_L1D_AND_SWPF.LOCAL_DRAM", 143 130 "MSRIndex": "0x1a6,0x1a7", ··· 148 133 }, 149 134 { 150 135 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that have any type of response.", 136 + "Counter": "0,1,2,3", 151 137 "EventCode": "0xB7, 0xBB", 152 138 "EventName": "OCR.HWPF_L2_DATA_RD.ANY_RESPONSE", 153 139 "MSRIndex": "0x1a6,0x1a7", ··· 158 142 }, 159 143 { 160 144 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that DRAM supplied the request.", 145 + "Counter": "0,1,2,3", 161 146 "EventCode": "0xB7, 0xBB", 162 147 "EventName": "OCR.HWPF_L2_DATA_RD.DRAM", 163 148 "MSRIndex": "0x1a6,0x1a7", ··· 168 151 }, 169 152 { 170 153 "BriefDescription": "Counts hardware prefetch data reads (which bring data to L2) that DRAM supplied the request.", 154 + "Counter": "0,1,2,3", 171 155 "EventCode": "0xB7, 0xBB", 172 156 "EventName": "OCR.HWPF_L2_DATA_RD.LOCAL_DRAM", 173 157 "MSRIndex": "0x1a6,0x1a7", ··· 178 160 }, 179 161 { 180 162 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that have any type of response.", 163 + "Counter": "0,1,2,3", 181 164 "EventCode": "0xB7, 0xBB", 182 165 "EventName": "OCR.HWPF_L2_RFO.ANY_RESPONSE", 183 166 "MSRIndex": "0x1a6,0x1a7", ··· 188 169 }, 189 170 { 190 171 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that DRAM supplied the request.", 172 + "Counter": "0,1,2,3", 191 173 "EventCode": "0xB7, 0xBB", 192 174 "EventName": "OCR.HWPF_L2_RFO.DRAM", 193 175 "MSRIndex": "0x1a6,0x1a7", ··· 198 178 }, 199 179 { 200 180 "BriefDescription": "Counts hardware prefetch RFOs (which bring data to L2) that DRAM supplied the request.", 181 + "Counter": "0,1,2,3", 201 182 "EventCode": "0xB7, 0xBB", 202 183 "EventName": "OCR.HWPF_L2_RFO.LOCAL_DRAM", 203 184 "MSRIndex": "0x1a6,0x1a7", ··· 208 187 }, 209 188 { 210 189 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that have any type of response.", 190 + "Counter": "0,1,2,3", 211 191 "EventCode": "0xB7, 0xBB", 212 192 "EventName": "OCR.OTHER.ANY_RESPONSE", 213 193 "MSRIndex": "0x1a6,0x1a7", ··· 218 196 }, 219 197 { 220 198 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that DRAM supplied the request.", 199 + "Counter": "0,1,2,3", 221 200 "EventCode": "0xB7, 0xBB", 222 201 "EventName": "OCR.OTHER.DRAM", 223 202 "MSRIndex": "0x1a6,0x1a7", ··· 228 205 }, 229 206 { 230 207 "BriefDescription": "Counts miscellaneous requests, such as I/O and un-cacheable accesses that DRAM supplied the request.", 208 + "Counter": "0,1,2,3", 231 209 "EventCode": "0xB7, 0xBB", 232 210 "EventName": "OCR.OTHER.LOCAL_DRAM", 233 211 "MSRIndex": "0x1a6,0x1a7", ··· 238 214 }, 239 215 { 240 216 "BriefDescription": "Counts streaming stores that have any type of response.", 217 + "Counter": "0,1,2,3", 241 218 "EventCode": "0xB7, 0xBB", 242 219 "EventName": "OCR.STREAMING_WR.ANY_RESPONSE", 243 220 "MSRIndex": "0x1a6,0x1a7", ··· 248 223 }, 249 224 { 250 225 "BriefDescription": "Counts streaming stores that DRAM supplied the request.", 226 + "Counter": "0,1,2,3", 251 227 "EventCode": "0xB7, 0xBB", 252 228 "EventName": "OCR.STREAMING_WR.DRAM", 253 229 "MSRIndex": "0x1a6,0x1a7", ··· 258 232 }, 259 233 { 260 234 "BriefDescription": "Counts streaming stores that DRAM supplied the request.", 235 + "Counter": "0,1,2,3", 261 236 "EventCode": "0xB7, 0xBB", 262 237 "EventName": "OCR.STREAMING_WR.LOCAL_DRAM", 263 238 "MSRIndex": "0x1a6,0x1a7",
+94
tools/perf/pmu-events/arch/x86/rocketlake/pipeline.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Cycles when divide unit is busy executing divide or square root operations.", 4 + "Counter": "0,1,2,3,4,5,6,7", 4 5 "CounterMask": "1", 5 6 "EventCode": "0x14", 6 7 "EventName": "ARITH.DIVIDER_ACTIVE", ··· 11 10 }, 12 11 { 13 12 "BriefDescription": "Number of occurrences where a microcode assist is invoked by hardware.", 13 + "Counter": "0,1,2,3,4,5,6,7", 14 14 "EventCode": "0xc1", 15 15 "EventName": "ASSISTS.ANY", 16 16 "PublicDescription": "Counts the number of occurrences where a microcode assist is invoked by hardware Examples include AD (page Access Dirty), FP and AVX related assists.", ··· 20 18 }, 21 19 { 22 20 "BriefDescription": "All branch instructions retired.", 21 + "Counter": "0,1,2,3,4,5,6,7", 23 22 "EventCode": "0xc4", 24 23 "EventName": "BR_INST_RETIRED.ALL_BRANCHES", 25 24 "PEBS": "1", ··· 29 26 }, 30 27 { 31 28 "BriefDescription": "Conditional branch instructions retired.", 29 + "Counter": "0,1,2,3,4,5,6,7", 32 30 "EventCode": "0xc4", 33 31 "EventName": "BR_INST_RETIRED.COND", 34 32 "PEBS": "1", ··· 39 35 }, 40 36 { 41 37 "BriefDescription": "Not taken branch instructions retired.", 38 + "Counter": "0,1,2,3,4,5,6,7", 42 39 "EventCode": "0xc4", 43 40 "EventName": "BR_INST_RETIRED.COND_NTAKEN", 44 41 "PEBS": "1", ··· 49 44 }, 50 45 { 51 46 "BriefDescription": "Taken conditional branch instructions retired.", 47 + "Counter": "0,1,2,3,4,5,6,7", 52 48 "EventCode": "0xc4", 53 49 "EventName": "BR_INST_RETIRED.COND_TAKEN", 54 50 "PEBS": "1", ··· 59 53 }, 60 54 { 61 55 "BriefDescription": "Far branch instructions retired.", 56 + "Counter": "0,1,2,3,4,5,6,7", 62 57 "EventCode": "0xc4", 63 58 "EventName": "BR_INST_RETIRED.FAR_BRANCH", 64 59 "PEBS": "1", ··· 69 62 }, 70 63 { 71 64 "BriefDescription": "Indirect near branch instructions retired (excluding returns)", 65 + "Counter": "0,1,2,3,4,5,6,7", 72 66 "EventCode": "0xc4", 73 67 "EventName": "BR_INST_RETIRED.INDIRECT", 74 68 "PEBS": "1", ··· 79 71 }, 80 72 { 81 73 "BriefDescription": "Direct and indirect near call instructions retired.", 74 + "Counter": "0,1,2,3,4,5,6,7", 82 75 "EventCode": "0xc4", 83 76 "EventName": "BR_INST_RETIRED.NEAR_CALL", 84 77 "PEBS": "1", ··· 89 80 }, 90 81 { 91 82 "BriefDescription": "Return instructions retired.", 83 + "Counter": "0,1,2,3,4,5,6,7", 92 84 "EventCode": "0xc4", 93 85 "EventName": "BR_INST_RETIRED.NEAR_RETURN", 94 86 "PEBS": "1", ··· 99 89 }, 100 90 { 101 91 "BriefDescription": "Taken branch instructions retired.", 92 + "Counter": "0,1,2,3,4,5,6,7", 102 93 "EventCode": "0xc4", 103 94 "EventName": "BR_INST_RETIRED.NEAR_TAKEN", 104 95 "PEBS": "1", ··· 109 98 }, 110 99 { 111 100 "BriefDescription": "All mispredicted branch instructions retired.", 101 + "Counter": "0,1,2,3,4,5,6,7", 112 102 "EventCode": "0xc5", 113 103 "EventName": "BR_MISP_RETIRED.ALL_BRANCHES", 114 104 "PEBS": "1", ··· 118 106 }, 119 107 { 120 108 "BriefDescription": "Mispredicted conditional branch instructions retired.", 109 + "Counter": "0,1,2,3,4,5,6,7", 121 110 "EventCode": "0xc5", 122 111 "EventName": "BR_MISP_RETIRED.COND", 123 112 "PEBS": "1", ··· 128 115 }, 129 116 { 130 117 "BriefDescription": "Mispredicted non-taken conditional branch instructions retired.", 118 + "Counter": "0,1,2,3,4,5,6,7", 131 119 "EventCode": "0xc5", 132 120 "EventName": "BR_MISP_RETIRED.COND_NTAKEN", 133 121 "PEBS": "1", ··· 138 124 }, 139 125 { 140 126 "BriefDescription": "number of branch instructions retired that were mispredicted and taken.", 127 + "Counter": "0,1,2,3,4,5,6,7", 141 128 "EventCode": "0xc5", 142 129 "EventName": "BR_MISP_RETIRED.COND_TAKEN", 143 130 "PEBS": "1", ··· 148 133 }, 149 134 { 150 135 "BriefDescription": "All miss-predicted indirect branch instructions retired (excluding RETs. TSX aborts is considered indirect branch).", 136 + "Counter": "0,1,2,3,4,5,6,7", 151 137 "EventCode": "0xc5", 152 138 "EventName": "BR_MISP_RETIRED.INDIRECT", 153 139 "PEBS": "1", ··· 158 142 }, 159 143 { 160 144 "BriefDescription": "Mispredicted indirect CALL instructions retired.", 145 + "Counter": "0,1,2,3,4,5,6,7", 161 146 "EventCode": "0xc5", 162 147 "EventName": "BR_MISP_RETIRED.INDIRECT_CALL", 163 148 "PEBS": "1", ··· 168 151 }, 169 152 { 170 153 "BriefDescription": "Number of near branch instructions retired that were mispredicted and taken.", 154 + "Counter": "0,1,2,3,4,5,6,7", 171 155 "EventCode": "0xc5", 172 156 "EventName": "BR_MISP_RETIRED.NEAR_TAKEN", 173 157 "PEBS": "1", ··· 178 160 }, 179 161 { 180 162 "BriefDescription": "This event counts the number of mispredicted ret instructions retired. Non PEBS", 163 + "Counter": "0,1,2,3,4,5,6,7", 181 164 "EventCode": "0xc5", 182 165 "EventName": "BR_MISP_RETIRED.RET", 183 166 "PEBS": "1", ··· 188 169 }, 189 170 { 190 171 "BriefDescription": "Cycle counts are evenly distributed between active threads in the Core.", 172 + "Counter": "0,1,2,3,4,5,6,7", 191 173 "EventCode": "0xec", 192 174 "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED", 193 175 "PublicDescription": "This event distributes cycle counts between active hyperthreads, i.e., those in C0. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If all other hyperthreads are inactive (or disabled or do not exist), all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.", ··· 197 177 }, 198 178 { 199 179 "BriefDescription": "Core crystal clock cycles when this thread is unhalted and the other thread is halted.", 180 + "Counter": "0,1,2,3,4,5,6,7", 200 181 "EventCode": "0x3C", 201 182 "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE", 202 183 "PublicDescription": "Counts Core crystal clock cycles when current thread is unhalted and the other thread is halted.", ··· 206 185 }, 207 186 { 208 187 "BriefDescription": "Core crystal clock cycles. Cycle counts are evenly distributed between active threads in the Core.", 188 + "Counter": "0,1,2,3,4,5,6,7", 209 189 "EventCode": "0x3c", 210 190 "EventName": "CPU_CLK_UNHALTED.REF_DISTRIBUTED", 211 191 "PublicDescription": "This event distributes Core crystal clock cycle counts between active hyperthreads, i.e., those in C0 sleep-state. A hyperthread becomes inactive when it executes the HLT or MWAIT instructions. If one thread is active in a core, all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.", ··· 215 193 }, 216 194 { 217 195 "BriefDescription": "Reference cycles when the core is not in halt state.", 196 + "Counter": "Fixed counter 2", 218 197 "EventName": "CPU_CLK_UNHALTED.REF_TSC", 219 198 "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'. The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'. After the counter has overflowed and software clears the overflow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.", 220 199 "SampleAfterValue": "2000003", ··· 223 200 }, 224 201 { 225 202 "BriefDescription": "Core crystal clock cycles when the thread is unhalted.", 203 + "Counter": "0,1,2,3,4,5,6,7", 226 204 "EventCode": "0x3C", 227 205 "EventName": "CPU_CLK_UNHALTED.REF_XCLK", 228 206 "PublicDescription": "Counts core crystal clock cycles when the thread is unhalted.", ··· 232 208 }, 233 209 { 234 210 "BriefDescription": "Core cycles when the thread is not in halt state", 211 + "Counter": "Fixed counter 1", 235 212 "EventName": "CPU_CLK_UNHALTED.THREAD", 236 213 "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the eight programmable counters available for other events.", 237 214 "SampleAfterValue": "2000003", ··· 240 215 }, 241 216 { 242 217 "BriefDescription": "Thread cycles when thread is not in halt state", 218 + "Counter": "0,1,2,3,4,5,6,7", 243 219 "EventCode": "0x3C", 244 220 "EventName": "CPU_CLK_UNHALTED.THREAD_P", 245 221 "PublicDescription": "This is an architectural event that counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. For this reason, this event may have a changing ratio with regards to wall clock time.", ··· 248 222 }, 249 223 { 250 224 "BriefDescription": "Cycles while L1 cache miss demand load is outstanding.", 225 + "Counter": "0,1,2,3", 251 226 "CounterMask": "8", 252 227 "EventCode": "0xA3", 253 228 "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS", ··· 257 230 }, 258 231 { 259 232 "BriefDescription": "Cycles while L2 cache miss demand load is outstanding.", 233 + "Counter": "0,1,2,3", 260 234 "CounterMask": "1", 261 235 "EventCode": "0xA3", 262 236 "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS", ··· 266 238 }, 267 239 { 268 240 "BriefDescription": "Cycles while memory subsystem has an outstanding load.", 241 + "Counter": "0,1,2,3,4,5,6,7", 269 242 "CounterMask": "16", 270 243 "EventCode": "0xA3", 271 244 "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY", ··· 275 246 }, 276 247 { 277 248 "BriefDescription": "Execution stalls while L1 cache miss demand load is outstanding.", 249 + "Counter": "0,1,2,3", 278 250 "CounterMask": "12", 279 251 "EventCode": "0xA3", 280 252 "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS", ··· 284 254 }, 285 255 { 286 256 "BriefDescription": "Execution stalls while L2 cache miss demand load is outstanding.", 257 + "Counter": "0,1,2,3", 287 258 "CounterMask": "5", 288 259 "EventCode": "0xa3", 289 260 "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS", ··· 293 262 }, 294 263 { 295 264 "BriefDescription": "Execution stalls while memory subsystem has an outstanding load.", 265 + "Counter": "0,1,2,3,4,5,6,7", 296 266 "CounterMask": "20", 297 267 "EventCode": "0xa3", 298 268 "EventName": "CYCLE_ACTIVITY.STALLS_MEM_ANY", ··· 302 270 }, 303 271 { 304 272 "BriefDescription": "Total execution stalls.", 273 + "Counter": "0,1,2,3,4,5,6,7", 305 274 "CounterMask": "4", 306 275 "EventCode": "0xa3", 307 276 "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL", ··· 311 278 }, 312 279 { 313 280 "BriefDescription": "Cycles total of 1 uop is executed on all ports and Reservation Station was not empty.", 281 + "Counter": "0,1,2,3,4,5,6,7", 314 282 "EventCode": "0xa6", 315 283 "EventName": "EXE_ACTIVITY.1_PORTS_UTIL", 316 284 "PublicDescription": "Counts cycles during which a total of 1 uop was executed on all ports and Reservation Station (RS) was not empty.", ··· 320 286 }, 321 287 { 322 288 "BriefDescription": "Cycles total of 2 uops are executed on all ports and Reservation Station was not empty.", 289 + "Counter": "0,1,2,3,4,5,6,7", 323 290 "EventCode": "0xa6", 324 291 "EventName": "EXE_ACTIVITY.2_PORTS_UTIL", 325 292 "PublicDescription": "Counts cycles during which a total of 2 uops were executed on all ports and Reservation Station (RS) was not empty.", ··· 329 294 }, 330 295 { 331 296 "BriefDescription": "Cycles total of 3 uops are executed on all ports and Reservation Station was not empty.", 297 + "Counter": "0,1,2,3,4,5,6,7", 332 298 "EventCode": "0xa6", 333 299 "EventName": "EXE_ACTIVITY.3_PORTS_UTIL", 334 300 "PublicDescription": "Cycles total of 3 uops are executed on all ports and Reservation Station (RS) was not empty.", ··· 338 302 }, 339 303 { 340 304 "BriefDescription": "Cycles total of 4 uops are executed on all ports and Reservation Station was not empty.", 305 + "Counter": "0,1,2,3,4,5,6,7", 341 306 "EventCode": "0xa6", 342 307 "EventName": "EXE_ACTIVITY.4_PORTS_UTIL", 343 308 "PublicDescription": "Cycles total of 4 uops are executed on all ports and Reservation Station (RS) was not empty.", ··· 347 310 }, 348 311 { 349 312 "BriefDescription": "Cycles where the Store Buffer was full and no loads caused an execution stall.", 313 + "Counter": "0,1,2,3,4,5,6,7", 350 314 "CounterMask": "2", 351 315 "EventCode": "0xA6", 352 316 "EventName": "EXE_ACTIVITY.BOUND_ON_STORES", ··· 357 319 }, 358 320 { 359 321 "BriefDescription": "Stalls caused by changing prefix length of the instruction. [This event is alias to DECODE.LCP]", 322 + "Counter": "0,1,2,3", 360 323 "EventCode": "0x87", 361 324 "EventName": "ILD_STALL.LCP", 362 325 "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk. [This event is alias to DECODE.LCP]", ··· 366 327 }, 367 328 { 368 329 "BriefDescription": "Instruction decoders utilized in a cycle", 330 + "Counter": "0,1,2,3", 369 331 "EventCode": "0x55", 370 332 "EventName": "INST_DECODED.DECODERS", 371 333 "PublicDescription": "Number of decoders utilized in a cycle when the MITE (legacy decode pipeline) fetches instructions.", ··· 375 335 }, 376 336 { 377 337 "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event", 338 + "Counter": "Fixed counter 0", 378 339 "EventName": "INST_RETIRED.ANY", 379 340 "PEBS": "1", 380 341 "PublicDescription": "Counts the number of instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.", ··· 384 343 }, 385 344 { 386 345 "BriefDescription": "Number of instructions retired. General Counter - architectural event", 346 + "Counter": "0,1,2,3,4,5,6,7", 387 347 "EventCode": "0xc0", 388 348 "EventName": "INST_RETIRED.ANY_P", 389 349 "PEBS": "1", ··· 393 351 }, 394 352 { 395 353 "BriefDescription": "Number of all retired NOP instructions.", 354 + "Counter": "0,1,2,3,4,5,6,7", 396 355 "EventCode": "0xc0", 397 356 "EventName": "INST_RETIRED.NOP", 398 357 "PEBS": "1", ··· 402 359 }, 403 360 { 404 361 "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution", 362 + "Counter": "Fixed counter 0", 405 363 "EventName": "INST_RETIRED.PREC_DIST", 406 364 "PEBS": "1", 407 365 "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.", ··· 411 367 }, 412 368 { 413 369 "BriefDescription": "Cycles without actually retired instructions.", 370 + "Counter": "0,1,2,3,4,5,6,7", 414 371 "CounterMask": "1", 415 372 "EventCode": "0xc0", 416 373 "EventName": "INST_RETIRED.STALL_CYCLES", ··· 422 377 }, 423 378 { 424 379 "BriefDescription": "Cycles the Backend cluster is recovering after a miss-speculation or a Store Buffer or Load Buffer drain stall.", 380 + "Counter": "0,1,2,3,4,5,6,7", 425 381 "CounterMask": "1", 426 382 "EventCode": "0x0D", 427 383 "EventName": "INT_MISC.ALL_RECOVERY_CYCLES", ··· 432 386 }, 433 387 { 434 388 "BriefDescription": "Clears speculative count", 389 + "Counter": "0,1,2,3,4,5,6,7", 435 390 "CounterMask": "1", 436 391 "EdgeDetect": "1", 437 392 "EventCode": "0x0D", ··· 443 396 }, 444 397 { 445 398 "BriefDescription": "Counts cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path.", 399 + "Counter": "0,1,2,3,4,5,6,7", 446 400 "EventCode": "0x0d", 447 401 "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES", 448 402 "PublicDescription": "Cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path.", ··· 452 404 }, 453 405 { 454 406 "BriefDescription": "Core cycles the allocator was stalled due to recovery from earlier clear event for this thread", 407 + "Counter": "0,1,2,3,4,5,6,7", 455 408 "EventCode": "0x0D", 456 409 "EventName": "INT_MISC.RECOVERY_CYCLES", 457 410 "PublicDescription": "Counts core cycles when the Resource allocator was stalled due to recovery from an earlier branch misprediction or machine clear event.", ··· 461 412 }, 462 413 { 463 414 "BriefDescription": "TMA slots where uops got dropped", 415 + "Counter": "0,1,2,3,4,5,6,7", 464 416 "EventCode": "0x0d", 465 417 "EventName": "INT_MISC.UOP_DROPPING", 466 418 "PublicDescription": "Estimated number of Top-down Microarchitecture Analysis slots that got dropped due to non front-end reasons", ··· 470 420 }, 471 421 { 472 422 "BriefDescription": "The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.", 423 + "Counter": "0,1,2,3", 473 424 "EventCode": "0x03", 474 425 "EventName": "LD_BLOCKS.NO_SR", 475 426 "PublicDescription": "Counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.", ··· 479 428 }, 480 429 { 481 430 "BriefDescription": "Loads blocked due to overlapping with a preceding store that cannot be forwarded.", 431 + "Counter": "0,1,2,3", 482 432 "EventCode": "0x03", 483 433 "EventName": "LD_BLOCKS.STORE_FORWARD", 484 434 "PublicDescription": "Counts the number of times where store forwarding was prevented for a load operation. The most common case is a load blocked due to the address of memory access (partially) overlapping with a preceding uncompleted store. Note: See the table of not supported store forwards in the Optimization Guide.", ··· 488 436 }, 489 437 { 490 438 "BriefDescription": "False dependencies due to partial compare on address.", 439 + "Counter": "0,1,2,3", 491 440 "EventCode": "0x07", 492 441 "EventName": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS", 493 442 "PublicDescription": "Counts the number of times a load got blocked due to false dependencies due to partial compare on address.", ··· 497 444 }, 498 445 { 499 446 "BriefDescription": "Counts the number of demand load dispatches that hit L1D fill buffer (FB) allocated for software prefetch.", 447 + "Counter": "0,1,2,3", 500 448 "EventCode": "0x4c", 501 449 "EventName": "LOAD_HIT_PREFETCH.SWPF", 502 450 "PublicDescription": "Counts all not software-prefetch load dispatches that hit the fill buffer (FB) allocated for the software prefetch. It can also be incremented by some lock instructions. So it should only be used with profiling so that the locks can be excluded by ASM (Assembly File) inspection of the nearby instructions.", ··· 506 452 }, 507 453 { 508 454 "BriefDescription": "Cycles Uops delivered by the LSD, but didn't come from the decoder.", 455 + "Counter": "0,1,2,3", 509 456 "CounterMask": "1", 510 457 "EventCode": "0xA8", 511 458 "EventName": "LSD.CYCLES_ACTIVE", ··· 516 461 }, 517 462 { 518 463 "BriefDescription": "Cycles optimal number of Uops delivered by the LSD, but did not come from the decoder.", 464 + "Counter": "0,1,2,3", 519 465 "CounterMask": "5", 520 466 "EventCode": "0xa8", 521 467 "EventName": "LSD.CYCLES_OK", ··· 526 470 }, 527 471 { 528 472 "BriefDescription": "Number of Uops delivered by the LSD.", 473 + "Counter": "0,1,2,3", 529 474 "EventCode": "0xa8", 530 475 "EventName": "LSD.UOPS", 531 476 "PublicDescription": "Counts the number of uops delivered to the back-end by the LSD(Loop Stream Detector).", ··· 535 478 }, 536 479 { 537 480 "BriefDescription": "Number of machine clears (nukes) of any type.", 481 + "Counter": "0,1,2,3,4,5,6,7", 538 482 "CounterMask": "1", 539 483 "EdgeDetect": "1", 540 484 "EventCode": "0xc3", ··· 546 488 }, 547 489 { 548 490 "BriefDescription": "Self-modifying code (SMC) detected.", 491 + "Counter": "0,1,2,3,4,5,6,7", 549 492 "EventCode": "0xc3", 550 493 "EventName": "MACHINE_CLEARS.SMC", 551 494 "PublicDescription": "Counts self-modifying code (SMC) detected, which causes a machine clear.", ··· 555 496 }, 556 497 { 557 498 "BriefDescription": "Increments whenever there is an update to the LBR array.", 499 + "Counter": "0,1,2,3,4,5,6,7", 558 500 "EventCode": "0xcc", 559 501 "EventName": "MISC_RETIRED.LBR_INSERTS", 560 502 "PublicDescription": "Increments when an entry is added to the Last Branch Record (LBR) array (or removed from the array in case of RETURNs in call stack mode). The event requires LBR to be enabled properly.", ··· 564 504 }, 565 505 { 566 506 "BriefDescription": "Number of retired PAUSE instructions. This event is not supported on first SKL and KBL products.", 507 + "Counter": "0,1,2,3,4,5,6,7", 567 508 "EventCode": "0xcc", 568 509 "EventName": "MISC_RETIRED.PAUSE_INST", 569 510 "PublicDescription": "Counts number of retired PAUSE instructions. This event is not supported on first SKL and KBL products.", ··· 573 512 }, 574 513 { 575 514 "BriefDescription": "Cycles stalled due to no store buffers available. (not including draining form sync).", 515 + "Counter": "0,1,2,3,4,5,6,7", 576 516 "EventCode": "0xa2", 577 517 "EventName": "RESOURCE_STALLS.SB", 578 518 "PublicDescription": "Counts allocation stall cycles caused by the store buffer (SB) being full. This counts cycles that the pipeline back-end blocked uop delivery from the front-end.", ··· 582 520 }, 583 521 { 584 522 "BriefDescription": "Counts cycles where the pipeline is stalled due to serializing operations.", 523 + "Counter": "0,1,2,3,4,5,6,7", 585 524 "EventCode": "0xa2", 586 525 "EventName": "RESOURCE_STALLS.SCOREBOARD", 587 526 "SampleAfterValue": "100003", ··· 590 527 }, 591 528 { 592 529 "BriefDescription": "Cycles when Reservation Station (RS) is empty for the thread", 530 + "Counter": "0,1,2,3,4,5,6,7", 593 531 "EventCode": "0x5e", 594 532 "EventName": "RS_EVENTS.EMPTY_CYCLES", 595 533 "PublicDescription": "Counts cycles during which the reservation station (RS) is empty for this logical processor. This is usually caused when the front-end pipeline runs into starvation periods (e.g. branch mispredictions or i-cache misses)", ··· 599 535 }, 600 536 { 601 537 "BriefDescription": "Counts end of periods where the Reservation Station (RS) was empty.", 538 + "Counter": "0,1,2,3,4,5,6,7", 602 539 "CounterMask": "1", 603 540 "EdgeDetect": "1", 604 541 "EventCode": "0x5E", ··· 611 546 }, 612 547 { 613 548 "BriefDescription": "TMA slots where no uops were being issued due to lack of back-end resources.", 549 + "Counter": "0,1,2,3,4,5,6,7", 614 550 "EventCode": "0xa4", 615 551 "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS", 616 552 "PublicDescription": "Counts the number of Top-down Microarchitecture Analysis (TMA) method's slots where no micro-operations were being issued from front-end to back-end of the machine due to lack of back-end resources.", ··· 620 554 }, 621 555 { 622 556 "BriefDescription": "TMA slots available for an unhalted logical processor. Fixed counter - architectural event", 557 + "Counter": "Fixed counter 3", 623 558 "EventName": "TOPDOWN.SLOTS", 624 559 "PublicDescription": "Number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method (TMA). The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the TMA method. This architectural event is counted on a designated fixed counter (Fixed Counter 3).", 625 560 "SampleAfterValue": "10000003", ··· 628 561 }, 629 562 { 630 563 "BriefDescription": "TMA slots available for an unhalted logical processor. General counter - architectural event", 564 + "Counter": "0,1,2,3,4,5,6,7", 631 565 "EventCode": "0xa4", 632 566 "EventName": "TOPDOWN.SLOTS_P", 633 567 "PublicDescription": "Counts the number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method. The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core.", ··· 637 569 }, 638 570 { 639 571 "BriefDescription": "Number of uops decoded out of instructions exclusively fetched by decoder 0", 572 + "Counter": "0,1,2,3", 640 573 "EventCode": "0x56", 641 574 "EventName": "UOPS_DECODED.DEC0", 642 575 "PublicDescription": "Uops exclusively fetched by decoder 0", ··· 646 577 }, 647 578 { 648 579 "BriefDescription": "Number of uops executed on port 0", 580 + "Counter": "0,1,2,3,4,5,6,7", 649 581 "EventCode": "0xa1", 650 582 "EventName": "UOPS_DISPATCHED.PORT_0", 651 583 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 0.", ··· 655 585 }, 656 586 { 657 587 "BriefDescription": "Number of uops executed on port 1", 588 + "Counter": "0,1,2,3,4,5,6,7", 658 589 "EventCode": "0xa1", 659 590 "EventName": "UOPS_DISPATCHED.PORT_1", 660 591 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 1.", ··· 664 593 }, 665 594 { 666 595 "BriefDescription": "Number of uops executed on port 2 and 3", 596 + "Counter": "0,1,2,3,4,5,6,7", 667 597 "EventCode": "0xa1", 668 598 "EventName": "UOPS_DISPATCHED.PORT_2_3", 669 599 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 2 and 3.", ··· 673 601 }, 674 602 { 675 603 "BriefDescription": "Number of uops executed on port 4 and 9", 604 + "Counter": "0,1,2,3,4,5,6,7", 676 605 "EventCode": "0xa1", 677 606 "EventName": "UOPS_DISPATCHED.PORT_4_9", 678 607 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 5 and 9.", ··· 682 609 }, 683 610 { 684 611 "BriefDescription": "Number of uops executed on port 5", 612 + "Counter": "0,1,2,3,4,5,6,7", 685 613 "EventCode": "0xa1", 686 614 "EventName": "UOPS_DISPATCHED.PORT_5", 687 615 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 5.", ··· 691 617 }, 692 618 { 693 619 "BriefDescription": "Number of uops executed on port 6", 620 + "Counter": "0,1,2,3,4,5,6,7", 694 621 "EventCode": "0xa1", 695 622 "EventName": "UOPS_DISPATCHED.PORT_6", 696 623 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 6.", ··· 700 625 }, 701 626 { 702 627 "BriefDescription": "Number of uops executed on port 7 and 8", 628 + "Counter": "0,1,2,3,4,5,6,7", 703 629 "EventCode": "0xa1", 704 630 "EventName": "UOPS_DISPATCHED.PORT_7_8", 705 631 "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 7 and 8.", ··· 709 633 }, 710 634 { 711 635 "BriefDescription": "Number of uops executed on the core.", 636 + "Counter": "0,1,2,3,4,5,6,7", 712 637 "EventCode": "0xB1", 713 638 "EventName": "UOPS_EXECUTED.CORE", 714 639 "PublicDescription": "Counts the number of uops executed from any thread.", ··· 718 641 }, 719 642 { 720 643 "BriefDescription": "Cycles at least 1 micro-op is executed from any thread on physical core.", 644 + "Counter": "0,1,2,3,4,5,6,7", 721 645 "CounterMask": "1", 722 646 "EventCode": "0xB1", 723 647 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1", ··· 728 650 }, 729 651 { 730 652 "BriefDescription": "Cycles at least 2 micro-op is executed from any thread on physical core.", 653 + "Counter": "0,1,2,3,4,5,6,7", 731 654 "CounterMask": "2", 732 655 "EventCode": "0xB1", 733 656 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2", ··· 738 659 }, 739 660 { 740 661 "BriefDescription": "Cycles at least 3 micro-op is executed from any thread on physical core.", 662 + "Counter": "0,1,2,3,4,5,6,7", 741 663 "CounterMask": "3", 742 664 "EventCode": "0xB1", 743 665 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3", ··· 748 668 }, 749 669 { 750 670 "BriefDescription": "Cycles at least 4 micro-op is executed from any thread on physical core.", 671 + "Counter": "0,1,2,3,4,5,6,7", 751 672 "CounterMask": "4", 752 673 "EventCode": "0xB1", 753 674 "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4", ··· 758 677 }, 759 678 { 760 679 "BriefDescription": "Cycles where at least 1 uop was executed per-thread", 680 + "Counter": "0,1,2,3,4,5,6,7", 761 681 "CounterMask": "1", 762 682 "EventCode": "0xb1", 763 683 "EventName": "UOPS_EXECUTED.CYCLES_GE_1", ··· 768 686 }, 769 687 { 770 688 "BriefDescription": "Cycles where at least 2 uops were executed per-thread", 689 + "Counter": "0,1,2,3,4,5,6,7", 771 690 "CounterMask": "2", 772 691 "EventCode": "0xb1", 773 692 "EventName": "UOPS_EXECUTED.CYCLES_GE_2", ··· 778 695 }, 779 696 { 780 697 "BriefDescription": "Cycles where at least 3 uops were executed per-thread", 698 + "Counter": "0,1,2,3,4,5,6,7", 781 699 "CounterMask": "3", 782 700 "EventCode": "0xb1", 783 701 "EventName": "UOPS_EXECUTED.CYCLES_GE_3", ··· 788 704 }, 789 705 { 790 706 "BriefDescription": "Cycles where at least 4 uops were executed per-thread", 707 + "Counter": "0,1,2,3,4,5,6,7", 791 708 "CounterMask": "4", 792 709 "EventCode": "0xb1", 793 710 "EventName": "UOPS_EXECUTED.CYCLES_GE_4", ··· 798 713 }, 799 714 { 800 715 "BriefDescription": "Counts number of cycles no uops were dispatched to be executed on this thread.", 716 + "Counter": "0,1,2,3,4,5,6,7", 801 717 "CounterMask": "1", 802 718 "EventCode": "0xB1", 803 719 "EventName": "UOPS_EXECUTED.STALL_CYCLES", ··· 809 723 }, 810 724 { 811 725 "BriefDescription": "Counts the number of uops to be executed per-thread each cycle.", 726 + "Counter": "0,1,2,3,4,5,6,7", 812 727 "EventCode": "0xb1", 813 728 "EventName": "UOPS_EXECUTED.THREAD", 814 729 "SampleAfterValue": "2000003", ··· 817 730 }, 818 731 { 819 732 "BriefDescription": "Counts the number of x87 uops dispatched.", 733 + "Counter": "0,1,2,3,4,5,6,7", 820 734 "EventCode": "0xB1", 821 735 "EventName": "UOPS_EXECUTED.X87", 822 736 "PublicDescription": "Counts the number of x87 uops executed.", ··· 826 738 }, 827 739 { 828 740 "BriefDescription": "Uops that RAT issues to RS", 741 + "Counter": "0,1,2,3,4,5,6,7", 829 742 "EventCode": "0x0e", 830 743 "EventName": "UOPS_ISSUED.ANY", 831 744 "PublicDescription": "Counts the number of uops that the Resource Allocation Table (RAT) issues to the Reservation Station (RS).", ··· 835 746 }, 836 747 { 837 748 "BriefDescription": "Cycles when RAT does not issue Uops to RS for the thread", 749 + "Counter": "0,1,2,3,4,5,6,7", 838 750 "CounterMask": "1", 839 751 "EventCode": "0x0E", 840 752 "EventName": "UOPS_ISSUED.STALL_CYCLES", ··· 846 756 }, 847 757 { 848 758 "BriefDescription": "Uops inserted at issue-stage in order to preserve upper bits of vector registers.", 759 + "Counter": "0,1,2,3,4,5,6,7", 849 760 "EventCode": "0x0e", 850 761 "EventName": "UOPS_ISSUED.VECTOR_WIDTH_MISMATCH", 851 762 "PublicDescription": "Counts the number of Blend Uops issued by the Resource Allocation Table (RAT) to the reservation station (RS) in order to preserve upper bits of vector registers. Starting with the Skylake microarchitecture, these Blend uops are needed since every Intel SSE instruction executed in Dirty Upper State needs to preserve bits 128-255 of the destination register. For more information, refer to 'Mixing Intel AVX and Intel SSE Code' section of the Optimization Guide.", ··· 855 764 }, 856 765 { 857 766 "BriefDescription": "Retirement slots used.", 767 + "Counter": "0,1,2,3,4,5,6,7", 858 768 "EventCode": "0xc2", 859 769 "EventName": "UOPS_RETIRED.SLOTS", 860 770 "PublicDescription": "Counts the retirement slots used each cycle.", ··· 864 772 }, 865 773 { 866 774 "BriefDescription": "Cycles without actually retired uops.", 775 + "Counter": "0,1,2,3,4,5,6,7", 867 776 "CounterMask": "1", 868 777 "EventCode": "0xc2", 869 778 "EventName": "UOPS_RETIRED.STALL_CYCLES", ··· 875 782 }, 876 783 { 877 784 "BriefDescription": "Cycles with less than 10 actually retired uops.", 785 + "Counter": "0,1,2,3,4,5,6,7", 878 786 "CounterMask": "10", 879 787 "EventCode": "0xc2", 880 788 "EventName": "UOPS_RETIRED.TOTAL_CYCLES",
+228 -80
tools/perf/pmu-events/arch/x86/rocketlake/rkl-metrics.json
··· 104 104 { 105 105 "BriefDescription": "This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists", 106 106 "MetricExpr": "34 * ASSISTS.ANY / tma_info_thread_slots", 107 - "MetricGroup": "TopdownL4;tma_L4_group;tma_microcode_sequencer_group", 107 + "MetricGroup": "BvIO;TopdownL4;tma_L4_group;tma_microcode_sequencer_group", 108 108 "MetricName": "tma_assists", 109 109 "MetricThreshold": "tma_assists > 0.1 & (tma_microcode_sequencer > 0.05 & tma_heavy_operations > 0.1)", 110 110 "PublicDescription": "This metric estimates fraction of slots the CPU retired uops delivered by the Microcode_Sequencer as a result of Assists. Assists are long sequences of uops that are required in certain corner-cases for operations that cannot be handled natively by the execution pipeline. For example; when working with very small floating point values (so-called Denormals); the FP units are not set up to perform these operations natively. Instead; a sequence of instructions to perform the computation on the Denormals is injected into the pipeline. Since these microcode sequences might be dozens of uops long; Assists can be extremely deleterious to performance and they can be avoided in many cases. Sample with: ASSISTS.ANY", ··· 114 114 "BriefDescription": "This category represents fraction of slots where no uops are being delivered due to a lack of required resources for accepting new uops in the Backend", 115 115 "DefaultMetricgroupName": "TopdownL1", 116 116 "MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 5 * INT_MISC.CLEARS_COUNT / tma_info_thread_slots", 117 - "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 117 + "MetricGroup": "BvOB;Default;TmaL1;TopdownL1;tma_L1_group", 118 118 "MetricName": "tma_backend_bound", 119 119 "MetricThreshold": "tma_backend_bound > 0.2", 120 120 "MetricgroupNoGroup": "TopdownL1;Default", ··· 135 135 { 136 136 "BriefDescription": "This metric represents fraction of slots where the CPU was retiring branch instructions.", 137 137 "MetricExpr": "tma_light_operations * BR_INST_RETIRED.ALL_BRANCHES / (tma_retiring * tma_info_thread_slots)", 138 - "MetricGroup": "Branches;Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group", 138 + "MetricGroup": "Branches;BvBO;Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group", 139 139 "MetricName": "tma_branch_instructions", 140 140 "MetricThreshold": "tma_branch_instructions > 0.1 & tma_light_operations > 0.6", 141 141 "ScaleUnit": "100%" ··· 143 143 { 144 144 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Branch Misprediction", 145 145 "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT) * tma_bad_speculation", 146 - "MetricGroup": "BadSpec;BrMispredicts;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM", 146 + "MetricGroup": "BadSpec;BrMispredicts;BvMP;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueBM", 147 147 "MetricName": "tma_branch_mispredicts", 148 148 "MetricThreshold": "tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15", 149 149 "MetricgroupNoGroup": "TopdownL2", ··· 181 181 "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses", 182 182 "MetricConstraint": "NO_GROUP_EVENTS", 183 183 "MetricExpr": "(29 * tma_info_system_core_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM + 23.5 * tma_info_system_core_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", 184 - "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 184 + "MetricGroup": "BvMS;DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 185 185 "MetricName": "tma_contested_accesses", 186 186 "MetricThreshold": "tma_contested_accesses > 0.05 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 187 187 "PublicDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses. Contested accesses occur when data written by one Logical Processor are read by another Logical Processor on a different Physical Core. Examples of contested accesses include synchronizations such as locks; true data sharing such as modified locked variables; and false sharing. Sample with: MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM_PS;MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS_PS. Related metrics: tma_data_sharing, tma_false_sharing, tma_machine_clears, tma_remote_cache", ··· 201 201 "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses", 202 202 "MetricConstraint": "NO_GROUP_EVENTS", 203 203 "MetricExpr": "23.5 * tma_info_system_core_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks", 204 - "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 204 + "MetricGroup": "BvMS;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group", 205 205 "MetricName": "tma_data_sharing", 206 206 "MetricThreshold": "tma_data_sharing > 0.05 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 207 207 "PublicDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Data shared by multiple Logical Processors (even just read shared) may cause increased access latency due to cache coherency. Excessive data sharing can drastically harm multithreaded performance. Sample with: MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT_PS. Related metrics: tma_contested_accesses, tma_false_sharing, tma_machine_clears, tma_remote_cache", ··· 219 219 { 220 220 "BriefDescription": "This metric represents fraction of cycles where the Divider unit was active", 221 221 "MetricExpr": "ARITH.DIVIDER_ACTIVE / tma_info_thread_clks", 222 - "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group", 222 + "MetricGroup": "BvCB;TopdownL3;tma_L3_group;tma_core_bound_group", 223 223 "MetricName": "tma_divider", 224 224 "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)", 225 225 "PublicDescription": "This metric represents fraction of cycles where the Divider unit was active. Divide and square root instructions are performed by the Divider unit and can take considerably longer latency than integer or Floating Point addition; subtraction; or multiplication. Sample with: ARITH.DIVIDER_ACTIVE", ··· 250 250 "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB", 251 251 "MetricName": "tma_dsb_switches", 252 252 "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 253 - "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty. Sample with: FRONTEND_RETIRED.DSB_MISS_PS. Related metrics: tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 253 + "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heavy x86 decoding. The DSB pipeline has shorter latency and delivered higher bandwidth than the MITE (legacy instruction decode pipeline). Switching between the two pipelines can cause penalties hence this metric measures the exposed penalty. Sample with: FRONTEND_RETIRED.DSB_MISS_PS. Related metrics: tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 254 254 "ScaleUnit": "100%" 255 255 }, 256 256 { 257 257 "BriefDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses", 258 258 "MetricExpr": "min(7 * cpu@DTLB_LOAD_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", 259 - "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_l1_bound_group", 259 + "MetricGroup": "BvMT;MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_l1_bound_group", 260 260 "MetricName": "tma_dtlb_load", 261 261 "MetricThreshold": "tma_dtlb_load > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 262 262 "PublicDescription": "This metric roughly estimates the fraction of cycles where the Data TLB (DTLB) was missed by load accesses. TLBs (Translation Look-aside Buffers) are processor caches for recently used entries out of the Page Tables that are used to map virtual- to physical-addresses by the operating system. This metric approximates the potential delay of demand loads missing the first-level data TLB (assuming worst case scenario with back to back misses to different pages). This includes hitting in the second-level TLB (STLB) as well as performing a hardware page walk on an STLB miss. Sample with: MEM_INST_RETIRED.STLB_MISS_LOADS_PS. Related metrics: tma_dtlb_store, tma_info_bottleneck_memory_data_tlbs, tma_info_bottleneck_memory_synchronization", ··· 265 265 { 266 266 "BriefDescription": "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses", 267 267 "MetricExpr": "(7 * cpu@DTLB_STORE_MISSES.STLB_HIT\\,cmask\\=1@ + DTLB_STORE_MISSES.WALK_ACTIVE) / tma_info_core_core_clks", 268 - "MetricGroup": "MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_store_bound_group", 268 + "MetricGroup": "BvMT;MemoryTLB;TopdownL4;tma_L4_group;tma_issueTLB;tma_store_bound_group", 269 269 "MetricName": "tma_dtlb_store", 270 270 "MetricThreshold": "tma_dtlb_store > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 271 271 "PublicDescription": "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store misses. As with ordinary data caching; focus on improving data locality and reducing working-set size to reduce DTLB overhead. Additionally; consider using profile-guided optimization (PGO) to collocate frequently-used data on the same page. Try using larger page sizes for large amounts of frequently-used data. Sample with: MEM_INST_RETIRED.STLB_MISS_STORES_PS. Related metrics: tma_dtlb_load, tma_info_bottleneck_memory_data_tlbs, tma_info_bottleneck_memory_synchronization", ··· 274 274 { 275 275 "BriefDescription": "This metric roughly estimates how often CPU was handling synchronizations due to False Sharing", 276 276 "MetricExpr": "32.5 * tma_info_system_core_frequency * OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM / tma_info_thread_clks", 277 - "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_store_bound_group", 277 + "MetricGroup": "BvMS;DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_store_bound_group", 278 278 "MetricName": "tma_false_sharing", 279 279 "MetricThreshold": "tma_false_sharing > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 280 280 "PublicDescription": "This metric roughly estimates how often CPU was handling synchronizations due to False Sharing. False Sharing is a multithreading hiccup; where multiple Logical Processors contend on different data-elements mapped into the same cache line. Sample with: OCR.DEMAND_RFO.L3_HIT.SNOOP_HITM. Related metrics: tma_contested_accesses, tma_data_sharing, tma_machine_clears, tma_remote_cache", ··· 283 283 { 284 284 "BriefDescription": "This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed", 285 285 "MetricExpr": "L1D_PEND_MISS.FB_FULL / tma_info_thread_clks", 286 - "MetricGroup": "MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_issueSL;tma_issueSmSt;tma_l1_bound_group", 286 + "MetricGroup": "BvMS;MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_issueSL;tma_issueSmSt;tma_l1_bound_group", 287 287 "MetricName": "tma_fb_full", 288 288 "MetricThreshold": "tma_fb_full > 0.3", 289 289 "PublicDescription": "This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed. The higher the metric value; the deeper the memory hierarchy level the misses are satisfied from (metric values >1 are valid). Often it hints on approaching bandwidth limits (to L2 cache; L3 cache or external memory). Related metrics: tma_info_bottleneck_cache_memory_bandwidth, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full, tma_store_latency, tma_streaming_stores", ··· 296 296 "MetricName": "tma_fetch_bandwidth", 297 297 "MetricThreshold": "tma_fetch_bandwidth > 0.2", 298 298 "MetricgroupNoGroup": "TopdownL2", 299 - "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 299 + "PublicDescription": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for caching in the DSB (decoded uops cache) are categorized under Fetch Bandwidth. In such cases; the Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_1_PS;FRONTEND_RETIRED.LATENCY_GE_2_PS. Related metrics: tma_dsb_switches, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp", 300 300 "ScaleUnit": "100%" 301 301 }, 302 302 { ··· 338 338 }, 339 339 { 340 340 "BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction the CPU has retired", 341 - "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ / (tma_retiring * tma_info_thread_slots)", 341 + "MetricExpr": "FP_ARITH_INST_RETIRED.SCALAR / (tma_retiring * tma_info_thread_slots)", 342 342 "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_group;tma_issue2P", 343 343 "MetricName": "tma_fp_scalar", 344 344 "MetricThreshold": "tma_fp_scalar > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)", ··· 347 347 }, 348 348 { 349 349 "BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction the CPU has retired aggregated across all vector widths", 350 - "MetricExpr": "cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0xfc@ / (tma_retiring * tma_info_thread_slots)", 350 + "MetricExpr": "FP_ARITH_INST_RETIRED.VECTOR / (tma_retiring * tma_info_thread_slots)", 351 351 "MetricGroup": "Compute;Flops;TopdownL4;tma_L4_group;tma_fp_arith_group;tma_issue2P", 352 352 "MetricName": "tma_fp_vector", 353 353 "MetricThreshold": "tma_fp_vector > 0.1 & (tma_fp_arith > 0.2 & tma_light_operations > 0.6)", ··· 385 385 "BriefDescription": "This category represents fraction of slots where the processor's Frontend undersupplies its Backend", 386 386 "DefaultMetricgroupName": "TopdownL1", 387 387 "MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) - INT_MISC.UOP_DROPPING / tma_info_thread_slots", 388 - "MetricGroup": "Default;PGO;TmaL1;TopdownL1;tma_L1_group", 388 + "MetricGroup": "BvFB;BvIO;Default;PGO;TmaL1;TopdownL1;tma_L1_group", 389 389 "MetricName": "tma_frontend_bound", 390 390 "MetricThreshold": "tma_frontend_bound > 0.15", 391 391 "MetricgroupNoGroup": "TopdownL1;Default", ··· 405 405 { 406 406 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to instruction cache misses", 407 407 "MetricExpr": "ICACHE_DATA.STALLS / tma_info_thread_clks", 408 - "MetricGroup": "BigFootprint;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma_fetch_latency_group", 408 + "MetricGroup": "BigFootprint;BvBC;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma_fetch_latency_group", 409 409 "MetricName": "tma_icache_misses", 410 410 "MetricThreshold": "tma_icache_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 411 411 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to instruction cache misses. Sample with: FRONTEND_RETIRED.L2_MISS_PS;FRONTEND_RETIRED.L1I_MISS_PS", ··· 462 462 }, 463 463 { 464 464 "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts", 465 + "MetricExpr": "tma_info_botlnk_l0_core_bound_likely", 466 + "MetricGroup": "Cor;Metric;SMT", 467 + "MetricName": "tma_info_botlnk_core_bound_likely", 468 + "MetricThreshold": "tma_info_botlnk_core_bound_likely > 0.5" 469 + }, 470 + { 471 + "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck.", 472 + "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_icache_misses + tma_itlb_misses + tma_branch_resteers + tma_ms_switches + tma_lcp + tma_dsb_switches) + tma_fetch_bandwidth * tma_mite / (tma_mite + tma_dsb + tma_lsd))", 473 + "MetricGroup": "DSBmiss;Fed;Scaled_Slots;tma_issueFB", 474 + "MetricName": "tma_info_botlnk_dsb_misses", 475 + "MetricThreshold": "tma_info_botlnk_dsb_misses > 10" 476 + }, 477 + { 478 + "BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck.", 479 + "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma_icache_misses + tma_itlb_misses + tma_branch_resteers + tma_ms_switches + tma_lcp + tma_dsb_switches))", 480 + "MetricGroup": "Fed;FetchLat;IcMiss;Scaled_Slots;tma_issueFL", 481 + "MetricName": "tma_info_botlnk_ic_misses", 482 + "MetricThreshold": "tma_info_botlnk_ic_misses > 5" 483 + }, 484 + { 485 + "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts", 465 486 "MetricConstraint": "NO_GROUP_EVENTS", 466 487 "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t_utilization > 0.5 else 0)", 467 488 "MetricGroup": "Cor;SMT", 468 489 "MetricName": "tma_info_botlnk_l0_core_bound_likely", 469 490 "MetricThreshold": "tma_info_botlnk_l0_core_bound_likely > 0.5" 491 + }, 492 + { 493 + "BriefDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck", 494 + "MetricExpr": "100 * (tma_frontend_bound * (tma_fetch_bandwidth / (tma_fetch_bandwidth + tma_fetch_latency)) * (tma_dsb / (tma_dsb + tma_lsd + tma_mite)))", 495 + "MetricGroup": "DSB;FetchBW;tma_issueFB", 496 + "MetricName": "tma_info_botlnk_l2_dsb_bandwidth", 497 + "MetricThreshold": "tma_info_botlnk_l2_dsb_bandwidth > 10", 498 + "PublicDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp" 470 499 }, 471 500 { 472 501 "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck", ··· 504 475 "MetricGroup": "DSBmiss;Fed;tma_issueFB", 505 476 "MetricName": "tma_info_botlnk_l2_dsb_misses", 506 477 "MetricThreshold": "tma_info_botlnk_l2_dsb_misses > 10", 507 - "PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp" 478 + "PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb, tma_lcp" 508 479 }, 509 480 { 510 481 "BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck", ··· 516 487 "PublicDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck. Related metrics: " 517 488 }, 518 489 { 519 - "BriefDescription": "Total pipeline cost of \"useful operations\" - the baseline operations not covered by Branching_Overhead nor Irregular_Overhead.", 520 - "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + BR_INST_RETIRED.NEAR_CALL) / tma_info_thread_slots - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", 521 - "MetricGroup": "Ret", 522 - "MetricName": "tma_info_bottleneck_base_non_br", 523 - "MetricThreshold": "tma_info_bottleneck_base_non_br > 20" 524 - }, 525 - { 526 490 "BriefDescription": "Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)", 527 491 "MetricConstraint": "NO_GROUP_EVENTS", 528 492 "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_icache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)", 529 - "MetricGroup": "BigFootprint;Fed;Frontend;IcMiss;MemoryTLB", 493 + "MetricGroup": "BigFootprint;BvBC;Fed;Frontend;IcMiss;MemoryTLB", 530 494 "MetricName": "tma_info_bottleneck_big_code", 531 495 "MetricThreshold": "tma_info_bottleneck_big_code > 20" 532 496 }, 533 497 { 534 - "BriefDescription": "Total pipeline cost of branch related instructions (used for program control-flow including function calls)", 535 - "MetricExpr": "100 * ((BR_INST_RETIRED.ALL_BRANCHES + BR_INST_RETIRED.NEAR_CALL) / tma_info_thread_slots)", 536 - "MetricGroup": "Ret", 498 + "BriefDescription": "Total pipeline cost of instructions used for program control-flow - a subset of the Retiring category in TMA", 499 + "MetricExpr": "100 * ((BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slots)", 500 + "MetricGroup": "BvBO;Ret", 537 501 "MetricName": "tma_info_bottleneck_branching_overhead", 538 - "MetricThreshold": "tma_info_bottleneck_branching_overhead > 5" 502 + "MetricThreshold": "tma_info_bottleneck_branching_overhead > 5", 503 + "PublicDescription": "Total pipeline cost of instructions used for program control-flow - a subset of the Retiring category in TMA. Examples include function calls; loops and alignments. (A lower bound)" 539 504 }, 540 505 { 541 506 "BriefDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks", 542 - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", 543 - "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW", 507 + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_fb_full / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_hit_latency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", 508 + "MetricGroup": "BvMB;Mem;MemoryBW;Offcore;tma_issueBW", 544 509 "MetricName": "tma_info_bottleneck_cache_memory_bandwidth", 545 510 "MetricThreshold": "tma_info_bottleneck_cache_memory_bandwidth > 20", 546 511 "PublicDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenecks. Related metrics: tma_fb_full, tma_info_system_dram_bw_use, tma_mem_bandwidth, tma_sq_full" 547 512 }, 548 513 { 549 514 "BriefDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks", 550 - "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", 551 - "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat", 515 + "MetricExpr": "100 * (tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_memory_bound * tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_store_latency / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)) + tma_memory_bound * (tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_l1_hit_latency / (tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_hit_latency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)))", 516 + "MetricGroup": "BvML;Mem;MemoryLat;Offcore;tma_issueLat", 552 517 "MetricName": "tma_info_bottleneck_cache_memory_latency", 553 518 "MetricThreshold": "tma_info_bottleneck_cache_memory_latency > 20", 554 519 "PublicDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks. Related metrics: tma_l3_hit_latency, tma_mem_latency" ··· 550 527 { 551 528 "BriefDescription": "Total pipeline cost when the execution is compute-bound - an estimation", 552 529 "MetricExpr": "100 * (tma_core_bound * tma_divider / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_core_bound * (tma_ports_utilization / (tma_divider + tma_ports_utilization + tma_serializing_operation)) * (tma_ports_utilized_3m / (tma_ports_utilized_0 + tma_ports_utilized_1 + tma_ports_utilized_2 + tma_ports_utilized_3m)))", 553 - "MetricGroup": "Cor;tma_issueComp", 530 + "MetricGroup": "BvCB;Cor;tma_issueComp", 554 531 "MetricName": "tma_info_bottleneck_compute_bound_est", 555 532 "MetricThreshold": "tma_info_bottleneck_compute_bound_est > 20", 556 533 "PublicDescription": "Total pipeline cost when the execution is compute-bound - an estimation. Covers Core Bound when High ILP as well as when long-latency execution units are busy. Related metrics: " 557 534 }, 558 535 { 559 - "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks", 536 + "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the back-end)", 560 537 "MetricConstraint": "NO_GROUP_EVENTS", 561 538 "MetricExpr": "100 * (tma_frontend_bound - (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottleneck_big_code", 562 - "MetricGroup": "Fed;FetchBW;Frontend", 539 + "MetricGroup": "BvFB;Fed;FetchBW;Frontend", 563 540 "MetricName": "tma_info_bottleneck_instruction_fetch_bw", 564 541 "MetricThreshold": "tma_info_bottleneck_instruction_fetch_bw > 20" 565 542 }, 566 543 { 567 544 "BriefDescription": "Total pipeline cost of irregular execution (e.g", 568 545 "MetricExpr": "100 * (tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_fetch_latency * (tma_ms_switches + tma_branch_resteers * (tma_clears_resteers + tma_mispredicts_resteers * (10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts)) / (tma_clears_resteers + tma_mispredicts_resteers + tma_unknown_branches)) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts * tma_branch_mispredicts + tma_machine_clears * tma_other_nukes / tma_other_nukes + tma_core_bound * (tma_serializing_operation + tma_core_bound * RS_EVENTS.EMPTY_CYCLES / tma_info_thread_clks * tma_ports_utilized_0) / (tma_divider + tma_ports_utilization + tma_serializing_operation) + tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", 569 - "MetricGroup": "Bad;Cor;Ret;tma_issueMS", 546 + "MetricGroup": "Bad;BvIO;Cor;Ret;tma_issueMS", 570 547 "MetricName": "tma_info_bottleneck_irregular_overhead", 571 548 "MetricThreshold": "tma_info_bottleneck_irregular_overhead > 10", 572 549 "PublicDescription": "Total pipeline cost of irregular execution (e.g. FP-assists in HPC, Wait time with work imbalance multithreaded workloads, overhead in system services or virtualized environments). Related metrics: tma_microcode_sequencer, tma_ms_switches" ··· 574 551 { 575 552 "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)", 576 553 "MetricConstraint": "NO_GROUP_EVENTS", 577 - "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", 578 - "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB", 554 + "MetricExpr": "100 * (tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_load / max(tma_l1_bound, tma_4k_aliasing + tma_dtlb_load + tma_fb_full + tma_l1_hit_latency + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_memory_bound * (tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound)) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))", 555 + "MetricGroup": "BvMT;Mem;MemoryTLB;Offcore;tma_issueTLB", 579 556 "MetricName": "tma_info_bottleneck_memory_data_tlbs", 580 557 "MetricThreshold": "tma_info_bottleneck_memory_data_tlbs > 20", 581 558 "PublicDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs). Related metrics: tma_dtlb_load, tma_dtlb_store, tma_info_bottleneck_memory_synchronization" ··· 583 560 { 584 561 "BriefDescription": "Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors)", 585 562 "MetricExpr": "100 * (tma_memory_bound * (tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_contested_accesses + tma_data_sharing) / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * tma_false_sharing / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_machine_clears * (1 - tma_other_nukes / tma_other_nukes))", 586 - "MetricGroup": "Mem;Offcore;tma_issueTLB", 563 + "MetricGroup": "BvMS;Mem;Offcore;tma_issueTLB", 587 564 "MetricName": "tma_info_bottleneck_memory_synchronization", 588 565 "MetricThreshold": "tma_info_bottleneck_memory_synchronization > 10", 589 566 "PublicDescription": "Total pipeline cost of Memory Synchronization related bottlenecks (data transfers and coherency updates across processors). Related metrics: tma_dtlb_load, tma_dtlb_store, tma_info_bottleneck_memory_data_tlbs" ··· 592 569 "BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks", 593 570 "MetricConstraint": "NO_GROUP_EVENTS", 594 571 "MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))", 595 - "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM", 572 + "MetricGroup": "Bad;BadSpec;BrMispredicts;BvMP;tma_issueBM", 596 573 "MetricName": "tma_info_bottleneck_mispredictions", 597 574 "MetricThreshold": "tma_info_bottleneck_mispredictions > 20", 598 575 "PublicDescription": "Total pipeline cost of Branch Misprediction related bottlenecks. Related metrics: tma_branch_mispredicts, tma_info_bad_spec_branch_misprediction_cost, tma_mispredicts_resteers" 599 576 }, 600 577 { 601 - "BriefDescription": "Total pipeline cost of remaining bottlenecks (apart from those listed in the Info.Bottlenecks metrics class)", 602 - "MetricExpr": "100 - (tma_info_bottleneck_big_code + tma_info_bottleneck_instruction_fetch_bw + tma_info_bottleneck_mispredictions + tma_info_bottleneck_cache_memory_bandwidth + tma_info_bottleneck_cache_memory_latency + tma_info_bottleneck_memory_data_tlbs + tma_info_bottleneck_memory_synchronization + tma_info_bottleneck_compute_bound_est + tma_info_bottleneck_irregular_overhead + tma_info_bottleneck_branching_overhead + tma_info_bottleneck_base_non_br)", 603 - "MetricGroup": "Cor;Offcore", 578 + "BriefDescription": "Total pipeline cost of remaining bottlenecks in the back-end", 579 + "MetricExpr": "100 - (tma_info_bottleneck_big_code + tma_info_bottleneck_instruction_fetch_bw + tma_info_bottleneck_mispredictions + tma_info_bottleneck_cache_memory_bandwidth + tma_info_bottleneck_cache_memory_latency + tma_info_bottleneck_memory_data_tlbs + tma_info_bottleneck_memory_synchronization + tma_info_bottleneck_compute_bound_est + tma_info_bottleneck_irregular_overhead + tma_info_bottleneck_branching_overhead + tma_info_bottleneck_useful_work)", 580 + "MetricGroup": "BvOB;Cor;Offcore", 604 581 "MetricName": "tma_info_bottleneck_other_bottlenecks", 605 582 "MetricThreshold": "tma_info_bottleneck_other_bottlenecks > 20", 606 - "PublicDescription": "Total pipeline cost of remaining bottlenecks (apart from those listed in the Info.Bottlenecks metrics class). Examples include data-dependencies (Core Bound when Low ILP) and other unlisted memory-related stalls." 583 + "PublicDescription": "Total pipeline cost of remaining bottlenecks in the back-end. Examples include data-dependencies (Core Bound when Low ILP) and other unlisted memory-related stalls." 584 + }, 585 + { 586 + "BriefDescription": "Total pipeline cost of \"useful operations\" - the portion of Retiring category not covered by Branching_Overhead nor Irregular_Overhead.", 587 + "MetricExpr": "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIRED.NOP) / tma_info_thread_slots - tma_microcode_sequencer / (tma_few_uops_instructions + tma_microcode_sequencer) * (tma_assists / tma_microcode_sequencer) * tma_heavy_operations)", 588 + "MetricGroup": "BvUW;Ret", 589 + "MetricName": "tma_info_bottleneck_useful_work", 590 + "MetricThreshold": "tma_info_bottleneck_useful_work > 20" 607 591 }, 608 592 { 609 593 "BriefDescription": "Fraction of branches that are CALL or RET", ··· 668 638 }, 669 639 { 670 640 "BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width)", 671 - "MetricExpr": "(cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0xfc@) / (2 * tma_info_core_core_clks)", 641 + "MetricExpr": "(FP_ARITH_INST_RETIRED.SCALAR + FP_ARITH_INST_RETIRED.VECTOR) / (2 * tma_info_core_core_clks)", 672 642 "MetricGroup": "Cor;Flops;HPC", 673 643 "MetricName": "tma_info_core_fp_arith_utilization", 674 644 "PublicDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width). Values > 1 are possible due to ([BDW+] Fused-Multiply Add (FMA) counting - common; [ADL+] use all of ADD/MUL/FMA in Scalar or 128/256-bit vectors - less common)." ··· 685 655 "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB", 686 656 "MetricName": "tma_info_frontend_dsb_coverage", 687 657 "MetricThreshold": "tma_info_frontend_dsb_coverage < 0.7 & tma_info_thread_ipc / 5 > 0.35", 688 - "PublicDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_inst_mix_iptb, tma_lcp" 658 + "PublicDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache). Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_inst_mix_iptb, tma_lcp" 689 659 }, 690 660 { 691 661 "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.", ··· 751 721 }, 752 722 { 753 723 "BriefDescription": "Instructions per FP Arithmetic instruction (lower number means higher occurrence rate)", 754 - "MetricExpr": "INST_RETIRED.ANY / (cpu@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ + cpu@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0xfc@)", 724 + "MetricExpr": "INST_RETIRED.ANY / (FP_ARITH_INST_RETIRED.SCALAR + FP_ARITH_INST_RETIRED.VECTOR)", 755 725 "MetricGroup": "Flops;InsType", 756 726 "MetricName": "tma_info_inst_mix_iparith", 757 727 "MetricThreshold": "tma_info_inst_mix_iparith < 10", ··· 846 816 "MetricThreshold": "tma_info_inst_mix_ipswpf < 100" 847 817 }, 848 818 { 849 - "BriefDescription": "Instruction per taken branch", 819 + "BriefDescription": "Instructions per taken branch", 850 820 "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN", 851 821 "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO;tma_issueFB", 852 822 "MetricName": "tma_info_inst_mix_iptb", 853 823 "MetricThreshold": "tma_info_inst_mix_iptb < 11", 854 - "PublicDescription": "Instruction per taken branch. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_lcp" 824 + "PublicDescription": "Instructions per taken branch. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_lcp" 825 + }, 826 + { 827 + "BriefDescription": "\"Bus lock\" per kilo instruction", 828 + "MetricExpr": "tma_info_memory_mix_bus_lock_pki", 829 + "MetricGroup": "Mem;Metric", 830 + "MetricName": "tma_info_memory_bus_lock_pki" 831 + }, 832 + { 833 + "BriefDescription": "STLB (2nd level TLB) code speculative misses per kilo instruction (misses of any page-size that complete the page walk)", 834 + "MetricExpr": "tma_info_memory_tlb_code_stlb_mpki", 835 + "MetricGroup": "Fed;MemoryTLB;Metric", 836 + "MetricName": "tma_info_memory_code_stlb_mpki" 855 837 }, 856 838 { 857 839 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]", ··· 890 848 "MetricName": "tma_info_memory_core_l3_cache_fill_bw_2t" 891 849 }, 892 850 { 851 + "BriefDescription": "Average Parallel L2 cache miss data reads", 852 + "MetricExpr": "tma_info_memory_latency_data_l2_mlp", 853 + "MetricGroup": "Memory_BW;Metric;Offcore", 854 + "MetricName": "tma_info_memory_data_l2_mlp" 855 + }, 856 + { 893 857 "BriefDescription": "Fill Buffer (FB) hits per kilo instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)", 894 858 "MetricExpr": "1e3 * MEM_LOAD_RETIRED.FB_HIT / INST_RETIRED.ANY", 895 859 "MetricGroup": "CacheHits;Mem", 896 860 "MetricName": "tma_info_memory_fb_hpki" 897 861 }, 898 862 { 899 - "BriefDescription": "", 863 + "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]", 900 864 "MetricExpr": "64 * L1D.REPLACEMENT / 1e9 / duration_time", 901 865 "MetricGroup": "Mem;MemoryBW", 902 866 "MetricName": "tma_info_memory_l1d_cache_fill_bw" 867 + }, 868 + { 869 + "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]", 870 + "MetricExpr": "tma_info_memory_l1d_cache_fill_bw", 871 + "MetricGroup": "Core_Metric;Mem;MemoryBW", 872 + "MetricName": "tma_info_memory_l1d_cache_fill_bw_2t" 903 873 }, 904 874 { 905 875 "BriefDescription": "L1 cache true misses per kilo instruction for retired demand loads", ··· 926 872 "MetricName": "tma_info_memory_l1mpki_load" 927 873 }, 928 874 { 929 - "BriefDescription": "", 875 + "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]", 930 876 "MetricExpr": "64 * L2_LINES_IN.ALL / 1e9 / duration_time", 931 877 "MetricGroup": "Mem;MemoryBW", 932 878 "MetricName": "tma_info_memory_l2_cache_fill_bw" 879 + }, 880 + { 881 + "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]", 882 + "MetricExpr": "tma_info_memory_l2_cache_fill_bw", 883 + "MetricGroup": "Core_Metric;Mem;MemoryBW", 884 + "MetricName": "tma_info_memory_l2_cache_fill_bw_2t" 933 885 }, 934 886 { 935 887 "BriefDescription": "L2 cache hits per kilo instruction for all request types (including speculative)", ··· 968 908 "MetricName": "tma_info_memory_l2mpki_load" 969 909 }, 970 910 { 971 - "BriefDescription": "", 911 + "BriefDescription": "Offcore requests (L2 cache miss) per kilo instruction for demand RFOs", 912 + "MetricExpr": "1e3 * L2_RQSTS.RFO_MISS / INST_RETIRED.ANY", 913 + "MetricGroup": "CacheMisses;Offcore", 914 + "MetricName": "tma_info_memory_l2mpki_rfo" 915 + }, 916 + { 917 + "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]", 972 918 "MetricExpr": "64 * OFFCORE_REQUESTS.ALL_REQUESTS / 1e9 / duration_time", 973 919 "MetricGroup": "Mem;MemoryBW;Offcore", 974 920 "MetricName": "tma_info_memory_l3_cache_access_bw" 975 921 }, 976 922 { 977 - "BriefDescription": "", 923 + "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]", 924 + "MetricExpr": "tma_info_memory_l3_cache_access_bw", 925 + "MetricGroup": "Core_Metric;Mem;MemoryBW;Offcore", 926 + "MetricName": "tma_info_memory_l3_cache_access_bw_2t" 927 + }, 928 + { 929 + "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]", 978 930 "MetricExpr": "64 * LONGEST_LAT_CACHE.MISS / 1e9 / duration_time", 979 931 "MetricGroup": "Mem;MemoryBW", 980 932 "MetricName": "tma_info_memory_l3_cache_fill_bw" 933 + }, 934 + { 935 + "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]", 936 + "MetricExpr": "tma_info_memory_l3_cache_fill_bw", 937 + "MetricGroup": "Core_Metric;Mem;MemoryBW", 938 + "MetricName": "tma_info_memory_l3_cache_fill_bw_2t" 981 939 }, 982 940 { 983 941 "BriefDescription": "L3 cache true misses per kilo instruction for retired demand loads", ··· 1011 933 }, 1012 934 { 1013 935 "BriefDescription": "Average Latency for L2 cache miss demand Loads", 1014 - "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / OFFCORE_REQUESTS.DEMAND_DATA_RD", 936 + "MetricExpr": "tma_info_memory_load_l2_miss_latency", 1015 937 "MetricGroup": "Memory_Lat;Offcore", 1016 938 "MetricName": "tma_info_memory_latency_load_l2_miss_latency" 1017 939 }, ··· 1028 950 "MetricName": "tma_info_memory_latency_load_l3_miss_latency" 1029 951 }, 1030 952 { 953 + "BriefDescription": "Average Latency for L2 cache miss demand Loads", 954 + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / OFFCORE_REQUESTS.DEMAND_DATA_RD", 955 + "MetricGroup": "Clocks_Latency;Memory_Lat;Offcore", 956 + "MetricName": "tma_info_memory_load_l2_miss_latency" 957 + }, 958 + { 959 + "BriefDescription": "Average Parallel L2 cache miss demand Loads", 960 + "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / cpu@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=0x1@", 961 + "MetricGroup": "Memory_BW;Metric;Offcore", 962 + "MetricName": "tma_info_memory_load_l2_mlp" 963 + }, 964 + { 965 + "BriefDescription": "Average Latency for L3 cache miss demand Loads", 966 + "MetricExpr": "cpu@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,umask\\=0x0@ / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD", 967 + "MetricGroup": "Clocks_Latency;Memory_Lat;Offcore", 968 + "MetricName": "tma_info_memory_load_l3_miss_latency" 969 + }, 970 + { 1031 971 "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)", 1032 972 "MetricExpr": "L1D_PEND_MISS.PENDING / (MEM_LOAD_RETIRED.L1_MISS + MEM_LOAD_RETIRED.FB_HIT)", 1033 973 "MetricGroup": "Mem;MemoryBound;MemoryLat", 1034 974 "MetricName": "tma_info_memory_load_miss_real_latency" 975 + }, 976 + { 977 + "BriefDescription": "STLB (2nd level TLB) data load speculative misses per kilo instruction (misses of any page-size that complete the page walk)", 978 + "MetricExpr": "tma_info_memory_tlb_load_stlb_mpki", 979 + "MetricGroup": "Mem;MemoryTLB;Metric", 980 + "MetricName": "tma_info_memory_load_stlb_mpki" 1035 981 }, 1036 982 { 1037 983 "BriefDescription": "\"Bus lock\" per kilo instruction", ··· 1065 963 }, 1066 964 { 1067 965 "BriefDescription": "Un-cacheable retired load per kilo instruction", 1068 - "MetricExpr": "1e3 * MEM_LOAD_MISC_RETIRED.UC / INST_RETIRED.ANY", 966 + "MetricExpr": "tma_info_memory_uc_load_pki", 1069 967 "MetricGroup": "Mem", 1070 968 "MetricName": "tma_info_memory_mix_uc_load_pki" 1071 969 }, ··· 1075 973 "MetricGroup": "Mem;MemoryBW;MemoryBound", 1076 974 "MetricName": "tma_info_memory_mlp", 1077 975 "PublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)" 976 + }, 977 + { 978 + "BriefDescription": "Utilization of the core's Page Walker(s) serving STLB misses triggered by instruction/Load/Store accesses", 979 + "MetricExpr": "tma_info_memory_tlb_page_walks_utilization", 980 + "MetricGroup": "Core_Metric;Mem;MemoryTLB", 981 + "MetricName": "tma_info_memory_page_walks_utilization", 982 + "MetricThreshold": "tma_info_memory_page_walks_utilization > 0.5" 983 + }, 984 + { 985 + "BriefDescription": "STLB (2nd level TLB) data store speculative misses per kilo instruction (misses of any page-size that complete the page walk)", 986 + "MetricExpr": "tma_info_memory_tlb_store_stlb_mpki", 987 + "MetricGroup": "Mem;MemoryTLB;Metric", 988 + "MetricName": "tma_info_memory_store_stlb_mpki" 1078 989 }, 1079 990 { 1080 991 "BriefDescription": "STLB (2nd level TLB) code speculative misses per kilo instruction (misses of any page-size that complete the page walk)", ··· 1115 1000 "MetricName": "tma_info_memory_tlb_store_stlb_mpki" 1116 1001 }, 1117 1002 { 1118 - "BriefDescription": "", 1003 + "BriefDescription": "Un-cacheable retired load per kilo instruction", 1004 + "MetricExpr": "1e3 * MEM_LOAD_MISC_RETIRED.UC / INST_RETIRED.ANY", 1005 + "MetricGroup": "Mem;Metric", 1006 + "MetricName": "tma_info_memory_uc_load_pki" 1007 + }, 1008 + { 1009 + "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per core", 1119 1010 "MetricExpr": "UOPS_EXECUTED.THREAD / (UOPS_EXECUTED.CORE_CYCLES_GE_1 / 2 if #SMT_on else cpu@UOPS_EXECUTED.THREAD\\,cmask\\=1@)", 1120 1011 "MetricGroup": "Cor;Pipeline;PortsUtil;SMT", 1121 1012 "MetricName": "tma_info_pipeline_execute" 1013 + }, 1014 + { 1015 + "BriefDescription": "Average number of uops fetched from DSB per cycle", 1016 + "MetricExpr": "IDQ.DSB_UOPS / IDQ.DSB_CYCLES_ANY", 1017 + "MetricGroup": "Fed;FetchBW", 1018 + "MetricName": "tma_info_pipeline_fetch_dsb" 1019 + }, 1020 + { 1021 + "BriefDescription": "Average number of uops fetched from LSD per cycle", 1022 + "MetricExpr": "LSD.UOPS / LSD.CYCLES_ACTIVE", 1023 + "MetricGroup": "Fed;FetchBW", 1024 + "MetricName": "tma_info_pipeline_fetch_lsd" 1025 + }, 1026 + { 1027 + "BriefDescription": "Average number of uops fetched from MITE per cycle", 1028 + "MetricExpr": "IDQ.MITE_UOPS / IDQ.MITE_CYCLES_ANY", 1029 + "MetricGroup": "Fed;FetchBW", 1030 + "MetricName": "tma_info_pipeline_fetch_mite" 1122 1031 }, 1123 1032 { 1124 1033 "BriefDescription": "Instructions per a microcode Assist invocation", ··· 1166 1027 }, 1167 1028 { 1168 1029 "BriefDescription": "Average CPU Utilization (percentage)", 1169 - "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", 1030 + "MetricExpr": "tma_info_system_cpus_utilized / #num_cpus_online", 1170 1031 "MetricGroup": "HPC;Summary", 1171 1032 "MetricName": "tma_info_system_cpu_utilization" 1172 1033 }, 1173 1034 { 1174 1035 "BriefDescription": "Average number of utilized CPUs", 1175 - "MetricExpr": "#num_cpus_online * tma_info_system_cpu_utilization", 1036 + "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC", 1176 1037 "MetricGroup": "Summary", 1177 1038 "MetricName": "tma_info_system_cpus_utilized" 1178 1039 }, ··· 1310 1171 "MetricThreshold": "tma_info_thread_uoppi > 1.05" 1311 1172 }, 1312 1173 { 1313 - "BriefDescription": "Instruction per taken branch", 1174 + "BriefDescription": "Uops per taken branch", 1314 1175 "MetricExpr": "tma_retiring * tma_info_thread_slots / BR_INST_RETIRED.NEAR_TAKEN", 1315 1176 "MetricGroup": "Branches;Fed;FetchBW", 1316 1177 "MetricName": "tma_info_thread_uptb", ··· 1319 1180 { 1320 1181 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses", 1321 1182 "MetricExpr": "ICACHE_TAG.STALLS / tma_info_thread_clks", 1322 - "MetricGroup": "BigFootprint;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;tma_fetch_latency_group", 1183 + "MetricGroup": "BigFootprint;BvBC;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;tma_fetch_latency_group", 1323 1184 "MetricName": "tma_itlb_misses", 1324 1185 "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 1325 1186 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: FRONTEND_RETIRED.STLB_MISS_PS;FRONTEND_RETIRED.ITLB_MISS_PS", ··· 1335 1196 "ScaleUnit": "100%" 1336 1197 }, 1337 1198 { 1199 + "BriefDescription": "This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache", 1200 + "MetricExpr": "min(2 * (MEM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETIRED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLES_L1D_MISS, 0)) / tma_info_thread_clks", 1201 + "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_l1_bound_group", 1202 + "MetricName": "tma_l1_hit_latency", 1203 + "MetricThreshold": "tma_l1_hit_latency > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1204 + "PublicDescription": "This metric roughly estimates fraction of cycles with demand load accesses that hit the L1 cache. The short latency of the L1 data cache may be exposed in pointer-chasing memory access patterns as an example. Sample with: MEM_LOAD_RETIRED.L1_HIT", 1205 + "ScaleUnit": "100%" 1206 + }, 1207 + { 1338 1208 "BriefDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads", 1339 1209 "MetricConstraint": "NO_GROUP_EVENTS", 1340 1210 "MetricExpr": "MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) / (MEM_LOAD_RETIRED.L2_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS) + L1D_PEND_MISS.FB_FULL_PERIODS) * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks)", 1341 - "MetricGroup": "CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group", 1211 + "MetricGroup": "BvML;CacheHits;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group", 1342 1212 "MetricName": "tma_l2_bound", 1343 1213 "MetricThreshold": "tma_l2_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)", 1344 1214 "PublicDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads. Avoiding cache misses (i.e. L1 misses/L2 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_RETIRED.L2_HIT_PS", ··· 1366 1218 { 1367 1219 "BriefDescription": "This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited)", 1368 1220 "MetricExpr": "9 * tma_info_system_core_frequency * (MEM_LOAD_RETIRED.L3_HIT * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2)) / tma_info_thread_clks", 1369 - "MetricGroup": "MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_l3_bound_group", 1221 + "MetricGroup": "BvML;MemoryLat;TopdownL4;tma_L4_group;tma_issueLat;tma_l3_bound_group", 1370 1222 "MetricName": "tma_l3_hit_latency", 1371 1223 "MetricThreshold": "tma_l3_hit_latency > 0.1 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1372 1224 "PublicDescription": "This metric estimates fraction of cycles with demand load accesses that hit the L3 cache under unloaded scenarios (possibly L3 latency limited). Avoiding private cache misses (i.e. L2 misses/L3 hits) will improve the latency; reduce contention with sibling physical cores and increase performance. Note the value of this node may overlap with its siblings. Sample with: MEM_LOAD_RETIRED.L3_HIT_PS. Related metrics: tma_info_bottleneck_cache_memory_latency, tma_mem_latency", ··· 1378 1230 "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB", 1379 1231 "MetricName": "tma_lcp", 1380 1232 "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)", 1381 - "PublicDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb", 1233 + "PublicDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs). Using proper compiler flags or Intel Compiler by default will certainly avoid this. #Link: Optimization Guide about LCP BKMs. Related metrics: tma_dsb_switches, tma_fetch_bandwidth, tma_info_botlnk_l2_dsb_bandwidth, tma_info_botlnk_l2_dsb_misses, tma_info_frontend_dsb_coverage, tma_info_inst_mix_iptb", 1382 1234 "ScaleUnit": "100%" 1383 1235 }, 1384 1236 { ··· 1423 1275 "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1_bound_group", 1424 1276 "MetricName": "tma_lock_latency", 1425 1277 "MetricThreshold": "tma_lock_latency > 0.2 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1426 - "PublicDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations. Due to the microarchitecture handling of locks; they are classified as L1_Bound regardless of what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOADS_PS. Related metrics: tma_store_latency", 1278 + "PublicDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations. Due to the microarchitecture handling of locks; they are classified as L1_Bound regardless of what memory source satisfied them. Sample with: MEM_INST_RETIRED.LOCK_LOADS. Related metrics: tma_store_latency", 1427 1279 "ScaleUnit": "100%" 1428 1280 }, 1429 1281 { ··· 1438 1290 { 1439 1291 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Machine Clears", 1440 1292 "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts)", 1441 - "MetricGroup": "BadSpec;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", 1293 + "MetricGroup": "BadSpec;BvMS;MachineClears;TmaL2;TopdownL2;tma_L2_group;tma_bad_speculation_group;tma_issueMC;tma_issueSyncxn", 1442 1294 "MetricName": "tma_machine_clears", 1443 1295 "MetricThreshold": "tma_machine_clears > 0.1 & tma_bad_speculation > 0.15", 1444 1296 "MetricgroupNoGroup": "TopdownL2", ··· 1448 1300 { 1449 1301 "BriefDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM)", 1450 1302 "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\=4@) / tma_info_thread_clks", 1451 - "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueBW", 1303 + "MetricGroup": "BvMS;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueBW", 1452 1304 "MetricName": "tma_mem_bandwidth", 1453 1305 "MetricThreshold": "tma_mem_bandwidth > 0.2 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1454 1306 "PublicDescription": "This metric estimates fraction of cycles where the core's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuristic assumes that a similar off-core traffic is generated by all IA cores. This metric does not aggregate non-data-read requests by this logical processor; requests from other IA Logical Processors/Physical Cores/sockets; or other non-IA devices like GPU; hence the maximum external memory bandwidth limits may or may not be approached when this metric is flagged (see Uncore counters for that). Related metrics: tma_fb_full, tma_info_bottleneck_cache_memory_bandwidth, tma_info_system_dram_bw_use, tma_sq_full", ··· 1457 1309 { 1458 1310 "BriefDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM)", 1459 1311 "MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth", 1460 - "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueLat", 1312 + "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_dram_bound_group;tma_issueLat", 1461 1313 "MetricName": "tma_mem_latency", 1462 1314 "MetricThreshold": "tma_mem_latency > 0.1 & (tma_dram_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1463 1315 "PublicDescription": "This metric estimates fraction of cycles where the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM). This metric does not aggregate requests from other Logical Processors/Physical Cores/sockets (see Uncore counters for that). Related metrics: tma_info_bottleneck_cache_memory_latency, tma_l3_hit_latency", ··· 1494 1346 { 1495 1347 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage", 1496 1348 "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.COUNT) * INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_thread_clks", 1497 - "MetricGroup": "BadSpec;BrMispredicts;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueBM", 1349 + "MetricGroup": "BadSpec;BrMispredicts;BvMP;TopdownL4;tma_L4_group;tma_branch_resteers_group;tma_issueBM", 1498 1350 "MetricName": "tma_mispredicts_resteers", 1499 1351 "MetricThreshold": "tma_mispredicts_resteers > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", 1500 1352 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers as a result of Branch Misprediction at execution stage. Sample with: INT_MISC.CLEAR_RESTEER_CYCLES. Related metrics: tma_branch_mispredicts, tma_info_bad_spec_branch_misprediction_cost, tma_info_bottleneck_mispredictions", ··· 1538 1390 { 1539 1391 "BriefDescription": "This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions", 1540 1392 "MetricExpr": "tma_light_operations * INST_RETIRED.NOP / (tma_retiring * tma_info_thread_slots)", 1541 - "MetricGroup": "Pipeline;TopdownL4;tma_L4_group;tma_other_light_ops_group", 1393 + "MetricGroup": "BvBO;Pipeline;TopdownL4;tma_L4_group;tma_other_light_ops_group", 1542 1394 "MetricName": "tma_nop_instructions", 1543 1395 "MetricThreshold": "tma_nop_instructions > 0.1 & (tma_other_light_ops > 0.3 & tma_light_operations > 0.6)", 1544 1396 "PublicDescription": "This metric represents fraction of slots where the CPU was retiring NOP (no op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address of a function or loop body. Sample with: INST_RETIRED.NOP", ··· 1557 1409 { 1558 1410 "BriefDescription": "This metric estimates fraction of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches or other types).", 1559 1411 "MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.ALL_BRANCHES / (INT_MISC.CLEARS_COUNT - MACHINE_CLEARS.COUNT)), 0.0001)", 1560 - "MetricGroup": "BrMispredicts;TopdownL3;tma_L3_group;tma_branch_mispredicts_group", 1412 + "MetricGroup": "BrMispredicts;BvIO;TopdownL3;tma_L3_group;tma_branch_mispredicts_group", 1561 1413 "MetricName": "tma_other_mispredicts", 1562 1414 "MetricThreshold": "tma_other_mispredicts > 0.05 & (tma_branch_mispredicts > 0.1 & tma_bad_speculation > 0.15)", 1563 1415 "ScaleUnit": "100%" ··· 1565 1417 { 1566 1418 "BriefDescription": "This metric represents fraction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering.", 1567 1419 "MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.COUNT), 0.0001)", 1568 - "MetricGroup": "Machine_Clears;TopdownL3;tma_L3_group;tma_machine_clears_group", 1420 + "MetricGroup": "BvIO;Machine_Clears;TopdownL3;tma_L3_group;tma_machine_clears_group", 1569 1421 "MetricName": "tma_other_nukes", 1570 1422 "MetricThreshold": "tma_other_nukes > 0.05 & (tma_machine_clears > 0.1 & tma_bad_speculation > 0.15)", 1571 1423 "ScaleUnit": "100%" ··· 1617 1469 }, 1618 1470 { 1619 1471 "BriefDescription": "This metric represents fraction of cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL, Physical Core cycles otherwise)", 1620 - "MetricExpr": "(cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ + tma_core_bound * RS_EVENTS.EMPTY_CYCLES) / tma_info_thread_clks * (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALLS_MEM_ANY) / tma_info_thread_clks", 1472 + "MetricExpr": "cpu@EXE_ACTIVITY.3_PORTS_UTIL\\,umask\\=0x80@ / tma_info_thread_clks", 1621 1473 "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group", 1622 1474 "MetricName": "tma_ports_utilized_0", 1623 1475 "MetricThreshold": "tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", ··· 1645 1497 { 1646 1498 "BriefDescription": "This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)", 1647 1499 "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks", 1648 - "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group", 1500 + "MetricGroup": "BvCB;PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group", 1649 1501 "MetricName": "tma_ports_utilized_3m", 1650 1502 "MetricThreshold": "tma_ports_utilized_3m > 0.4 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))", 1651 1503 "PublicDescription": "This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise). Sample with: UOPS_EXECUTED.CYCLES_GE_3", ··· 1655 1507 "BriefDescription": "This category represents fraction of slots utilized by useful work i.e. issued uops that eventually get retired", 1656 1508 "DefaultMetricgroupName": "TopdownL1", 1657 1509 "MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retiring + topdown\\-be\\-bound) + 0 * tma_info_thread_slots", 1658 - "MetricGroup": "Default;TmaL1;TopdownL1;tma_L1_group", 1510 + "MetricGroup": "BvUW;Default;TmaL1;TopdownL1;tma_L1_group", 1659 1511 "MetricName": "tma_retiring", 1660 1512 "MetricThreshold": "tma_retiring > 0.7 | tma_heavy_operations > 0.1", 1661 1513 "MetricgroupNoGroup": "TopdownL1;Default", ··· 1665 1517 { 1666 1518 "BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations", 1667 1519 "MetricExpr": "RESOURCE_STALLS.SCOREBOARD / tma_info_thread_clks", 1668 - "MetricGroup": "PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_group;tma_issueSO", 1520 + "MetricGroup": "BvIO;PortsUtil;TopdownL3;tma_L3_group;tma_core_bound_group;tma_issueSO", 1669 1521 "MetricName": "tma_serializing_operation", 1670 1522 "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)", 1671 1523 "PublicDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations. Instructions like CPUID; WRMSR or LFENCE serialize the out-of-order execution which may limit performance. Sample with: RESOURCE_STALLS.SCOREBOARD. Related metrics: tma_ms_switches", ··· 1702 1554 { 1703 1555 "BriefDescription": "This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors)", 1704 1556 "MetricExpr": "L1D_PEND_MISS.L2_STALL / tma_info_thread_clks", 1705 - "MetricGroup": "MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueBW;tma_l3_bound_group", 1557 + "MetricGroup": "BvMS;MemoryBW;Offcore;TopdownL4;tma_L4_group;tma_issueBW;tma_l3_bound_group", 1706 1558 "MetricName": "tma_sq_full", 1707 1559 "MetricThreshold": "tma_sq_full > 0.3 & (tma_l3_bound > 0.05 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1708 1560 "PublicDescription": "This metric measures fraction of cycles where the Super Queue (SQ) was full taking into account all request-types and both hardware SMT threads (Logical Processors). Related metrics: tma_fb_full, tma_info_bottleneck_cache_memory_bandwidth, tma_info_system_dram_bw_use, tma_mem_bandwidth", ··· 1730 1582 { 1731 1583 "BriefDescription": "This metric estimates fraction of cycles the CPU spent handling L1D store misses", 1732 1584 "MetricExpr": "(L2_RQSTS.RFO_HIT * 10 * (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) + (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES) * min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO)) / tma_info_thread_clks", 1733 - "MetricGroup": "MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_issueSL;tma_store_bound_group", 1585 + "MetricGroup": "BvML;MemoryLat;Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_issueSL;tma_store_bound_group", 1734 1586 "MetricName": "tma_store_latency", 1735 1587 "MetricThreshold": "tma_store_latency > 0.1 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))", 1736 1588 "PublicDescription": "This metric estimates fraction of cycles the CPU spent handling L1D store misses. Store accesses usually less impact out-of-order core performance; however; holding resources for longer time can lead into undesired implications (e.g. contention on L1D fill-buffer entries - see FB_Full). Related metrics: tma_fb_full, tma_lock_latency", ··· 1773 1625 { 1774 1626 "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to new branch address clears", 1775 1627 "MetricExpr": "10 * BACLEARS.ANY / tma_info_thread_clks", 1776 - "MetricGroup": "BigFootprint;FetchLat;TopdownL4;tma_L4_group;tma_branch_resteers_group", 1628 + "MetricGroup": "BigFootprint;BvBC;FetchLat;TopdownL4;tma_L4_group;tma_branch_resteers_group", 1777 1629 "MetricName": "tma_unknown_branches", 1778 1630 "MetricThreshold": "tma_unknown_branches > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))", 1779 1631 "PublicDescription": "This metric represents fraction of cycles the CPU was stalled due to new branch address clears. These are fetched branches the Branch Prediction Unit was unable to recognize (e.g. first time the branch is fetched or hitting BTB capacity limit) hence called Unknown Branches. Sample with: BACLEARS.ANY",
+22 -6
tools/perf/pmu-events/arch/x86/rocketlake/uncore-interconnect.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Number of entries allocated. Account for Any type: e.g. Snoop, etc.", 4 + "Counter": "1", 4 5 "EventCode": "0x84", 5 6 "EventName": "UNC_ARB_COH_TRK_REQUESTS.ALL", 6 7 "PerPkg": "1", ··· 9 8 "Unit": "ARB" 10 9 }, 11 10 { 12 - "BriefDescription": "Each cycle counts number of any coherent request at memory controller that were issued by any core. This event is not supported on ICL products but is supported on RKL products.", 11 + "BriefDescription": "Each cycle counts number of any coherent requests at memory controller that were issued by any core.", 12 + "Counter": "0", 13 13 "EventCode": "0x85", 14 14 "EventName": "UNC_ARB_DAT_OCCUPANCY.ALL", 15 + "Experimental": "1", 15 16 "PerPkg": "1", 16 17 "UMask": "0x1", 17 18 "Unit": "ARB" 18 19 }, 19 20 { 20 - "BriefDescription": "Each cycle counts number of coherent reads pending on data return from memory controller that were issued by any core. This event is not supported on ICL products but is supported on RKL products.", 21 + "BriefDescription": "Each cycle counts number of coherent reads pending on data return from memory controller that were issued by any core.", 22 + "Counter": "0", 21 23 "EventCode": "0x85", 22 24 "EventName": "UNC_ARB_DAT_OCCUPANCY.RD", 25 + "Experimental": "1", 23 26 "PerPkg": "1", 24 27 "UMask": "0x2", 25 28 "Unit": "ARB" 26 29 }, 27 30 { 28 - "BriefDescription": "Each cycle count number of 'valid' coherent Data Read entries . Such entry is defined as valid when it is allocated till deallocation. Doesn't include prefetches. This event is not supported on ICL products but is supported on RKL products.", 31 + "BriefDescription": "Each cycle counts number of valid coherent Data Read entries. Such entry is defined as valid when it is allocated until deallocation. Does not include prefetches.", 32 + "Counter": "0", 29 33 "EventCode": "0x80", 30 34 "EventName": "UNC_ARB_REQ_TRK_OCCUPANCY.DRD", 35 + "Experimental": "1", 31 36 "PerPkg": "1", 32 37 "UMask": "0x2", 33 38 "Unit": "ARB" 34 39 }, 35 40 { 36 41 "BriefDescription": "Number of all coherent Data Read entries. Doesn't include prefetches", 42 + "Counter": "1", 37 43 "EventCode": "0x81", 38 44 "EventName": "UNC_ARB_REQ_TRK_REQUEST.DRD", 45 + "Experimental": "1", 39 46 "PerPkg": "1", 40 47 "UMask": "0x2", 41 48 "Unit": "ARB" 42 49 }, 43 50 { 44 - "BriefDescription": "Each cycle counts number of all outgoing valid entries in ReqTrk. Such entry is defined as valid from its allocation in ReqTrk till deallocation. Accounts for Coherent and non-coherent traffic. This event is not supported on ICL products but is supported on RKL products.", 51 + "BriefDescription": "Each cycle counts number of all outgoing valid entries in ReqTrk. Such entry is defined as valid from its allocation in ReqTrk until deallocation. Accounts for Coherent and non-coherent traffic.", 52 + "Counter": "0", 45 53 "EventCode": "0x80", 46 54 "EventName": "UNC_ARB_TRK_OCCUPANCY.ALL", 55 + "Experimental": "1", 47 56 "PerPkg": "1", 48 57 "UMask": "0x1", 49 58 "Unit": "ARB" 50 59 }, 51 60 { 52 - "BriefDescription": "Each cycle count number of 'valid' coherent Data Read entries . Such entry is defined as valid when it is allocated till deallocation. Doesn't include prefetches. This event is not supported on ICL products but is supported on RKL products.", 61 + "BriefDescription": "Each cycle counts number of valid coherent Data Read entries. Such entry is defined as valid when it is allocated until deallocation. Does not include prefetches.", 62 + "Counter": "0", 53 63 "EventCode": "0x80", 54 64 "EventName": "UNC_ARB_TRK_OCCUPANCY.RD", 65 + "Experimental": "1", 55 66 "PerPkg": "1", 56 67 "UMask": "0x2", 57 68 "Unit": "ARB" 58 69 }, 59 70 { 60 71 "BriefDescription": "Total number of all outgoing entries allocated. Accounts for Coherent and non-coherent traffic.", 72 + "Counter": "1", 61 73 "EventCode": "0x81", 62 74 "EventName": "UNC_ARB_TRK_REQUESTS.ALL", 63 75 "PerPkg": "1", ··· 78 64 "Unit": "ARB" 79 65 }, 80 66 { 81 - "BriefDescription": "Number of all coherent Data Read entries. Doesn't include prefetches. This event is not supported on ICL products but is supported on RKL products.", 67 + "BriefDescription": "Counts number of all coherent Data Read entries. Does not include prefetches.", 68 + "Counter": "0,1", 82 69 "EventCode": "0x81", 83 70 "EventName": "UNC_ARB_TRK_REQUESTS.RD", 71 + "Experimental": "1", 84 72 "PerPkg": "1", 85 73 "UMask": "0x2", 86 74 "Unit": "ARB"
+1
tools/perf/pmu-events/arch/x86/rocketlake/uncore-other.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "UNC_CLOCK.SOCKET", 4 + "Counter": "FIXED", 4 5 "EventCode": "0xff", 5 6 "EventName": "UNC_CLOCK.SOCKET", 6 7 "PerPkg": "1",
+20
tools/perf/pmu-events/arch/x86/rocketlake/virtual-memory.json
··· 1 1 [ 2 2 { 3 3 "BriefDescription": "Loads that miss the DTLB and hit the STLB.", 4 + "Counter": "0,1,2,3", 4 5 "EventCode": "0x08", 5 6 "EventName": "DTLB_LOAD_MISSES.STLB_HIT", 6 7 "PublicDescription": "Counts loads that miss the DTLB (Data TLB) and hit the STLB (Second level TLB).", ··· 10 9 }, 11 10 { 12 11 "BriefDescription": "Cycles when at least one PMH is busy with a page walk for a demand load.", 12 + "Counter": "0,1,2,3", 13 13 "CounterMask": "1", 14 14 "EventCode": "0x08", 15 15 "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE", ··· 20 18 }, 21 19 { 22 20 "BriefDescription": "Load miss in all TLB levels causes a page walk that completes. (All page sizes)", 21 + "Counter": "0,1,2,3", 23 22 "EventCode": "0x08", 24 23 "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED", 25 24 "PublicDescription": "Counts completed page walks (all page sizes) caused by demand data loads. This implies it missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 29 26 }, 30 27 { 31 28 "BriefDescription": "Page walks completed due to a demand data load to a 2M/4M page.", 29 + "Counter": "0,1,2,3", 32 30 "EventCode": "0x08", 33 31 "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M", 34 32 "PublicDescription": "Counts completed page walks (2M/4M sizes) caused by demand data loads. This implies address translations missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 38 34 }, 39 35 { 40 36 "BriefDescription": "Page walks completed due to a demand data load to a 4K page.", 37 + "Counter": "0,1,2,3", 41 38 "EventCode": "0x08", 42 39 "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K", 43 40 "PublicDescription": "Counts completed page walks (4K sizes) caused by demand data loads. This implies address translations missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 47 42 }, 48 43 { 49 44 "BriefDescription": "Number of page walks outstanding for a demand load in the PMH each cycle.", 45 + "Counter": "0,1,2,3", 50 46 "EventCode": "0x08", 51 47 "EventName": "DTLB_LOAD_MISSES.WALK_PENDING", 52 48 "PublicDescription": "Counts the number of page walks outstanding for a demand load in the PMH (Page Miss Handler) each cycle.", ··· 56 50 }, 57 51 { 58 52 "BriefDescription": "Stores that miss the DTLB and hit the STLB.", 53 + "Counter": "0,1,2,3", 59 54 "EventCode": "0x49", 60 55 "EventName": "DTLB_STORE_MISSES.STLB_HIT", 61 56 "PublicDescription": "Counts stores that miss the DTLB (Data TLB) and hit the STLB (2nd Level TLB).", ··· 65 58 }, 66 59 { 67 60 "BriefDescription": "Cycles when at least one PMH is busy with a page walk for a store.", 61 + "Counter": "0,1,2,3", 68 62 "CounterMask": "1", 69 63 "EventCode": "0x49", 70 64 "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE", ··· 75 67 }, 76 68 { 77 69 "BriefDescription": "Store misses in all TLB levels causes a page walk that completes. (All page sizes)", 70 + "Counter": "0,1,2,3", 78 71 "EventCode": "0x49", 79 72 "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED", 80 73 "PublicDescription": "Counts completed page walks (all page sizes) caused by demand data stores. This implies it missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 84 75 }, 85 76 { 86 77 "BriefDescription": "Page walks completed due to a demand data store to a 2M/4M page.", 78 + "Counter": "0,1,2,3", 87 79 "EventCode": "0x49", 88 80 "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M", 89 81 "PublicDescription": "Counts completed page walks (2M/4M sizes) caused by demand data stores. This implies address translations missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 93 83 }, 94 84 { 95 85 "BriefDescription": "Page walks completed due to a demand data store to a 4K page.", 86 + "Counter": "0,1,2,3", 96 87 "EventCode": "0x49", 97 88 "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K", 98 89 "PublicDescription": "Counts completed page walks (4K sizes) caused by demand data stores. This implies address translations missed in the DTLB and further levels of TLB. The page walk can end with or without a fault.", ··· 102 91 }, 103 92 { 104 93 "BriefDescription": "Number of page walks outstanding for a store in the PMH each cycle.", 94 + "Counter": "0,1,2,3", 105 95 "EventCode": "0x49", 106 96 "EventName": "DTLB_STORE_MISSES.WALK_PENDING", 107 97 "PublicDescription": "Counts the number of page walks outstanding for a store in the PMH (Page Miss Handler) each cycle.", ··· 111 99 }, 112 100 { 113 101 "BriefDescription": "Instruction fetch requests that miss the ITLB and hit the STLB.", 102 + "Counter": "0,1,2,3", 114 103 "EventCode": "0x85", 115 104 "EventName": "ITLB_MISSES.STLB_HIT", 116 105 "PublicDescription": "Counts instruction fetch requests that miss the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).", ··· 120 107 }, 121 108 { 122 109 "BriefDescription": "Cycles when at least one PMH is busy with a page walk for code (instruction fetch) request.", 110 + "Counter": "0,1,2,3", 123 111 "CounterMask": "1", 124 112 "EventCode": "0x85", 125 113 "EventName": "ITLB_MISSES.WALK_ACTIVE", ··· 130 116 }, 131 117 { 132 118 "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (All page sizes)", 119 + "Counter": "0,1,2,3", 133 120 "EventCode": "0x85", 134 121 "EventName": "ITLB_MISSES.WALK_COMPLETED", 135 122 "PublicDescription": "Counts completed page walks (all page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.", ··· 139 124 }, 140 125 { 141 126 "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (2M/4M)", 127 + "Counter": "0,1,2,3", 142 128 "EventCode": "0x85", 143 129 "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M", 144 130 "PublicDescription": "Counts completed page walks (2M/4M page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.", ··· 148 132 }, 149 133 { 150 134 "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (4K)", 135 + "Counter": "0,1,2,3", 151 136 "EventCode": "0x85", 152 137 "EventName": "ITLB_MISSES.WALK_COMPLETED_4K", 153 138 "PublicDescription": "Counts completed page walks (4K page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.", ··· 157 140 }, 158 141 { 159 142 "BriefDescription": "Number of page walks outstanding for an outstanding code request in the PMH each cycle.", 143 + "Counter": "0,1,2,3", 160 144 "EventCode": "0x85", 161 145 "EventName": "ITLB_MISSES.WALK_PENDING", 162 146 "PublicDescription": "Counts the number of page walks outstanding for an outstanding code (instruction fetch) request in the PMH (Page Miss Handler) each cycle.", ··· 166 148 }, 167 149 { 168 150 "BriefDescription": "DTLB flush attempts of the thread-specific entries", 151 + "Counter": "0,1,2,3", 169 152 "EventCode": "0xBD", 170 153 "EventName": "TLB_FLUSH.DTLB_THREAD", 171 154 "PublicDescription": "Counts the number of DTLB flush attempts of the thread-specific entries.", ··· 175 156 }, 176 157 { 177 158 "BriefDescription": "STLB flush attempts", 159 + "Counter": "0,1,2,3", 178 160 "EventCode": "0xBD", 179 161 "EventName": "TLB_FLUSH.STLB_ANY", 180 162 "PublicDescription": "Counts the number of any STLB flush attempts (such as entire, VPID, PCID, InvPage, CR3 write, etc.).",