Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

- Extend the resctrl machinery to support telemetry monitoring on
Intel (Tony Luck)

The practical usage of this is being able to tell how much energy or
how much work can be attributed to a group of tasks tracked under a
single idenitifier. Prepend this work with proper refactoring of
resctrl domains handling code.

* tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86,fs/resctrl: Update documentation for telemetry events
x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
fs/resctrl: Move RMID initialization to first mount
x86,fs/resctrl: Compute number of RMIDs as minimum across resources
fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
x86/resctrl: Add energy/perf choices to rdt boot option
x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
fs/resctrl: Refactor mkdir_mondata_subdir()
x86/resctrl: Read telemetry events
x86/resctrl: Find and enable usable telemetry events
x86,fs/resctrl: Add architectural event pointer
x86,fs/resctrl: Fill in details of events for performance and energy GUIDs
x86/resctrl: Discover hardware telemetry events
fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains
x86,fs/resctrl: Add and initialize a resource for package scope monitoring
x86,fs/resctrl: Add an architectural hook called for first mount
x86,fs/resctrl: Support binary fixed point event counters
x86,fs/resctrl: Handle events that can be read from any CPU
...

+1306 -410
+6 -1
Documentation/admin-guide/kernel-parameters.txt
··· 6354 6354 rdt= [HW,X86,RDT] 6355 6355 Turn on/off individual RDT features. List is: 6356 6356 cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, 6357 - mba, smba, bmec, abmc, sdciae. 6357 + mba, smba, bmec, abmc, sdciae, energy[:guid], 6358 + perf[:guid]. 6358 6359 E.g. to turn on cmt and turn off mba use: 6359 6360 rdt=cmt,!mba 6361 + To turn off all energy telemetry monitoring and ensure that 6362 + perf telemetry monitoring associated with guid 0x12345 6363 + is enabled use: 6364 + rdt=!energy,perf:0x12345 6360 6365 6361 6366 reboot= [KNL] 6362 6367 Format (x86 or x86_64):
+54 -12
Documentation/filesystems/resctrl.rst
··· 252 252 bandwidth percentages are directly applied to 253 253 the threads running on the core 254 254 255 - If RDT monitoring is available there will be an "L3_MON" directory 255 + If L3 monitoring is available there will be an "L3_MON" directory 256 256 with the following files: 257 257 258 258 "num_rmids": 259 - The number of RMIDs available. This is the 260 - upper bound for how many "CTRL_MON" + "MON" 261 - groups can be created. 259 + The number of RMIDs supported by hardware for 260 + L3 monitoring events. 262 261 263 262 "mon_features": 264 263 Lists the monitoring events if ··· 483 484 bytes) at which a previously used LLC_occupancy 484 485 counter can be considered for reuse. 485 486 487 + If telemetry monitoring is available there will be a "PERF_PKG_MON" directory 488 + with the following files: 489 + 490 + "num_rmids": 491 + The number of RMIDs for telemetry monitoring events. 492 + 493 + On Intel resctrl will not enable telemetry events if the number of 494 + RMIDs that can be tracked concurrently is lower than the total number 495 + of RMIDs supported. Telemetry events can be force-enabled with the 496 + "rdt=" kernel parameter, but this may reduce the number of 497 + monitoring groups that can be created. 498 + 499 + "mon_features": 500 + Lists the telemetry monitoring events that are enabled on this system. 501 + 502 + The upper bound for how many "CTRL_MON" + "MON" can be created 503 + is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values. 504 + 486 505 Finally, in the top level of the "info" directory there is a file 487 506 named "last_cmd_status". This is reset with every "command" issued 488 507 via the file system (making new directories or writing to any of the ··· 606 589 When monitoring is enabled all MON groups will also contain: 607 590 608 591 "mon_data": 609 - This contains a set of files organized by L3 domain and by 610 - RDT event. E.g. on a system with two L3 domains there will 611 - be subdirectories "mon_L3_00" and "mon_L3_01". Each of these 612 - directories have one file per event (e.g. "llc_occupancy", 613 - "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these 614 - files provide a read out of the current value of the event for 615 - all tasks in the group. In CTRL_MON groups these files provide 616 - the sum for all tasks in the CTRL_MON group and all tasks in 592 + This contains directories for each monitor domain. 593 + 594 + If L3 monitoring is enabled, there will be a "mon_L3_XX" directory for 595 + each instance of an L3 cache. Each directory contains files for the enabled 596 + L3 events (e.g. "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes"). 597 + 598 + If telemetry monitoring is enabled, there will be a "mon_PERF_PKG_YY" 599 + directory for each physical processor package. Each directory contains 600 + files for the enabled telemetry events (e.g. "core_energy". "activity", 601 + "uops_retired", etc.) 602 + 603 + The info/`*`/mon_features files provide the full list of enabled 604 + event/file names. 605 + 606 + "core energy" reports a floating point number for the energy (in Joules) 607 + consumed by cores (registers, arithmetic units, TLB and L1/L2 caches) 608 + during execution of instructions summed across all logical CPUs on a 609 + package for the current monitoring group. 610 + 611 + "activity" also reports a floating point value (in Farads). This provides 612 + an estimate of work done independent of the frequency that the CPUs used 613 + for execution. 614 + 615 + Note that "core energy" and "activity" only measure energy/activity in the 616 + "core" of the CPU (arithmetic units, TLB, L1 and L2 caches, etc.). They 617 + do not include L3 cache, memory, I/O devices etc. 618 + 619 + All other events report decimal integer values. 620 + 621 + In a MON group these files provide a read out of the current value of 622 + the event for all tasks in the group. In CTRL_MON groups these files 623 + provide the sum for all tasks in the CTRL_MON group and all tasks in 617 624 MON groups. Please see example section for more details on usage. 625 + 618 626 On systems with Sub-NUMA Cluster (SNC) enabled there are extra 619 627 directories for each node (located within the "mon_L3_XX" directory 620 628 for the L3 cache they occupy). These are named "mon_sub_L3_YY"
+13
arch/x86/Kconfig
··· 541 541 542 542 Say N if unsure. 543 543 544 + config X86_CPU_RESCTRL_INTEL_AET 545 + bool "Intel Application Energy Telemetry" 546 + depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y 547 + help 548 + Enable per-RMID telemetry events in resctrl. 549 + 550 + Intel feature that collects per-RMID execution data 551 + about energy consumption, measure of frequency independent 552 + activity and other performance metrics. Data is aggregated 553 + per package. 554 + 555 + Say N if unsure. 556 + 544 557 config X86_FRED 545 558 bool "Flexible Return and Event Delivery" 546 559 depends on X86_64
+1
arch/x86/kernel/cpu/resctrl/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o 3 3 obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o 4 + obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET) += intel_aet.o 4 5 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK) += pseudo_lock.o 5 6 6 7 # To allow define_trace.h's recursive include:
+161 -83
arch/x86/kernel/cpu/resctrl/core.c
··· 100 100 .schema_fmt = RESCTRL_SCHEMA_RANGE, 101 101 }, 102 102 }, 103 + [RDT_RESOURCE_PERF_PKG] = 104 + { 105 + .r_resctrl = { 106 + .name = "PERF_PKG", 107 + .mon_scope = RESCTRL_PACKAGE, 108 + .mon_domains = mon_domain_init(RDT_RESOURCE_PERF_PKG), 109 + }, 110 + }, 103 111 }; 104 112 113 + /** 114 + * resctrl_arch_system_num_rmid_idx - Compute number of supported RMIDs 115 + * (minimum across all mon_capable resource) 116 + * 117 + * Return: Number of supported RMIDs at time of call. Note that mount time 118 + * enumeration of resources may reduce the number. 119 + */ 105 120 u32 resctrl_arch_system_num_rmid_idx(void) 106 121 { 107 - struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; 122 + u32 num_rmids = U32_MAX; 123 + struct rdt_resource *r; 124 + 125 + for_each_mon_capable_rdt_resource(r) 126 + num_rmids = min(num_rmids, r->mon.num_rmid); 108 127 109 128 /* RMID are independent numbers for x86. num_rmid_idx == num_rmid */ 110 - return r->mon.num_rmid; 129 + return num_rmids == U32_MAX ? 0 : num_rmids; 111 130 } 112 131 113 132 struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l) ··· 387 368 kfree(hw_dom); 388 369 } 389 370 390 - static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom) 371 + static void l3_mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom) 391 372 { 392 373 int idx; 393 374 ··· 420 401 } 421 402 422 403 /** 423 - * arch_domain_mbm_alloc() - Allocate arch private storage for the MBM counters 404 + * l3_mon_domain_mbm_alloc() - Allocate arch private storage for the MBM counters 424 405 * @num_rmid: The size of the MBM counter array 425 406 * @hw_dom: The domain that owns the allocated arrays 407 + * 408 + * Return: 0 for success, or -ENOMEM. 426 409 */ 427 - static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom) 410 + static int l3_mon_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom) 428 411 { 429 412 size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]); 430 413 enum resctrl_event_id eventid; ··· 459 438 return get_cpu_cacheinfo_id(cpu, scope); 460 439 case RESCTRL_L3_NODE: 461 440 return cpu_to_node(cpu); 441 + case RESCTRL_PACKAGE: 442 + return topology_physical_package_id(cpu); 462 443 default: 463 444 break; 464 445 } ··· 487 464 488 465 hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos); 489 466 if (hdr) { 490 - if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN)) 467 + if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid)) 491 468 return; 492 469 d = container_of(hdr, struct rdt_ctrl_domain, hdr); 493 470 ··· 504 481 d = &hw_dom->d_resctrl; 505 482 d->hdr.id = id; 506 483 d->hdr.type = RESCTRL_CTRL_DOMAIN; 484 + d->hdr.rid = r->rid; 507 485 cpumask_set_cpu(cpu, &d->hdr.cpu_mask); 508 486 509 487 rdt_domain_reconfigure_cdp(r); ··· 524 500 } 525 501 } 526 502 503 + static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos) 504 + { 505 + struct rdt_hw_l3_mon_domain *hw_dom; 506 + struct rdt_l3_mon_domain *d; 507 + struct cacheinfo *ci; 508 + int err; 509 + 510 + hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); 511 + if (!hw_dom) 512 + return; 513 + 514 + d = &hw_dom->d_resctrl; 515 + d->hdr.id = id; 516 + d->hdr.type = RESCTRL_MON_DOMAIN; 517 + d->hdr.rid = RDT_RESOURCE_L3; 518 + ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE); 519 + if (!ci) { 520 + pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name); 521 + l3_mon_domain_free(hw_dom); 522 + return; 523 + } 524 + d->ci_id = ci->id; 525 + cpumask_set_cpu(cpu, &d->hdr.cpu_mask); 526 + 527 + arch_mon_domain_online(r, d); 528 + 529 + if (l3_mon_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) { 530 + l3_mon_domain_free(hw_dom); 531 + return; 532 + } 533 + 534 + list_add_tail_rcu(&d->hdr.list, add_pos); 535 + 536 + err = resctrl_online_mon_domain(r, &d->hdr); 537 + if (err) { 538 + list_del_rcu(&d->hdr.list); 539 + synchronize_rcu(); 540 + l3_mon_domain_free(hw_dom); 541 + } 542 + } 543 + 527 544 static void domain_add_cpu_mon(int cpu, struct rdt_resource *r) 528 545 { 529 546 int id = get_domain_id_from_scope(cpu, r->mon_scope); 530 547 struct list_head *add_pos = NULL; 531 - struct rdt_hw_mon_domain *hw_dom; 532 548 struct rdt_domain_hdr *hdr; 533 - struct rdt_mon_domain *d; 534 - struct cacheinfo *ci; 535 - int err; 536 549 537 550 lockdep_assert_held(&domain_list_lock); 538 551 ··· 580 519 } 581 520 582 521 hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos); 583 - if (hdr) { 584 - if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) 585 - return; 586 - d = container_of(hdr, struct rdt_mon_domain, hdr); 522 + if (hdr) 523 + cpumask_set_cpu(cpu, &hdr->cpu_mask); 587 524 588 - cpumask_set_cpu(cpu, &d->hdr.cpu_mask); 525 + switch (r->rid) { 526 + case RDT_RESOURCE_L3: 589 527 /* Update the mbm_assign_mode state for the CPU if supported */ 590 528 if (r->mon.mbm_cntr_assignable) 591 529 resctrl_arch_mbm_cntr_assign_set_one(r); 592 - return; 593 - } 594 - 595 - hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); 596 - if (!hw_dom) 597 - return; 598 - 599 - d = &hw_dom->d_resctrl; 600 - d->hdr.id = id; 601 - d->hdr.type = RESCTRL_MON_DOMAIN; 602 - ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE); 603 - if (!ci) { 604 - pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name); 605 - mon_domain_free(hw_dom); 606 - return; 607 - } 608 - d->ci_id = ci->id; 609 - cpumask_set_cpu(cpu, &d->hdr.cpu_mask); 610 - 611 - /* Update the mbm_assign_mode state for the CPU if supported */ 612 - if (r->mon.mbm_cntr_assignable) 613 - resctrl_arch_mbm_cntr_assign_set_one(r); 614 - 615 - arch_mon_domain_online(r, d); 616 - 617 - if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) { 618 - mon_domain_free(hw_dom); 619 - return; 620 - } 621 - 622 - list_add_tail_rcu(&d->hdr.list, add_pos); 623 - 624 - err = resctrl_online_mon_domain(r, d); 625 - if (err) { 626 - list_del_rcu(&d->hdr.list); 627 - synchronize_rcu(); 628 - mon_domain_free(hw_dom); 530 + if (!hdr) 531 + l3_mon_domain_setup(cpu, id, r, add_pos); 532 + break; 533 + case RDT_RESOURCE_PERF_PKG: 534 + if (!hdr) 535 + intel_aet_mon_domain_setup(cpu, id, r, add_pos); 536 + break; 537 + default: 538 + pr_warn_once("Unknown resource rid=%d\n", r->rid); 539 + break; 629 540 } 630 541 } 631 542 ··· 631 598 return; 632 599 } 633 600 634 - if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN)) 601 + cpumask_clear_cpu(cpu, &hdr->cpu_mask); 602 + if (!cpumask_empty(&hdr->cpu_mask)) 603 + return; 604 + 605 + if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid)) 635 606 return; 636 607 637 608 d = container_of(hdr, struct rdt_ctrl_domain, hdr); 638 609 hw_dom = resctrl_to_arch_ctrl_dom(d); 639 610 640 - cpumask_clear_cpu(cpu, &d->hdr.cpu_mask); 641 - if (cpumask_empty(&d->hdr.cpu_mask)) { 642 - resctrl_offline_ctrl_domain(r, d); 643 - list_del_rcu(&d->hdr.list); 644 - synchronize_rcu(); 611 + resctrl_offline_ctrl_domain(r, d); 612 + list_del_rcu(&hdr->list); 613 + synchronize_rcu(); 645 614 646 - /* 647 - * rdt_ctrl_domain "d" is going to be freed below, so clear 648 - * its pointer from pseudo_lock_region struct. 649 - */ 650 - if (d->plr) 651 - d->plr->d = NULL; 652 - ctrl_domain_free(hw_dom); 653 - 654 - return; 655 - } 615 + /* 616 + * rdt_ctrl_domain "d" is going to be freed below, so clear 617 + * its pointer from pseudo_lock_region struct. 618 + */ 619 + if (d->plr) 620 + d->plr->d = NULL; 621 + ctrl_domain_free(hw_dom); 656 622 } 657 623 658 624 static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r) 659 625 { 660 626 int id = get_domain_id_from_scope(cpu, r->mon_scope); 661 - struct rdt_hw_mon_domain *hw_dom; 662 627 struct rdt_domain_hdr *hdr; 663 - struct rdt_mon_domain *d; 664 628 665 629 lockdep_assert_held(&domain_list_lock); 666 630 ··· 674 644 return; 675 645 } 676 646 677 - if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) 647 + cpumask_clear_cpu(cpu, &hdr->cpu_mask); 648 + if (!cpumask_empty(&hdr->cpu_mask)) 678 649 return; 679 650 680 - d = container_of(hdr, struct rdt_mon_domain, hdr); 681 - hw_dom = resctrl_to_arch_mon_dom(d); 651 + switch (r->rid) { 652 + case RDT_RESOURCE_L3: { 653 + struct rdt_hw_l3_mon_domain *hw_dom; 654 + struct rdt_l3_mon_domain *d; 682 655 683 - cpumask_clear_cpu(cpu, &d->hdr.cpu_mask); 684 - if (cpumask_empty(&d->hdr.cpu_mask)) { 685 - resctrl_offline_mon_domain(r, d); 686 - list_del_rcu(&d->hdr.list); 656 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 657 + return; 658 + 659 + d = container_of(hdr, struct rdt_l3_mon_domain, hdr); 660 + hw_dom = resctrl_to_arch_mon_dom(d); 661 + resctrl_offline_mon_domain(r, hdr); 662 + list_del_rcu(&hdr->list); 687 663 synchronize_rcu(); 688 - mon_domain_free(hw_dom); 664 + l3_mon_domain_free(hw_dom); 665 + break; 666 + } 667 + case RDT_RESOURCE_PERF_PKG: { 668 + struct rdt_perf_pkg_mon_domain *pkgd; 689 669 690 - return; 670 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_PERF_PKG)) 671 + return; 672 + 673 + pkgd = container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr); 674 + resctrl_offline_mon_domain(r, hdr); 675 + list_del_rcu(&hdr->list); 676 + synchronize_rcu(); 677 + kfree(pkgd); 678 + break; 679 + } 680 + default: 681 + pr_warn_once("Unknown resource rid=%d\n", r->rid); 682 + break; 691 683 } 692 684 } 693 685 ··· 762 710 clear_closid_rmid(cpu); 763 711 764 712 return 0; 713 + } 714 + 715 + void resctrl_arch_pre_mount(void) 716 + { 717 + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl; 718 + int cpu; 719 + 720 + if (!intel_aet_get_events()) 721 + return; 722 + 723 + /* 724 + * Late discovery of telemetry events means the domains for the 725 + * resource were not built. Do that now. 726 + */ 727 + cpus_read_lock(); 728 + mutex_lock(&domain_list_lock); 729 + r->mon_capable = true; 730 + rdt_mon_capable = true; 731 + for_each_online_cpu(cpu) 732 + domain_add_cpu_mon(cpu, r); 733 + mutex_unlock(&domain_list_lock); 734 + cpus_read_unlock(); 765 735 } 766 736 767 737 enum { ··· 841 767 force_off = *tok == '!'; 842 768 if (force_off) 843 769 tok++; 770 + if (intel_handle_aet_option(force_off, tok)) 771 + continue; 844 772 for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) { 845 773 if (strcmp(tok, o->name) == 0) { 846 774 if (force_off) ··· 965 889 bool ret = false; 966 890 967 891 if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) { 968 - resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID); 892 + resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL); 969 893 ret = true; 970 894 } 971 895 if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) { 972 - resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID); 896 + resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL); 973 897 ret = true; 974 898 } 975 899 if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) { 976 - resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID); 900 + resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0, NULL); 977 901 ret = true; 978 902 } 979 903 if (rdt_cpu_has(X86_FEATURE_ABMC)) ··· 982 906 if (!ret) 983 907 return false; 984 908 985 - return !rdt_get_mon_l3_config(r); 909 + return !rdt_get_l3_mon_config(r); 986 910 } 987 911 988 912 static __init void __check_quirks_intel(void) ··· 1160 1084 1161 1085 static void __exit resctrl_arch_exit(void) 1162 1086 { 1087 + intel_aet_exit(); 1088 + 1163 1089 cpuhp_remove_state(rdt_online); 1164 1090 1165 1091 resctrl_exit();
+409
arch/x86/kernel/cpu/resctrl/intel_aet.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Resource Director Technology(RDT) 4 + * - Intel Application Energy Telemetry 5 + * 6 + * Copyright (C) 2025 Intel Corporation 7 + * 8 + * Author: 9 + * Tony Luck <tony.luck@intel.com> 10 + */ 11 + 12 + #define pr_fmt(fmt) "resctrl: " fmt 13 + 14 + #include <linux/bits.h> 15 + #include <linux/compiler_types.h> 16 + #include <linux/container_of.h> 17 + #include <linux/cpumask.h> 18 + #include <linux/err.h> 19 + #include <linux/errno.h> 20 + #include <linux/gfp_types.h> 21 + #include <linux/init.h> 22 + #include <linux/intel_pmt_features.h> 23 + #include <linux/intel_vsec.h> 24 + #include <linux/io.h> 25 + #include <linux/minmax.h> 26 + #include <linux/printk.h> 27 + #include <linux/rculist.h> 28 + #include <linux/rcupdate.h> 29 + #include <linux/resctrl.h> 30 + #include <linux/resctrl_types.h> 31 + #include <linux/slab.h> 32 + #include <linux/stddef.h> 33 + #include <linux/topology.h> 34 + #include <linux/types.h> 35 + 36 + #include "internal.h" 37 + 38 + /** 39 + * struct pmt_event - Telemetry event. 40 + * @id: Resctrl event id. 41 + * @idx: Counter index within each per-RMID block of counters. 42 + * @bin_bits: Zero for integer valued events, else number bits in fraction 43 + * part of fixed-point. 44 + */ 45 + struct pmt_event { 46 + enum resctrl_event_id id; 47 + unsigned int idx; 48 + unsigned int bin_bits; 49 + }; 50 + 51 + #define EVT(_id, _idx, _bits) { .id = _id, .idx = _idx, .bin_bits = _bits } 52 + 53 + /** 54 + * struct event_group - Events with the same feature type ("energy" or "perf") and GUID. 55 + * @pfname: PMT feature name ("energy" or "perf") of this event group. 56 + * Used by boot rdt= option. 57 + * @pfg: Points to the aggregated telemetry space information 58 + * returned by the intel_pmt_get_regions_by_feature() 59 + * call to the INTEL_PMT_TELEMETRY driver that contains 60 + * data for all telemetry regions of type @pfname. 61 + * Valid if the system supports the event group, 62 + * NULL otherwise. 63 + * @force_off: True when "rdt" command line or architecture code disables 64 + * this event group due to insufficient RMIDs. 65 + * @force_on: True when "rdt" command line overrides disable of this 66 + * event group. 67 + * @guid: Unique number per XML description file. 68 + * @num_rmid: Number of RMIDs supported by this group. May be 69 + * adjusted downwards if enumeration from 70 + * intel_pmt_get_regions_by_feature() indicates fewer 71 + * RMIDs can be tracked simultaneously. 72 + * @mmio_size: Number of bytes of MMIO registers for this group. 73 + * @num_events: Number of events in this group. 74 + * @evts: Array of event descriptors. 75 + */ 76 + struct event_group { 77 + /* Data fields for additional structures to manage this group. */ 78 + const char *pfname; 79 + struct pmt_feature_group *pfg; 80 + bool force_off, force_on; 81 + 82 + /* Remaining fields initialized from XML file. */ 83 + u32 guid; 84 + u32 num_rmid; 85 + size_t mmio_size; 86 + unsigned int num_events; 87 + struct pmt_event evts[] __counted_by(num_events); 88 + }; 89 + 90 + #define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status) \ 91 + (((num_rmids) * (num_events) + (num_extra_status)) * sizeof(u64)) 92 + 93 + /* 94 + * Link: https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml 95 + */ 96 + static struct event_group energy_0x26696143 = { 97 + .pfname = "energy", 98 + .guid = 0x26696143, 99 + .num_rmid = 576, 100 + .mmio_size = XML_MMIO_SIZE(576, 2, 3), 101 + .num_events = 2, 102 + .evts = { 103 + EVT(PMT_EVENT_ENERGY, 0, 18), 104 + EVT(PMT_EVENT_ACTIVITY, 1, 18), 105 + } 106 + }; 107 + 108 + /* 109 + * Link: https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml 110 + */ 111 + static struct event_group perf_0x26557651 = { 112 + .pfname = "perf", 113 + .guid = 0x26557651, 114 + .num_rmid = 576, 115 + .mmio_size = XML_MMIO_SIZE(576, 7, 3), 116 + .num_events = 7, 117 + .evts = { 118 + EVT(PMT_EVENT_STALLS_LLC_HIT, 0, 0), 119 + EVT(PMT_EVENT_C1_RES, 1, 0), 120 + EVT(PMT_EVENT_UNHALTED_CORE_CYCLES, 2, 0), 121 + EVT(PMT_EVENT_STALLS_LLC_MISS, 3, 0), 122 + EVT(PMT_EVENT_AUTO_C6_RES, 4, 0), 123 + EVT(PMT_EVENT_UNHALTED_REF_CYCLES, 5, 0), 124 + EVT(PMT_EVENT_UOPS_RETIRED, 6, 0), 125 + } 126 + }; 127 + 128 + static struct event_group *known_event_groups[] = { 129 + &energy_0x26696143, 130 + &perf_0x26557651, 131 + }; 132 + 133 + #define for_each_event_group(_peg) \ 134 + for (_peg = known_event_groups; \ 135 + _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)]; \ 136 + _peg++) 137 + 138 + bool intel_handle_aet_option(bool force_off, char *tok) 139 + { 140 + struct event_group **peg; 141 + bool ret = false; 142 + u32 guid = 0; 143 + char *name; 144 + 145 + if (!tok) 146 + return false; 147 + 148 + name = strsep(&tok, ":"); 149 + if (tok && kstrtou32(tok, 16, &guid)) 150 + return false; 151 + 152 + for_each_event_group(peg) { 153 + if (strcmp(name, (*peg)->pfname)) 154 + continue; 155 + if (guid && (*peg)->guid != guid) 156 + continue; 157 + if (force_off) 158 + (*peg)->force_off = true; 159 + else 160 + (*peg)->force_on = true; 161 + ret = true; 162 + } 163 + 164 + return ret; 165 + } 166 + 167 + static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e) 168 + { 169 + if (tr->guid != e->guid) 170 + return true; 171 + if (tr->plat_info.package_id >= topology_max_packages()) { 172 + pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id, 173 + tr->guid); 174 + return true; 175 + } 176 + if (tr->size != e->mmio_size) { 177 + pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n", 178 + tr->size, e->guid, e->mmio_size); 179 + return true; 180 + } 181 + 182 + return false; 183 + } 184 + 185 + static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_group *p) 186 + { 187 + bool usable_regions = false; 188 + 189 + for (int i = 0; i < p->count; i++) { 190 + if (skip_telem_region(&p->regions[i], e)) { 191 + /* 192 + * Clear the address field of regions that did not pass the checks in 193 + * skip_telem_region() so they will not be used by intel_aet_read_event(). 194 + * This is safe to do because intel_pmt_get_regions_by_feature() allocates 195 + * a new pmt_feature_group structure to return to each caller and only makes 196 + * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group() 197 + * returns the structure. 198 + */ 199 + p->regions[i].addr = NULL; 200 + 201 + continue; 202 + } 203 + usable_regions = true; 204 + } 205 + 206 + return usable_regions; 207 + } 208 + 209 + static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p) 210 + { 211 + struct telemetry_region *tr; 212 + 213 + for (int i = 0; i < p->count; i++) { 214 + if (!p->regions[i].addr) 215 + continue; 216 + tr = &p->regions[i]; 217 + if (tr->num_rmids < e->num_rmid) { 218 + e->force_off = true; 219 + return false; 220 + } 221 + } 222 + 223 + return true; 224 + } 225 + 226 + static bool enable_events(struct event_group *e, struct pmt_feature_group *p) 227 + { 228 + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl; 229 + int skipped_events = 0; 230 + 231 + if (e->force_off) 232 + return false; 233 + 234 + if (!group_has_usable_regions(e, p)) 235 + return false; 236 + 237 + /* 238 + * Only enable event group with insufficient RMIDs if the user requested 239 + * it from the kernel command line. 240 + */ 241 + if (!all_regions_have_sufficient_rmid(e, p) && !e->force_on) { 242 + pr_info("%s %s:0x%x monitoring not enabled due to insufficient RMIDs\n", 243 + r->name, e->pfname, e->guid); 244 + return false; 245 + } 246 + 247 + for (int i = 0; i < p->count; i++) { 248 + if (!p->regions[i].addr) 249 + continue; 250 + /* 251 + * e->num_rmid only adjusted lower if user (via rdt= kernel 252 + * parameter) forces an event group with insufficient RMID 253 + * to be enabled. 254 + */ 255 + e->num_rmid = min(e->num_rmid, p->regions[i].num_rmids); 256 + } 257 + 258 + for (int j = 0; j < e->num_events; j++) { 259 + if (!resctrl_enable_mon_event(e->evts[j].id, true, 260 + e->evts[j].bin_bits, &e->evts[j])) 261 + skipped_events++; 262 + } 263 + if (e->num_events == skipped_events) { 264 + pr_info("No events enabled in %s %s:0x%x\n", r->name, e->pfname, e->guid); 265 + return false; 266 + } 267 + 268 + if (r->mon.num_rmid) 269 + r->mon.num_rmid = min(r->mon.num_rmid, e->num_rmid); 270 + else 271 + r->mon.num_rmid = e->num_rmid; 272 + 273 + if (skipped_events) 274 + pr_info("%s %s:0x%x monitoring detected (skipped %d events)\n", r->name, 275 + e->pfname, e->guid, skipped_events); 276 + else 277 + pr_info("%s %s:0x%x monitoring detected\n", r->name, e->pfname, e->guid); 278 + 279 + return true; 280 + } 281 + 282 + static enum pmt_feature_id lookup_pfid(const char *pfname) 283 + { 284 + if (!strcmp(pfname, "energy")) 285 + return FEATURE_PER_RMID_ENERGY_TELEM; 286 + else if (!strcmp(pfname, "perf")) 287 + return FEATURE_PER_RMID_PERF_TELEM; 288 + 289 + pr_warn("Unknown PMT feature name '%s'\n", pfname); 290 + 291 + return FEATURE_INVALID; 292 + } 293 + 294 + /* 295 + * Request a copy of struct pmt_feature_group for each event group. If there is 296 + * one, the returned structure has an array of telemetry_region structures, 297 + * each element of the array describes one telemetry aggregator. The 298 + * telemetry aggregators may have different GUIDs so obtain duplicate struct 299 + * pmt_feature_group for event groups with same feature type but different 300 + * GUID. Post-processing ensures an event group can only use the telemetry 301 + * aggregators that match its GUID. An event group keeps a pointer to its 302 + * struct pmt_feature_group to indicate that its events are successfully 303 + * enabled. 304 + */ 305 + bool intel_aet_get_events(void) 306 + { 307 + struct pmt_feature_group *p; 308 + enum pmt_feature_id pfid; 309 + struct event_group **peg; 310 + bool ret = false; 311 + 312 + for_each_event_group(peg) { 313 + pfid = lookup_pfid((*peg)->pfname); 314 + p = intel_pmt_get_regions_by_feature(pfid); 315 + if (IS_ERR_OR_NULL(p)) 316 + continue; 317 + if (enable_events(*peg, p)) { 318 + (*peg)->pfg = p; 319 + ret = true; 320 + } else { 321 + intel_pmt_put_feature_group(p); 322 + } 323 + } 324 + 325 + return ret; 326 + } 327 + 328 + void __exit intel_aet_exit(void) 329 + { 330 + struct event_group **peg; 331 + 332 + for_each_event_group(peg) { 333 + if ((*peg)->pfg) { 334 + intel_pmt_put_feature_group((*peg)->pfg); 335 + (*peg)->pfg = NULL; 336 + } 337 + } 338 + } 339 + 340 + #define DATA_VALID BIT_ULL(63) 341 + #define DATA_BITS GENMASK_ULL(62, 0) 342 + 343 + /* 344 + * Read counter for an event on a domain (summing all aggregators on the 345 + * domain). If an aggregator hasn't received any data for a specific RMID, 346 + * the MMIO read indicates that data is not valid. Return success if at 347 + * least one aggregator has valid data. 348 + */ 349 + int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val) 350 + { 351 + struct pmt_event *pevt = arch_priv; 352 + struct event_group *e; 353 + bool valid = false; 354 + u64 total = 0; 355 + u64 evtcount; 356 + void *pevt0; 357 + u32 idx; 358 + 359 + pevt0 = pevt - pevt->idx; 360 + e = container_of(pevt0, struct event_group, evts); 361 + idx = rmid * e->num_events; 362 + idx += pevt->idx; 363 + 364 + if (idx * sizeof(u64) + sizeof(u64) > e->mmio_size) { 365 + pr_warn_once("MMIO index %u out of range\n", idx); 366 + return -EIO; 367 + } 368 + 369 + for (int i = 0; i < e->pfg->count; i++) { 370 + if (!e->pfg->regions[i].addr) 371 + continue; 372 + if (e->pfg->regions[i].plat_info.package_id != domid) 373 + continue; 374 + evtcount = readq(e->pfg->regions[i].addr + idx * sizeof(u64)); 375 + if (!(evtcount & DATA_VALID)) 376 + continue; 377 + total += evtcount & DATA_BITS; 378 + valid = true; 379 + } 380 + 381 + if (valid) 382 + *val = total; 383 + 384 + return valid ? 0 : -EINVAL; 385 + } 386 + 387 + void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r, 388 + struct list_head *add_pos) 389 + { 390 + struct rdt_perf_pkg_mon_domain *d; 391 + int err; 392 + 393 + d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu)); 394 + if (!d) 395 + return; 396 + 397 + d->hdr.id = id; 398 + d->hdr.type = RESCTRL_MON_DOMAIN; 399 + d->hdr.rid = RDT_RESOURCE_PERF_PKG; 400 + cpumask_set_cpu(cpu, &d->hdr.cpu_mask); 401 + list_add_tail_rcu(&d->hdr.list, add_pos); 402 + 403 + err = resctrl_online_mon_domain(r, &d->hdr); 404 + if (err) { 405 + list_del_rcu(&d->hdr.list); 406 + synchronize_rcu(); 407 + kfree(d); 408 + } 409 + }
+37 -9
arch/x86/kernel/cpu/resctrl/internal.h
··· 66 66 }; 67 67 68 68 /** 69 - * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share 70 - * a resource for a monitor function 71 - * @d_resctrl: Properties exposed to the resctrl file system 69 + * struct rdt_hw_l3_mon_domain - Arch private attributes of a set of CPUs sharing 70 + * RDT_RESOURCE_L3 monitoring 71 + * @d_resctrl: Properties exposed to the resctrl file system 72 72 * @arch_mbm_states: Per-event pointer to the MBM event's saved state. 73 73 * An MBM event's state is an array of struct arch_mbm_state 74 74 * indexed by RMID on x86. 75 75 * 76 76 * Members of this structure are accessed via helpers that provide abstraction. 77 77 */ 78 - struct rdt_hw_mon_domain { 79 - struct rdt_mon_domain d_resctrl; 78 + struct rdt_hw_l3_mon_domain { 79 + struct rdt_l3_mon_domain d_resctrl; 80 80 struct arch_mbm_state *arch_mbm_states[QOS_NUM_L3_MBM_EVENTS]; 81 81 }; 82 82 ··· 85 85 return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl); 86 86 } 87 87 88 - static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r) 88 + static inline struct rdt_hw_l3_mon_domain *resctrl_to_arch_mon_dom(struct rdt_l3_mon_domain *r) 89 89 { 90 - return container_of(r, struct rdt_hw_mon_domain, d_resctrl); 90 + return container_of(r, struct rdt_hw_l3_mon_domain, d_resctrl); 91 91 } 92 + 93 + /** 94 + * struct rdt_perf_pkg_mon_domain - CPUs sharing an package scoped resctrl monitor resource 95 + * @hdr: common header for different domain types 96 + */ 97 + struct rdt_perf_pkg_mon_domain { 98 + struct rdt_domain_hdr hdr; 99 + }; 92 100 93 101 /** 94 102 * struct msr_param - set a range of MSRs from a domain ··· 151 143 152 144 extern struct rdt_hw_resource rdt_resources_all[]; 153 145 154 - void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d); 146 + void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d); 155 147 156 148 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */ 157 149 union cpuid_0x10_1_eax { ··· 224 216 225 217 void rdt_ctrl_update(void *arg); 226 218 227 - int rdt_get_mon_l3_config(struct rdt_resource *r); 219 + int rdt_get_l3_mon_config(struct rdt_resource *r); 228 220 229 221 bool rdt_cpu_has(int flag); 230 222 ··· 232 224 233 225 void rdt_domain_reconfigure_cdp(struct rdt_resource *r); 234 226 void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r); 227 + 228 + #ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET 229 + bool intel_aet_get_events(void); 230 + void __exit intel_aet_exit(void); 231 + int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val); 232 + void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r, 233 + struct list_head *add_pos); 234 + bool intel_handle_aet_option(bool force_off, char *tok); 235 + #else 236 + static inline bool intel_aet_get_events(void) { return false; } 237 + static inline void __exit intel_aet_exit(void) { } 238 + static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val) 239 + { 240 + return -EINVAL; 241 + } 242 + 243 + static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r, 244 + struct list_head *add_pos) { } 245 + static inline bool intel_handle_aet_option(bool force_off, char *tok) { return false; } 246 + #endif 235 247 236 248 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
+30 -20
arch/x86/kernel/cpu/resctrl/monitor.c
··· 109 109 * 110 110 * In RMID sharing mode there are fewer "logical RMID" values available 111 111 * to accumulate data ("physical RMIDs" are divided evenly between SNC 112 - * nodes that share an L3 cache). Linux creates an rdt_mon_domain for 112 + * nodes that share an L3 cache). Linux creates an rdt_l3_mon_domain for 113 113 * each SNC node. 114 114 * 115 115 * The value loaded into IA32_PQR_ASSOC is the "logical RMID". ··· 157 157 return 0; 158 158 } 159 159 160 - static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom, 160 + static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_l3_mon_domain *hw_dom, 161 161 u32 rmid, 162 162 enum resctrl_event_id eventid) 163 163 { ··· 171 171 return state ? &state[rmid] : NULL; 172 172 } 173 173 174 - void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d, 174 + void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 175 175 u32 unused, u32 rmid, 176 176 enum resctrl_event_id eventid) 177 177 { 178 - struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 178 + struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 179 179 int cpu = cpumask_any(&d->hdr.cpu_mask); 180 180 struct arch_mbm_state *am; 181 181 u32 prmid; ··· 194 194 * Assumes that hardware counters are also reset and thus that there is 195 195 * no need to record initial non-zero counts. 196 196 */ 197 - void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d) 197 + void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d) 198 198 { 199 - struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 199 + struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 200 200 enum resctrl_event_id eventid; 201 201 int idx; 202 202 ··· 217 217 return chunks >> shift; 218 218 } 219 219 220 - static u64 get_corrected_val(struct rdt_resource *r, struct rdt_mon_domain *d, 220 + static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 221 221 u32 rmid, enum resctrl_event_id eventid, u64 msr_val) 222 222 { 223 - struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 223 + struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 224 224 struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); 225 225 struct arch_mbm_state *am; 226 226 u64 chunks; ··· 238 238 return chunks * hw_res->mon_scale; 239 239 } 240 240 241 - int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d, 241 + int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr, 242 242 u32 unused, u32 rmid, enum resctrl_event_id eventid, 243 - u64 *val, void *ignored) 243 + void *arch_priv, u64 *val, void *ignored) 244 244 { 245 - struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 246 - int cpu = cpumask_any(&d->hdr.cpu_mask); 245 + struct rdt_hw_l3_mon_domain *hw_dom; 246 + struct rdt_l3_mon_domain *d; 247 247 struct arch_mbm_state *am; 248 248 u64 msr_val; 249 249 u32 prmid; 250 + int cpu; 250 251 int ret; 251 252 252 253 resctrl_arch_rmid_read_context_check(); 253 254 255 + if (r->rid == RDT_RESOURCE_PERF_PKG) 256 + return intel_aet_read_event(hdr->id, rmid, arch_priv, val); 257 + 258 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 259 + return -EINVAL; 260 + 261 + d = container_of(hdr, struct rdt_l3_mon_domain, hdr); 262 + hw_dom = resctrl_to_arch_mon_dom(d); 263 + cpu = cpumask_any(&hdr->cpu_mask); 254 264 prmid = logical_rmid_to_physical_rmid(cpu, rmid); 255 265 ret = __rmid_read_phys(prmid, eventid, &msr_val); 256 266 ··· 312 302 return 0; 313 303 } 314 304 315 - void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 305 + void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 316 306 u32 unused, u32 rmid, int cntr_id, 317 307 enum resctrl_event_id eventid) 318 308 { 319 - struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 309 + struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 320 310 struct arch_mbm_state *am; 321 311 322 312 am = get_arch_mbm_state(hw_dom, rmid, eventid); ··· 328 318 } 329 319 } 330 320 331 - int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d, 321 + int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 332 322 u32 unused, u32 rmid, int cntr_id, 333 323 enum resctrl_event_id eventid, u64 *val) 334 324 { ··· 358 348 * must adjust RMID counter numbers based on SNC node. See 359 349 * logical_rmid_to_physical_rmid() for code that does this. 360 350 */ 361 - void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d) 351 + void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d) 362 352 { 363 353 if (snc_nodes_per_l3_cache > 1) 364 354 msr_clear_bit(MSR_RMID_SNC_CONFIG, 0); ··· 428 418 return ret; 429 419 } 430 420 431 - int __init rdt_get_mon_l3_config(struct rdt_resource *r) 421 + int __init rdt_get_l3_mon_config(struct rdt_resource *r) 432 422 { 433 423 unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset; 434 424 struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); ··· 520 510 */ 521 511 static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable) 522 512 { 523 - struct rdt_mon_domain *d; 513 + struct rdt_l3_mon_domain *d; 524 514 525 515 lockdep_assert_cpus_held(); 526 516 ··· 559 549 /* 560 550 * Send an IPI to the domain to assign the counter to RMID, event pair. 561 551 */ 562 - void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 552 + void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 563 553 enum resctrl_event_id evtid, u32 rmid, u32 closid, 564 554 u32 cntr_id, bool assign) 565 555 { 566 - struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 556 + struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); 567 557 union l3_qos_abmc_cfg abmc_cfg = { 0 }; 568 558 struct arch_mbm_state *am; 569 559
+99 -14
fs/resctrl/ctrlmondata.c
··· 17 17 18 18 #include <linux/cpu.h> 19 19 #include <linux/kernfs.h> 20 + #include <linux/math.h> 20 21 #include <linux/seq_file.h> 21 22 #include <linux/slab.h> 22 23 #include <linux/tick.h> ··· 552 551 } 553 552 554 553 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, 555 - struct rdt_mon_domain *d, struct rdtgroup *rdtgrp, 556 - cpumask_t *cpumask, int evtid, int first) 554 + struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp, 555 + cpumask_t *cpumask, struct mon_evt *evt, int first) 557 556 { 558 557 int cpu; 559 558 ··· 564 563 * Setup the parameters to pass to mon_event_count() to read the data. 565 564 */ 566 565 rr->rgrp = rdtgrp; 567 - rr->evtid = evtid; 566 + rr->evt = evt; 568 567 rr->r = r; 569 - rr->d = d; 568 + rr->hdr = hdr; 570 569 rr->first = first; 571 570 if (resctrl_arch_mbm_cntr_assign_enabled(r) && 572 - resctrl_is_mbm_event(evtid)) { 571 + resctrl_is_mbm_event(evt->evtid)) { 573 572 rr->is_mbm_cntr = true; 574 573 } else { 575 - rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid); 574 + rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evt->evtid); 576 575 if (IS_ERR(rr->arch_mon_ctx)) { 577 576 rr->err = -EINVAL; 578 577 return; 579 578 } 579 + } 580 + 581 + if (evt->any_cpu) { 582 + mon_event_count(rr); 583 + goto out_ctx_free; 580 584 } 581 585 582 586 cpu = cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU); ··· 597 591 else 598 592 smp_call_on_cpu(cpu, smp_mon_event_count, rr, false); 599 593 594 + out_ctx_free: 600 595 if (rr->arch_mon_ctx) 601 - resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx); 596 + resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx); 597 + } 598 + 599 + /* 600 + * Decimal place precision to use for each number of fixed-point 601 + * binary bits computed from ceil(binary_bits * log10(2)) except 602 + * binary_bits == 0 which will print "value.0" 603 + */ 604 + static const unsigned int decplaces[MAX_BINARY_BITS + 1] = { 605 + [0] = 1, 606 + [1] = 1, 607 + [2] = 1, 608 + [3] = 1, 609 + [4] = 2, 610 + [5] = 2, 611 + [6] = 2, 612 + [7] = 3, 613 + [8] = 3, 614 + [9] = 3, 615 + [10] = 4, 616 + [11] = 4, 617 + [12] = 4, 618 + [13] = 4, 619 + [14] = 5, 620 + [15] = 5, 621 + [16] = 5, 622 + [17] = 6, 623 + [18] = 6, 624 + [19] = 6, 625 + [20] = 7, 626 + [21] = 7, 627 + [22] = 7, 628 + [23] = 7, 629 + [24] = 8, 630 + [25] = 8, 631 + [26] = 8, 632 + [27] = 9 633 + }; 634 + 635 + static void print_event_value(struct seq_file *m, unsigned int binary_bits, u64 val) 636 + { 637 + unsigned long long frac = 0; 638 + 639 + if (binary_bits) { 640 + /* Mask off the integer part of the fixed-point value. */ 641 + frac = val & GENMASK_ULL(binary_bits - 1, 0); 642 + 643 + /* 644 + * Multiply by 10^{desired decimal places}. The integer part of 645 + * the fixed point value is now almost what is needed. 646 + */ 647 + frac *= int_pow(10ull, decplaces[binary_bits]); 648 + 649 + /* 650 + * Round to nearest by adding a value that would be a "1" in the 651 + * binary_bits + 1 place. Integer part of fixed point value is 652 + * now the needed value. 653 + */ 654 + frac += 1ull << (binary_bits - 1); 655 + 656 + /* 657 + * Extract the integer part of the value. This is the decimal 658 + * representation of the original fixed-point fractional value. 659 + */ 660 + frac >>= binary_bits; 661 + } 662 + 663 + /* 664 + * "frac" is now in the range [0 .. 10^decplaces). I.e. string 665 + * representation will fit into chosen number of decimal places. 666 + */ 667 + seq_printf(m, "%llu.%0*llu\n", val >> binary_bits, decplaces[binary_bits], frac); 602 668 } 603 669 604 670 int rdtgroup_mondata_show(struct seq_file *m, void *arg) 605 671 { 606 672 struct kernfs_open_file *of = m->private; 607 673 enum resctrl_res_level resid; 608 - enum resctrl_event_id evtid; 609 674 struct rdt_domain_hdr *hdr; 610 675 struct rmid_read rr = {0}; 611 - struct rdt_mon_domain *d; 612 676 struct rdtgroup *rdtgrp; 613 677 int domid, cpu, ret = 0; 614 678 struct rdt_resource *r; 615 679 struct cacheinfo *ci; 680 + struct mon_evt *evt; 616 681 struct mon_data *md; 617 682 618 683 rdtgrp = rdtgroup_kn_lock_live(of->kn); ··· 700 623 701 624 resid = md->rid; 702 625 domid = md->domid; 703 - evtid = md->evtid; 626 + evt = md->evt; 704 627 r = resctrl_arch_get_resource(resid); 705 628 706 629 if (md->sum) { 630 + struct rdt_l3_mon_domain *d; 631 + 632 + if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3)) { 633 + ret = -EINVAL; 634 + goto out; 635 + } 636 + 707 637 /* 708 638 * This file requires summing across all domains that share 709 639 * the L3 cache id that was provided in the "domid" field of the ··· 725 641 continue; 726 642 rr.ci = ci; 727 643 mon_event_read(&rr, r, NULL, rdtgrp, 728 - &ci->shared_cpu_map, evtid, false); 644 + &ci->shared_cpu_map, evt, false); 729 645 goto checkresult; 730 646 } 731 647 } ··· 737 653 * the resource to find the domain with "domid". 738 654 */ 739 655 hdr = resctrl_find_domain(&r->mon_domains, domid, NULL); 740 - if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) { 656 + if (!hdr) { 741 657 ret = -ENOENT; 742 658 goto out; 743 659 } 744 - d = container_of(hdr, struct rdt_mon_domain, hdr); 745 - mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false); 660 + mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evt, false); 746 661 } 747 662 748 663 checkresult: ··· 756 673 seq_puts(m, "Unavailable\n"); 757 674 else if (rr.err == -ENOENT) 758 675 seq_puts(m, "Unassigned\n"); 676 + else if (evt->is_floating_point) 677 + print_event_value(m, evt->binary_bits, rr.val); 759 678 else 760 679 seq_printf(m, "%llu\n", rr.val); 761 680
+45 -23
fs/resctrl/internal.h
··· 61 61 * READS_TO_REMOTE_MEM) being tracked by @evtid. 62 62 * Only valid if @evtid is an MBM event. 63 63 * @configurable: true if the event is configurable 64 + * @any_cpu: true if the event can be read from any CPU 65 + * @is_floating_point: event values are displayed in floating point format 66 + * @binary_bits: number of fixed-point binary bits from architecture, 67 + * only valid if @is_floating_point is true 64 68 * @enabled: true if the event is enabled 69 + * @arch_priv: Architecture private data for this event. 70 + * The @arch_priv provided by the architecture via 71 + * resctrl_enable_mon_event(). 65 72 */ 66 73 struct mon_evt { 67 74 enum resctrl_event_id evtid; ··· 76 69 char *name; 77 70 u32 evt_cfg; 78 71 bool configurable; 72 + bool any_cpu; 73 + bool is_floating_point; 74 + unsigned int binary_bits; 79 75 bool enabled; 76 + void *arch_priv; 80 77 }; 81 78 82 79 extern struct mon_evt mon_event_all[QOS_NUM_EVENTS]; ··· 88 77 #define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT]; \ 89 78 mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++) 90 79 80 + /* Limit for mon_evt::binary_bits */ 81 + #define MAX_BINARY_BITS 27 82 + 91 83 /** 92 84 * struct mon_data - Monitoring details for each event file. 93 85 * @list: Member of the global @mon_data_kn_priv_list list. 94 86 * @rid: Resource id associated with the event file. 95 - * @evtid: Event id associated with the event file. 96 - * @sum: Set when event must be summed across multiple 97 - * domains. 87 + * @evt: Event structure associated with the event file. 88 + * @sum: Set for RDT_RESOURCE_L3 when event must be summed 89 + * across multiple domains. 98 90 * @domid: When @sum is zero this is the domain to which 99 91 * the event file belongs. When @sum is one this 100 92 * is the id of the L3 cache that all domains to be ··· 109 95 struct mon_data { 110 96 struct list_head list; 111 97 enum resctrl_res_level rid; 112 - enum resctrl_event_id evtid; 98 + struct mon_evt *evt; 113 99 int domid; 114 100 bool sum; 115 101 }; ··· 120 106 * resource group then its event count is summed with the count from all 121 107 * its child resource groups. 122 108 * @r: Resource describing the properties of the event being read. 123 - * @d: Domain that the counter should be read from. If NULL then sum all 124 - * domains in @r sharing L3 @ci.id 125 - * @evtid: Which monitor event to read. 109 + * @hdr: Header of domain that the counter should be read from. If NULL then 110 + * sum all domains in @r sharing L3 @ci.id 111 + * @evt: Which monitor event to read. 126 112 * @first: Initialize MBM counter when true. 127 - * @ci: Cacheinfo for L3. Only set when @d is NULL. Used when summing domains. 113 + * @ci: Cacheinfo for L3. Only set when @hdr is NULL. Used when summing 114 + * domains. 128 115 * @is_mbm_cntr: true if "mbm_event" counter assignment mode is enabled and it 129 116 * is an MBM event. 130 117 * @err: Error encountered when reading counter. 131 - * @val: Returned value of event counter. If @rgrp is a parent resource group, 132 - * @val includes the sum of event counts from its child resource groups. 133 - * If @d is NULL, @val includes the sum of all domains in @r sharing @ci.id, 134 - * (summed across child resource groups if @rgrp is a parent resource group). 118 + * @val: Returned value of event counter. If @rgrp is a parent resource 119 + * group, @val includes the sum of event counts from its child 120 + * resource groups. If @hdr is NULL, @val includes the sum of all 121 + * domains in @r sharing @ci.id, (summed across child resource groups 122 + * if @rgrp is a parent resource group). 135 123 * @arch_mon_ctx: Hardware monitor allocated for this read request (MPAM only). 136 124 */ 137 125 struct rmid_read { 138 126 struct rdtgroup *rgrp; 139 127 struct rdt_resource *r; 140 - struct rdt_mon_domain *d; 141 - enum resctrl_event_id evtid; 128 + struct rdt_domain_hdr *hdr; 129 + struct mon_evt *evt; 142 130 bool first; 143 131 struct cacheinfo *ci; 144 132 bool is_mbm_cntr; ··· 259 243 260 244 #define RFTYPE_ASSIGN_CONFIG BIT(11) 261 245 246 + #define RFTYPE_RES_PERF_PKG BIT(12) 247 + 262 248 #define RFTYPE_CTRL_INFO (RFTYPE_INFO | RFTYPE_CTRL) 263 249 264 250 #define RFTYPE_MON_INFO (RFTYPE_INFO | RFTYPE_MON) ··· 369 351 370 352 void closid_free(int closid); 371 353 354 + int setup_rmid_lru_list(void); 355 + 356 + void free_rmid_lru_list(void); 357 + 372 358 int alloc_rmid(u32 closid); 373 359 374 360 void free_rmid(u32 closid, u32 rmid); 375 361 376 - void resctrl_mon_resource_exit(void); 362 + int resctrl_l3_mon_resource_init(void); 363 + 364 + void resctrl_l3_mon_resource_exit(void); 377 365 378 366 void mon_event_count(void *info); 379 367 380 368 int rdtgroup_mondata_show(struct seq_file *m, void *arg); 381 369 382 370 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, 383 - struct rdt_mon_domain *d, struct rdtgroup *rdtgrp, 384 - cpumask_t *cpumask, int evtid, int first); 371 + struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp, 372 + cpumask_t *cpumask, struct mon_evt *evt, int first); 385 373 386 - int resctrl_mon_resource_init(void); 387 - 388 - void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, 374 + void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, 389 375 unsigned long delay_ms, 390 376 int exclude_cpu); 391 377 ··· 397 375 398 376 bool is_mba_sc(struct rdt_resource *r); 399 377 400 - void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms, 378 + void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms, 401 379 int exclude_cpu); 402 380 403 381 void cqm_handle_limbo(struct work_struct *work); 404 382 405 - bool has_busy_rmid(struct rdt_mon_domain *d); 383 + bool has_busy_rmid(struct rdt_l3_mon_domain *d); 406 384 407 - void __check_limbo(struct rdt_mon_domain *d, bool force_free); 385 + void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free); 408 386 409 387 void resctrl_file_fflags_init(const char *config, unsigned long fflags); 410 388
+228 -136
fs/resctrl/monitor.c
··· 130 130 * decrement the count. If the busy count gets to zero on an RMID, we 131 131 * free the RMID 132 132 */ 133 - void __check_limbo(struct rdt_mon_domain *d, bool force_free) 133 + void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free) 134 134 { 135 135 struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 136 136 u32 idx_limit = resctrl_arch_system_num_rmid_idx(); 137 137 struct rmid_entry *entry; 138 138 u32 idx, cur_idx = 1; 139 139 void *arch_mon_ctx; 140 + void *arch_priv; 140 141 bool rmid_dirty; 141 142 u64 val = 0; 142 143 144 + arch_priv = mon_event_all[QOS_L3_OCCUP_EVENT_ID].arch_priv; 143 145 arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, QOS_L3_OCCUP_EVENT_ID); 144 146 if (IS_ERR(arch_mon_ctx)) { 145 147 pr_warn_ratelimited("Failed to allocate monitor context: %ld", ··· 161 159 break; 162 160 163 161 entry = __rmid_entry(idx); 164 - if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid, 165 - QOS_L3_OCCUP_EVENT_ID, &val, 162 + if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid, 163 + QOS_L3_OCCUP_EVENT_ID, arch_priv, &val, 166 164 arch_mon_ctx)) { 167 165 rmid_dirty = true; 168 166 } else { ··· 190 188 resctrl_arch_mon_ctx_free(r, QOS_L3_OCCUP_EVENT_ID, arch_mon_ctx); 191 189 } 192 190 193 - bool has_busy_rmid(struct rdt_mon_domain *d) 191 + bool has_busy_rmid(struct rdt_l3_mon_domain *d) 194 192 { 195 193 u32 idx_limit = resctrl_arch_system_num_rmid_idx(); 196 194 ··· 291 289 static void add_rmid_to_limbo(struct rmid_entry *entry) 292 290 { 293 291 struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 294 - struct rdt_mon_domain *d; 292 + struct rdt_l3_mon_domain *d; 295 293 u32 idx; 296 294 297 295 lockdep_assert_held(&rdtgroup_mutex); ··· 344 342 list_add_tail(&entry->list, &rmid_free_lru); 345 343 } 346 344 347 - static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid, 345 + static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid, 348 346 u32 rmid, enum resctrl_event_id evtid) 349 347 { 350 348 u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid); ··· 364 362 * Return: 365 363 * Valid counter ID on success, or -ENOENT on failure. 366 364 */ 367 - static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d, 365 + static int mbm_cntr_get(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 368 366 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid) 369 367 { 370 368 int cntr_id; ··· 391 389 * Return: 392 390 * Valid counter ID on success, or -ENOSPC on failure. 393 391 */ 394 - static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d, 392 + static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 395 393 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid) 396 394 { 397 395 int cntr_id; ··· 410 408 /* 411 409 * mbm_cntr_free() - Clear the counter ID configuration details in the domain @d. 412 410 */ 413 - static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id) 411 + static void mbm_cntr_free(struct rdt_l3_mon_domain *d, int cntr_id) 414 412 { 415 413 memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg)); 416 414 } 417 415 418 - static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr) 416 + static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr) 419 417 { 420 418 int cpu = smp_processor_id(); 421 419 u32 closid = rdtgrp->closid; 422 420 u32 rmid = rdtgrp->mon.rmid; 423 - struct rdt_mon_domain *d; 421 + struct rdt_l3_mon_domain *d; 424 422 int cntr_id = -ENOENT; 425 423 struct mbm_state *m; 426 - int err, ret; 427 424 u64 tval = 0; 428 425 426 + if (!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) { 427 + rr->err = -EIO; 428 + return -EINVAL; 429 + } 430 + d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr); 431 + 429 432 if (rr->is_mbm_cntr) { 430 - cntr_id = mbm_cntr_get(rr->r, rr->d, rdtgrp, rr->evtid); 433 + cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evt->evtid); 431 434 if (cntr_id < 0) { 432 435 rr->err = -ENOENT; 433 436 return -EINVAL; ··· 441 434 442 435 if (rr->first) { 443 436 if (rr->is_mbm_cntr) 444 - resctrl_arch_reset_cntr(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid); 437 + resctrl_arch_reset_cntr(rr->r, d, closid, rmid, cntr_id, rr->evt->evtid); 445 438 else 446 - resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid); 447 - m = get_mbm_state(rr->d, closid, rmid, rr->evtid); 439 + resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evt->evtid); 440 + m = get_mbm_state(d, closid, rmid, rr->evt->evtid); 448 441 if (m) 449 442 memset(m, 0, sizeof(struct mbm_state)); 450 443 return 0; 451 444 } 452 445 453 - if (rr->d) { 454 - /* Reading a single domain, must be on a CPU in that domain. */ 455 - if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask)) 456 - return -EINVAL; 457 - if (rr->is_mbm_cntr) 458 - rr->err = resctrl_arch_cntr_read(rr->r, rr->d, closid, rmid, cntr_id, 459 - rr->evtid, &tval); 460 - else 461 - rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, 462 - rr->evtid, &tval, rr->arch_mon_ctx); 463 - if (rr->err) 464 - return rr->err; 446 + /* Reading a single domain, must be on a CPU in that domain. */ 447 + if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask)) 448 + return -EINVAL; 449 + if (rr->is_mbm_cntr) 450 + rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id, 451 + rr->evt->evtid, &tval); 452 + else 453 + rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid, 454 + rr->evt->evtid, rr->evt->arch_priv, 455 + &tval, rr->arch_mon_ctx); 456 + if (rr->err) 457 + return rr->err; 465 458 466 - rr->val += tval; 459 + rr->val += tval; 467 460 468 - return 0; 461 + return 0; 462 + } 463 + 464 + static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr) 465 + { 466 + int cpu = smp_processor_id(); 467 + u32 closid = rdtgrp->closid; 468 + u32 rmid = rdtgrp->mon.rmid; 469 + struct rdt_l3_mon_domain *d; 470 + u64 tval = 0; 471 + int err, ret; 472 + 473 + /* 474 + * Summing across domains is only done for systems that implement 475 + * Sub-NUMA Cluster. There is no overlap with systems that support 476 + * assignable counters. 477 + */ 478 + if (rr->is_mbm_cntr) { 479 + pr_warn_once("Summing domains using assignable counters is not supported\n"); 480 + rr->err = -EINVAL; 481 + return -EINVAL; 469 482 } 470 483 471 484 /* Summing domains that share a cache, must be on a CPU for that cache. */ ··· 503 476 list_for_each_entry(d, &rr->r->mon_domains, hdr.list) { 504 477 if (d->ci_id != rr->ci->id) 505 478 continue; 506 - if (rr->is_mbm_cntr) 507 - err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id, 508 - rr->evtid, &tval); 509 - else 510 - err = resctrl_arch_rmid_read(rr->r, d, closid, rmid, 511 - rr->evtid, &tval, rr->arch_mon_ctx); 479 + err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid, 480 + rr->evt->evtid, rr->evt->arch_priv, 481 + &tval, rr->arch_mon_ctx); 512 482 if (!err) { 513 483 rr->val += tval; 514 484 ret = 0; ··· 516 492 rr->err = ret; 517 493 518 494 return ret; 495 + } 496 + 497 + static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr) 498 + { 499 + switch (rr->r->rid) { 500 + case RDT_RESOURCE_L3: 501 + WARN_ON_ONCE(rr->evt->any_cpu); 502 + if (rr->hdr) 503 + return __l3_mon_event_count(rdtgrp, rr); 504 + else 505 + return __l3_mon_event_count_sum(rdtgrp, rr); 506 + case RDT_RESOURCE_PERF_PKG: { 507 + u64 tval = 0; 508 + 509 + rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, rdtgrp->closid, 510 + rdtgrp->mon.rmid, rr->evt->evtid, 511 + rr->evt->arch_priv, 512 + &tval, rr->arch_mon_ctx); 513 + if (rr->err) 514 + return rr->err; 515 + 516 + rr->val += tval; 517 + 518 + return 0; 519 + } 520 + default: 521 + rr->err = -EINVAL; 522 + return -EINVAL; 523 + } 519 524 } 520 525 521 526 /* ··· 564 511 u64 cur_bw, bytes, cur_bytes; 565 512 u32 closid = rdtgrp->closid; 566 513 u32 rmid = rdtgrp->mon.rmid; 514 + struct rdt_l3_mon_domain *d; 567 515 struct mbm_state *m; 568 516 569 - m = get_mbm_state(rr->d, closid, rmid, rr->evtid); 517 + if (!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 518 + return; 519 + d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr); 520 + m = get_mbm_state(d, closid, rmid, rr->evt->evtid); 570 521 if (WARN_ON_ONCE(!m)) 571 522 return; 572 523 ··· 669 612 * throttle MSRs already have low percentage values. To avoid 670 613 * unnecessarily restricting such rdtgroups, we also increase the bandwidth. 671 614 */ 672 - static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm) 615 + static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_l3_mon_domain *dom_mbm) 673 616 { 674 617 u32 closid, rmid, cur_msr_val, new_msr_val; 675 618 struct mbm_state *pmbm_data, *cmbm_data; ··· 737 680 resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val); 738 681 } 739 682 740 - static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d, 683 + static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 741 684 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid) 742 685 { 743 686 struct rmid_read rr = {0}; 744 687 745 688 rr.r = r; 746 - rr.d = d; 747 - rr.evtid = evtid; 689 + rr.hdr = &d->hdr; 690 + rr.evt = &mon_event_all[evtid]; 748 691 if (resctrl_arch_mbm_cntr_assign_enabled(r)) { 749 692 rr.is_mbm_cntr = true; 750 693 } else { 751 - rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid); 694 + rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, evtid); 752 695 if (IS_ERR(rr.arch_mon_ctx)) { 753 696 pr_warn_ratelimited("Failed to allocate monitor context: %ld", 754 697 PTR_ERR(rr.arch_mon_ctx)); ··· 766 709 mbm_bw_count(rdtgrp, &rr); 767 710 768 711 if (rr.arch_mon_ctx) 769 - resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx); 712 + resctrl_arch_mon_ctx_free(rr.r, evtid, rr.arch_mon_ctx); 770 713 } 771 714 772 - static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d, 715 + static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 773 716 struct rdtgroup *rdtgrp) 774 717 { 775 718 /* ··· 790 733 void cqm_handle_limbo(struct work_struct *work) 791 734 { 792 735 unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL); 793 - struct rdt_mon_domain *d; 736 + struct rdt_l3_mon_domain *d; 794 737 795 738 cpus_read_lock(); 796 739 mutex_lock(&rdtgroup_mutex); 797 740 798 - d = container_of(work, struct rdt_mon_domain, cqm_limbo.work); 741 + d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work); 799 742 800 743 __check_limbo(d, false); 801 744 ··· 818 761 * @exclude_cpu: Which CPU the handler should not run on, 819 762 * RESCTRL_PICK_ANY_CPU to pick any CPU. 820 763 */ 821 - void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms, 764 + void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms, 822 765 int exclude_cpu) 823 766 { 824 767 unsigned long delay = msecs_to_jiffies(delay_ms); ··· 835 778 { 836 779 unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL); 837 780 struct rdtgroup *prgrp, *crgrp; 838 - struct rdt_mon_domain *d; 781 + struct rdt_l3_mon_domain *d; 839 782 struct list_head *head; 840 783 struct rdt_resource *r; 841 784 ··· 850 793 goto out_unlock; 851 794 852 795 r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 853 - d = container_of(work, struct rdt_mon_domain, mbm_over.work); 796 + d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work); 854 797 855 798 list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 856 799 mbm_update(r, d, prgrp); ··· 884 827 * @exclude_cpu: Which CPU the handler should not run on, 885 828 * RESCTRL_PICK_ANY_CPU to pick any CPU. 886 829 */ 887 - void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms, 830 + void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms, 888 831 int exclude_cpu) 889 832 { 890 833 unsigned long delay = msecs_to_jiffies(delay_ms); ··· 903 846 schedule_delayed_work_on(cpu, &dom->mbm_over, delay); 904 847 } 905 848 906 - static int dom_data_init(struct rdt_resource *r) 849 + int setup_rmid_lru_list(void) 907 850 { 908 - u32 idx_limit = resctrl_arch_system_num_rmid_idx(); 909 - u32 num_closid = resctrl_arch_get_num_closid(r); 910 851 struct rmid_entry *entry = NULL; 911 - int err = 0, i; 852 + u32 idx_limit; 912 853 u32 idx; 854 + int i; 913 855 914 - mutex_lock(&rdtgroup_mutex); 915 - if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) { 916 - u32 *tmp; 856 + if (!resctrl_arch_mon_capable()) 857 + return 0; 917 858 918 - /* 919 - * If the architecture hasn't provided a sanitised value here, 920 - * this may result in larger arrays than necessary. Resctrl will 921 - * use a smaller system wide value based on the resources in 922 - * use. 923 - */ 924 - tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL); 925 - if (!tmp) { 926 - err = -ENOMEM; 927 - goto out_unlock; 928 - } 859 + /* 860 + * Called on every mount, but the number of RMIDs cannot change 861 + * after the first mount, so keep using the same set of rmid_ptrs[] 862 + * until resctrl_exit(). Note that the limbo handler continues to 863 + * access rmid_ptrs[] after resctrl is unmounted. 864 + */ 865 + if (rmid_ptrs) 866 + return 0; 929 867 930 - closid_num_dirty_rmid = tmp; 931 - } 932 - 868 + idx_limit = resctrl_arch_system_num_rmid_idx(); 933 869 rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL); 934 - if (!rmid_ptrs) { 935 - if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) { 936 - kfree(closid_num_dirty_rmid); 937 - closid_num_dirty_rmid = NULL; 938 - } 939 - err = -ENOMEM; 940 - goto out_unlock; 941 - } 870 + if (!rmid_ptrs) 871 + return -ENOMEM; 942 872 943 873 for (i = 0; i < idx_limit; i++) { 944 874 entry = &rmid_ptrs[i]; ··· 938 894 /* 939 895 * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are special and 940 896 * are always allocated. These are used for the rdtgroup_default 941 - * control group, which will be setup later in resctrl_init(). 897 + * control group, which was setup earlier in rdtgroup_setup_default(). 942 898 */ 943 899 idx = resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID, 944 900 RESCTRL_RESERVED_RMID); 945 901 entry = __rmid_entry(idx); 946 902 list_del(&entry->list); 947 903 948 - out_unlock: 949 - mutex_unlock(&rdtgroup_mutex); 950 - 951 - return err; 904 + return 0; 952 905 } 953 906 954 - static void dom_data_exit(struct rdt_resource *r) 907 + void free_rmid_lru_list(void) 955 908 { 909 + if (!resctrl_arch_mon_capable()) 910 + return; 911 + 956 912 mutex_lock(&rdtgroup_mutex); 957 - 958 - if (!r->mon_capable) 959 - goto out_unlock; 960 - 961 - if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) { 962 - kfree(closid_num_dirty_rmid); 963 - closid_num_dirty_rmid = NULL; 964 - } 965 - 966 913 kfree(rmid_ptrs); 967 914 rmid_ptrs = NULL; 968 - 969 - out_unlock: 970 915 mutex_unlock(&rdtgroup_mutex); 916 + } 917 + 918 + #define MON_EVENT(_eventid, _name, _res, _fp) \ 919 + [_eventid] = { \ 920 + .name = _name, \ 921 + .evtid = _eventid, \ 922 + .rid = _res, \ 923 + .is_floating_point = _fp, \ 971 924 } 972 925 973 926 /* ··· 973 932 * to set .enabled. 974 933 */ 975 934 struct mon_evt mon_event_all[QOS_NUM_EVENTS] = { 976 - [QOS_L3_OCCUP_EVENT_ID] = { 977 - .name = "llc_occupancy", 978 - .evtid = QOS_L3_OCCUP_EVENT_ID, 979 - .rid = RDT_RESOURCE_L3, 980 - }, 981 - [QOS_L3_MBM_TOTAL_EVENT_ID] = { 982 - .name = "mbm_total_bytes", 983 - .evtid = QOS_L3_MBM_TOTAL_EVENT_ID, 984 - .rid = RDT_RESOURCE_L3, 985 - }, 986 - [QOS_L3_MBM_LOCAL_EVENT_ID] = { 987 - .name = "mbm_local_bytes", 988 - .evtid = QOS_L3_MBM_LOCAL_EVENT_ID, 989 - .rid = RDT_RESOURCE_L3, 990 - }, 935 + MON_EVENT(QOS_L3_OCCUP_EVENT_ID, "llc_occupancy", RDT_RESOURCE_L3, false), 936 + MON_EVENT(QOS_L3_MBM_TOTAL_EVENT_ID, "mbm_total_bytes", RDT_RESOURCE_L3, false), 937 + MON_EVENT(QOS_L3_MBM_LOCAL_EVENT_ID, "mbm_local_bytes", RDT_RESOURCE_L3, false), 938 + MON_EVENT(PMT_EVENT_ENERGY, "core_energy", RDT_RESOURCE_PERF_PKG, true), 939 + MON_EVENT(PMT_EVENT_ACTIVITY, "activity", RDT_RESOURCE_PERF_PKG, true), 940 + MON_EVENT(PMT_EVENT_STALLS_LLC_HIT, "stalls_llc_hit", RDT_RESOURCE_PERF_PKG, false), 941 + MON_EVENT(PMT_EVENT_C1_RES, "c1_res", RDT_RESOURCE_PERF_PKG, false), 942 + MON_EVENT(PMT_EVENT_UNHALTED_CORE_CYCLES, "unhalted_core_cycles", RDT_RESOURCE_PERF_PKG, false), 943 + MON_EVENT(PMT_EVENT_STALLS_LLC_MISS, "stalls_llc_miss", RDT_RESOURCE_PERF_PKG, false), 944 + MON_EVENT(PMT_EVENT_AUTO_C6_RES, "c6_res", RDT_RESOURCE_PERF_PKG, false), 945 + MON_EVENT(PMT_EVENT_UNHALTED_REF_CYCLES, "unhalted_ref_cycles", RDT_RESOURCE_PERF_PKG, false), 946 + MON_EVENT(PMT_EVENT_UOPS_RETIRED, "uops_retired", RDT_RESOURCE_PERF_PKG, false), 991 947 }; 992 948 993 - void resctrl_enable_mon_event(enum resctrl_event_id eventid) 949 + bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, 950 + unsigned int binary_bits, void *arch_priv) 994 951 { 995 - if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS)) 996 - return; 952 + if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS || 953 + binary_bits > MAX_BINARY_BITS)) 954 + return false; 997 955 if (mon_event_all[eventid].enabled) { 998 956 pr_warn("Duplicate enable for event %d\n", eventid); 999 - return; 957 + return false; 958 + } 959 + if (binary_bits && !mon_event_all[eventid].is_floating_point) { 960 + pr_warn("Event %d may not be floating point\n", eventid); 961 + return false; 1000 962 } 1001 963 964 + mon_event_all[eventid].any_cpu = any_cpu; 965 + mon_event_all[eventid].binary_bits = binary_bits; 966 + mon_event_all[eventid].arch_priv = arch_priv; 1002 967 mon_event_all[eventid].enabled = true; 968 + 969 + return true; 1003 970 } 1004 971 1005 972 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid) ··· 1131 1082 * mbm_cntr_free_all() - Clear all the counter ID configuration details in the 1132 1083 * domain @d. Called when mbm_assign_mode is changed. 1133 1084 */ 1134 - static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d) 1085 + static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d) 1135 1086 { 1136 1087 memset(d->cntr_cfg, 0, sizeof(*d->cntr_cfg) * r->mon.num_mbm_cntrs); 1137 1088 } ··· 1140 1091 * resctrl_reset_rmid_all() - Reset all non-architecture states for all the 1141 1092 * supported RMIDs. 1142 1093 */ 1143 - static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d) 1094 + static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d) 1144 1095 { 1145 1096 u32 idx_limit = resctrl_arch_system_num_rmid_idx(); 1146 1097 enum resctrl_event_id evt; ··· 1161 1112 * Assign the counter if @assign is true else unassign the counter. Reset the 1162 1113 * associated non-architectural state. 1163 1114 */ 1164 - static void rdtgroup_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 1115 + static void rdtgroup_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 1165 1116 enum resctrl_event_id evtid, u32 rmid, u32 closid, 1166 1117 u32 cntr_id, bool assign) 1167 1118 { ··· 1181 1132 * Return: 1182 1133 * 0 on success, < 0 on failure. 1183 1134 */ 1184 - static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 1135 + static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 1185 1136 struct rdtgroup *rdtgrp, struct mon_evt *mevt) 1186 1137 { 1187 1138 int cntr_id; ··· 1216 1167 * Return: 1217 1168 * 0 on success, < 0 on failure. 1218 1169 */ 1219 - static int rdtgroup_assign_cntr_event(struct rdt_mon_domain *d, struct rdtgroup *rdtgrp, 1170 + static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp, 1220 1171 struct mon_evt *mevt) 1221 1172 { 1222 1173 struct rdt_resource *r = resctrl_arch_get_resource(mevt->rid); ··· 1266 1217 * rdtgroup_free_unassign_cntr() - Unassign and reset the counter ID configuration 1267 1218 * for the event pointed to by @mevt within the domain @d and resctrl group @rdtgrp. 1268 1219 */ 1269 - static void rdtgroup_free_unassign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 1220 + static void rdtgroup_free_unassign_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 1270 1221 struct rdtgroup *rdtgrp, struct mon_evt *mevt) 1271 1222 { 1272 1223 int cntr_id; ··· 1287 1238 * the event structure @mevt from the domain @d and the group @rdtgrp. Unassign 1288 1239 * the counters from all the domains if @d is NULL else unassign from @d. 1289 1240 */ 1290 - static void rdtgroup_unassign_cntr_event(struct rdt_mon_domain *d, struct rdtgroup *rdtgrp, 1241 + static void rdtgroup_unassign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp, 1291 1242 struct mon_evt *mevt) 1292 1243 { 1293 1244 struct rdt_resource *r = resctrl_arch_get_resource(mevt->rid); ··· 1362 1313 static void rdtgroup_update_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp, 1363 1314 enum resctrl_event_id evtid) 1364 1315 { 1365 - struct rdt_mon_domain *d; 1316 + struct rdt_l3_mon_domain *d; 1366 1317 int cntr_id; 1367 1318 1368 1319 list_for_each_entry(d, &r->mon_domains, hdr.list) { ··· 1468 1419 size_t nbytes, loff_t off) 1469 1420 { 1470 1421 struct rdt_resource *r = rdt_kn_parent_priv(of->kn); 1471 - struct rdt_mon_domain *d; 1422 + struct rdt_l3_mon_domain *d; 1472 1423 int ret = 0; 1473 1424 bool enable; 1474 1425 ··· 1541 1492 struct seq_file *s, void *v) 1542 1493 { 1543 1494 struct rdt_resource *r = rdt_kn_parent_priv(of->kn); 1544 - struct rdt_mon_domain *dom; 1495 + struct rdt_l3_mon_domain *dom; 1545 1496 bool sep = false; 1546 1497 1547 1498 cpus_read_lock(); ··· 1565 1516 struct seq_file *s, void *v) 1566 1517 { 1567 1518 struct rdt_resource *r = rdt_kn_parent_priv(of->kn); 1568 - struct rdt_mon_domain *dom; 1519 + struct rdt_l3_mon_domain *dom; 1569 1520 bool sep = false; 1570 1521 u32 cntrs, i; 1571 1522 int ret = 0; ··· 1606 1557 int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v) 1607 1558 { 1608 1559 struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 1609 - struct rdt_mon_domain *d; 1560 + struct rdt_l3_mon_domain *d; 1610 1561 struct rdtgroup *rdtgrp; 1611 1562 struct mon_evt *mevt; 1612 1563 int ret = 0; ··· 1669 1620 return NULL; 1670 1621 } 1671 1622 1672 - static int rdtgroup_modify_assign_state(char *assign, struct rdt_mon_domain *d, 1623 + static int rdtgroup_modify_assign_state(char *assign, struct rdt_l3_mon_domain *d, 1673 1624 struct rdtgroup *rdtgrp, struct mon_evt *mevt) 1674 1625 { 1675 1626 int ret = 0; ··· 1695 1646 static int resctrl_parse_mbm_assignment(struct rdt_resource *r, struct rdtgroup *rdtgrp, 1696 1647 char *event, char *tok) 1697 1648 { 1698 - struct rdt_mon_domain *d; 1649 + struct rdt_l3_mon_domain *d; 1699 1650 unsigned long dom_id = 0; 1700 1651 char *dom_str, *id_str; 1701 1652 struct mon_evt *mevt; ··· 1790 1741 return ret ?: nbytes; 1791 1742 } 1792 1743 1744 + static int closid_num_dirty_rmid_alloc(struct rdt_resource *r) 1745 + { 1746 + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) { 1747 + u32 num_closid = resctrl_arch_get_num_closid(r); 1748 + u32 *tmp; 1749 + 1750 + /* For ARM memory ordering access to closid_num_dirty_rmid */ 1751 + mutex_lock(&rdtgroup_mutex); 1752 + 1753 + /* 1754 + * If the architecture hasn't provided a sanitised value here, 1755 + * this may result in larger arrays than necessary. Resctrl will 1756 + * use a smaller system wide value based on the resources in 1757 + * use. 1758 + */ 1759 + tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL); 1760 + if (!tmp) { 1761 + mutex_unlock(&rdtgroup_mutex); 1762 + return -ENOMEM; 1763 + } 1764 + 1765 + closid_num_dirty_rmid = tmp; 1766 + 1767 + mutex_unlock(&rdtgroup_mutex); 1768 + } 1769 + 1770 + return 0; 1771 + } 1772 + 1773 + static void closid_num_dirty_rmid_free(void) 1774 + { 1775 + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) { 1776 + mutex_lock(&rdtgroup_mutex); 1777 + kfree(closid_num_dirty_rmid); 1778 + closid_num_dirty_rmid = NULL; 1779 + mutex_unlock(&rdtgroup_mutex); 1780 + } 1781 + } 1782 + 1793 1783 /** 1794 - * resctrl_mon_resource_init() - Initialise global monitoring structures. 1784 + * resctrl_l3_mon_resource_init() - Initialise global monitoring structures. 1795 1785 * 1796 1786 * Allocate and initialise global monitor resources that do not belong to a 1797 - * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists. 1787 + * specific domain. i.e. the closid_num_dirty_rmid[] used to find the CLOSID 1788 + * with the cleanest set of RMIDs. 1798 1789 * Called once during boot after the struct rdt_resource's have been configured 1799 1790 * but before the filesystem is mounted. 1800 1791 * Resctrl's cpuhp callbacks may be called before this point to bring a domain 1801 1792 * online. 1802 1793 * 1803 - * Returns 0 for success, or -ENOMEM. 1794 + * Return: 0 for success, or -ENOMEM. 1804 1795 */ 1805 - int resctrl_mon_resource_init(void) 1796 + int resctrl_l3_mon_resource_init(void) 1806 1797 { 1807 1798 struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 1808 1799 int ret; ··· 1850 1761 if (!r->mon_capable) 1851 1762 return 0; 1852 1763 1853 - ret = dom_data_init(r); 1764 + ret = closid_num_dirty_rmid_alloc(r); 1854 1765 if (ret) 1855 1766 return ret; 1856 1767 ··· 1892 1803 return 0; 1893 1804 } 1894 1805 1895 - void resctrl_mon_resource_exit(void) 1806 + void resctrl_l3_mon_resource_exit(void) 1896 1807 { 1897 1808 struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); 1898 1809 1899 - dom_data_exit(r); 1810 + if (!r->mon_capable) 1811 + return; 1812 + 1813 + closid_num_dirty_rmid_free(); 1900 1814 }
+173 -94
fs/resctrl/rdtgroup.c
··· 18 18 #include <linux/fs_parser.h> 19 19 #include <linux/sysfs.h> 20 20 #include <linux/kernfs.h> 21 + #include <linux/once.h> 21 22 #include <linux/resctrl.h> 22 23 #include <linux/seq_buf.h> 23 24 #include <linux/seq_file.h> ··· 1158 1157 { 1159 1158 struct rdt_resource *r = rdt_kn_parent_priv(of->kn); 1160 1159 1161 - seq_printf(seq, "%d\n", r->mon.num_rmid); 1160 + seq_printf(seq, "%u\n", r->mon.num_rmid); 1162 1161 1163 1162 return 0; 1164 1163 } ··· 1641 1640 static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid) 1642 1641 { 1643 1642 struct resctrl_mon_config_info mon_info; 1644 - struct rdt_mon_domain *dom; 1643 + struct rdt_l3_mon_domain *dom; 1645 1644 bool sep = false; 1646 1645 1647 1646 cpus_read_lock(); ··· 1689 1688 } 1690 1689 1691 1690 static void mbm_config_write_domain(struct rdt_resource *r, 1692 - struct rdt_mon_domain *d, u32 evtid, u32 val) 1691 + struct rdt_l3_mon_domain *d, u32 evtid, u32 val) 1693 1692 { 1694 1693 struct resctrl_mon_config_info mon_info = {0}; 1695 1694 ··· 1730 1729 static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid) 1731 1730 { 1732 1731 char *dom_str = NULL, *id_str; 1732 + struct rdt_l3_mon_domain *d; 1733 1733 unsigned long dom_id, val; 1734 - struct rdt_mon_domain *d; 1735 1734 1736 1735 /* Walking r->domains, ensure it can't race with cpuhp */ 1737 1736 lockdep_assert_cpus_held(); ··· 2396 2395 case RDT_RESOURCE_MBA: 2397 2396 case RDT_RESOURCE_SMBA: 2398 2397 return RFTYPE_RES_MB; 2398 + case RDT_RESOURCE_PERF_PKG: 2399 + return RFTYPE_RES_PERF_PKG; 2399 2400 } 2400 2401 2401 2402 return WARN_ON_ONCE(1); ··· 2784 2781 { 2785 2782 struct rdt_fs_context *ctx = rdt_fc2context(fc); 2786 2783 unsigned long flags = RFTYPE_CTRL_BASE; 2787 - struct rdt_mon_domain *dom; 2784 + struct rdt_l3_mon_domain *dom; 2788 2785 struct rdt_resource *r; 2789 2786 int ret; 2787 + 2788 + DO_ONCE_SLEEPABLE(resctrl_arch_pre_mount); 2790 2789 2791 2790 cpus_read_lock(); 2792 2791 mutex_lock(&rdtgroup_mutex); ··· 2799 2794 ret = -EBUSY; 2800 2795 goto out; 2801 2796 } 2797 + 2798 + ret = setup_rmid_lru_list(); 2799 + if (ret) 2800 + goto out; 2802 2801 2803 2802 ret = rdtgroup_setup_root(ctx); 2804 2803 if (ret) ··· 3100 3091 * @rid: The resource id for the event file being created. 3101 3092 * @domid: The domain id for the event file being created. 3102 3093 * @mevt: The type of event file being created. 3103 - * @do_sum: Whether SNC summing monitors are being created. 3094 + * @do_sum: Whether SNC summing monitors are being created. Only set 3095 + * when @rid == RDT_RESOURCE_L3. 3104 3096 */ 3105 3097 static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid, 3106 3098 struct mon_evt *mevt, ··· 3113 3103 3114 3104 list_for_each_entry(priv, &mon_data_kn_priv_list, list) { 3115 3105 if (priv->rid == rid && priv->domid == domid && 3116 - priv->sum == do_sum && priv->evtid == mevt->evtid) 3106 + priv->sum == do_sum && priv->evt == mevt) 3117 3107 return priv; 3118 3108 } 3119 3109 ··· 3124 3114 priv->rid = rid; 3125 3115 priv->domid = domid; 3126 3116 priv->sum = do_sum; 3127 - priv->evtid = mevt->evtid; 3117 + priv->evt = mevt; 3128 3118 list_add_tail(&priv->list, &mon_data_kn_priv_list); 3129 3119 3130 3120 return priv; ··· 3233 3223 } 3234 3224 3235 3225 /* 3236 - * Remove all subdirectories of mon_data of ctrl_mon groups 3237 - * and monitor groups for the given domain. 3238 - * Remove files and directories containing "sum" of domain data 3239 - * when last domain being summed is removed. 3226 + * Remove files and directories for one SNC node. If it is the last node 3227 + * sharing an L3 cache, then remove the upper level directory containing 3228 + * the "sum" files too. 3240 3229 */ 3241 - static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, 3242 - struct rdt_mon_domain *d) 3230 + static void rmdir_mondata_subdir_allrdtgrp_snc(struct rdt_resource *r, 3231 + struct rdt_domain_hdr *hdr) 3243 3232 { 3244 3233 struct rdtgroup *prgrp, *crgrp; 3234 + struct rdt_l3_mon_domain *d; 3245 3235 char subname[32]; 3246 - bool snc_mode; 3247 3236 char name[32]; 3248 3237 3249 - snc_mode = r->mon_scope == RESCTRL_L3_NODE; 3250 - sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id); 3251 - if (snc_mode) 3252 - sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id); 3238 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 3239 + return; 3240 + 3241 + d = container_of(hdr, struct rdt_l3_mon_domain, hdr); 3242 + sprintf(name, "mon_%s_%02d", r->name, d->ci_id); 3243 + sprintf(subname, "mon_sub_%s_%02d", r->name, hdr->id); 3253 3244 3254 3245 list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 3255 3246 mon_rmdir_one_subdir(prgrp->mon.mon_data_kn, name, subname); ··· 3260 3249 } 3261 3250 } 3262 3251 3263 - static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d, 3264 - struct rdt_resource *r, struct rdtgroup *prgrp, 3265 - bool do_sum) 3252 + /* 3253 + * Remove all subdirectories of mon_data of ctrl_mon groups 3254 + * and monitor groups for the given domain. 3255 + */ 3256 + static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, 3257 + struct rdt_domain_hdr *hdr) 3258 + { 3259 + struct rdtgroup *prgrp, *crgrp; 3260 + char name[32]; 3261 + 3262 + if (r->rid == RDT_RESOURCE_L3 && r->mon_scope == RESCTRL_L3_NODE) { 3263 + rmdir_mondata_subdir_allrdtgrp_snc(r, hdr); 3264 + return; 3265 + } 3266 + 3267 + sprintf(name, "mon_%s_%02d", r->name, hdr->id); 3268 + list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 3269 + kernfs_remove_by_name(prgrp->mon.mon_data_kn, name); 3270 + 3271 + list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list) 3272 + kernfs_remove_by_name(crgrp->mon.mon_data_kn, name); 3273 + } 3274 + } 3275 + 3276 + /* 3277 + * Create a directory for a domain and populate it with monitor files. Create 3278 + * summing monitors when @hdr is NULL. No need to initialize summing monitors. 3279 + */ 3280 + static struct kernfs_node *_mkdir_mondata_subdir(struct kernfs_node *parent_kn, char *name, 3281 + struct rdt_domain_hdr *hdr, 3282 + struct rdt_resource *r, 3283 + struct rdtgroup *prgrp, int domid) 3266 3284 { 3267 3285 struct rmid_read rr = {0}; 3286 + struct kernfs_node *kn; 3268 3287 struct mon_data *priv; 3269 3288 struct mon_evt *mevt; 3270 - int ret, domid; 3289 + int ret; 3290 + 3291 + kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp); 3292 + if (IS_ERR(kn)) 3293 + return kn; 3294 + 3295 + ret = rdtgroup_kn_set_ugid(kn); 3296 + if (ret) 3297 + goto out_destroy; 3271 3298 3272 3299 for_each_mon_event(mevt) { 3273 3300 if (mevt->rid != r->rid || !mevt->enabled) 3274 3301 continue; 3275 - domid = do_sum ? d->ci_id : d->hdr.id; 3276 - priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum); 3277 - if (WARN_ON_ONCE(!priv)) 3278 - return -EINVAL; 3302 + priv = mon_get_kn_priv(r->rid, domid, mevt, !hdr); 3303 + if (WARN_ON_ONCE(!priv)) { 3304 + ret = -EINVAL; 3305 + goto out_destroy; 3306 + } 3279 3307 3280 3308 ret = mon_addfile(kn, mevt->name, priv); 3281 3309 if (ret) 3282 - return ret; 3310 + goto out_destroy; 3283 3311 3284 - if (!do_sum && resctrl_is_mbm_event(mevt->evtid)) 3285 - mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true); 3312 + if (hdr && resctrl_is_mbm_event(mevt->evtid)) 3313 + mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt, true); 3286 3314 } 3287 3315 3288 - return 0; 3316 + return kn; 3317 + out_destroy: 3318 + kernfs_remove(kn); 3319 + return ERR_PTR(ret); 3289 3320 } 3290 3321 3291 - static int mkdir_mondata_subdir(struct kernfs_node *parent_kn, 3292 - struct rdt_mon_domain *d, 3293 - struct rdt_resource *r, struct rdtgroup *prgrp) 3322 + static int mkdir_mondata_subdir_snc(struct kernfs_node *parent_kn, 3323 + struct rdt_domain_hdr *hdr, 3324 + struct rdt_resource *r, struct rdtgroup *prgrp) 3294 3325 { 3295 - struct kernfs_node *kn, *ckn; 3326 + struct kernfs_node *ckn, *kn; 3327 + struct rdt_l3_mon_domain *d; 3296 3328 char name[32]; 3297 - bool snc_mode; 3298 - int ret = 0; 3299 3329 3300 - lockdep_assert_held(&rdtgroup_mutex); 3330 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 3331 + return -EINVAL; 3301 3332 3302 - snc_mode = r->mon_scope == RESCTRL_L3_NODE; 3303 - sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id); 3333 + d = container_of(hdr, struct rdt_l3_mon_domain, hdr); 3334 + sprintf(name, "mon_%s_%02d", r->name, d->ci_id); 3304 3335 kn = kernfs_find_and_get(parent_kn, name); 3305 3336 if (kn) { 3306 3337 /* ··· 3351 3298 */ 3352 3299 kernfs_put(kn); 3353 3300 } else { 3354 - kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp); 3301 + kn = _mkdir_mondata_subdir(parent_kn, name, NULL, r, prgrp, d->ci_id); 3355 3302 if (IS_ERR(kn)) 3356 3303 return PTR_ERR(kn); 3357 - 3358 - ret = rdtgroup_kn_set_ugid(kn); 3359 - if (ret) 3360 - goto out_destroy; 3361 - ret = mon_add_all_files(kn, d, r, prgrp, snc_mode); 3362 - if (ret) 3363 - goto out_destroy; 3364 3304 } 3365 3305 3366 - if (snc_mode) { 3367 - sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id); 3368 - ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp); 3369 - if (IS_ERR(ckn)) { 3370 - ret = -EINVAL; 3371 - goto out_destroy; 3372 - } 3373 - 3374 - ret = rdtgroup_kn_set_ugid(ckn); 3375 - if (ret) 3376 - goto out_destroy; 3377 - 3378 - ret = mon_add_all_files(ckn, d, r, prgrp, false); 3379 - if (ret) 3380 - goto out_destroy; 3306 + sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id); 3307 + ckn = _mkdir_mondata_subdir(kn, name, hdr, r, prgrp, hdr->id); 3308 + if (IS_ERR(ckn)) { 3309 + kernfs_remove(kn); 3310 + return PTR_ERR(ckn); 3381 3311 } 3382 3312 3383 3313 kernfs_activate(kn); 3384 3314 return 0; 3315 + } 3385 3316 3386 - out_destroy: 3387 - kernfs_remove(kn); 3388 - return ret; 3317 + static int mkdir_mondata_subdir(struct kernfs_node *parent_kn, 3318 + struct rdt_domain_hdr *hdr, 3319 + struct rdt_resource *r, struct rdtgroup *prgrp) 3320 + { 3321 + struct kernfs_node *kn; 3322 + char name[32]; 3323 + 3324 + lockdep_assert_held(&rdtgroup_mutex); 3325 + 3326 + if (r->rid == RDT_RESOURCE_L3 && r->mon_scope == RESCTRL_L3_NODE) 3327 + return mkdir_mondata_subdir_snc(parent_kn, hdr, r, prgrp); 3328 + 3329 + sprintf(name, "mon_%s_%02d", r->name, hdr->id); 3330 + kn = _mkdir_mondata_subdir(parent_kn, name, hdr, r, prgrp, hdr->id); 3331 + if (IS_ERR(kn)) 3332 + return PTR_ERR(kn); 3333 + 3334 + kernfs_activate(kn); 3335 + return 0; 3389 3336 } 3390 3337 3391 3338 /* ··· 3393 3340 * and "monitor" groups with given domain id. 3394 3341 */ 3395 3342 static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, 3396 - struct rdt_mon_domain *d) 3343 + struct rdt_domain_hdr *hdr) 3397 3344 { 3398 3345 struct kernfs_node *parent_kn; 3399 3346 struct rdtgroup *prgrp, *crgrp; ··· 3401 3348 3402 3349 list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) { 3403 3350 parent_kn = prgrp->mon.mon_data_kn; 3404 - mkdir_mondata_subdir(parent_kn, d, r, prgrp); 3351 + mkdir_mondata_subdir(parent_kn, hdr, r, prgrp); 3405 3352 3406 3353 head = &prgrp->mon.crdtgrp_list; 3407 3354 list_for_each_entry(crgrp, head, mon.crdtgrp_list) { 3408 3355 parent_kn = crgrp->mon.mon_data_kn; 3409 - mkdir_mondata_subdir(parent_kn, d, r, crgrp); 3356 + mkdir_mondata_subdir(parent_kn, hdr, r, crgrp); 3410 3357 } 3411 3358 } 3412 3359 } ··· 3415 3362 struct rdt_resource *r, 3416 3363 struct rdtgroup *prgrp) 3417 3364 { 3418 - struct rdt_mon_domain *dom; 3365 + struct rdt_domain_hdr *hdr; 3419 3366 int ret; 3420 3367 3421 3368 /* Walking r->domains, ensure it can't race with cpuhp */ 3422 3369 lockdep_assert_cpus_held(); 3423 3370 3424 - list_for_each_entry(dom, &r->mon_domains, hdr.list) { 3425 - ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp); 3371 + list_for_each_entry(hdr, &r->mon_domains, list) { 3372 + ret = mkdir_mondata_subdir(parent_kn, hdr, r, prgrp); 3426 3373 if (ret) 3427 3374 return ret; 3428 3375 } ··· 4284 4231 mutex_unlock(&rdtgroup_mutex); 4285 4232 } 4286 4233 4287 - static void domain_destroy_mon_state(struct rdt_mon_domain *d) 4234 + static void domain_destroy_l3_mon_state(struct rdt_l3_mon_domain *d) 4288 4235 { 4289 4236 int idx; 4290 4237 ··· 4306 4253 mutex_unlock(&rdtgroup_mutex); 4307 4254 } 4308 4255 4309 - void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d) 4256 + void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr) 4310 4257 { 4258 + struct rdt_l3_mon_domain *d; 4259 + 4311 4260 mutex_lock(&rdtgroup_mutex); 4312 4261 4313 4262 /* ··· 4317 4262 * per domain monitor data directories. 4318 4263 */ 4319 4264 if (resctrl_mounted && resctrl_arch_mon_capable()) 4320 - rmdir_mondata_subdir_allrdtgrp(r, d); 4265 + rmdir_mondata_subdir_allrdtgrp(r, hdr); 4321 4266 4267 + if (r->rid != RDT_RESOURCE_L3) 4268 + goto out_unlock; 4269 + 4270 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 4271 + goto out_unlock; 4272 + 4273 + d = container_of(hdr, struct rdt_l3_mon_domain, hdr); 4322 4274 if (resctrl_is_mbm_enabled()) 4323 4275 cancel_delayed_work(&d->mbm_over); 4324 4276 if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) { ··· 4341 4279 cancel_delayed_work(&d->cqm_limbo); 4342 4280 } 4343 4281 4344 - domain_destroy_mon_state(d); 4345 - 4282 + domain_destroy_l3_mon_state(d); 4283 + out_unlock: 4346 4284 mutex_unlock(&rdtgroup_mutex); 4347 4285 } 4348 4286 4349 4287 /** 4350 - * domain_setup_mon_state() - Initialise domain monitoring structures. 4288 + * domain_setup_l3_mon_state() - Initialise domain monitoring structures. 4351 4289 * @r: The resource for the newly online domain. 4352 4290 * @d: The newly online domain. 4353 4291 * ··· 4355 4293 * Called when the first CPU of a domain comes online, regardless of whether 4356 4294 * the filesystem is mounted. 4357 4295 * During boot this may be called before global allocations have been made by 4358 - * resctrl_mon_resource_init(). 4296 + * resctrl_l3_mon_resource_init(). 4359 4297 * 4360 - * Returns 0 for success, or -ENOMEM. 4298 + * Called during CPU online that may run as soon as CPU online callbacks 4299 + * are set up during resctrl initialization. The number of supported RMIDs 4300 + * may be reduced if additional mon_capable resources are enumerated 4301 + * at mount time. This means the rdt_l3_mon_domain::mbm_states[] and 4302 + * rdt_l3_mon_domain::rmid_busy_llc allocations may be larger than needed. 4303 + * 4304 + * Return: 0 for success, or -ENOMEM. 4361 4305 */ 4362 - static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d) 4306 + static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d) 4363 4307 { 4364 4308 u32 idx_limit = resctrl_arch_system_num_rmid_idx(); 4365 4309 size_t tsize = sizeof(*d->mbm_states[0]); ··· 4421 4353 return err; 4422 4354 } 4423 4355 4424 - int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d) 4356 + int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr) 4425 4357 { 4426 - int err; 4358 + struct rdt_l3_mon_domain *d; 4359 + int err = -EINVAL; 4427 4360 4428 4361 mutex_lock(&rdtgroup_mutex); 4429 4362 4430 - err = domain_setup_mon_state(r, d); 4363 + if (r->rid != RDT_RESOURCE_L3) 4364 + goto mkdir; 4365 + 4366 + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) 4367 + goto out_unlock; 4368 + 4369 + d = container_of(hdr, struct rdt_l3_mon_domain, hdr); 4370 + err = domain_setup_l3_mon_state(r, d); 4431 4371 if (err) 4432 4372 goto out_unlock; 4433 4373 ··· 4448 4372 if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) 4449 4373 INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo); 4450 4374 4375 + mkdir: 4376 + err = 0; 4451 4377 /* 4452 4378 * If the filesystem is not mounted then only the default resource group 4453 4379 * exists. Creation of its directories is deferred until mount time ··· 4457 4379 * If resctrl is mounted, add per domain monitor data directories. 4458 4380 */ 4459 4381 if (resctrl_mounted && resctrl_arch_mon_capable()) 4460 - mkdir_mondata_subdir_allrdtgrp(r, d); 4382 + mkdir_mondata_subdir_allrdtgrp(r, hdr); 4461 4383 4462 4384 out_unlock: 4463 4385 mutex_unlock(&rdtgroup_mutex); ··· 4483 4405 } 4484 4406 } 4485 4407 4486 - static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu, 4487 - struct rdt_resource *r) 4408 + static struct rdt_l3_mon_domain *get_mon_domain_from_cpu(int cpu, 4409 + struct rdt_resource *r) 4488 4410 { 4489 - struct rdt_mon_domain *d; 4411 + struct rdt_l3_mon_domain *d; 4490 4412 4491 4413 lockdep_assert_cpus_held(); 4492 4414 ··· 4502 4424 void resctrl_offline_cpu(unsigned int cpu) 4503 4425 { 4504 4426 struct rdt_resource *l3 = resctrl_arch_get_resource(RDT_RESOURCE_L3); 4505 - struct rdt_mon_domain *d; 4427 + struct rdt_l3_mon_domain *d; 4506 4428 struct rdtgroup *rdtgrp; 4507 4429 4508 4430 mutex_lock(&rdtgroup_mutex); ··· 4554 4476 4555 4477 io_alloc_init(); 4556 4478 4557 - ret = resctrl_mon_resource_init(); 4479 + ret = resctrl_l3_mon_resource_init(); 4558 4480 if (ret) 4559 4481 return ret; 4560 4482 4561 4483 ret = sysfs_create_mount_point(fs_kobj, "resctrl"); 4562 4484 if (ret) { 4563 - resctrl_mon_resource_exit(); 4485 + resctrl_l3_mon_resource_exit(); 4564 4486 return ret; 4565 4487 } 4566 4488 ··· 4595 4517 4596 4518 cleanup_mountpoint: 4597 4519 sysfs_remove_mount_point(fs_kobj, "resctrl"); 4598 - resctrl_mon_resource_exit(); 4520 + resctrl_l3_mon_resource_exit(); 4599 4521 4600 4522 return ret; 4601 4523 } ··· 4631 4553 * When called by the architecture code, all CPUs and resctrl domains must be 4632 4554 * offline. This ensures the limbo and overflow handlers are not scheduled to 4633 4555 * run, meaning the data structures they access can be freed by 4634 - * resctrl_mon_resource_exit(). 4556 + * resctrl_l3_mon_resource_exit(). 4635 4557 * 4636 4558 * After resctrl_exit() returns, the architecture code should return an 4637 4559 * error from all resctrl_arch_ functions that can do this. ··· 4658 4580 * it can be used to umount resctrl. 4659 4581 */ 4660 4582 4661 - resctrl_mon_resource_exit(); 4583 + resctrl_l3_mon_resource_exit(); 4584 + free_rmid_lru_list(); 4662 4585 }
+39 -18
include/linux/resctrl.h
··· 53 53 RDT_RESOURCE_L2, 54 54 RDT_RESOURCE_MBA, 55 55 RDT_RESOURCE_SMBA, 56 + RDT_RESOURCE_PERF_PKG, 56 57 57 58 /* Must be the last */ 58 59 RDT_NUM_RESOURCES, ··· 132 131 * @list: all instances of this resource 133 132 * @id: unique id for this instance 134 133 * @type: type of this instance 134 + * @rid: resource id for this instance 135 135 * @cpu_mask: which CPUs share this resource 136 136 */ 137 137 struct rdt_domain_hdr { 138 138 struct list_head list; 139 139 int id; 140 140 enum resctrl_domain_type type; 141 + enum resctrl_res_level rid; 141 142 struct cpumask cpu_mask; 142 143 }; 144 + 145 + static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr, 146 + enum resctrl_domain_type type, 147 + enum resctrl_res_level rid) 148 + { 149 + return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid); 150 + } 143 151 144 152 /** 145 153 * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource ··· 179 169 }; 180 170 181 171 /** 182 - * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource 172 + * struct rdt_l3_mon_domain - group of CPUs sharing RDT_RESOURCE_L3 monitoring 183 173 * @hdr: common header for different domain types 184 174 * @ci_id: cache info id for this domain 185 175 * @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold ··· 193 183 * @cntr_cfg: array of assignable counters' configuration (indexed 194 184 * by counter ID) 195 185 */ 196 - struct rdt_mon_domain { 186 + struct rdt_l3_mon_domain { 197 187 struct rdt_domain_hdr hdr; 198 188 unsigned int ci_id; 199 189 unsigned long *rmid_busy_llc; ··· 271 261 RESCTRL_L2_CACHE = 2, 272 262 RESCTRL_L3_CACHE = 3, 273 263 RESCTRL_L3_NODE, 264 + RESCTRL_PACKAGE, 274 265 }; 275 266 276 267 /** ··· 295 284 * events of monitor groups created via mkdir. 296 285 */ 297 286 struct resctrl_mon { 298 - int num_rmid; 287 + u32 num_rmid; 299 288 unsigned int mbm_cfg_mask; 300 289 int num_mbm_cntrs; 301 290 bool mbm_cntr_assignable; ··· 369 358 }; 370 359 371 360 struct resctrl_mon_config_info { 372 - struct rdt_resource *r; 373 - struct rdt_mon_domain *d; 374 - u32 evtid; 375 - u32 mon_config; 361 + struct rdt_resource *r; 362 + struct rdt_l3_mon_domain *d; 363 + u32 evtid; 364 + u32 mon_config; 376 365 }; 377 366 378 367 /** ··· 414 403 u32 resctrl_arch_system_num_rmid_idx(void); 415 404 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid); 416 405 417 - void resctrl_enable_mon_event(enum resctrl_event_id eventid); 406 + bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, 407 + unsigned int binary_bits, void *arch_priv); 418 408 419 409 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid); 420 410 ··· 510 498 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d, 511 499 u32 closid, enum resctrl_conf_type type); 512 500 int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d); 513 - int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d); 501 + int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr); 514 502 void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d); 515 - void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d); 503 + void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr); 516 504 void resctrl_online_cpu(unsigned int cpu); 517 505 void resctrl_offline_cpu(unsigned int cpu); 506 + 507 + /* 508 + * Architecture hook called at beginning of first file system mount attempt. 509 + * No locks are held. 510 + */ 511 + void resctrl_arch_pre_mount(void); 518 512 519 513 /** 520 514 * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid 521 515 * for this resource and domain. 522 516 * @r: resource that the counter should be read from. 523 - * @d: domain that the counter should be read from. 517 + * @hdr: Header of domain that the counter should be read from. 524 518 * @closid: closid that matches the rmid. Depending on the architecture, the 525 519 * counter may match traffic of both @closid and @rmid, or @rmid 526 520 * only. 527 521 * @rmid: rmid of the counter to read. 528 522 * @eventid: eventid to read, e.g. L3 occupancy. 523 + * @arch_priv: Architecture private data for this event. 524 + * The @arch_priv provided by the architecture via 525 + * resctrl_enable_mon_event(). 529 526 * @val: result of the counter read in bytes. 530 527 * @arch_mon_ctx: An architecture specific value from 531 528 * resctrl_arch_mon_ctx_alloc(), for MPAM this identifies ··· 550 529 * Return: 551 530 * 0 on success, or -EIO, -EINVAL etc on error. 552 531 */ 553 - int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d, 532 + int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr, 554 533 u32 closid, u32 rmid, enum resctrl_event_id eventid, 555 - u64 *val, void *arch_mon_ctx); 534 + void *arch_priv, u64 *val, void *arch_mon_ctx); 556 535 557 536 /** 558 537 * resctrl_arch_rmid_read_context_check() - warn about invalid contexts ··· 597 576 * 598 577 * This can be called from any CPU. 599 578 */ 600 - void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d, 579 + void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 601 580 u32 closid, u32 rmid, 602 581 enum resctrl_event_id eventid); 603 582 ··· 610 589 * 611 590 * This can be called from any CPU. 612 591 */ 613 - void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d); 592 + void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d); 614 593 615 594 /** 616 595 * resctrl_arch_reset_all_ctrls() - Reset the control for each CLOSID to its ··· 636 615 * 637 616 * This can be called from any CPU. 638 617 */ 639 - void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 618 + void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 640 619 enum resctrl_event_id evtid, u32 rmid, u32 closid, 641 620 u32 cntr_id, bool assign); 642 621 ··· 659 638 * Return: 660 639 * 0 on success, or -EIO, -EINVAL etc on error. 661 640 */ 662 - int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d, 641 + int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 663 642 u32 closid, u32 rmid, int cntr_id, 664 643 enum resctrl_event_id eventid, u64 *val); 665 644 ··· 674 653 * 675 654 * This can be called from any CPU. 676 655 */ 677 - void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d, 656 + void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d, 678 657 u32 closid, u32 rmid, int cntr_id, 679 658 enum resctrl_event_id eventid); 680 659
+11
include/linux/resctrl_types.h
··· 50 50 QOS_L3_MBM_TOTAL_EVENT_ID = 0x02, 51 51 QOS_L3_MBM_LOCAL_EVENT_ID = 0x03, 52 52 53 + /* Intel Telemetry Events */ 54 + PMT_EVENT_ENERGY, 55 + PMT_EVENT_ACTIVITY, 56 + PMT_EVENT_STALLS_LLC_HIT, 57 + PMT_EVENT_C1_RES, 58 + PMT_EVENT_UNHALTED_CORE_CYCLES, 59 + PMT_EVENT_STALLS_LLC_MISS, 60 + PMT_EVENT_AUTO_C6_RES, 61 + PMT_EVENT_UNHALTED_REF_CYCLES, 62 + PMT_EVENT_UOPS_RETIRED, 63 + 53 64 /* Must be the last */ 54 65 QOS_NUM_EVENTS, 55 66 };