Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:

- Add support for a new AMD feature called slow memory bandwidth
allocation. Its goal is to control resource allocation in external
slow memory which is connected to the machine like for example
through CXL devices, accelerators etc

* tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix a silly -Wunused-but-set-variable warning
Documentation/x86: Update resctrl.rst for new features
x86/resctrl: Add interface to write mbm_local_bytes_config
x86/resctrl: Add interface to write mbm_total_bytes_config
x86/resctrl: Add interface to read mbm_local_bytes_config
x86/resctrl: Add interface to read mbm_total_bytes_config
x86/resctrl: Support monitor configuration
x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
x86/resctrl: Include new features in command line options
x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()

+559 -41
+1 -1
Documentation/admin-guide/kernel-parameters.txt
··· 5221 5221 rdt= [HW,X86,RDT] 5222 5222 Turn on/off individual RDT features. List is: 5223 5223 cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, 5224 - mba. 5224 + mba, smba, bmec. 5225 5225 E.g. to turn on cmt and turn off mba use: 5226 5226 rdt=cmt,!mba 5227 5227
+145 -2
Documentation/x86/resctrl.rst
··· 17 17 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo 18 18 flag bits: 19 19 20 - ============================================= ================================ 20 + =============================================== ================================ 21 21 RDT (Resource Director Technology) Allocation "rdt_a" 22 22 CAT (Cache Allocation Technology) "cat_l3", "cat_l2" 23 23 CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2" 24 24 CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc" 25 25 MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local" 26 26 MBA (Memory Bandwidth Allocation) "mba" 27 - ============================================= ================================ 27 + SMBA (Slow Memory Bandwidth Allocation) "" 28 + BMEC (Bandwidth Monitoring Event Configuration) "" 29 + =============================================== ================================ 30 + 31 + Historically, new features were made visible by default in /proc/cpuinfo. This 32 + resulted in the feature flags becoming hard to parse by humans. Adding a new 33 + flag to /proc/cpuinfo should be avoided if user space can obtain information 34 + about the feature from resctrl's info directory. 28 35 29 36 To use the feature mount the file system:: 30 37 ··· 168 161 "mon_features": 169 162 Lists the monitoring events if 170 163 monitoring is enabled for the resource. 164 + Example:: 165 + 166 + # cat /sys/fs/resctrl/info/L3_MON/mon_features 167 + llc_occupancy 168 + mbm_total_bytes 169 + mbm_local_bytes 170 + 171 + If the system supports Bandwidth Monitoring Event 172 + Configuration (BMEC), then the bandwidth events will 173 + be configurable. The output will be:: 174 + 175 + # cat /sys/fs/resctrl/info/L3_MON/mon_features 176 + llc_occupancy 177 + mbm_total_bytes 178 + mbm_total_bytes_config 179 + mbm_local_bytes 180 + mbm_local_bytes_config 181 + 182 + "mbm_total_bytes_config", "mbm_local_bytes_config": 183 + Read/write files containing the configuration for the mbm_total_bytes 184 + and mbm_local_bytes events, respectively, when the Bandwidth 185 + Monitoring Event Configuration (BMEC) feature is supported. 186 + The event configuration settings are domain specific and affect 187 + all the CPUs in the domain. When either event configuration is 188 + changed, the bandwidth counters for all RMIDs of both events 189 + (mbm_total_bytes as well as mbm_local_bytes) are cleared for that 190 + domain. The next read for every RMID will report "Unavailable" 191 + and subsequent reads will report the valid value. 192 + 193 + Following are the types of events supported: 194 + 195 + ==== ======================================================== 196 + Bits Description 197 + ==== ======================================================== 198 + 6 Dirty Victims from the QOS domain to all types of memory 199 + 5 Reads to slow memory in the non-local NUMA domain 200 + 4 Reads to slow memory in the local NUMA domain 201 + 3 Non-temporal writes to non-local NUMA domain 202 + 2 Non-temporal writes to local NUMA domain 203 + 1 Reads to memory in the non-local NUMA domain 204 + 0 Reads to memory in the local NUMA domain 205 + ==== ======================================================== 206 + 207 + By default, the mbm_total_bytes configuration is set to 0x7f to count 208 + all the event types and the mbm_local_bytes configuration is set to 209 + 0x15 to count all the local memory events. 210 + 211 + Examples: 212 + 213 + * To view the current configuration:: 214 + :: 215 + 216 + # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 217 + 0=0x7f;1=0x7f;2=0x7f;3=0x7f 218 + 219 + # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 220 + 0=0x15;1=0x15;3=0x15;4=0x15 221 + 222 + * To change the mbm_total_bytes to count only reads on domain 0, 223 + the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary 224 + (in hexadecimal 0x33): 225 + :: 226 + 227 + # echo "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 228 + 229 + # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 230 + 0=0x33;1=0x7f;2=0x7f;3=0x7f 231 + 232 + * To change the mbm_local_bytes to count all the slow memory reads on 233 + domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b 234 + in binary (in hexadecimal 0x30): 235 + :: 236 + 237 + # echo "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 238 + 239 + # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 240 + 0=0x30;1=0x30;3=0x15;4=0x15 171 241 172 242 "max_threshold_occupancy": 173 243 Read/write file provides the largest value (in ··· 548 464 549 465 MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;... 550 466 467 + Slow Memory Bandwidth Allocation (SMBA) 468 + --------------------------------------- 469 + AMD hardware supports Slow Memory Bandwidth Allocation (SMBA). 470 + CXL.memory is the only supported "slow" memory device. With the 471 + support of SMBA, the hardware enables bandwidth allocation on 472 + the slow memory devices. If there are multiple such devices in 473 + the system, the throttling logic groups all the slow sources 474 + together and applies the limit on them as a whole. 475 + 476 + The presence of SMBA (with CXL.memory) is independent of slow memory 477 + devices presence. If there are no such devices on the system, then 478 + configuring SMBA will have no impact on the performance of the system. 479 + 480 + The bandwidth domain for slow memory is L3 cache. Its schemata file 481 + is formatted as: 482 + :: 483 + 484 + SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;... 485 + 551 486 Reading/writing the schemata file 552 487 --------------------------------- 553 488 Reading the schemata file will show the state of all resources ··· 581 478 # cat schemata 582 479 L3DATA:0=fffff;1=fffff;2=3c0;3=fffff 583 480 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff 481 + 482 + Reading/writing the schemata file (on AMD systems) 483 + -------------------------------------------------- 484 + Reading the schemata file will show the current bandwidth limit on all 485 + domains. The allocated resources are in multiples of one eighth GB/s. 486 + When writing to the file, you need to specify what cache id you wish to 487 + configure the bandwidth limit. 488 + 489 + For example, to allocate 2GB/s limit on the first cache id: 490 + 491 + :: 492 + 493 + # cat schemata 494 + MB:0=2048;1=2048;2=2048;3=2048 495 + L3:0=ffff;1=ffff;2=ffff;3=ffff 496 + 497 + # echo "MB:1=16" > schemata 498 + # cat schemata 499 + MB:0=2048;1= 16;2=2048;3=2048 500 + L3:0=ffff;1=ffff;2=ffff;3=ffff 501 + 502 + Reading/writing the schemata file (on AMD systems) with SMBA feature 503 + -------------------------------------------------------------------- 504 + Reading and writing the schemata file is the same as without SMBA in 505 + above section. 506 + 507 + For example, to allocate 8GB/s limit on the first cache id: 508 + 509 + :: 510 + 511 + # cat schemata 512 + SMBA:0=2048;1=2048;2=2048;3=2048 513 + MB:0=2048;1=2048;2=2048;3=2048 514 + L3:0=ffff;1=ffff;2=ffff;3=ffff 515 + 516 + # echo "SMBA:1=64" > schemata 517 + # cat schemata 518 + SMBA:0=2048;1= 64;2=2048;3=2048 519 + MB:0=2048;1=2048;2=2048;3=2048 520 + L3:0=ffff;1=ffff;2=ffff;3=ffff 584 521 585 522 Cache Pseudo-Locking 586 523 ====================
+2
arch/x86/include/asm/cpufeatures.h
··· 307 307 #define X86_FEATURE_SGX_EDECCSSA (11*32+18) /* "" SGX EDECCSSA user leaf function */ 308 308 #define X86_FEATURE_CALL_DEPTH (11*32+19) /* "" Call depth tracking for RSB stuffing */ 309 309 #define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */ 310 + #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ 311 + #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ 310 312 311 313 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ 312 314 #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
+2
arch/x86/include/asm/msr-index.h
··· 1084 1084 1085 1085 /* - AMD: */ 1086 1086 #define MSR_IA32_MBA_BW_BASE 0xc0000200 1087 + #define MSR_IA32_SMBA_BW_BASE 0xc0000280 1088 + #define MSR_IA32_EVT_CFG_BASE 0xc0000400 1087 1089 1088 1090 /* MSR_IA32_VMX_MISC bits */ 1089 1091 #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
+2
arch/x86/kernel/cpu/cpuid-deps.c
··· 68 68 { X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_LLC }, 69 69 { X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC }, 70 70 { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC }, 71 + { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL }, 72 + { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL }, 71 73 { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, 72 74 { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW }, 73 75 { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
+51 -3
arch/x86/kernel/cpu/resctrl/core.c
··· 100 100 .fflags = RFTYPE_RES_MB, 101 101 }, 102 102 }, 103 + [RDT_RESOURCE_SMBA] = 104 + { 105 + .r_resctrl = { 106 + .rid = RDT_RESOURCE_SMBA, 107 + .name = "SMBA", 108 + .cache_level = 3, 109 + .domains = domain_init(RDT_RESOURCE_SMBA), 110 + .parse_ctrlval = parse_bw, 111 + .format_str = "%d=%*u", 112 + .fflags = RFTYPE_RES_MB, 113 + }, 114 + }, 103 115 }; 104 116 105 117 /* ··· 161 149 { 162 150 if (!r) 163 151 return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc; 152 + 153 + /* 154 + * The software controller support is only applicable to MBA resource. 155 + * Make sure to check for resource type. 156 + */ 157 + if (r->rid != RDT_RESOURCE_MBA) 158 + return false; 164 159 165 160 return r->membw.mba_sc; 166 161 } ··· 232 213 struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); 233 214 union cpuid_0x10_3_eax eax; 234 215 union cpuid_0x10_x_edx edx; 235 - u32 ebx, ecx; 216 + u32 ebx, ecx, subleaf; 236 217 237 - cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full); 218 + /* 219 + * Query CPUID_Fn80000020_EDX_x01 for MBA and 220 + * CPUID_Fn80000020_EDX_x02 for SMBA 221 + */ 222 + subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 : 1; 223 + 224 + cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full); 238 225 hw_res->num_closid = edx.split.cos_max + 1; 239 226 r->default_ctrl = MAX_MBA_BW_AMD; 240 227 ··· 672 647 RDT_FLAG_L2_CAT, 673 648 RDT_FLAG_L2_CDP, 674 649 RDT_FLAG_MBA, 650 + RDT_FLAG_SMBA, 651 + RDT_FLAG_BMEC, 675 652 }; 676 653 677 654 #define RDT_OPT(idx, n, f) \ ··· 697 670 RDT_OPT(RDT_FLAG_L2_CAT, "l2cat", X86_FEATURE_CAT_L2), 698 671 RDT_OPT(RDT_FLAG_L2_CDP, "l2cdp", X86_FEATURE_CDP_L2), 699 672 RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA), 673 + RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA), 674 + RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC), 700 675 }; 701 676 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options) 702 677 ··· 728 699 } 729 700 __setup("rdt", set_rdt_options); 730 701 731 - static bool __init rdt_cpu_has(int flag) 702 + bool __init rdt_cpu_has(int flag) 732 703 { 733 704 bool ret = boot_cpu_has(flag); 734 705 struct rdt_options *o; ··· 763 734 return false; 764 735 } 765 736 737 + static __init bool get_slow_mem_config(void) 738 + { 739 + struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA]; 740 + 741 + if (!rdt_cpu_has(X86_FEATURE_SMBA)) 742 + return false; 743 + 744 + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) 745 + return __rdt_get_mem_config_amd(&hw_res->r_resctrl); 746 + 747 + return false; 748 + } 749 + 766 750 static __init bool get_rdt_alloc_resources(void) 767 751 { 768 752 struct rdt_resource *r; ··· 804 762 } 805 763 806 764 if (get_mem_config()) 765 + ret = true; 766 + 767 + if (get_slow_mem_config()) 807 768 ret = true; 808 769 809 770 return ret; ··· 897 852 r->cache.min_cbm_bits = 0; 898 853 } else if (r->rid == RDT_RESOURCE_MBA) { 899 854 hw_res->msr_base = MSR_IA32_MBA_BW_BASE; 855 + hw_res->msr_update = mba_wrmsr_amd; 856 + } else if (r->rid == RDT_RESOURCE_SMBA) { 857 + hw_res->msr_base = MSR_IA32_SMBA_BW_BASE; 900 858 hw_res->msr_update = mba_wrmsr_amd; 901 859 } 902 860 }
+4 -9
arch/x86/kernel/cpu/resctrl/ctrlmondata.c
··· 209 209 unsigned long dom_id; 210 210 211 211 if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP && 212 - r->rid == RDT_RESOURCE_MBA) { 212 + (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) { 213 213 rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n"); 214 214 return -EINVAL; 215 215 } ··· 310 310 enum resctrl_conf_type t; 311 311 cpumask_var_t cpu_mask; 312 312 struct rdt_domain *d; 313 - int cpu; 314 313 u32 idx; 315 314 316 315 if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) ··· 340 341 341 342 if (cpumask_empty(cpu_mask)) 342 343 goto done; 343 - cpu = get_cpu(); 344 - /* Update resource control msr on this CPU if it's in cpu_mask. */ 345 - if (cpumask_test_cpu(cpu, cpu_mask)) 346 - rdt_ctrl_update(&msr_param); 347 - /* Update resource control msr on other CPUs. */ 348 - smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1); 349 - put_cpu(); 344 + 345 + /* Update resource control msr on all the CPUs. */ 346 + on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1); 350 347 351 348 done: 352 349 free_cpumask_var(cpu_mask);
+28
arch/x86/kernel/cpu/resctrl/internal.h
··· 30 30 */ 31 31 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE) 32 32 33 + /* Reads to Local DRAM Memory */ 34 + #define READS_TO_LOCAL_MEM BIT(0) 35 + 36 + /* Reads to Remote DRAM Memory */ 37 + #define READS_TO_REMOTE_MEM BIT(1) 38 + 39 + /* Non-Temporal Writes to Local Memory */ 40 + #define NON_TEMP_WRITE_TO_LOCAL_MEM BIT(2) 41 + 42 + /* Non-Temporal Writes to Remote Memory */ 43 + #define NON_TEMP_WRITE_TO_REMOTE_MEM BIT(3) 44 + 45 + /* Reads to Local Memory the system identifies as "Slow Memory" */ 46 + #define READS_TO_LOCAL_S_MEM BIT(4) 47 + 48 + /* Reads to Remote Memory the system identifies as "Slow Memory" */ 49 + #define READS_TO_REMOTE_S_MEM BIT(5) 50 + 51 + /* Dirty Victims to All Types of Memory */ 52 + #define DIRTY_VICTIMS_TO_ALL_MEM BIT(6) 53 + 54 + /* Max event bits supported */ 55 + #define MAX_EVT_CONFIG_BITS GENMASK(6, 0) 33 56 34 57 struct rdt_fs_context { 35 58 struct kernfs_fs_context kfc; ··· 75 52 * struct mon_evt - Entry in the event list of a resource 76 53 * @evtid: event id 77 54 * @name: name of the event 55 + * @configurable: true if the event is configurable 78 56 * @list: entry in &rdt_resource->evt_list 79 57 */ 80 58 struct mon_evt { 81 59 enum resctrl_event_id evtid; 82 60 char *name; 61 + bool configurable; 83 62 struct list_head list; 84 63 }; 85 64 ··· 434 409 RDT_RESOURCE_L3, 435 410 RDT_RESOURCE_L2, 436 411 RDT_RESOURCE_MBA, 412 + RDT_RESOURCE_SMBA, 437 413 438 414 /* Must be the last */ 439 415 RDT_NUM_RESOURCES, ··· 537 511 int alloc_rmid(void); 538 512 void free_rmid(u32 rmid); 539 513 int rdt_get_mon_l3_config(struct rdt_resource *r); 514 + bool __init rdt_cpu_has(int flag); 540 515 void mon_event_count(void *info); 541 516 int rdtgroup_mondata_show(struct seq_file *m, void *arg); 542 517 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, ··· 554 527 void __check_limbo(struct rdt_domain *d, bool force_free); 555 528 void rdt_domain_reconfigure_cdp(struct rdt_resource *r); 556 529 void __init thread_throttle_mode_init(void); 530 + void __init mbm_config_rftype_init(const char *config); 557 531 558 532 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
+29 -1
arch/x86/kernel/cpu/resctrl/monitor.c
··· 204 204 } 205 205 } 206 206 207 + /* 208 + * Assumes that hardware counters are also reset and thus that there is 209 + * no need to record initial non-zero counts. 210 + */ 211 + void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d) 212 + { 213 + struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d); 214 + 215 + if (is_mbm_total_enabled()) 216 + memset(hw_dom->arch_mbm_total, 0, 217 + sizeof(*hw_dom->arch_mbm_total) * r->num_rmid); 218 + 219 + if (is_mbm_local_enabled()) 220 + memset(hw_dom->arch_mbm_local, 0, 221 + sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); 222 + } 223 + 207 224 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width) 208 225 { 209 226 u64 shift = 64 - width, chunks; ··· 780 763 list_add_tail(&mbm_local_event.list, &r->evt_list); 781 764 } 782 765 783 - int rdt_get_mon_l3_config(struct rdt_resource *r) 766 + int __init rdt_get_mon_l3_config(struct rdt_resource *r) 784 767 { 785 768 unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset; 786 769 struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); ··· 816 799 ret = dom_data_init(r); 817 800 if (ret) 818 801 return ret; 802 + 803 + if (rdt_cpu_has(X86_FEATURE_BMEC)) { 804 + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) { 805 + mbm_total_event.configurable = true; 806 + mbm_config_rftype_init("mbm_total_bytes_config"); 807 + } 808 + if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) { 809 + mbm_local_event.configurable = true; 810 + mbm_config_rftype_init("mbm_local_bytes_config"); 811 + } 812 + } 819 813 820 814 l3_mon_evt_init(r); 821 815
+282 -25
arch/x86/kernel/cpu/resctrl/rdtgroup.c
··· 325 325 static void 326 326 update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r) 327 327 { 328 - int cpu = get_cpu(); 329 - 330 - if (cpumask_test_cpu(cpu, cpu_mask)) 331 - update_cpu_closid_rmid(r); 332 - smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1); 333 - put_cpu(); 328 + on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1); 334 329 } 335 330 336 331 static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask, ··· 998 1003 struct rdt_resource *r = of->kn->parent->priv; 999 1004 struct mon_evt *mevt; 1000 1005 1001 - list_for_each_entry(mevt, &r->evt_list, list) 1006 + list_for_each_entry(mevt, &r->evt_list, list) { 1002 1007 seq_printf(seq, "%s\n", mevt->name); 1008 + if (mevt->configurable) 1009 + seq_printf(seq, "%s_config\n", mevt->name); 1010 + } 1003 1011 1004 1012 return 0; 1005 1013 } ··· 1213 1215 1214 1216 list_for_each_entry(s, &resctrl_schema_all, list) { 1215 1217 r = s->res; 1216 - if (r->rid == RDT_RESOURCE_MBA) 1218 + if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA) 1217 1219 continue; 1218 1220 has_cache = true; 1219 1221 list_for_each_entry(d, &r->domains, list) { ··· 1402 1404 ctrl = resctrl_arch_get_config(r, d, 1403 1405 closid, 1404 1406 type); 1405 - if (r->rid == RDT_RESOURCE_MBA) 1407 + if (r->rid == RDT_RESOURCE_MBA || 1408 + r->rid == RDT_RESOURCE_SMBA) 1406 1409 size = ctrl; 1407 1410 else 1408 1411 size = rdtgroup_cbm_to_size(r, d, ctrl); ··· 1418 1419 rdtgroup_kn_unlock(of->kn); 1419 1420 1420 1421 return ret; 1422 + } 1423 + 1424 + struct mon_config_info { 1425 + u32 evtid; 1426 + u32 mon_config; 1427 + }; 1428 + 1429 + #define INVALID_CONFIG_INDEX UINT_MAX 1430 + 1431 + /** 1432 + * mon_event_config_index_get - get the hardware index for the 1433 + * configurable event 1434 + * @evtid: event id. 1435 + * 1436 + * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID 1437 + * 1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID 1438 + * INVALID_CONFIG_INDEX for invalid evtid 1439 + */ 1440 + static inline unsigned int mon_event_config_index_get(u32 evtid) 1441 + { 1442 + switch (evtid) { 1443 + case QOS_L3_MBM_TOTAL_EVENT_ID: 1444 + return 0; 1445 + case QOS_L3_MBM_LOCAL_EVENT_ID: 1446 + return 1; 1447 + default: 1448 + /* Should never reach here */ 1449 + return INVALID_CONFIG_INDEX; 1450 + } 1451 + } 1452 + 1453 + static void mon_event_config_read(void *info) 1454 + { 1455 + struct mon_config_info *mon_info = info; 1456 + unsigned int index; 1457 + u64 msrval; 1458 + 1459 + index = mon_event_config_index_get(mon_info->evtid); 1460 + if (index == INVALID_CONFIG_INDEX) { 1461 + pr_warn_once("Invalid event id %d\n", mon_info->evtid); 1462 + return; 1463 + } 1464 + rdmsrl(MSR_IA32_EVT_CFG_BASE + index, msrval); 1465 + 1466 + /* Report only the valid event configuration bits */ 1467 + mon_info->mon_config = msrval & MAX_EVT_CONFIG_BITS; 1468 + } 1469 + 1470 + static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info) 1471 + { 1472 + smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1); 1473 + } 1474 + 1475 + static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid) 1476 + { 1477 + struct mon_config_info mon_info = {0}; 1478 + struct rdt_domain *dom; 1479 + bool sep = false; 1480 + 1481 + mutex_lock(&rdtgroup_mutex); 1482 + 1483 + list_for_each_entry(dom, &r->domains, list) { 1484 + if (sep) 1485 + seq_puts(s, ";"); 1486 + 1487 + memset(&mon_info, 0, sizeof(struct mon_config_info)); 1488 + mon_info.evtid = evtid; 1489 + mondata_config_read(dom, &mon_info); 1490 + 1491 + seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config); 1492 + sep = true; 1493 + } 1494 + seq_puts(s, "\n"); 1495 + 1496 + mutex_unlock(&rdtgroup_mutex); 1497 + 1498 + return 0; 1499 + } 1500 + 1501 + static int mbm_total_bytes_config_show(struct kernfs_open_file *of, 1502 + struct seq_file *seq, void *v) 1503 + { 1504 + struct rdt_resource *r = of->kn->parent->priv; 1505 + 1506 + mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID); 1507 + 1508 + return 0; 1509 + } 1510 + 1511 + static int mbm_local_bytes_config_show(struct kernfs_open_file *of, 1512 + struct seq_file *seq, void *v) 1513 + { 1514 + struct rdt_resource *r = of->kn->parent->priv; 1515 + 1516 + mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID); 1517 + 1518 + return 0; 1519 + } 1520 + 1521 + static void mon_event_config_write(void *info) 1522 + { 1523 + struct mon_config_info *mon_info = info; 1524 + unsigned int index; 1525 + 1526 + index = mon_event_config_index_get(mon_info->evtid); 1527 + if (index == INVALID_CONFIG_INDEX) { 1528 + pr_warn_once("Invalid event id %d\n", mon_info->evtid); 1529 + return; 1530 + } 1531 + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0); 1532 + } 1533 + 1534 + static int mbm_config_write_domain(struct rdt_resource *r, 1535 + struct rdt_domain *d, u32 evtid, u32 val) 1536 + { 1537 + struct mon_config_info mon_info = {0}; 1538 + int ret = 0; 1539 + 1540 + /* mon_config cannot be more than the supported set of events */ 1541 + if (val > MAX_EVT_CONFIG_BITS) { 1542 + rdt_last_cmd_puts("Invalid event configuration\n"); 1543 + return -EINVAL; 1544 + } 1545 + 1546 + /* 1547 + * Read the current config value first. If both are the same then 1548 + * no need to write it again. 1549 + */ 1550 + mon_info.evtid = evtid; 1551 + mondata_config_read(d, &mon_info); 1552 + if (mon_info.mon_config == val) 1553 + goto out; 1554 + 1555 + mon_info.mon_config = val; 1556 + 1557 + /* 1558 + * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the 1559 + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE 1560 + * are scoped at the domain level. Writing any of these MSRs 1561 + * on one CPU is observed by all the CPUs in the domain. 1562 + */ 1563 + smp_call_function_any(&d->cpu_mask, mon_event_config_write, 1564 + &mon_info, 1); 1565 + 1566 + /* 1567 + * When an Event Configuration is changed, the bandwidth counters 1568 + * for all RMIDs and Events will be cleared by the hardware. The 1569 + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for 1570 + * every RMID on the next read to any event for every RMID. 1571 + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) 1572 + * cleared while it is tracked by the hardware. Clear the 1573 + * mbm_local and mbm_total counts for all the RMIDs. 1574 + */ 1575 + resctrl_arch_reset_rmid_all(r, d); 1576 + 1577 + out: 1578 + return ret; 1579 + } 1580 + 1581 + static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid) 1582 + { 1583 + char *dom_str = NULL, *id_str; 1584 + unsigned long dom_id, val; 1585 + struct rdt_domain *d; 1586 + int ret = 0; 1587 + 1588 + next: 1589 + if (!tok || tok[0] == '\0') 1590 + return 0; 1591 + 1592 + /* Start processing the strings for each domain */ 1593 + dom_str = strim(strsep(&tok, ";")); 1594 + id_str = strsep(&dom_str, "="); 1595 + 1596 + if (!id_str || kstrtoul(id_str, 10, &dom_id)) { 1597 + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n"); 1598 + return -EINVAL; 1599 + } 1600 + 1601 + if (!dom_str || kstrtoul(dom_str, 16, &val)) { 1602 + rdt_last_cmd_puts("Non-numeric event configuration value\n"); 1603 + return -EINVAL; 1604 + } 1605 + 1606 + list_for_each_entry(d, &r->domains, list) { 1607 + if (d->id == dom_id) { 1608 + ret = mbm_config_write_domain(r, d, evtid, val); 1609 + if (ret) 1610 + return -EINVAL; 1611 + goto next; 1612 + } 1613 + } 1614 + 1615 + return -EINVAL; 1616 + } 1617 + 1618 + static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of, 1619 + char *buf, size_t nbytes, 1620 + loff_t off) 1621 + { 1622 + struct rdt_resource *r = of->kn->parent->priv; 1623 + int ret; 1624 + 1625 + /* Valid input requires a trailing newline */ 1626 + if (nbytes == 0 || buf[nbytes - 1] != '\n') 1627 + return -EINVAL; 1628 + 1629 + mutex_lock(&rdtgroup_mutex); 1630 + 1631 + rdt_last_cmd_clear(); 1632 + 1633 + buf[nbytes - 1] = '\0'; 1634 + 1635 + ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID); 1636 + 1637 + mutex_unlock(&rdtgroup_mutex); 1638 + 1639 + return ret ?: nbytes; 1640 + } 1641 + 1642 + static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of, 1643 + char *buf, size_t nbytes, 1644 + loff_t off) 1645 + { 1646 + struct rdt_resource *r = of->kn->parent->priv; 1647 + int ret; 1648 + 1649 + /* Valid input requires a trailing newline */ 1650 + if (nbytes == 0 || buf[nbytes - 1] != '\n') 1651 + return -EINVAL; 1652 + 1653 + mutex_lock(&rdtgroup_mutex); 1654 + 1655 + rdt_last_cmd_clear(); 1656 + 1657 + buf[nbytes - 1] = '\0'; 1658 + 1659 + ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID); 1660 + 1661 + mutex_unlock(&rdtgroup_mutex); 1662 + 1663 + return ret ?: nbytes; 1421 1664 } 1422 1665 1423 1666 /* rdtgroup information files for one cache resource. */ ··· 1759 1518 .write = max_threshold_occ_write, 1760 1519 .seq_show = max_threshold_occ_show, 1761 1520 .fflags = RF_MON_INFO | RFTYPE_RES_CACHE, 1521 + }, 1522 + { 1523 + .name = "mbm_total_bytes_config", 1524 + .mode = 0644, 1525 + .kf_ops = &rdtgroup_kf_single_ops, 1526 + .seq_show = mbm_total_bytes_config_show, 1527 + .write = mbm_total_bytes_config_write, 1528 + }, 1529 + { 1530 + .name = "mbm_local_bytes_config", 1531 + .mode = 0644, 1532 + .kf_ops = &rdtgroup_kf_single_ops, 1533 + .seq_show = mbm_local_bytes_config_show, 1534 + .write = mbm_local_bytes_config_write, 1762 1535 }, 1763 1536 { 1764 1537 .name = "cpus", ··· 1878 1623 return; 1879 1624 1880 1625 rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB; 1626 + } 1627 + 1628 + void __init mbm_config_rftype_init(const char *config) 1629 + { 1630 + struct rftype *rft; 1631 + 1632 + rft = rdtgroup_get_rftype_by_name(config); 1633 + if (rft) 1634 + rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE; 1881 1635 } 1882 1636 1883 1637 /** ··· 2130 1866 /* Pick one CPU from each domain instance to update MSR */ 2131 1867 cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask); 2132 1868 } 2133 - cpu = get_cpu(); 2134 - /* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */ 2135 - if (cpumask_test_cpu(cpu, cpu_mask)) 2136 - update(&enable); 2137 - /* Update QOS_CFG MSR on all other cpus in cpu_mask. */ 2138 - smp_call_function_many(cpu_mask, update, &enable, 1); 2139 - put_cpu(); 1869 + 1870 + /* Update QOS_CFG MSR on all the CPUs in cpu_mask */ 1871 + on_each_cpu_mask(cpu_mask, update, &enable, 1); 2140 1872 2141 1873 free_cpumask_var(cpu_mask); 2142 1874 ··· 2609 2349 struct msr_param msr_param; 2610 2350 cpumask_var_t cpu_mask; 2611 2351 struct rdt_domain *d; 2612 - int i, cpu; 2352 + int i; 2613 2353 2614 2354 if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) 2615 2355 return -ENOMEM; ··· 2630 2370 for (i = 0; i < hw_res->num_closid; i++) 2631 2371 hw_dom->ctrl_val[i] = r->default_ctrl; 2632 2372 } 2633 - cpu = get_cpu(); 2634 - /* Update CBM on this cpu if it's in cpu_mask. */ 2635 - if (cpumask_test_cpu(cpu, cpu_mask)) 2636 - rdt_ctrl_update(&msr_param); 2637 - /* Update CBM on all other cpus in cpu_mask. */ 2638 - smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1); 2639 - put_cpu(); 2373 + 2374 + /* Update CBM on all the CPUs in cpu_mask */ 2375 + on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1); 2640 2376 2641 2377 free_cpumask_var(cpu_mask); 2642 2378 ··· 3111 2855 3112 2856 list_for_each_entry(s, &resctrl_schema_all, list) { 3113 2857 r = s->res; 3114 - if (r->rid == RDT_RESOURCE_MBA) { 2858 + if (r->rid == RDT_RESOURCE_MBA || 2859 + r->rid == RDT_RESOURCE_SMBA) { 3115 2860 rdtgroup_init_mba(r, rdtgrp->closid); 3116 2861 if (is_mba_sc(r)) 3117 2862 continue;
+2
arch/x86/kernel/cpu/scattered.c
··· 45 45 { X86_FEATURE_CPB, CPUID_EDX, 9, 0x80000007, 0 }, 46 46 { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 }, 47 47 { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 }, 48 + { X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 }, 49 + { X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 }, 48 50 { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 }, 49 51 { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 }, 50 52 { 0, 0, 0, 0, 0 }
+11
include/linux/resctrl.h
··· 250 250 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, 251 251 u32 rmid, enum resctrl_event_id eventid); 252 252 253 + /** 254 + * resctrl_arch_reset_rmid_all() - Reset all private state associated with 255 + * all rmids and eventids. 256 + * @r: The resctrl resource. 257 + * @d: The domain for which all architectural counter state will 258 + * be cleared. 259 + * 260 + * This can be called from any CPU. 261 + */ 262 + void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d); 263 + 253 264 extern unsigned int resctrl_rmid_realloc_threshold; 254 265 extern unsigned int resctrl_rmid_realloc_limit; 255 266