Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU

Adds PCIE PMU support in Tegra410 SOC. This PMU is instanced
in each root complex in the SOC and can capture traffic from
PCIE device to various memory types. This PMU can filter traffic
based on the originating root port or BDF and the target memory
types (CPU DRAM, GPU Memory, CXL Memory, or remote Memory).

Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>

authored by

Besar Wicaksono and committed by
Will Deacon
bf585ba1 bc86281f

+368 -5
+163
Documentation/admin-guide/perf/nvidia-tegra410-pmu.rst
··· 6 6 metrics like memory bandwidth, latency, and utilization: 7 7 8 8 * Unified Coherence Fabric (UCF) 9 + * PCIE 9 10 10 11 PMU Driver 11 12 ---------- ··· 105 104 destination filter = remote memory:: 106 105 107 106 perf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/ 107 + 108 + PCIE PMU 109 + -------- 110 + 111 + This PMU is located in the SOC fabric connecting the PCIE root complex (RC) and 112 + the memory subsystem. It monitors all read/write traffic from the root port(s) 113 + or a particular BDF in a PCIE RC to local or remote memory. There is one PMU per 114 + PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated into 115 + up to 8 root ports. The traffic from each root port can be filtered using RP or 116 + BDF filter. For example, specifying "src_rp_mask=0xFF" means the PMU counter will 117 + capture traffic from all RPs. Please see below for more details. 118 + 119 + The events and configuration options of this PMU device are described in sysfs, 120 + see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>_rc_<pcie-rc-id>. 121 + 122 + The events in this PMU can be used to measure bandwidth, utilization, and 123 + latency: 124 + 125 + * rd_req: count the number of read requests by PCIE device. 126 + * wr_req: count the number of write requests by PCIE device. 127 + * rd_bytes: count the number of bytes transferred by rd_req. 128 + * wr_bytes: count the number of bytes transferred by wr_req. 129 + * rd_cum_outs: count outstanding rd_req each cycle. 130 + * cycles: count the clock cycles of SOC fabric connected to the PCIE interface. 131 + 132 + The average bandwidth is calculated as:: 133 + 134 + AVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS 135 + AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NS 136 + 137 + The average request rate is calculated as:: 138 + 139 + AVG_RD_REQUEST_RATE = RD_REQ / CYCLES 140 + AVG_WR_REQUEST_RATE = WR_REQ / CYCLES 141 + 142 + 143 + The average latency is calculated as:: 144 + 145 + FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS 146 + AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ 147 + AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ 148 + 149 + The PMU events can be filtered based on the traffic source and destination. 150 + The source filter indicates the PCIE devices that will be monitored. The 151 + destination filter specifies the destination memory type, e.g. local system 152 + memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote 153 + classification of the destination filter is based on the home socket of the 154 + address, not where the data actually resides. These filters can be found in 155 + /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>_rc_<pcie-rc-id>/format/. 156 + 157 + The list of event filters: 158 + 159 + * Source filter: 160 + 161 + * src_rp_mask: bitmask of root ports that will be monitored. Each bit in this 162 + bitmask represents the RP index in the RC. If the bit is set, all devices under 163 + the associated RP will be monitored. E.g "src_rp_mask=0xF" will monitor 164 + devices in root port 0 to 3. 165 + * src_bdf: the BDF that will be monitored. This is a 16-bit value that 166 + follows formula: (bus << 8) + (device << 3) + (function). For example, the 167 + value of BDF 27:01.1 is 0x2781. 168 + * src_bdf_en: enable the BDF filter. If this is set, the BDF filter value in 169 + "src_bdf" is used to filter the traffic. 170 + 171 + Note that Root-Port and BDF filters are mutually exclusive and the PMU in 172 + each RC can only have one BDF filter for the whole counters. If BDF filter 173 + is enabled, the BDF filter value will be applied to all events. 174 + 175 + * Destination filter: 176 + 177 + * dst_loc_cmem: if set, count events to local system memory (CMEM) address 178 + * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address 179 + * dst_loc_pcie_p2p: if set, count events to local PCIE peer address 180 + * dst_loc_pcie_cxl: if set, count events to local CXL memory address 181 + * dst_rem: if set, count events to remote memory address 182 + 183 + If the source filter is not specified, the PMU will count events from all root 184 + ports. If the destination filter is not specified, the PMU will count events 185 + to all destinations. 186 + 187 + Example usage: 188 + 189 + * Count event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting all 190 + destinations:: 191 + 192 + perf stat -a -e nvidia_pcie_pmu_0_rc_0/event=0x0,src_rp_mask=0x1/ 193 + 194 + * Count event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and 195 + targeting just local CMEM of socket 0:: 196 + 197 + perf stat -a -e nvidia_pcie_pmu_0_rc_1/event=0x1,src_rp_mask=0x3,dst_loc_cmem=0x1/ 198 + 199 + * Count event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting all 200 + destinations:: 201 + 202 + perf stat -a -e nvidia_pcie_pmu_1_rc_2/event=0x2,src_rp_mask=0x1/ 203 + 204 + * Count event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and 205 + targeting just local CMEM of socket 1:: 206 + 207 + perf stat -a -e nvidia_pcie_pmu_1_rc_3/event=0x3,src_rp_mask=0x3,dst_loc_cmem=0x1/ 208 + 209 + * Count event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting all 210 + destinations:: 211 + 212 + perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/ 213 + 214 + Mapping the RC# to lspci segment number can be non-trivial; hence a new NVIDIA 215 + Designated Vendor Specific Capability (DVSEC) register is added into the PCIE config space 216 + for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DVSEC register 217 + contains the following information to map PCIE devices under the RP back to its RC# : 218 + 219 + - Bus# (byte 0xc) : bus number as reported by the lspci output 220 + - Segment# (byte 0xd) : segment number as reported by the lspci output 221 + - RP# (byte 0xe) : port number as reported by LnkCap attribute from lspci for a device with Root Port capability 222 + - RC# (byte 0xf): root complex number associated with the RP 223 + - Socket# (byte 0x10): socket number associated with the RP 224 + 225 + Example script for mapping lspci BDF to RC# and socket#:: 226 + 227 + #!/bin/bash 228 + while read bdf rest; do 229 + dvsec4_reg=$(lspci -vv -s $bdf | awk ' 230 + /Designated Vendor-Specific: Vendor=10de ID=0004/ { 231 + match($0, /\[([0-9a-fA-F]+)/, arr); 232 + print "0x" arr[1]; 233 + exit 234 + } 235 + ') 236 + if [ -n "$dvsec4_reg" ]; then 237 + bus=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xc))).b) 238 + segment=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xd))).b) 239 + rp=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xe))).b) 240 + rc=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0xf))).b) 241 + socket=$(setpci -s $bdf $(printf '0x%x' $((${dvsec4_reg} + 0x10))).b) 242 + echo "$bdf: Bus=$bus, Segment=$segment, RP=$rp, RC=$rc, Socket=$socket" 243 + fi 244 + done < <(lspci -d 10de:) 245 + 246 + Example output:: 247 + 248 + 0001:00:00.0: Bus=00, Segment=01, RP=00, RC=00, Socket=00 249 + 0002:80:00.0: Bus=80, Segment=02, RP=01, RC=01, Socket=00 250 + 0002:a0:00.0: Bus=a0, Segment=02, RP=02, RC=01, Socket=00 251 + 0002:c0:00.0: Bus=c0, Segment=02, RP=03, RC=01, Socket=00 252 + 0002:e0:00.0: Bus=e0, Segment=02, RP=04, RC=01, Socket=00 253 + 0003:00:00.0: Bus=00, Segment=03, RP=00, RC=02, Socket=00 254 + 0004:00:00.0: Bus=00, Segment=04, RP=00, RC=03, Socket=00 255 + 0005:00:00.0: Bus=00, Segment=05, RP=00, RC=04, Socket=00 256 + 0005:40:00.0: Bus=40, Segment=05, RP=01, RC=04, Socket=00 257 + 0005:c0:00.0: Bus=c0, Segment=05, RP=02, RC=04, Socket=00 258 + 0006:00:00.0: Bus=00, Segment=06, RP=00, RC=05, Socket=00 259 + 0009:00:00.0: Bus=00, Segment=09, RP=00, RC=00, Socket=01 260 + 000a:80:00.0: Bus=80, Segment=0a, RP=01, RC=01, Socket=01 261 + 000a:a0:00.0: Bus=a0, Segment=0a, RP=02, RC=01, Socket=01 262 + 000a:e0:00.0: Bus=e0, Segment=0a, RP=03, RC=01, Socket=01 263 + 000b:00:00.0: Bus=00, Segment=0b, RP=00, RC=02, Socket=01 264 + 000c:00:00.0: Bus=00, Segment=0c, RP=00, RC=03, Socket=01 265 + 000d:00:00.0: Bus=00, Segment=0d, RP=00, RC=04, Socket=01 266 + 000d:40:00.0: Bus=40, Segment=0d, RP=01, RC=04, Socket=01 267 + 000d:c0:00.0: Bus=c0, Segment=0d, RP=02, RC=04, Socket=01 268 + 000e:00:00.0: Bus=00, Segment=0e, RP=00, RC=05, Socket=01
+205 -5
drivers/perf/arm_cspmu/nvidia_cspmu.c
··· 8 8 9 9 #include <linux/io.h> 10 10 #include <linux/module.h> 11 + #include <linux/property.h> 11 12 #include <linux/topology.h> 12 13 13 14 #include "arm_cspmu.h" ··· 28 27 #define NV_UCF_FILTER_SRC GENMASK_ULL(2, 0) 29 28 #define NV_UCF_FILTER_DST GENMASK_ULL(11, 8) 30 29 #define NV_UCF_FILTER_DEFAULT (NV_UCF_FILTER_SRC | NV_UCF_FILTER_DST) 30 + 31 + #define NV_PCIE_V2_PORT_COUNT 8ULL 32 + #define NV_PCIE_V2_FILTER_ID_MASK GENMASK_ULL(24, 0) 33 + #define NV_PCIE_V2_FILTER_PORT GENMASK_ULL(NV_PCIE_V2_PORT_COUNT - 1, 0) 34 + #define NV_PCIE_V2_FILTER_BDF_VAL GENMASK_ULL(23, NV_PCIE_V2_PORT_COUNT) 35 + #define NV_PCIE_V2_FILTER_BDF_EN BIT(24) 36 + #define NV_PCIE_V2_FILTER_BDF_VAL_EN GENMASK_ULL(24, NV_PCIE_V2_PORT_COUNT) 37 + #define NV_PCIE_V2_FILTER_DEFAULT NV_PCIE_V2_FILTER_PORT 38 + 39 + #define NV_PCIE_V2_DST_COUNT 5ULL 40 + #define NV_PCIE_V2_FILTER2_ID_MASK GENMASK_ULL(4, 0) 41 + #define NV_PCIE_V2_FILTER2_DST GENMASK_ULL(NV_PCIE_V2_DST_COUNT - 1, 0) 42 + #define NV_PCIE_V2_FILTER2_DEFAULT NV_PCIE_V2_FILTER2_DST 31 43 32 44 #define NV_GENERIC_FILTER_ID_MASK GENMASK_ULL(31, 0) 33 45 ··· 175 161 NULL 176 162 }; 177 163 164 + static struct attribute *pcie_v2_pmu_event_attrs[] = { 165 + ARM_CSPMU_EVENT_ATTR(rd_bytes, 0x0), 166 + ARM_CSPMU_EVENT_ATTR(wr_bytes, 0x1), 167 + ARM_CSPMU_EVENT_ATTR(rd_req, 0x2), 168 + ARM_CSPMU_EVENT_ATTR(wr_req, 0x3), 169 + ARM_CSPMU_EVENT_ATTR(rd_cum_outs, 0x4), 170 + ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), 171 + NULL 172 + }; 173 + 178 174 static struct attribute *generic_pmu_event_attrs[] = { 179 175 ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT), 180 176 NULL, ··· 225 201 NULL 226 202 }; 227 203 204 + static struct attribute *pcie_v2_pmu_format_attrs[] = { 205 + ARM_CSPMU_FORMAT_EVENT_ATTR, 206 + ARM_CSPMU_FORMAT_ATTR(src_rp_mask, "config1:0-7"), 207 + ARM_CSPMU_FORMAT_ATTR(src_bdf, "config1:8-23"), 208 + ARM_CSPMU_FORMAT_ATTR(src_bdf_en, "config1:24"), 209 + ARM_CSPMU_FORMAT_ATTR(dst_loc_cmem, "config2:0"), 210 + ARM_CSPMU_FORMAT_ATTR(dst_loc_gmem, "config2:1"), 211 + ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_p2p, "config2:2"), 212 + ARM_CSPMU_FORMAT_ATTR(dst_loc_pcie_cxl, "config2:3"), 213 + ARM_CSPMU_FORMAT_ATTR(dst_rem, "config2:4"), 214 + NULL 215 + }; 216 + 228 217 static struct attribute *generic_pmu_format_attrs[] = { 229 218 ARM_CSPMU_FORMAT_EVENT_ATTR, 230 219 ARM_CSPMU_FORMAT_FILTER_ATTR, ··· 268 231 269 232 return ctx->name; 270 233 } 234 + 235 + #if defined(CONFIG_ACPI) && defined(CONFIG_ARM64) 236 + static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id) 237 + { 238 + struct fwnode_handle *fwnode; 239 + struct acpi_device *adev; 240 + int ret; 241 + 242 + adev = arm_cspmu_acpi_dev_get(cspmu); 243 + if (!adev) 244 + return -ENODEV; 245 + 246 + fwnode = acpi_fwnode_handle(adev); 247 + ret = fwnode_property_read_u32(fwnode, "instance_id", id); 248 + if (ret) 249 + dev_err(cspmu->dev, "Failed to get instance ID\n"); 250 + 251 + acpi_dev_put(adev); 252 + return ret; 253 + } 254 + #else 255 + static int nv_cspmu_get_inst_id(const struct arm_cspmu *cspmu, u32 *id) 256 + { 257 + return -EINVAL; 258 + } 259 + #endif 271 260 272 261 static u32 nv_cspmu_event_filter(const struct perf_event *event) 273 262 { ··· 340 277 } 341 278 } 342 279 280 + static void nv_cspmu_reset_ev_filter(struct arm_cspmu *cspmu, 281 + const struct perf_event *event) 282 + { 283 + const struct nv_cspmu_ctx *ctx = 284 + to_nv_cspmu_ctx(to_arm_cspmu(event->pmu)); 285 + const u32 offset = 4 * event->hw.idx; 286 + 287 + if (ctx->get_filter) 288 + writel(0, cspmu->base0 + PMEVFILTR + offset); 289 + 290 + if (ctx->get_filter2) 291 + writel(0, cspmu->base0 + PMEVFILT2R + offset); 292 + } 293 + 343 294 static void nv_cspmu_set_cc_filter(struct arm_cspmu *cspmu, 344 295 const struct perf_event *event) 345 296 { ··· 384 307 return ret; 385 308 } 386 309 310 + static u32 pcie_v2_pmu_bdf_val_en(u32 filter) 311 + { 312 + const u32 bdf_en = FIELD_GET(NV_PCIE_V2_FILTER_BDF_EN, filter); 313 + 314 + /* Returns both BDF value and enable bit if BDF filtering is enabled. */ 315 + if (bdf_en) 316 + return FIELD_GET(NV_PCIE_V2_FILTER_BDF_VAL_EN, filter); 317 + 318 + /* Ignore the BDF value if BDF filter is not enabled. */ 319 + return 0; 320 + } 321 + 322 + static u32 pcie_v2_pmu_event_filter(const struct perf_event *event) 323 + { 324 + u32 filter, lead_filter, lead_bdf; 325 + struct perf_event *leader; 326 + const struct nv_cspmu_ctx *ctx = 327 + to_nv_cspmu_ctx(to_arm_cspmu(event->pmu)); 328 + 329 + filter = event->attr.config1 & ctx->filter_mask; 330 + if (filter != 0) 331 + return filter; 332 + 333 + leader = event->group_leader; 334 + 335 + /* Use leader's filter value if its BDF filtering is enabled. */ 336 + if (event != leader) { 337 + lead_filter = pcie_v2_pmu_event_filter(leader); 338 + lead_bdf = pcie_v2_pmu_bdf_val_en(lead_filter); 339 + if (lead_bdf != 0) 340 + return lead_filter; 341 + } 342 + 343 + /* Otherwise, return default filter value. */ 344 + return ctx->filter_default_val; 345 + } 346 + 347 + static int pcie_v2_pmu_validate_event(struct arm_cspmu *cspmu, 348 + struct perf_event *new_ev) 349 + { 350 + /* 351 + * Make sure the events are using same BDF filter since the PCIE-SRC PMU 352 + * only supports one common BDF filter setting for all of the counters. 353 + */ 354 + 355 + int idx; 356 + u32 new_filter, new_rp, new_bdf, new_lead_filter, new_lead_bdf; 357 + struct perf_event *new_leader; 358 + 359 + if (cspmu->impl.ops.is_cycle_counter_event(new_ev)) 360 + return 0; 361 + 362 + new_leader = new_ev->group_leader; 363 + 364 + new_filter = pcie_v2_pmu_event_filter(new_ev); 365 + new_lead_filter = pcie_v2_pmu_event_filter(new_leader); 366 + 367 + new_bdf = pcie_v2_pmu_bdf_val_en(new_filter); 368 + new_lead_bdf = pcie_v2_pmu_bdf_val_en(new_lead_filter); 369 + 370 + new_rp = FIELD_GET(NV_PCIE_V2_FILTER_PORT, new_filter); 371 + 372 + if (new_rp != 0 && new_bdf != 0) { 373 + dev_err(cspmu->dev, 374 + "RP and BDF filtering are mutually exclusive\n"); 375 + return -EINVAL; 376 + } 377 + 378 + if (new_bdf != new_lead_bdf) { 379 + dev_err(cspmu->dev, 380 + "sibling and leader BDF value should be equal\n"); 381 + return -EINVAL; 382 + } 383 + 384 + /* Compare BDF filter on existing events. */ 385 + idx = find_first_bit(cspmu->hw_events.used_ctrs, 386 + cspmu->cycle_counter_logical_idx); 387 + 388 + if (idx != cspmu->cycle_counter_logical_idx) { 389 + struct perf_event *leader = cspmu->hw_events.events[idx]->group_leader; 390 + 391 + const u32 lead_filter = pcie_v2_pmu_event_filter(leader); 392 + const u32 lead_bdf = pcie_v2_pmu_bdf_val_en(lead_filter); 393 + 394 + if (new_lead_bdf != lead_bdf) { 395 + dev_err(cspmu->dev, "only one BDF value is supported\n"); 396 + return -EINVAL; 397 + } 398 + } 399 + 400 + return 0; 401 + } 402 + 387 403 enum nv_cspmu_name_fmt { 388 404 NAME_FMT_GENERIC, 389 - NAME_FMT_SOCKET 405 + NAME_FMT_SOCKET, 406 + NAME_FMT_SOCKET_INST, 390 407 }; 391 408 392 409 struct nv_cspmu_match { ··· 599 428 }, 600 429 }, 601 430 { 431 + .prodid = 0x10301000, 432 + .prodid_mask = NV_PRODID_MASK, 433 + .name_pattern = "nvidia_pcie_pmu_%u_rc_%u", 434 + .name_fmt = NAME_FMT_SOCKET_INST, 435 + .template_ctx = { 436 + .event_attr = pcie_v2_pmu_event_attrs, 437 + .format_attr = pcie_v2_pmu_format_attrs, 438 + .filter_mask = NV_PCIE_V2_FILTER_ID_MASK, 439 + .filter_default_val = NV_PCIE_V2_FILTER_DEFAULT, 440 + .filter2_mask = NV_PCIE_V2_FILTER2_ID_MASK, 441 + .filter2_default_val = NV_PCIE_V2_FILTER2_DEFAULT, 442 + .get_filter = pcie_v2_pmu_event_filter, 443 + .get_filter2 = nv_cspmu_event_filter2, 444 + }, 445 + .ops = { 446 + .validate_event = pcie_v2_pmu_validate_event, 447 + .reset_ev_filter = nv_cspmu_reset_ev_filter, 448 + } 449 + }, 450 + { 602 451 .prodid = 0, 603 452 .prodid_mask = 0, 604 453 .name_pattern = "nvidia_uncore_pmu_%u", ··· 641 450 static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu, 642 451 const struct nv_cspmu_match *match) 643 452 { 644 - char *name; 453 + char *name = NULL; 645 454 struct device *dev = cspmu->dev; 646 455 647 456 static atomic_t pmu_generic_idx = {0}; ··· 655 464 socket); 656 465 break; 657 466 } 467 + case NAME_FMT_SOCKET_INST: { 468 + const int cpu = cpumask_first(&cspmu->associated_cpus); 469 + const int socket = cpu_to_node(cpu); 470 + u32 inst_id; 471 + 472 + if (!nv_cspmu_get_inst_id(cspmu, &inst_id)) 473 + name = devm_kasprintf(dev, GFP_KERNEL, 474 + match->name_pattern, socket, inst_id); 475 + break; 476 + } 658 477 case NAME_FMT_GENERIC: 659 478 name = devm_kasprintf(dev, GFP_KERNEL, match->name_pattern, 660 479 atomic_fetch_inc(&pmu_generic_idx)); 661 - break; 662 - default: 663 - name = NULL; 664 480 break; 665 481 } 666 482 ··· 709 511 cspmu->impl.ctx = ctx; 710 512 711 513 /* NVIDIA specific callbacks. */ 514 + SET_OP(validate_event, impl_ops, match, NULL); 712 515 SET_OP(set_cc_filter, impl_ops, match, nv_cspmu_set_cc_filter); 713 516 SET_OP(set_ev_filter, impl_ops, match, nv_cspmu_set_ev_filter); 517 + SET_OP(reset_ev_filter, impl_ops, match, NULL); 714 518 SET_OP(get_event_attrs, impl_ops, match, nv_cspmu_get_event_attrs); 715 519 SET_OP(get_format_attrs, impl_ops, match, nv_cspmu_get_format_attrs); 716 520 SET_OP(get_name, impl_ops, match, nv_cspmu_get_name);