Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'mlx5-updates-2023-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-02-04

This series provides misc updates to mlx5 driver:

1) Trivial LAG code cleanup patches from Roi

2) Rahul improves mlx5's documentation structure
Separates the documentation into multiple pages related to different
components in the device driver. Adds Kconfig parameters, devlink
parameters, and tracepoints that were previously introduced but not added
to the documentation. Introduces a new page on ethtool statistics counters
with information about counters previously implemented in the mlx5_core
driver but not documented in the kernel tree.

3) From Raed, policy/state selector support for IPSec.

4) From Fragos, add support for XDR speed in IPoIB mlx5 netdev

5) Few more misc cleanups and trivial changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+2328 -803
+1 -1
Documentation/networking/device_drivers/ethernet/index.rst
··· 39 39 intel/ice 40 40 marvell/octeontx2 41 41 marvell/octeon_ep 42 - mellanox/mlx5 42 + mellanox/mlx5/index 43 43 microsoft/netvsc 44 44 neterion/s2io 45 45 netronome/nfp
-746
Documentation/networking/device_drivers/ethernet/mellanox/mlx5.rst
··· 1 - .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 - 3 - ================================================= 4 - Mellanox ConnectX(R) mlx5 core VPI Network Driver 5 - ================================================= 6 - 7 - Copyright (c) 2019, Mellanox Technologies LTD. 8 - 9 - Contents 10 - ======== 11 - 12 - - `Enabling the driver and kconfig options`_ 13 - - `Devlink info`_ 14 - - `Devlink parameters`_ 15 - - `Bridge offload`_ 16 - - `mlx5 subfunction`_ 17 - - `mlx5 function attributes`_ 18 - - `Devlink health reporters`_ 19 - - `mlx5 tracepoints`_ 20 - 21 - Enabling the driver and kconfig options 22 - ======================================= 23 - 24 - | mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out) 25 - | at build time via kernel Kconfig flags. 26 - | Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags 27 - | CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y. 28 - | For the list of advanced features, please see below. 29 - 30 - **CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko) 31 - 32 - | The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config. 33 - | This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib). 34 - 35 - 36 - **CONFIG_MLX5_CORE_EN=(y/n)** 37 - 38 - | Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads. 39 - | mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be 40 - | built-in into mlx5_core.ko. 41 - 42 - 43 - **CONFIG_MLX5_EN_ARFS=(y/n)** 44 - 45 - | Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering. 46 - | https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4 47 - 48 - 49 - **CONFIG_MLX5_EN_RXNFC=(y/n)** 50 - 51 - | Enables ethtool receive network flow classification, which allows user defined 52 - | flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API. 53 - 54 - 55 - **CONFIG_MLX5_CORE_EN_DCB=(y/n)**: 56 - 57 - | Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_. 58 - 59 - 60 - **CONFIG_MLX5_MPFS=(y/n)** 61 - 62 - | Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC. 63 - | MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing 64 - | user configured unicast MAC addresses to the requesting PF. 65 - 66 - 67 - **CONFIG_MLX5_ESWITCH=(y/n)** 68 - 69 - | Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering 70 - | and switching for the enabled VFs and PF in two available modes: 71 - | 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_. 72 - | 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_. 73 - 74 - 75 - **CONFIG_MLX5_CORE_IPOIB=(y/n)** 76 - 77 - | IPoIB offloads & acceleration support. 78 - | Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma 79 - | IPoIB ulp netdevice. 80 - 81 - 82 - **CONFIG_MLX5_FPGA=(y/n)** 83 - 84 - | Build support for the Innova family of network cards by Mellanox Technologies. 85 - | Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board. 86 - | If you select this option, the mlx5_core driver will include the Innova FPGA core and allow 87 - | building sandbox-specific client drivers. 88 - 89 - 90 - **CONFIG_MLX5_EN_IPSEC=(y/n)** 91 - 92 - | Enables `IPSec XFRM cryptography-offload acceleration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_. 93 - 94 - **CONFIG_MLX5_EN_TLS=(y/n)** 95 - 96 - | TLS cryptography-offload acceleration. 97 - 98 - 99 - **CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko) 100 - 101 - | Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support. 102 - 103 - **CONFIG_MLX5_SF=(y/n)** 104 - 105 - | Build support for subfunction. 106 - | Subfunctons are more light weight than PCI SRIOV VFs. Choosing this option 107 - | will enable support for creating subfunction devices. 108 - 109 - **External options** ( Choose if the corresponding mlx5 feature is required ) 110 - 111 - - CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled 112 - - CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled. 113 - - CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool). 114 - 115 - Devlink info 116 - ============ 117 - 118 - The devlink info reports the running and stored firmware versions on device. 119 - It also prints the device PSID which represents the HCA board type ID. 120 - 121 - User command example:: 122 - 123 - $ devlink dev info pci/0000:00:06.0 124 - pci/0000:00:06.0: 125 - driver mlx5_core 126 - versions: 127 - fixed: 128 - fw.psid MT_0000000009 129 - running: 130 - fw.version 16.26.0100 131 - stored: 132 - fw.version 16.26.0100 133 - 134 - Devlink parameters 135 - ================== 136 - 137 - flow_steering_mode: Device flow steering mode 138 - --------------------------------------------- 139 - The flow steering mode parameter controls the flow steering mode of the driver. 140 - Two modes are supported: 141 - 1. 'dmfs' - Device managed flow steering. 142 - 2. 'smfs' - Software/Driver managed flow steering. 143 - 144 - In DMFS mode, the HW steering entities are created and managed through the 145 - Firmware. 146 - In SMFS mode, the HW steering entities are created and managed though by 147 - the driver directly into hardware without firmware intervention. 148 - 149 - SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode. 150 - 151 - User command examples: 152 - 153 - - Set SMFS flow steering mode:: 154 - 155 - $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime 156 - 157 - - Read device flow steering mode:: 158 - 159 - $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode 160 - pci/0000:06:00.0: 161 - name flow_steering_mode type driver-specific 162 - values: 163 - cmode runtime value smfs 164 - 165 - enable_roce: RoCE enablement state 166 - ---------------------------------- 167 - RoCE enablement state controls driver support for RoCE traffic. 168 - When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well-known UDP RoCE port is handled as raw ethernet traffic. 169 - 170 - To change RoCE enablement state, a user must change the driverinit cmode value and run devlink reload. 171 - 172 - User command examples: 173 - 174 - - Disable RoCE:: 175 - 176 - $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit 177 - $ devlink dev reload pci/0000:06:00.0 178 - 179 - - Read RoCE enablement state:: 180 - 181 - $ devlink dev param show pci/0000:06:00.0 name enable_roce 182 - pci/0000:06:00.0: 183 - name enable_roce type generic 184 - values: 185 - cmode driverinit value true 186 - 187 - esw_port_metadata: Eswitch port metadata state 188 - ---------------------------------------------- 189 - When applicable, disabling eswitch metadata can increase packet rate 190 - up to 20% depending on the use case and packet sizes. 191 - 192 - Eswitch port metadata state controls whether to internally tag packets with 193 - metadata. Metadata tagging must be enabled for multi-port RoCE, failover 194 - between representors and stacked devices. 195 - By default metadata is enabled on the supported devices in E-switch. 196 - Metadata is applicable only for E-switch in switchdev mode and 197 - users may disable it when NONE of the below use cases will be in use: 198 - 1. HCA is in Dual/multi-port RoCE mode. 199 - 2. VF/SF representor bonding (Usually used for Live migration) 200 - 3. Stacked devices 201 - 202 - When metadata is disabled, the above use cases will fail to initialize if 203 - users try to enable them. 204 - 205 - - Show eswitch port metadata:: 206 - 207 - $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata 208 - pci/0000:06:00.0: 209 - name esw_port_metadata type driver-specific 210 - values: 211 - cmode runtime value true 212 - 213 - - Disable eswitch port metadata:: 214 - 215 - $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime 216 - 217 - - Change eswitch mode to switchdev mode where after choosing the metadata value:: 218 - 219 - $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 220 - 221 - Bridge offload 222 - ============== 223 - The mlx5 driver implements support for offloading bridge rules when in switchdev 224 - mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev 225 - representor is attached to bridge. 226 - 227 - - Change device to switchdev mode:: 228 - 229 - $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 230 - 231 - - Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1':: 232 - 233 - $ ip link set enp8s0f0 master bridge1 234 - 235 - VLANs 236 - ----- 237 - Following bridge VLAN functions are supported by mlx5: 238 - 239 - - VLAN filtering (including multiple VLANs per port):: 240 - 241 - $ ip link set bridge1 type bridge vlan_filtering 1 242 - $ bridge vlan add dev enp8s0f0 vid 2-3 243 - 244 - - VLAN push on bridge ingress:: 245 - 246 - $ bridge vlan add dev enp8s0f0 vid 3 pvid 247 - 248 - - VLAN pop on bridge egress:: 249 - 250 - $ bridge vlan add dev enp8s0f0 vid 3 untagged 251 - 252 - mlx5 subfunction 253 - ================ 254 - mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. 255 - 256 - A subfunction has its own function capabilities and its own resources. This 257 - means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These 258 - queues are neither shared nor stolen from the parent PCI function. 259 - 260 - When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA 261 - resources neither shared nor stolen from the parent PCI function. 262 - 263 - A subfunction has a dedicated window in PCI BAR space that is not shared 264 - with the other subfunctions or the parent PCI function. This ensures that all 265 - devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned 266 - PCI BAR space. 267 - 268 - A subfunction supports eswitch representation through which it supports tc 269 - offloads. The user configures eswitch to send/receive packets from/to 270 - the subfunction port. 271 - 272 - Subfunctions share PCI level resources such as PCI MSI-X IRQs with 273 - other subfunctions and/or with its parent PCI function. 274 - 275 - Example mlx5 software, system, and device view:: 276 - 277 - _______ 278 - | admin | 279 - | user |---------- 280 - |_______| | 281 - | | 282 - ____|____ __|______ _________________ 283 - | | | | | | 284 - | devlink | | tc tool | | user | 285 - | tool | |_________| | applications | 286 - |_________| | |_________________| 287 - | | | | 288 - | | | | Userspace 289 - +---------|-------------|-------------------|----------|--------------------+ 290 - | | +----------+ +----------+ Kernel 291 - | | | netdev | | rdma dev | 292 - | | +----------+ +----------+ 293 - (devlink port add/del | ^ ^ 294 - port function set) | | | 295 - | | +---------------| 296 - _____|___ | | _______|_______ 297 - | | | | | mlx5 class | 298 - | devlink | +------------+ | | drivers | 299 - | kernel | | rep netdev | | |(mlx5_core,ib) | 300 - |_________| +------------+ | |_______________| 301 - | | | ^ 302 - (devlink ops) | | (probe/remove) 303 - _________|________ | | ____|________ 304 - | subfunction | | +---------------+ | subfunction | 305 - | management driver|----- | subfunction |---| driver | 306 - | (mlx5_core) | | auxiliary dev | | (mlx5_core) | 307 - |__________________| +---------------+ |_____________| 308 - | ^ 309 - (sf add/del, vhca events) | 310 - | (device add/del) 311 - _____|____ ____|________ 312 - | | | subfunction | 313 - | PCI NIC |--- activate/deactivate events--->| host driver | 314 - |__________| | (mlx5_core) | 315 - |_____________| 316 - 317 - Subfunction is created using devlink port interface. 318 - 319 - - Change device to switchdev mode:: 320 - 321 - $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 322 - 323 - - Add a devlink port of subfunction flavour:: 324 - 325 - $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 326 - pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 327 - function: 328 - hw_addr 00:00:00:00:00:00 state inactive opstate detached 329 - 330 - - Show a devlink port of the subfunction:: 331 - 332 - $ devlink port show pci/0000:06:00.0/32768 333 - pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 334 - function: 335 - hw_addr 00:00:00:00:00:00 state inactive opstate detached 336 - 337 - - Delete a devlink port of subfunction after use:: 338 - 339 - $ devlink port del pci/0000:06:00.0/32768 340 - 341 - mlx5 function attributes 342 - ======================== 343 - The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in 344 - a unified way for SmartNIC and non-SmartNIC. 345 - 346 - This is supported only when the eswitch mode is set to switchdev. Port function 347 - configuration of the PCI VF/SF is supported through devlink eswitch port. 348 - 349 - Port function attributes should be set before PCI VF/SF is enumerated by the 350 - driver. 351 - 352 - MAC address setup 353 - ----------------- 354 - mlx5 driver support devlink port function attr mechanism to setup MAC 355 - address. (refer to Documentation/networking/devlink/devlink-port.rst) 356 - 357 - RoCE capability setup 358 - --------------------- 359 - Not all mlx5 PCI devices/SFs require RoCE capability. 360 - 361 - When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per 362 - PCI devices/SF. 363 - 364 - mlx5 driver support devlink port function attr mechanism to setup RoCE 365 - capability. (refer to Documentation/networking/devlink/devlink-port.rst) 366 - 367 - migratable capability setup 368 - --------------------------- 369 - User who wants mlx5 PCI VFs to be able to perform live migration need to 370 - explicitly enable the VF migratable capability. 371 - 372 - mlx5 driver support devlink port function attr mechanism to setup migratable 373 - capability. (refer to Documentation/networking/devlink/devlink-port.rst) 374 - 375 - SF state setup 376 - -------------- 377 - To use the SF, the user must activate the SF using the SF function state 378 - attribute. 379 - 380 - - Get the state of the SF identified by its unique devlink port index:: 381 - 382 - $ devlink port show ens2f0npf0sf88 383 - pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 384 - function: 385 - hw_addr 00:00:00:00:88:88 state inactive opstate detached 386 - 387 - - Activate the function and verify its state is active:: 388 - 389 - $ devlink port function set ens2f0npf0sf88 state active 390 - 391 - $ devlink port show ens2f0npf0sf88 392 - pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 393 - function: 394 - hw_addr 00:00:00:00:88:88 state active opstate detached 395 - 396 - Upon function activation, the PF driver instance gets the event from the device 397 - that a particular SF was activated. It's the cue to put the device on bus, probe 398 - it and instantiate the devlink instance and class specific auxiliary devices 399 - for it. 400 - 401 - - Show the auxiliary device and port of the subfunction:: 402 - 403 - $ devlink dev show 404 - devlink dev show auxiliary/mlx5_core.sf.4 405 - 406 - $ devlink port show auxiliary/mlx5_core.sf.4/1 407 - auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false 408 - 409 - $ rdma link show mlx5_0/1 410 - link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 411 - 412 - $ rdma dev show 413 - 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 414 - 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 415 - 416 - - Subfunction auxiliary device and class device hierarchy:: 417 - 418 - mlx5_core.sf.4 419 - (subfunction auxiliary device) 420 - /\ 421 - / \ 422 - / \ 423 - / \ 424 - / \ 425 - mlx5_core.eth.4 mlx5_core.rdma.4 426 - (sf eth aux dev) (sf rdma aux dev) 427 - | | 428 - | | 429 - p0sf88 mlx5_0 430 - (sf netdev) (sf rdma device) 431 - 432 - Additionally, the SF port also gets the event when the driver attaches to the 433 - auxiliary device of the subfunction. This results in changing the operational 434 - state of the function. This provides visibility to the user to decide when is it 435 - safe to delete the SF port for graceful termination of the subfunction. 436 - 437 - - Show the SF port operational state:: 438 - 439 - $ devlink port show ens2f0npf0sf88 440 - pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 441 - function: 442 - hw_addr 00:00:00:00:88:88 state active opstate attached 443 - 444 - Devlink health reporters 445 - ======================== 446 - 447 - tx reporter 448 - ----------- 449 - The tx reporter is responsible for reporting and recovering of the following two error scenarios: 450 - 451 - - tx timeout 452 - Report on kernel tx timeout detection. 453 - Recover by searching lost interrupts. 454 - - tx error completion 455 - Report on error tx completion. 456 - Recover by flushing the tx queue and reset it. 457 - 458 - tx reporter also support on demand diagnose callback, on which it provides 459 - real time information of its send queues status. 460 - 461 - User commands examples: 462 - 463 - - Diagnose send queues status:: 464 - 465 - $ devlink health diagnose pci/0000:82:00.0 reporter tx 466 - 467 - NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 468 - 469 - - Show number of tx errors indicated, number of recover flows ended successfully, 470 - is autorecover enabled and graceful period from last recover:: 471 - 472 - $ devlink health show pci/0000:82:00.0 reporter tx 473 - 474 - rx reporter 475 - ----------- 476 - The rx reporter is responsible for reporting and recovering of the following two error scenarios: 477 - 478 - - rx queues' initialization (population) timeout 479 - Population of rx queues' descriptors on ring initialization is done 480 - in napi context via triggering an irq. In case of a failure to get 481 - the minimum amount of descriptors, a timeout would occur, and 482 - descriptors could be recovered by polling the EQ (Event Queue). 483 - - rx completions with errors (reported by HW on interrupt context) 484 - Report on rx completion error. 485 - Recover (if needed) by flushing the related queue and reset it. 486 - 487 - rx reporter also supports on demand diagnose callback, on which it 488 - provides real time information of its receive queues' status. 489 - 490 - - Diagnose rx queues' status and corresponding completion queue:: 491 - 492 - $ devlink health diagnose pci/0000:82:00.0 reporter rx 493 - 494 - NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output. 495 - 496 - - Show number of rx errors indicated, number of recover flows ended successfully, 497 - is autorecover enabled, and graceful period from last recover:: 498 - 499 - $ devlink health show pci/0000:82:00.0 reporter rx 500 - 501 - fw reporter 502 - ----------- 503 - The fw reporter implements `diagnose` and `dump` callbacks. 504 - It follows symptoms of fw error such as fw syndrome by triggering 505 - fw core dump and storing it into the dump buffer. 506 - The fw reporter diagnose command can be triggered any time by the user to check 507 - current fw status. 508 - 509 - User commands examples: 510 - 511 - - Check fw heath status:: 512 - 513 - $ devlink health diagnose pci/0000:82:00.0 reporter fw 514 - 515 - - Read FW core dump if already stored or trigger new one:: 516 - 517 - $ devlink health dump show pci/0000:82:00.0 reporter fw 518 - 519 - NOTE: This command can run only on the PF which has fw tracer ownership, 520 - running it on other PF or any VF will return "Operation not permitted". 521 - 522 - fw fatal reporter 523 - ----------------- 524 - The fw fatal reporter implements `dump` and `recover` callbacks. 525 - It follows fatal errors indications by CR-space dump and recover flow. 526 - The CR-space dump uses vsc interface which is valid even if the FW command 527 - interface is not functional, which is the case in most FW fatal errors. 528 - The recover function runs recover flow which reloads the driver and triggers fw 529 - reset if needed. 530 - On firmware error, the health buffer is dumped into the dmesg. The log 531 - level is derived from the error's severity (given in health buffer). 532 - 533 - User commands examples: 534 - 535 - - Run fw recover flow manually:: 536 - 537 - $ devlink health recover pci/0000:82:00.0 reporter fw_fatal 538 - 539 - - Read FW CR-space dump if already stored or trigger new one:: 540 - 541 - $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal 542 - 543 - NOTE: This command can run only on PF. 544 - 545 - mlx5 tracepoints 546 - ================ 547 - 548 - mlx5 driver provides internal tracepoints for tracking and debugging using 549 - kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst). 550 - 551 - For the list of support mlx5 events, check `/sys/kernel/debug/tracing/events/mlx5/`. 552 - 553 - tc and eswitch offloads tracepoints: 554 - 555 - - mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5:: 556 - 557 - $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event 558 - $ cat /sys/kernel/debug/tracing/trace 559 - ... 560 - tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT 561 - 562 - - mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5:: 563 - 564 - $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event 565 - $ cat /sys/kernel/debug/tracing/trace 566 - ... 567 - tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL 568 - 569 - - mlx5e_stats_flower: trace flower stats request:: 570 - 571 - $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event 572 - $ cat /sys/kernel/debug/tracing/trace 573 - ... 574 - tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217 575 - 576 - - mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5:: 577 - 578 - $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event 579 - $ cat /sys/kernel/debug/tracing/trace 580 - ... 581 - kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1 582 - 583 - - mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events:: 584 - 585 - $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event 586 - $ cat /sys/kernel/debug/tracing/trace 587 - ... 588 - kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1 589 - 590 - Bridge offloads tracepoints: 591 - 592 - - mlx5_esw_bridge_fdb_entry_init: trace bridge FDB entry offloaded to mlx5:: 593 - 594 - $ echo mlx5:mlx5_esw_bridge_fdb_entry_init >> set_event 595 - $ cat /sys/kernel/debug/tracing/trace 596 - ... 597 - kworker/u20:9-2217 [003] ...1 318.582243: mlx5_esw_bridge_fdb_entry_init: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=0 flags=0 used=0 598 - 599 - - mlx5_esw_bridge_fdb_entry_cleanup: trace bridge FDB entry deleted from mlx5:: 600 - 601 - $ echo mlx5:mlx5_esw_bridge_fdb_entry_cleanup >> set_event 602 - $ cat /sys/kernel/debug/tracing/trace 603 - ... 604 - ip-2581 [005] ...1 318.629871: mlx5_esw_bridge_fdb_entry_cleanup: net_device=enp8s0f0_1 addr=e4:fd:05:08:00:03 vid=0 flags=0 used=16 605 - 606 - - mlx5_esw_bridge_fdb_entry_refresh: trace bridge FDB entry offload refreshed in 607 - mlx5:: 608 - 609 - $ echo mlx5:mlx5_esw_bridge_fdb_entry_refresh >> set_event 610 - $ cat /sys/kernel/debug/tracing/trace 611 - ... 612 - kworker/u20:8-3849 [003] ...1 466716: mlx5_esw_bridge_fdb_entry_refresh: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=3 flags=0 used=0 613 - 614 - - mlx5_esw_bridge_vlan_create: trace bridge VLAN object add on mlx5 615 - representor:: 616 - 617 - $ echo mlx5:mlx5_esw_bridge_vlan_create >> set_event 618 - $ cat /sys/kernel/debug/tracing/trace 619 - ... 620 - ip-2560 [007] ...1 318.460258: mlx5_esw_bridge_vlan_create: vid=1 flags=6 621 - 622 - - mlx5_esw_bridge_vlan_cleanup: trace bridge VLAN object delete from mlx5 623 - representor:: 624 - 625 - $ echo mlx5:mlx5_esw_bridge_vlan_cleanup >> set_event 626 - $ cat /sys/kernel/debug/tracing/trace 627 - ... 628 - bridge-2582 [007] ...1 318.653496: mlx5_esw_bridge_vlan_cleanup: vid=2 flags=8 629 - 630 - - mlx5_esw_bridge_vport_init: trace mlx5 vport assigned with bridge upper 631 - device:: 632 - 633 - $ echo mlx5:mlx5_esw_bridge_vport_init >> set_event 634 - $ cat /sys/kernel/debug/tracing/trace 635 - ... 636 - ip-2560 [007] ...1 318.458915: mlx5_esw_bridge_vport_init: vport_num=1 637 - 638 - - mlx5_esw_bridge_vport_cleanup: trace mlx5 vport removed from bridge upper 639 - device:: 640 - 641 - $ echo mlx5:mlx5_esw_bridge_vport_cleanup >> set_event 642 - $ cat /sys/kernel/debug/tracing/trace 643 - ... 644 - ip-5387 [000] ...1 573713: mlx5_esw_bridge_vport_cleanup: vport_num=1 645 - 646 - Eswitch QoS tracepoints: 647 - 648 - - mlx5_esw_vport_qos_create: trace creation of transmit scheduler arbiter for vport:: 649 - 650 - $ echo mlx5:mlx5_esw_vport_qos_create >> /sys/kernel/debug/tracing/set_event 651 - $ cat /sys/kernel/debug/tracing/trace 652 - ... 653 - <...>-23496 [018] .... 73136.838831: mlx5_esw_vport_qos_create: (0000:82:00.0) vport=2 tsar_ix=4 bw_share=0, max_rate=0 group=000000007b576bb3 654 - 655 - - mlx5_esw_vport_qos_config: trace configuration of transmit scheduler arbiter for vport:: 656 - 657 - $ echo mlx5:mlx5_esw_vport_qos_config >> /sys/kernel/debug/tracing/set_event 658 - $ cat /sys/kernel/debug/tracing/trace 659 - ... 660 - <...>-26548 [023] .... 75754.223823: mlx5_esw_vport_qos_config: (0000:82:00.0) vport=1 tsar_ix=3 bw_share=34, max_rate=10000 group=000000007b576bb3 661 - 662 - - mlx5_esw_vport_qos_destroy: trace deletion of transmit scheduler arbiter for vport:: 663 - 664 - $ echo mlx5:mlx5_esw_vport_qos_destroy >> /sys/kernel/debug/tracing/set_event 665 - $ cat /sys/kernel/debug/tracing/trace 666 - ... 667 - <...>-27418 [004] .... 76546.680901: mlx5_esw_vport_qos_destroy: (0000:82:00.0) vport=1 tsar_ix=3 668 - 669 - - mlx5_esw_group_qos_create: trace creation of transmit scheduler arbiter for rate group:: 670 - 671 - $ echo mlx5:mlx5_esw_group_qos_create >> /sys/kernel/debug/tracing/set_event 672 - $ cat /sys/kernel/debug/tracing/trace 673 - ... 674 - <...>-26578 [008] .... 75776.022112: mlx5_esw_group_qos_create: (0000:82:00.0) group=000000008dac63ea tsar_ix=5 675 - 676 - - mlx5_esw_group_qos_config: trace configuration of transmit scheduler arbiter for rate group:: 677 - 678 - $ echo mlx5:mlx5_esw_group_qos_config >> /sys/kernel/debug/tracing/set_event 679 - $ cat /sys/kernel/debug/tracing/trace 680 - ... 681 - <...>-27303 [020] .... 76461.455356: mlx5_esw_group_qos_config: (0000:82:00.0) group=000000008dac63ea tsar_ix=5 bw_share=100 max_rate=20000 682 - 683 - - mlx5_esw_group_qos_destroy: trace deletion of transmit scheduler arbiter for group:: 684 - 685 - $ echo mlx5:mlx5_esw_group_qos_destroy >> /sys/kernel/debug/tracing/set_event 686 - $ cat /sys/kernel/debug/tracing/trace 687 - ... 688 - <...>-27418 [006] .... 76547.187258: mlx5_esw_group_qos_destroy: (0000:82:00.0) group=000000007b576bb3 tsar_ix=1 689 - 690 - SF tracepoints: 691 - 692 - - mlx5_sf_add: trace addition of the SF port:: 693 - 694 - $ echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event 695 - $ cat /sys/kernel/debug/tracing/trace 696 - ... 697 - devlink-9363 [031] ..... 24610.188722: mlx5_sf_add: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 sfnum=88 698 - 699 - - mlx5_sf_free: trace freeing of the SF port:: 700 - 701 - $ echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event 702 - $ cat /sys/kernel/debug/tracing/trace 703 - ... 704 - devlink-9830 [038] ..... 26300.404749: mlx5_sf_free: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 705 - 706 - - mlx5_sf_hwc_alloc: trace allocating of the hardware SF context:: 707 - 708 - $ echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event 709 - $ cat /sys/kernel/debug/tracing/trace 710 - ... 711 - devlink-9775 [031] ..... 26296.385259: mlx5_sf_hwc_alloc: (0000:06:00.0) controller=0 hw_id=0x8000 sfnum=88 712 - 713 - - mlx5_sf_hwc_free: trace freeing of the hardware SF context:: 714 - 715 - $ echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event 716 - $ cat /sys/kernel/debug/tracing/trace 717 - ... 718 - kworker/u128:3-9093 [046] ..... 24625.365771: mlx5_sf_hwc_free: (0000:06:00.0) hw_id=0x8000 719 - 720 - - mlx5_sf_hwc_deferred_free : trace deferred freeing of the hardware SF context:: 721 - 722 - $ echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event 723 - $ cat /sys/kernel/debug/tracing/trace 724 - ... 725 - devlink-9519 [046] ..... 24624.400271: mlx5_sf_hwc_deferred_free: (0000:06:00.0) hw_id=0x8000 726 - 727 - - mlx5_sf_vhca_event: trace SF vhca event and state:: 728 - 729 - $ echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event 730 - $ cat /sys/kernel/debug/tracing/trace 731 - ... 732 - kworker/u128:3-9093 [046] ..... 24625.365525: mlx5_sf_vhca_event: (0000:06:00.0) hw_id=0x8000 sfnum=88 vhca_state=1 733 - 734 - - mlx5_sf_dev_add : trace SF device add event:: 735 - 736 - $ echo mlx5:mlx5_sf_dev_add>> /sys/kernel/debug/tracing/set_event 737 - $ cat /sys/kernel/debug/tracing/trace 738 - ... 739 - kworker/u128:3-9093 [000] ..... 24616.524495: mlx5_sf_dev_add: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88 740 - 741 - - mlx5_sf_dev_del : trace SF device delete event:: 742 - 743 - $ echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event 744 - $ cat /sys/kernel/debug/tracing/trace 745 - ... 746 - kworker/u128:3-9093 [044] ..... 24624.400749: mlx5_sf_dev_del: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
+1302
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + .. include:: <isonum.txt> 3 + 4 + ================ 5 + Ethtool counters 6 + ================ 7 + 8 + :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 + 10 + Contents 11 + ======== 12 + 13 + - `Overview`_ 14 + - `Groups`_ 15 + - `Types`_ 16 + - `Descriptions`_ 17 + 18 + Overview 19 + ======== 20 + 21 + There are several counter groups based on where the counter is being counted. In 22 + addition, each group of counters may have different counter types. 23 + 24 + These counter groups are based on which component in a networking setup, 25 + illustrated below, that they describe:: 26 + 27 + ---------------------------------------- 28 + | | 29 + ---------------------------------------- ---------------------------------------- | 30 + | Hypervisor | | VM | | 31 + | | | | | 32 + | ------------------- --------------- | | ------------------- --------------- | | 33 + | | Ethernet driver | | RDMA driver | | | | Ethernet driver | | RDMA driver | | | 34 + | ------------------- --------------- | | ------------------- --------------- | | 35 + | | | | | | | | | 36 + | ------------------- | | ------------------- | | 37 + | | | | | |-- 38 + ---------------------------------------- ---------------------------------------- 39 + | | 40 + ------------- ----------------------------- 41 + | | 42 + ------ ------ ------ ------ ------ ------ ------ 43 + -----| PF |----------------------| VF |-| VF |-| VF |----- --| PF |--- --| PF |--- --| PF |--- 44 + | ------ ------ ------ ------ | | ------ | | ------ | | ------ | 45 + | | | | | | | | 46 + | | | | | | | | 47 + | | | | | | | | 48 + | eSwitch | | eSwitch | | eSwitch | | eSwitch | 49 + ---------------------------------------------------------- ----------- ----------- ----------- 50 + ------------------------------------------------------------------------------- 51 + | | 52 + | | 53 + | Uplink (no counters) | 54 + ------------------------------------------------------------------------------- 55 + --------------------------------------------------------------- 56 + | | 57 + | | 58 + | MPFS (no counters) | 59 + --------------------------------------------------------------- 60 + | 61 + | 62 + | Port 63 + 64 + Groups 65 + ====== 66 + 67 + Ring 68 + Software counters populated by the driver stack. 69 + 70 + Netdev 71 + An aggregation of software ring counters. 72 + 73 + vPort counters 74 + Traffic counters and drops due to steering or no buffers. May indicate issues 75 + with NIC. These counters include Ethernet traffic counters (including Raw 76 + Ethernet) and RDMA/RoCE traffic counters. 77 + 78 + Physical port counters 79 + Counters that collect statistics about the PFs and VFs. May indicate issues 80 + with NIC, link, or network. This measuring point holds information on 81 + standardized counters like IEEE 802.3, RFC2863, RFC 2819, RFC 3635 and 82 + additional counters like flow control, FEC and more. Physical port counters 83 + are not exposed to virtual machines. 84 + 85 + Priority Port Counters 86 + A set of the physical port counters, per priority per port. 87 + 88 + Types 89 + ===== 90 + 91 + Counters are divided into three types. 92 + 93 + Traffic Informative Counters 94 + Counters which count traffic. These counters can be used for load estimation 95 + or for general debug. 96 + 97 + Traffic Acceleration Counters 98 + Counters which count traffic that was accelerated by Mellanox driver or by 99 + hardware. The counters are an additional layer to the informative counter set, 100 + and the same traffic is counted in both informative and acceleration counters. 101 + 102 + .. [#accel] Traffic acceleration counter. 103 + 104 + Error Counters 105 + Increment of these counters might indicate a problem. Each of these counters 106 + has an explanation and correction action. 107 + 108 + Statistic can be fetched via the `ip link` or `ethtool` commands. `ethtool` 109 + provides more detailed information.:: 110 + 111 + ip –s link show <if-name> 112 + ethtool -S <if-name> 113 + 114 + Descriptions 115 + ============ 116 + 117 + XSK, PTP, and QoS counters that are similar to counters defined previously will 118 + not be separately listed. For example, `ptp_tx[i]_packets` will not be 119 + explicitly documented since `tx[i]_packets` describes the behavior of both 120 + counters, except `ptp_tx[i]_packets` is only counted when precision time 121 + protocol is used. 122 + 123 + Ring / Netdev Counter 124 + ---------------------------- 125 + The following counters are available per ring or software port. 126 + 127 + These counters provide information on the amount of traffic that was accelerated 128 + by the NIC. The counters are counting the accelerated traffic in addition to the 129 + standard counters which counts it (i.e. accelerated traffic is counted twice). 130 + 131 + The counter names in the table below refers to both ring and port counters. The 132 + notation for ring counters includes the [i] index without the braces. The 133 + notation for port counters doesn't include the [i]. A counter name 134 + `rx[i]_packets` will be printed as `rx0_packets` for ring 0 and `rx_packets` for 135 + the software port. 136 + 137 + .. flat-table:: Ring / Software Port Counter Table 138 + :widths: 2 3 1 139 + 140 + * - Counter 141 + - Description 142 + - Type 143 + 144 + * - `rx[i]_packets` 145 + - The number of packets received on ring i. 146 + - Informative 147 + 148 + * - `rx[i]_bytes` 149 + - The number of bytes received on ring i. 150 + - Informative 151 + 152 + * - `tx[i]_packets` 153 + - The number of packets transmitted on ring i. 154 + - Informative 155 + 156 + * - `tx[i]_bytes` 157 + - The number of bytes transmitted on ring i. 158 + - Informative 159 + 160 + * - `tx[i]_recover` 161 + - The number of times the SQ was recovered. 162 + - Error 163 + 164 + * - `tx[i]_cqes` 165 + - Number of CQEs events on SQ issued on ring i. 166 + - Informative 167 + 168 + * - `tx[i]_cqe_err` 169 + - The number of error CQEs encountered on the SQ for ring i. 170 + - Error 171 + 172 + * - `tx[i]_tso_packets` 173 + - The number of TSO packets transmitted on ring i [#accel]_. 174 + - Acceleration 175 + 176 + * - `tx[i]_tso_bytes` 177 + - The number of TSO bytes transmitted on ring i [#accel]_. 178 + - Acceleration 179 + 180 + * - `tx[i]_tso_inner_packets` 181 + - The number of TSO packets which are indicated to be carry internal 182 + encapsulation transmitted on ring i [#accel]_. 183 + - Acceleration 184 + 185 + * - `tx[i]_tso_inner_bytes` 186 + - The number of TSO bytes which are indicated to be carry internal 187 + encapsulation transmitted on ring i [#accel]_. 188 + - Acceleration 189 + 190 + * - `rx[i]_gro_packets` 191 + - Number of received packets processed using hardware-accelerated GRO. The 192 + number of hardware GRO offloaded packets received on ring i. 193 + - Acceleration 194 + 195 + * - `rx[i]_gro_bytes` 196 + - Number of received bytes processed using hardware-accelerated GRO. The 197 + number of hardware GRO offloaded bytes received on ring i. 198 + - Acceleration 199 + 200 + * - `rx[i]_gro_skbs` 201 + - The number of receive SKBs constructed while performing 202 + hardware-accelerated GRO. 203 + - Informative 204 + 205 + * - `rx[i]_gro_match_packets` 206 + - Number of received packets processed using hardware-accelerated GRO that 207 + met the flow table match criteria. 208 + - Informative 209 + 210 + * - `rx[i]_gro_large_hds` 211 + - Number of receive packets using hardware-accelerated GRO that have large 212 + headers that require additional memory to be allocated. 213 + - Informative 214 + 215 + * - `rx[i]_lro_packets` 216 + - The number of LRO packets received on ring i [#accel]_. 217 + - Acceleration 218 + 219 + * - `rx[i]_lro_bytes` 220 + - The number of LRO bytes received on ring i [#accel]_. 221 + - Acceleration 222 + 223 + * - `rx[i]_ecn_mark` 224 + - The number of received packets where the ECN mark was turned on. 225 + - Informative 226 + 227 + * - `rx_oversize_pkts_buffer` 228 + - The number of dropped received packets due to length which arrived to RQ 229 + and exceed software buffer size allocated by the device for incoming 230 + traffic. It might imply that the device MTU is larger than the software 231 + buffers size. 232 + - Error 233 + 234 + * - `rx_oversize_pkts_sw_drop` 235 + - Number of received packets dropped in software because the CQE data is 236 + larger than the MTU size. 237 + - Error 238 + 239 + * - `rx[i]_csum_unnecessary` 240 + - Packets received with a `CHECKSUM_UNNECESSARY` on ring i [#accel]_. 241 + - Acceleration 242 + 243 + * - `rx[i]_csum_unnecessary_inner` 244 + - Packets received with inner encapsulation with a `CHECKSUM_UNNECESSARY` 245 + on ring i [#accel]_. 246 + - Acceleration 247 + 248 + * - `rx[i]_csum_none` 249 + - Packets received with a `CHECKSUM_NONE` on ring i [#accel]_. 250 + - Acceleration 251 + 252 + * - `rx[i]_csum_complete` 253 + - Packets received with a `CHECKSUM_COMPLETE` on ring i [#accel]_. 254 + - Acceleration 255 + 256 + * - `rx[i]_csum_complete_tail` 257 + - Number of received packets that had checksum calculation computed, 258 + potentially needed padding, and were able to do so with 259 + `CHECKSUM_PARTIAL`. 260 + - Informative 261 + 262 + * - `rx[i]_csum_complete_tail_slow` 263 + - Number of received packets that need padding larger than eight bytes for 264 + the checksum. 265 + - Informative 266 + 267 + * - `tx[i]_csum_partial` 268 + - Packets transmitted with a `CHECKSUM_PARTIAL` on ring i [#accel]_. 269 + - Acceleration 270 + 271 + * - `tx[i]_csum_partial_inner` 272 + - Packets transmitted with inner encapsulation with a `CHECKSUM_PARTIAL` on 273 + ring i [#accel]_. 274 + - Acceleration 275 + 276 + * - `tx[i]_csum_none` 277 + - Packets transmitted with no hardware checksum acceleration on ring i. 278 + - Informative 279 + 280 + * - `tx[i]_stopped` / `tx_queue_stopped` [#ring_global]_ 281 + - Events where SQ was full on ring i. If this counter is increased, check 282 + the amount of buffers allocated for transmission. 283 + - Informative 284 + 285 + * - `tx[i]_wake` / `tx_queue_wake` [#ring_global]_ 286 + - Events where SQ was full and has become not full on ring i. 287 + - Informative 288 + 289 + * - `tx[i]_dropped` / `tx_queue_dropped` [#ring_global]_ 290 + - Packets transmitted that were dropped due to DMA mapping failure on 291 + ring i. If this counter is increased, check the amount of buffers 292 + allocated for transmission. 293 + - Error 294 + 295 + * - `tx[i]_nop` 296 + - The number of nop WQEs (empty WQEs) inserted to the SQ (related to 297 + ring i) due to the reach of the end of the cyclic buffer. When reaching 298 + near to the end of cyclic buffer the driver may add those empty WQEs to 299 + avoid handling a state the a WQE start in the end of the queue and ends 300 + in the beginning of the queue. This is a normal condition. 301 + - Informative 302 + 303 + * - `tx[i]_added_vlan_packets` 304 + - The number of packets sent where vlan tag insertion was offloaded to the 305 + hardware. 306 + - Acceleration 307 + 308 + * - `rx[i]_removed_vlan_packets` 309 + - The number of packets received where vlan tag stripping was offloaded to 310 + the hardware. 311 + - Acceleration 312 + 313 + * - `rx[i]_wqe_err` 314 + - The number of wrong opcodes received on ring i. 315 + - Error 316 + 317 + * - `rx[i]_mpwqe_frag` 318 + - The number of WQEs that failed to allocate compound page and hence 319 + fragmented MPWQE’s (Multi Packet WQEs) were used on ring i. If this 320 + counter raise, it may suggest that there is no enough memory for large 321 + pages, the driver allocated fragmented pages. This is not abnormal 322 + condition. 323 + - Informative 324 + 325 + * - `rx[i]_mpwqe_filler_cqes` 326 + - The number of filler CQEs events that were issued on ring i. 327 + - Informative 328 + 329 + * - `rx[i]_mpwqe_filler_strides` 330 + - The number of strides consumed by filler CQEs on ring i. 331 + - Informative 332 + 333 + * - `tx[i]_mpwqe_blks` 334 + - The number of send blocks processed from Multi-Packet WQEs (mpwqe). 335 + - Informative 336 + 337 + * - `tx[i]_mpwqe_pkts` 338 + - The number of send packets processed from Multi-Packet WQEs (mpwqe). 339 + - Informative 340 + 341 + * - `rx[i]_cqe_compress_blks` 342 + - The number of receive blocks with CQE compression on ring i [#accel]_. 343 + - Acceleration 344 + 345 + * - `rx[i]_cqe_compress_pkts` 346 + - The number of receive packets with CQE compression on ring i [#accel]_. 347 + - Acceleration 348 + 349 + * - `rx[i]_cache_reuse` 350 + - The number of events of successful reuse of a page from a driver's 351 + internal page cache. 352 + - Acceleration 353 + 354 + * - `rx[i]_cache_full` 355 + - The number of events of full internal page cache where driver can't put a 356 + page back to the cache for recycling (page will be freed). 357 + - Acceleration 358 + 359 + * - `rx[i]_cache_empty` 360 + - The number of events where cache was empty - no page to give. Driver 361 + shall allocate new page. 362 + - Acceleration 363 + 364 + * - `rx[i]_cache_busy` 365 + - The number of events where cache head was busy and cannot be recycled. 366 + Driver allocated new page. 367 + - Acceleration 368 + 369 + * - `rx[i]_cache_waive` 370 + - The number of cache evacuation. This can occur due to page move to 371 + another NUMA node or page was pfmemalloc-ed and should be freed as soon 372 + as possible. 373 + - Acceleration 374 + 375 + * - `rx[i]_arfs_err` 376 + - Number of flow rules that failed to be added to the flow table. 377 + - Error 378 + 379 + * - `rx[i]_recover` 380 + - The number of times the RQ was recovered. 381 + - Error 382 + 383 + * - `tx[i]_xmit_more` 384 + - The number of packets sent with `xmit_more` indication set on the skbuff 385 + (no doorbell). 386 + - Acceleration 387 + 388 + * - `ch[i]_poll` 389 + - The number of invocations of NAPI poll of channel i. 390 + - Informative 391 + 392 + * - `ch[i]_arm` 393 + - The number of times the NAPI poll function completed and armed the 394 + completion queues on channel i. 395 + - Informative 396 + 397 + * - `ch[i]_aff_change` 398 + - The number of times the NAPI poll function explicitly stopped execution 399 + on a CPU due to a change in affinity, on channel i. 400 + - Informative 401 + 402 + * - `ch[i]_events` 403 + - The number of hard interrupt events on the completion queues of channel i. 404 + - Informative 405 + 406 + * - `ch[i]_eq_rearm` 407 + - The number of times the EQ was recovered. 408 + - Error 409 + 410 + * - `ch[i]_force_irq` 411 + - Number of times NAPI is triggered by XSK wakeups by posting a NOP to 412 + ICOSQ. 413 + - Acceleration 414 + 415 + * - `rx[i]_congst_umr` 416 + - The number of times an outstanding UMR request is delayed due to 417 + congestion, on ring i. 418 + - Informative 419 + 420 + * - `rx_pp_alloc_fast` 421 + - Number of successful fast path allocations. 422 + - Informative 423 + 424 + * - `rx_pp_alloc_slow` 425 + - Number of slow path order-0 allocations. 426 + - Informative 427 + 428 + * - `rx_pp_alloc_slow_high_order` 429 + - Number of slow path high order allocations. 430 + - Informative 431 + 432 + * - `rx_pp_alloc_empty` 433 + - Counter is incremented when ptr ring is empty, so a slow path allocation 434 + was forced. 435 + - Informative 436 + 437 + * - `rx_pp_alloc_refill` 438 + - Counter is incremented when an allocation which triggered a refill of the 439 + cache. 440 + - Informative 441 + 442 + * - `rx_pp_alloc_waive` 443 + - Counter is incremented when pages obtained from the ptr ring that cannot 444 + be added to the cache due to a NUMA mismatch. 445 + - Informative 446 + 447 + * - `rx_pp_recycle_cached` 448 + - Counter is incremented when recycling placed page in the page pool cache. 449 + - Informative 450 + 451 + * - `rx_pp_recycle_cache_full` 452 + - Counter is incremented when page pool cache was full. 453 + - Informative 454 + 455 + * - `rx_pp_recycle_ring` 456 + - Counter is incremented when page placed into the ptr ring. 457 + - Informative 458 + 459 + * - `rx_pp_recycle_ring_full` 460 + - Counter is incremented when page released from page pool because the ptr 461 + ring was full. 462 + - Informative 463 + 464 + * - `rx_pp_recycle_released_ref` 465 + - Counter is incremented when page released (and not recycled) because 466 + refcnt > 1. 467 + - Informative 468 + 469 + * - `rx[i]_xsk_buff_alloc_err` 470 + - The number of times allocating an skb or XSK buffer failed in the XSK RQ 471 + context. 472 + - Error 473 + 474 + * - `rx[i]_xsk_arfs_err` 475 + - aRFS (accelerated Receive Flow Steering) does not occur in the XSK RQ 476 + context, so this counter should never increment. 477 + - Error 478 + 479 + * - `rx[i]_xdp_tx_xmit` 480 + - The number of packets forwarded back to the port due to XDP program 481 + `XDP_TX` action (bouncing). these packets are not counted by other 482 + software counters. These packets are counted by physical port and vPort 483 + counters. 484 + - Informative 485 + 486 + * - `rx[i]_xdp_tx_mpwqe` 487 + - Number of multi-packet WQEs transmitted by the netdev and `XDP_TX`-ed by 488 + the netdev during the RQ context. 489 + - Acceleration 490 + 491 + * - `rx[i]_xdp_tx_inlnw` 492 + - Number of WQE data segments transmitted where the data could be inlined 493 + in the WQE and then `XDP_TX`-ed during the RQ context. 494 + - Acceleration 495 + 496 + * - `rx[i]_xdp_tx_nops` 497 + - Number of NOP WQEBBs (WQE building blocks) received posted to the XDP SQ. 498 + - Acceleration 499 + 500 + * - `rx[i]_xdp_tx_full` 501 + - The number of packets that should have been forwarded back to the port 502 + due to `XDP_TX` action but were dropped due to full tx queue. These packets 503 + are not counted by other software counters. These packets are counted by 504 + physical port and vPort counters. You may open more rx queues and spread 505 + traffic rx over all queues and/or increase rx ring size. 506 + - Error 507 + 508 + * - `rx[i]_xdp_tx_err` 509 + - The number of times an `XDP_TX` error such as frame too long and frame 510 + too short occurred on `XDP_TX` ring of RX ring. 511 + - Error 512 + 513 + * - `rx[i]_xdp_tx_cqes` / `rx_xdp_tx_cqe` [#ring_global]_ 514 + - The number of completions received on the CQ of the `XDP_TX` ring. 515 + - Informative 516 + 517 + * - `rx[i]_xdp_drop` 518 + - The number of packets dropped due to XDP program `XDP_DROP` action. these 519 + packets are not counted by other software counters. These packets are 520 + counted by physical port and vPort counters. 521 + - Informative 522 + 523 + * - `rx[i]_xdp_redirect` 524 + - The number of times an XDP redirect action was triggered on ring i. 525 + - Acceleration 526 + 527 + * - `tx[i]_xdp_xmit` 528 + - The number of packets redirected to the interface(due to XDP redirect). 529 + These packets are not counted by other software counters. These packets 530 + are counted by physical port and vPort counters. 531 + - Informative 532 + 533 + * - `tx[i]_xdp_full` 534 + - The number of packets redirected to the interface(due to XDP redirect), 535 + but were dropped due to full tx queue. these packets are not counted by 536 + other software counters. you may enlarge tx queues. 537 + - Informative 538 + 539 + * - `tx[i]_xdp_mpwqe` 540 + - Number of multi-packet WQEs offloaded onto the NIC that were 541 + `XDP_REDIRECT`-ed from other netdevs. 542 + - Acceleration 543 + 544 + * - `tx[i]_xdp_inlnw` 545 + - Number of WQE data segments where the data could be inlined in the WQE 546 + where the data segments were `XDP_REDIRECT`-ed from other netdevs. 547 + - Acceleration 548 + 549 + * - `tx[i]_xdp_nops` 550 + - Number of NOP WQEBBs (WQE building blocks) posted to the SQ that were 551 + `XDP_REDIRECT`-ed from other netdevs. 552 + - Acceleration 553 + 554 + * - `tx[i]_xdp_err` 555 + - The number of packets redirected to the interface(due to XDP redirect) 556 + but were dropped due to error such as frame too long and frame too short. 557 + - Error 558 + 559 + * - `tx[i]_xdp_cqes` 560 + - The number of completions received for packets redirected to the 561 + interface(due to XDP redirect) on the CQ. 562 + - Informative 563 + 564 + * - `tx[i]_xsk_xmit` 565 + - The number of packets transmitted using XSK zerocopy functionality. 566 + - Acceleration 567 + 568 + * - `tx[i]_xsk_mpwqe` 569 + - Number of multi-packet WQEs offloaded onto the NIC that were 570 + `XDP_REDIRECT`-ed from other netdevs. 571 + - Acceleration 572 + 573 + * - `tx[i]_xsk_inlnw` 574 + - Number of WQE data segments where the data could be inlined in the WQE 575 + that are transmitted using XSK zerocopy. 576 + - Acceleration 577 + 578 + * - `tx[i]_xsk_full` 579 + - Number of times doorbell is rung in XSK zerocopy mode when SQ is full. 580 + - Error 581 + 582 + * - `tx[i]_xsk_err` 583 + - Number of errors that occurred in XSK zerocopy mode such as if the data 584 + size is larger than the MTU size. 585 + - Error 586 + 587 + * - `tx[i]_xsk_cqes` 588 + - Number of CQEs processed in XSK zerocopy mode. 589 + - Acceleration 590 + 591 + * - `tx_tls_ctx` 592 + - Number of TLS TX HW offload contexts added to device for encryption. 593 + - Acceleration 594 + 595 + * - `tx_tls_del` 596 + - Number of TLS TX HW offload contexts removed from device (connection 597 + closed). 598 + - Acceleration 599 + 600 + * - `tx_tls_pool_alloc` 601 + - Number of times a unit of work is successfully allocated in the TLS HW 602 + offload pool. 603 + - Acceleration 604 + 605 + * - `tx_tls_pool_free` 606 + - Number of times a unit of work is freed in the TLS HW offload pool. 607 + - Acceleration 608 + 609 + * - `rx_tls_ctx` 610 + - Number of TLS RX HW offload contexts added to device for decryption. 611 + - Acceleration 612 + 613 + * - `rx_tls_del` 614 + - Number of TLS RX HW offload contexts deleted from device (connection has 615 + finished). 616 + - Acceleration 617 + 618 + * - `rx[i]_tls_decrypted_packets` 619 + - Number of successfully decrypted RX packets which were part of a TLS 620 + stream. 621 + - Acceleration 622 + 623 + * - `rx[i]_tls_decrypted_bytes` 624 + - Number of TLS payload bytes in RX packets which were successfully 625 + decrypted. 626 + - Acceleration 627 + 628 + * - `rx[i]_tls_resync_req_pkt` 629 + - Number of received TLS packets with a resync request. 630 + - Acceleration 631 + 632 + * - `rx[i]_tls_resync_req_start` 633 + - Number of times the TLS async resync request was started. 634 + - Acceleration 635 + 636 + * - `rx[i]_tls_resync_req_end` 637 + - Number of times the TLS async resync request properly ended with 638 + providing the HW tracked tcp-seq. 639 + - Acceleration 640 + 641 + * - `rx[i]_tls_resync_req_skip` 642 + - Number of times the TLS async resync request procedure was started but 643 + not properly ended. 644 + - Error 645 + 646 + * - `rx[i]_tls_resync_res_ok` 647 + - Number of times the TLS resync response call to the driver was 648 + successfully handled. 649 + - Acceleration 650 + 651 + * - `rx[i]_tls_resync_res_retry` 652 + - Number of times the TLS resync response call to the driver was 653 + reattempted when ICOSQ is full. 654 + - Error 655 + 656 + * - `rx[i]_tls_resync_res_skip` 657 + - Number of times the TLS resync response call to the driver was terminated 658 + unsuccessfully. 659 + - Error 660 + 661 + * - `rx[i]_tls_err` 662 + - Number of times when CQE TLS offload was problematic. 663 + - Error 664 + 665 + * - `tx[i]_tls_encrypted_packets` 666 + - The number of send packets that are TLS encrypted by the kernel. 667 + - Acceleration 668 + 669 + * - `tx[i]_tls_encrypted_bytes` 670 + - The number of send bytes that are TLS encrypted by the kernel. 671 + - Acceleration 672 + 673 + * - `tx[i]_tls_ooo` 674 + - Number of times out of order TLS SQE fragments were handled on ring i. 675 + - Acceleration 676 + 677 + * - `tx[i]_tls_dump_packets` 678 + - Number of TLS decrypted packets copied over from NIC over DMA. 679 + - Acceleration 680 + 681 + * - `tx[i]_tls_dump_bytes` 682 + - Number of TLS decrypted bytes copied over from NIC over DMA. 683 + - Acceleration 684 + 685 + * - `tx[i]_tls_resync_bytes` 686 + - Number of TLS bytes requested to be resynchronized in order to be 687 + decrypted. 688 + - Acceleration 689 + 690 + * - `tx[i]_tls_skip_no_sync_data` 691 + - Number of TLS send data that can safely be skipped / do not need to be 692 + decrypted. 693 + - Acceleration 694 + 695 + * - `tx[i]_tls_drop_no_sync_data` 696 + - Number of TLS send data that were dropped due to retransmission of TLS 697 + data. 698 + - Acceleration 699 + 700 + * - `ptp_cq[i]_abort` 701 + - Number of times a CQE has to be skipped in precision time protocol due to 702 + a skew between the port timestamp and CQE timestamp being greater than 703 + 128 seconds. 704 + - Error 705 + 706 + * - `ptp_cq[i]_abort_abs_diff_ns` 707 + - Accumulation of time differences between the port timestamp and CQE 708 + timestamp when the difference is greater than 128 seconds in precision 709 + time protocol. 710 + - Error 711 + 712 + .. [#ring_global] The corresponding ring and global counters do not share the 713 + same name (i.e. do not follow the common naming scheme). 714 + 715 + vPort Counters 716 + -------------- 717 + Counters on the NIC port that is connected to a eSwitch. 718 + 719 + .. flat-table:: vPort Counter Table 720 + :widths: 2 3 1 721 + 722 + * - Counter 723 + - Description 724 + - Type 725 + 726 + * - `rx_vport_unicast_packets` 727 + - Unicast packets received, steered to a port including Raw Ethernet 728 + QP/DPDK traffic, excluding RDMA traffic. 729 + - Informative 730 + 731 + * - `rx_vport_unicast_bytes` 732 + - Unicast bytes received, steered to a port including Raw Ethernet QP/DPDK 733 + traffic, excluding RDMA traffic. 734 + - Informative 735 + 736 + * - `tx_vport_unicast_packets` 737 + - Unicast packets transmitted, steered from a port including Raw Ethernet 738 + QP/DPDK traffic, excluding RDMA traffic. 739 + - Informative 740 + 741 + * - `tx_vport_unicast_bytes` 742 + - Unicast bytes transmitted, steered from a port including Raw Ethernet 743 + QP/DPDK traffic, excluding RDMA traffic. 744 + - Informative 745 + 746 + * - `rx_vport_multicast_packets` 747 + - Multicast packets received, steered to a port including Raw Ethernet 748 + QP/DPDK traffic, excluding RDMA traffic. 749 + - Informative 750 + 751 + * - `rx_vport_multicast_bytes` 752 + - Multicast bytes received, steered to a port including Raw Ethernet 753 + QP/DPDK traffic, excluding RDMA traffic. 754 + - Informative 755 + 756 + * - `tx_vport_multicast_packets` 757 + - Multicast packets transmitted, steered from a port including Raw Ethernet 758 + QP/DPDK traffic, excluding RDMA traffic. 759 + - Informative 760 + 761 + * - `tx_vport_multicast_bytes` 762 + - Multicast bytes transmitted, steered from a port including Raw Ethernet 763 + QP/DPDK traffic, excluding RDMA traffic. 764 + - Informative 765 + 766 + * - `rx_vport_broadcast_packets` 767 + - Broadcast packets received, steered to a port including Raw Ethernet 768 + QP/DPDK traffic, excluding RDMA traffic. 769 + - Informative 770 + 771 + * - `rx_vport_broadcast_bytes` 772 + - Broadcast bytes received, steered to a port including Raw Ethernet 773 + QP/DPDK traffic, excluding RDMA traffic. 774 + - Informative 775 + 776 + * - `tx_vport_broadcast_packets` 777 + - Broadcast packets transmitted, steered from a port including Raw Ethernet 778 + QP/DPDK traffic, excluding RDMA traffic. 779 + - Informative 780 + 781 + * - `tx_vport_broadcast_bytes` 782 + - Broadcast bytes transmitted, steered from a port including Raw Ethernet 783 + QP/DPDK traffic, excluding RDMA traffic. 784 + - Informative 785 + 786 + * - `rx_vport_rdma_unicast_packets` 787 + - RDMA unicast packets received, steered to a port (counters counts 788 + RoCE/UD/RC traffic) [#accel]_. 789 + - Acceleration 790 + 791 + * - `rx_vport_rdma_unicast_bytes` 792 + - RDMA unicast bytes received, steered to a port (counters counts 793 + RoCE/UD/RC traffic) [#accel]_. 794 + - Acceleration 795 + 796 + * - `tx_vport_rdma_unicast_packets` 797 + - RDMA unicast packets transmitted, steered from a port (counters counts 798 + RoCE/UD/RC traffic) [#accel]_. 799 + - Acceleration 800 + 801 + * - `tx_vport_rdma_unicast_bytes` 802 + - RDMA unicast bytes transmitted, steered from a port (counters counts 803 + RoCE/UD/RC traffic) [#accel]_. 804 + - Acceleration 805 + 806 + * - `rx_vport_rdma_multicast_packets` 807 + - RDMA multicast packets received, steered to a port (counters counts 808 + RoCE/UD/RC traffic) [#accel]_. 809 + - Acceleration 810 + 811 + * - `rx_vport_rdma_multicast_bytes` 812 + - RDMA multicast bytes received, steered to a port (counters counts 813 + RoCE/UD/RC traffic) [#accel]_. 814 + - Acceleration 815 + 816 + * - `tx_vport_rdma_multicast_packets` 817 + - RDMA multicast packets transmitted, steered from a port (counters counts 818 + RoCE/UD/RC traffic) [#accel]_. 819 + - Acceleration 820 + 821 + * - `tx_vport_rdma_multicast_bytes` 822 + - RDMA multicast bytes transmitted, steered from a port (counters counts 823 + RoCE/UD/RC traffic) [#accel]_. 824 + - Acceleration 825 + 826 + * - `rx_steer_missed_packets` 827 + - Number of packets that was received by the NIC, however was discarded 828 + because it did not match any flow in the NIC flow table. 829 + - Error 830 + 831 + * - `rx_packets` 832 + - Representor only: packets received, that were handled by the hypervisor. 833 + - Informative 834 + 835 + * - `rx_bytes` 836 + - Representor only: bytes received, that were handled by the hypervisor. 837 + - Informative 838 + 839 + * - `tx_packets` 840 + - Representor only: packets transmitted, that were handled by the 841 + hypervisor. 842 + - Informative 843 + 844 + * - `tx_bytes` 845 + - Representor only: bytes transmitted, that were handled by the hypervisor. 846 + - Informative 847 + 848 + * - `dev_internal_queue_oob` 849 + - The number of dropped packets due to lack of receive WQEs for an internal 850 + device RQ. 851 + - Error 852 + 853 + Physical Port Counters 854 + ---------------------- 855 + The physical port counters are the counters on the external port connecting the 856 + adapter to the network. This measuring point holds information on standardized 857 + counters like IEEE 802.3, RFC2863, RFC 2819, RFC 3635 and additional counters 858 + like flow control, FEC and more. 859 + 860 + .. flat-table:: Physical Port Counter Table 861 + :widths: 2 3 1 862 + 863 + * - Counter 864 + - Description 865 + - Type 866 + 867 + * - `rx_packets_phy` 868 + - The number of packets received on the physical port. This counter doesn’t 869 + include packets that were discarded due to FCS, frame size and similar 870 + errors. 871 + - Informative 872 + 873 + * - `tx_packets_phy` 874 + - The number of packets transmitted on the physical port. 875 + - Informative 876 + 877 + * - `rx_bytes_phy` 878 + - The number of bytes received on the physical port, including Ethernet 879 + header and FCS. 880 + - Informative 881 + 882 + * - `tx_bytes_phy` 883 + - The number of bytes transmitted on the physical port. 884 + - Informative 885 + 886 + * - `rx_multicast_phy` 887 + - The number of multicast packets received on the physical port. 888 + - Informative 889 + 890 + * - `tx_multicast_phy` 891 + - The number of multicast packets transmitted on the physical port. 892 + - Informative 893 + 894 + * - `rx_broadcast_phy` 895 + - The number of broadcast packets received on the physical port. 896 + - Informative 897 + 898 + * - `tx_broadcast_phy` 899 + - The number of broadcast packets transmitted on the physical port. 900 + - Informative 901 + 902 + * - `rx_crc_errors_phy` 903 + - The number of dropped received packets due to FCS (Frame Check Sequence) 904 + error on the physical port. If this counter is increased in high rate, 905 + check the link quality using `rx_symbol_error_phy` and 906 + `rx_corrected_bits_phy` counters below. 907 + - Error 908 + 909 + * - `rx_in_range_len_errors_phy` 910 + - The number of received packets dropped due to length/type errors on a 911 + physical port. 912 + - Error 913 + 914 + * - `rx_out_of_range_len_phy` 915 + - The number of received packets dropped due to length greater than allowed 916 + on a physical port. If this counter is increasing, it implies that the 917 + peer connected to the adapter has a larger MTU configured. Using same MTU 918 + configuration shall resolve this issue. 919 + - Error 920 + 921 + * - `rx_oversize_pkts_phy` 922 + - The number of dropped received packets due to length which exceed MTU 923 + size on a physical port. If this counter is increasing, it implies that 924 + the peer connected to the adapter has a larger MTU configured. Using same 925 + MTU configuration shall resolve this issue. 926 + - Error 927 + 928 + * - `rx_symbol_err_phy` 929 + - The number of received packets dropped due to physical coding errors 930 + (symbol errors) on a physical port. 931 + - Error 932 + 933 + * - `rx_mac_control_phy` 934 + - The number of MAC control packets received on the physical port. 935 + - Informative 936 + 937 + * - `tx_mac_control_phy` 938 + - The number of MAC control packets transmitted on the physical port. 939 + - Informative 940 + 941 + * - `rx_pause_ctrl_phy` 942 + - The number of link layer pause packets received on a physical port. If 943 + this counter is increasing, it implies that the network is congested and 944 + cannot absorb the traffic coming from to the adapter. 945 + - Informative 946 + 947 + * - `tx_pause_ctrl_phy` 948 + - The number of link layer pause packets transmitted on a physical port. If 949 + this counter is increasing, it implies that the NIC is congested and 950 + cannot absorb the traffic coming from the network. 951 + - Informative 952 + 953 + * - `rx_unsupported_op_phy` 954 + - The number of MAC control packets received with unsupported opcode on a 955 + physical port. 956 + - Error 957 + 958 + * - `rx_discards_phy` 959 + - The number of received packets dropped due to lack of buffers on a 960 + physical port. If this counter is increasing, it implies that the adapter 961 + is congested and cannot absorb the traffic coming from the network. 962 + - Error 963 + 964 + * - `tx_discards_phy` 965 + - The number of packets which were discarded on transmission, even no 966 + errors were detected. the drop might occur due to link in down state, 967 + head of line drop, pause from the network, etc. 968 + - Error 969 + 970 + * - `tx_errors_phy` 971 + - The number of transmitted packets dropped due to a length which exceed 972 + MTU size on a physical port. 973 + - Error 974 + 975 + * - `rx_undersize_pkts_phy` 976 + - The number of received packets dropped due to length which is shorter 977 + than 64 bytes on a physical port. If this counter is increasing, it 978 + implies that the peer connected to the adapter has a non-standard MTU 979 + configured or malformed packet had arrived. 980 + - Error 981 + 982 + * - `rx_fragments_phy` 983 + - The number of received packets dropped due to a length which is shorter 984 + than 64 bytes and has FCS error on a physical port. If this counter is 985 + increasing, it implies that the peer connected to the adapter has a 986 + non-standard MTU configured. 987 + - Error 988 + 989 + * - `rx_jabbers_phy` 990 + - The number of received packets d due to a length which is longer than 64 991 + bytes and had FCS error on a physical port. 992 + - Error 993 + 994 + * - `rx_64_bytes_phy` 995 + - The number of packets received on the physical port with size of 64 bytes. 996 + - Informative 997 + 998 + * - `rx_65_to_127_bytes_phy` 999 + - The number of packets received on the physical port with size of 65 to 1000 + 127 bytes. 1001 + - Informative 1002 + 1003 + * - `rx_128_to_255_bytes_phy` 1004 + - The number of packets received on the physical port with size of 128 to 1005 + 255 bytes. 1006 + - Informative 1007 + 1008 + * - `rx_256_to_511_bytes_phy` 1009 + - The number of packets received on the physical port with size of 256 to 1010 + 512 bytes. 1011 + - Informative 1012 + 1013 + * - `rx_512_to_1023_bytes_phy` 1014 + - The number of packets received on the physical port with size of 512 to 1015 + 1023 bytes. 1016 + - Informative 1017 + 1018 + * - `rx_1024_to_1518_bytes_phy` 1019 + - The number of packets received on the physical port with size of 1024 to 1020 + 1518 bytes. 1021 + - Informative 1022 + 1023 + * - `rx_1519_to_2047_bytes_phy` 1024 + - The number of packets received on the physical port with size of 1519 to 1025 + 2047 bytes. 1026 + - Informative 1027 + 1028 + * - `rx_2048_to_4095_bytes_phy` 1029 + - The number of packets received on the physical port with size of 2048 to 1030 + 4095 bytes. 1031 + - Informative 1032 + 1033 + * - `rx_4096_to_8191_bytes_phy` 1034 + - The number of packets received on the physical port with size of 4096 to 1035 + 8191 bytes. 1036 + - Informative 1037 + 1038 + * - `rx_8192_to_10239_bytes_phy` 1039 + - The number of packets received on the physical port with size of 8192 to 1040 + 10239 bytes. 1041 + - Informative 1042 + 1043 + * - `link_down_events_phy` 1044 + - The number of times where the link operative state changed to down. In 1045 + case this counter is increasing it may imply on port flapping. You may 1046 + need to replace the cable/transceiver. 1047 + - Error 1048 + 1049 + * - `rx_out_of_buffer` 1050 + - Number of times receive queue had no software buffers allocated for the 1051 + adapter's incoming traffic. 1052 + - Error 1053 + 1054 + * - `module_bus_stuck` 1055 + - The number of times that module's I\ :sup:`2`\C bus (data or clock) 1056 + short-wire was detected. You may need to replace the cable/transceiver. 1057 + - Error 1058 + 1059 + * - `module_high_temp` 1060 + - The number of times that the module temperature was too high. If this 1061 + issue persist, you may need to check the ambient temperature or replace 1062 + the cable/transceiver module. 1063 + - Error 1064 + 1065 + * - `module_bad_shorted` 1066 + - The number of times that the module cables were shorted. You may need to 1067 + replace the cable/transceiver module. 1068 + - Error 1069 + 1070 + * - `module_unplug` 1071 + - The number of times that module was ejected. 1072 + - Informative 1073 + 1074 + * - `rx_buffer_passed_thres_phy` 1075 + - The number of events where the port receive buffer was over 85% full. 1076 + - Informative 1077 + 1078 + * - `tx_pause_storm_warning_events` 1079 + - The number of times the device was sending pauses for a long period of 1080 + time. 1081 + - Informative 1082 + 1083 + * - `tx_pause_storm_error_events` 1084 + - The number of times the device was sending pauses for a long period of 1085 + time, reaching time out and disabling transmission of pause frames. on 1086 + the period where pause frames were disabled, drop could have been 1087 + occurred. 1088 + - Error 1089 + 1090 + * - `rx[i]_buff_alloc_err` 1091 + - Failed to allocate a buffer to received packet (or SKB) on ring i. 1092 + - Error 1093 + 1094 + * - `rx_bits_phy` 1095 + - This counter provides information on the total amount of traffic that 1096 + could have been received and can be used as a guideline to measure the 1097 + ratio of errored traffic in `rx_pcs_symbol_err_phy` and 1098 + `rx_corrected_bits_phy`. 1099 + - Informative 1100 + 1101 + * - `rx_pcs_symbol_err_phy` 1102 + - This counter counts the number of symbol errors that wasn’t corrected by 1103 + FEC correction algorithm or that FEC algorithm was not active on this 1104 + interface. If this counter is increasing, it implies that the link 1105 + between the NIC and the network is suffering from high BER, and that 1106 + traffic is lost. You may need to replace the cable/transceiver. The error 1107 + rate is the number of `rx_pcs_symbol_err_phy` divided by the number of 1108 + `rx_bits_phy` on a specific time frame. 1109 + - Error 1110 + 1111 + * - `rx_corrected_bits_phy` 1112 + - The number of corrected bits on this port according to active FEC 1113 + (RS/FC). If this counter is increasing, it implies that the link between 1114 + the NIC and the network is suffering from high BER. The corrected bit 1115 + rate is the number of `rx_corrected_bits_phy` divided by the number of 1116 + `rx_bits_phy` on a specific time frame. 1117 + - Error 1118 + 1119 + * - `rx_err_lane_[l]_phy` 1120 + - This counter counts the number of physical raw errors per lane l index. 1121 + The counter counts errors before FEC corrections. If this counter is 1122 + increasing, it implies that the link between the NIC and the network is 1123 + suffering from high BER, and that traffic might be lost. You may need to 1124 + replace the cable/transceiver. Please check in accordance with 1125 + `rx_corrected_bits_phy`. 1126 + - Error 1127 + 1128 + * - `rx_global_pause` 1129 + - The number of pause packets received on the physical port. If this 1130 + counter is increasing, it implies that the network is congested and 1131 + cannot absorb the traffic coming from the adapter. Note: This counter is 1132 + only enabled when global pause mode is enabled. 1133 + - Informative 1134 + 1135 + * - `rx_global_pause_duration` 1136 + - The duration of pause received (in microSec) on the physical port. The 1137 + counter represents the time the port did not send any traffic. If this 1138 + counter is increasing, it implies that the network is congested and 1139 + cannot absorb the traffic coming from the adapter. Note: This counter is 1140 + only enabled when global pause mode is enabled. 1141 + - Informative 1142 + 1143 + * - `tx_global_pause` 1144 + - The number of pause packets transmitted on a physical port. If this 1145 + counter is increasing, it implies that the adapter is congested and 1146 + cannot absorb the traffic coming from the network. Note: This counter is 1147 + only enabled when global pause mode is enabled. 1148 + - Informative 1149 + 1150 + * - `tx_global_pause_duration` 1151 + - The duration of pause transmitter (in microSec) on the physical port. 1152 + Note: This counter is only enabled when global pause mode is enabled. 1153 + - Informative 1154 + 1155 + * - `rx_global_pause_transition` 1156 + - The number of times a transition from Xoff to Xon on the physical port 1157 + has occurred. Note: This counter is only enabled when global pause mode 1158 + is enabled. 1159 + - Informative 1160 + 1161 + * - `rx_if_down_packets` 1162 + - The number of received packets that were dropped due to interface down. 1163 + - Informative 1164 + 1165 + Priority Port Counters 1166 + ---------------------- 1167 + The following counters are physical port counters that are counted per L2 1168 + priority (0-7). 1169 + 1170 + **Note:** `p` in the counter name represents the priority. 1171 + 1172 + .. flat-table:: Priority Port Counter Table 1173 + :widths: 2 3 1 1174 + 1175 + * - Counter 1176 + - Description 1177 + - Type 1178 + 1179 + * - `rx_prio[p]_bytes` 1180 + - The number of bytes received with priority p on the physical port. 1181 + - Informative 1182 + 1183 + * - `rx_prio[p]_packets` 1184 + - The number of packets received with priority p on the physical port. 1185 + - Informative 1186 + 1187 + * - `tx_prio[p]_bytes` 1188 + - The number of bytes transmitted on priority p on the physical port. 1189 + - Informative 1190 + 1191 + * - `tx_prio[p]_packets` 1192 + - The number of packets transmitted on priority p on the physical port. 1193 + - Informative 1194 + 1195 + * - `rx_prio[p]_pause` 1196 + - The number of pause packets received with priority p on a physical port. 1197 + If this counter is increasing, it implies that the network is congested 1198 + and cannot absorb the traffic coming from the adapter. Note: This counter 1199 + is available only if PFC was enabled on priority p. 1200 + - Informative 1201 + 1202 + * - `rx_prio[p]_pause_duration` 1203 + - The duration of pause received (in microSec) on priority p on the 1204 + physical port. The counter represents the time the port did not send any 1205 + traffic on this priority. If this counter is increasing, it implies that 1206 + the network is congested and cannot absorb the traffic coming from the 1207 + adapter. Note: This counter is available only if PFC was enabled on 1208 + priority p. 1209 + - Informative 1210 + 1211 + * - `rx_prio[p]_pause_transition` 1212 + - The number of times a transition from Xoff to Xon on priority p on the 1213 + physical port has occurred. Note: This counter is available only if PFC 1214 + was enabled on priority p. 1215 + - Informative 1216 + 1217 + * - `tx_prio[p]_pause` 1218 + - The number of pause packets transmitted on priority p on a physical port. 1219 + If this counter is increasing, it implies that the adapter is congested 1220 + and cannot absorb the traffic coming from the network. Note: This counter 1221 + is available only if PFC was enabled on priority p. 1222 + - Informative 1223 + 1224 + * - `tx_prio[p]_pause_duration` 1225 + - The duration of pause transmitter (in microSec) on priority p on the 1226 + physical port. Note: This counter is available only if PFC was enabled on 1227 + priority p. 1228 + - Informative 1229 + 1230 + * - `rx_prio[p]_buf_discard` 1231 + - The number of packets discarded by device due to lack of per host receive 1232 + buffers. 1233 + - Informative 1234 + 1235 + * - `rx_prio[p]_cong_discard` 1236 + - The number of packets discarded by device due to per host congestion. 1237 + - Informative 1238 + 1239 + * - `rx_prio[p]_marked` 1240 + - The number of packets ecn marked by device due to per host congestion. 1241 + - Informative 1242 + 1243 + * - `rx_prio[p]_discards` 1244 + - The number of packets discarded by device due to lack of receive buffers. 1245 + - Informative 1246 + 1247 + Device Counters 1248 + --------------- 1249 + .. flat-table:: Device Counter Table 1250 + :widths: 2 3 1 1251 + 1252 + * - Counter 1253 + - Description 1254 + - Type 1255 + 1256 + * - `rx_pci_signal_integrity` 1257 + - Counts physical layer PCIe signal integrity errors, the number of 1258 + transitions to recovery due to Framing errors and CRC (dlp and tlp). If 1259 + this counter is raising, try moving the adapter card to a different slot 1260 + to rule out a bad PCI slot. Validate that you are running with the latest 1261 + firmware available and latest server BIOS version. 1262 + - Error 1263 + 1264 + * - `tx_pci_signal_integrity` 1265 + - Counts physical layer PCIe signal integrity errors, the number of 1266 + transition to recovery initiated by the other side (moving to recovery 1267 + due to getting TS/EIEOS). If this counter is raising, try moving the 1268 + adapter card to a different slot to rule out a bad PCI slot. Validate 1269 + that you are running with the latest firmware available and latest server 1270 + BIOS version. 1271 + - Error 1272 + 1273 + * - `outbound_pci_buffer_overflow` 1274 + - The number of packets dropped due to pci buffer overflow. If this counter 1275 + is raising in high rate, it might indicate that the receive traffic rate 1276 + for a host is larger than the PCIe bus and therefore a congestion occurs. 1277 + - Informative 1278 + 1279 + * - `outbound_pci_stalled_rd` 1280 + - The percentage (in the range 0...100) of time within the last second that 1281 + the NIC had outbound non-posted reads requests but could not perform the 1282 + operation due to insufficient posted credits. 1283 + - Informative 1284 + 1285 + * - `outbound_pci_stalled_wr` 1286 + - The percentage (in the range 0...100) of time within the last second that 1287 + the NIC had outbound posted writes requests but could not perform the 1288 + operation due to insufficient posted credits. 1289 + - Informative 1290 + 1291 + * - `outbound_pci_stalled_rd_events` 1292 + - The number of seconds where `outbound_pci_stalled_rd` was above 30%. 1293 + - Informative 1294 + 1295 + * - `outbound_pci_stalled_wr_events` 1296 + - The number of seconds where `outbound_pci_stalled_wr` was above 30%. 1297 + - Informative 1298 + 1299 + * - `dev_out_of_buffer` 1300 + - The number of times the device owned queue had not enough buffers 1301 + allocated. 1302 + - Error
+224
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/devlink.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + .. include:: <isonum.txt> 3 + 4 + ======= 5 + Devlink 6 + ======= 7 + 8 + :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 + 10 + Contents 11 + ======== 12 + 13 + - `Info`_ 14 + - `Parameters`_ 15 + - `Health reporters`_ 16 + 17 + Info 18 + ==== 19 + 20 + The devlink info reports the running and stored firmware versions on device. 21 + It also prints the device PSID which represents the HCA board type ID. 22 + 23 + User command example:: 24 + 25 + $ devlink dev info pci/0000:00:06.0 26 + pci/0000:00:06.0: 27 + driver mlx5_core 28 + versions: 29 + fixed: 30 + fw.psid MT_0000000009 31 + running: 32 + fw.version 16.26.0100 33 + stored: 34 + fw.version 16.26.0100 35 + 36 + Parameters 37 + ========== 38 + 39 + flow_steering_mode: Device flow steering mode 40 + --------------------------------------------- 41 + The flow steering mode parameter controls the flow steering mode of the driver. 42 + Two modes are supported: 43 + 1. 'dmfs' - Device managed flow steering. 44 + 2. 'smfs' - Software/Driver managed flow steering. 45 + 46 + In DMFS mode, the HW steering entities are created and managed through the 47 + Firmware. 48 + In SMFS mode, the HW steering entities are created and managed though by 49 + the driver directly into hardware without firmware intervention. 50 + 51 + SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode. 52 + 53 + User command examples: 54 + 55 + - Set SMFS flow steering mode:: 56 + 57 + $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime 58 + 59 + - Read device flow steering mode:: 60 + 61 + $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode 62 + pci/0000:06:00.0: 63 + name flow_steering_mode type driver-specific 64 + values: 65 + cmode runtime value smfs 66 + 67 + enable_roce: RoCE enablement state 68 + ---------------------------------- 69 + If the device supports RoCE disablement, RoCE enablement state controls device 70 + support for RoCE capability. Otherwise, the control occurs in the driver stack. 71 + When RoCE is disabled at the driver level, only raw ethernet QPs are supported. 72 + 73 + To change RoCE enablement state, a user must change the driverinit cmode value 74 + and run devlink reload. 75 + 76 + User command examples: 77 + 78 + - Disable RoCE:: 79 + 80 + $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit 81 + $ devlink dev reload pci/0000:06:00.0 82 + 83 + - Read RoCE enablement state:: 84 + 85 + $ devlink dev param show pci/0000:06:00.0 name enable_roce 86 + pci/0000:06:00.0: 87 + name enable_roce type generic 88 + values: 89 + cmode driverinit value true 90 + 91 + esw_port_metadata: Eswitch port metadata state 92 + ---------------------------------------------- 93 + When applicable, disabling eswitch metadata can increase packet rate 94 + up to 20% depending on the use case and packet sizes. 95 + 96 + Eswitch port metadata state controls whether to internally tag packets with 97 + metadata. Metadata tagging must be enabled for multi-port RoCE, failover 98 + between representors and stacked devices. 99 + By default metadata is enabled on the supported devices in E-switch. 100 + Metadata is applicable only for E-switch in switchdev mode and 101 + users may disable it when NONE of the below use cases will be in use: 102 + 1. HCA is in Dual/multi-port RoCE mode. 103 + 2. VF/SF representor bonding (Usually used for Live migration) 104 + 3. Stacked devices 105 + 106 + When metadata is disabled, the above use cases will fail to initialize if 107 + users try to enable them. 108 + 109 + - Show eswitch port metadata:: 110 + 111 + $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata 112 + pci/0000:06:00.0: 113 + name esw_port_metadata type driver-specific 114 + values: 115 + cmode runtime value true 116 + 117 + - Disable eswitch port metadata:: 118 + 119 + $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime 120 + 121 + - Change eswitch mode to switchdev mode where after choosing the metadata value:: 122 + 123 + $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 124 + 125 + Health reporters 126 + ================ 127 + 128 + tx reporter 129 + ----------- 130 + The tx reporter is responsible for reporting and recovering of the following two error scenarios: 131 + 132 + - tx timeout 133 + Report on kernel tx timeout detection. 134 + Recover by searching lost interrupts. 135 + - tx error completion 136 + Report on error tx completion. 137 + Recover by flushing the tx queue and reset it. 138 + 139 + tx reporter also support on demand diagnose callback, on which it provides 140 + real time information of its send queues status. 141 + 142 + User commands examples: 143 + 144 + - Diagnose send queues status:: 145 + 146 + $ devlink health diagnose pci/0000:82:00.0 reporter tx 147 + 148 + NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 149 + 150 + - Show number of tx errors indicated, number of recover flows ended successfully, 151 + is autorecover enabled and graceful period from last recover:: 152 + 153 + $ devlink health show pci/0000:82:00.0 reporter tx 154 + 155 + rx reporter 156 + ----------- 157 + The rx reporter is responsible for reporting and recovering of the following two error scenarios: 158 + 159 + - rx queues' initialization (population) timeout 160 + Population of rx queues' descriptors on ring initialization is done 161 + in napi context via triggering an irq. In case of a failure to get 162 + the minimum amount of descriptors, a timeout would occur, and 163 + descriptors could be recovered by polling the EQ (Event Queue). 164 + - rx completions with errors (reported by HW on interrupt context) 165 + Report on rx completion error. 166 + Recover (if needed) by flushing the related queue and reset it. 167 + 168 + rx reporter also supports on demand diagnose callback, on which it 169 + provides real time information of its receive queues' status. 170 + 171 + - Diagnose rx queues' status and corresponding completion queue:: 172 + 173 + $ devlink health diagnose pci/0000:82:00.0 reporter rx 174 + 175 + NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output. 176 + 177 + - Show number of rx errors indicated, number of recover flows ended successfully, 178 + is autorecover enabled, and graceful period from last recover:: 179 + 180 + $ devlink health show pci/0000:82:00.0 reporter rx 181 + 182 + fw reporter 183 + ----------- 184 + The fw reporter implements `diagnose` and `dump` callbacks. 185 + It follows symptoms of fw error such as fw syndrome by triggering 186 + fw core dump and storing it into the dump buffer. 187 + The fw reporter diagnose command can be triggered any time by the user to check 188 + current fw status. 189 + 190 + User commands examples: 191 + 192 + - Check fw heath status:: 193 + 194 + $ devlink health diagnose pci/0000:82:00.0 reporter fw 195 + 196 + - Read FW core dump if already stored or trigger new one:: 197 + 198 + $ devlink health dump show pci/0000:82:00.0 reporter fw 199 + 200 + NOTE: This command can run only on the PF which has fw tracer ownership, 201 + running it on other PF or any VF will return "Operation not permitted". 202 + 203 + fw fatal reporter 204 + ----------------- 205 + The fw fatal reporter implements `dump` and `recover` callbacks. 206 + It follows fatal errors indications by CR-space dump and recover flow. 207 + The CR-space dump uses vsc interface which is valid even if the FW command 208 + interface is not functional, which is the case in most FW fatal errors. 209 + The recover function runs recover flow which reloads the driver and triggers fw 210 + reset if needed. 211 + On firmware error, the health buffer is dumped into the dmesg. The log 212 + level is derived from the error's severity (given in health buffer). 213 + 214 + User commands examples: 215 + 216 + - Run fw recover flow manually:: 217 + 218 + $ devlink health recover pci/0000:82:00.0 reporter fw_fatal 219 + 220 + - Read FW CR-space dump if already stored or trigger new one:: 221 + 222 + $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal 223 + 224 + NOTE: This command can run only on PF.
+26
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + .. include:: <isonum.txt> 3 + 4 + Mellanox ConnectX(R) mlx5 core VPI Network Driver 5 + ================================================= 6 + 7 + :Copyright: |copy| 2019, Mellanox Technologies LTD. 8 + :Copyright: |copy| 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 + 10 + Contents: 11 + 12 + .. toctree:: 13 + :maxdepth: 2 14 + 15 + kconfig 16 + devlink 17 + switchdev 18 + tracepoints 19 + counters 20 + 21 + .. only:: subproject and html 22 + 23 + Indices 24 + ======= 25 + 26 + * :ref:`genindex`
+168
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/kconfig.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + .. include:: <isonum.txt> 3 + 4 + ======================================= 5 + Enabling the driver and kconfig options 6 + ======================================= 7 + 8 + :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 + 10 + | mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out) 11 + | at build time via kernel Kconfig flags. 12 + | Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags 13 + | CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y. 14 + | For the list of advanced features, please see below. 15 + 16 + **CONFIG_MLX5_BRIDGE=(y/n)** 17 + 18 + | Enable :ref:`Ethernet Bridging (BRIDGE) offloading support <mlx5_bridge_offload>`. 19 + | This will provide the ability to add representors of mlx5 uplink and VF 20 + | ports to Bridge and offloading rules for traffic between such ports. 21 + | Supports VLANs (trunk and access modes). 22 + 23 + 24 + **CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko) 25 + 26 + | The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config. 27 + | This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib). 28 + 29 + 30 + **CONFIG_MLX5_CORE_EN=(y/n)** 31 + 32 + | Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads. 33 + | mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be 34 + | built-in into mlx5_core.ko. 35 + 36 + 37 + **CONFIG_MLX5_CORE_EN_DCB=(y/n)**: 38 + 39 + | Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_. 40 + 41 + 42 + **CONFIG_MLX5_CORE_IPOIB=(y/n)** 43 + 44 + | IPoIB offloads & acceleration support. 45 + | Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma 46 + | IPoIB ulp netdevice. 47 + 48 + 49 + **CONFIG_MLX5_CLS_ACT=(y/n)** 50 + 51 + | Enables offload support for TC classifier action (NET_CLS_ACT). 52 + | Works in both native NIC mode and Switchdev SRIOV mode. 53 + | Flow-based classifiers, such as those registered through 54 + | `tc-flower(8)`, are processed by the device, rather than the 55 + | host. Actions that would then overwrite matching classification 56 + | results would then be instant due to the offload. 57 + 58 + 59 + **CONFIG_MLX5_EN_ARFS=(y/n)** 60 + 61 + | Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering. 62 + | https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4 63 + 64 + 65 + **CONFIG_MLX5_EN_IPSEC=(y/n)** 66 + 67 + | Enables `IPSec XFRM cryptography-offload acceleration <https://support.mellanox.com/s/article/ConnectX-6DX-Bluefield-2-IPsec-HW-Full-Offload-Configuration-Guide>`_. 68 + 69 + 70 + **CONFIG_MLX5_EN_MACSEC=(y/n)** 71 + 72 + | Build support for MACsec cryptography-offload acceleration in the NIC. 73 + 74 + 75 + **CONFIG_MLX5_EN_RXNFC=(y/n)** 76 + 77 + | Enables ethtool receive network flow classification, which allows user defined 78 + | flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API. 79 + 80 + 81 + **CONFIG_MLX5_EN_TLS=(y/n)** 82 + 83 + | TLS cryptography-offload acceleration. 84 + 85 + 86 + **CONFIG_MLX5_ESWITCH=(y/n)** 87 + 88 + | Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering 89 + | and switching for the enabled VFs and PF in two available modes: 90 + | 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_. 91 + | 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_. 92 + 93 + 94 + **CONFIG_MLX5_FPGA=(y/n)** 95 + 96 + | Build support for the Innova family of network cards by Mellanox Technologies. 97 + | Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board. 98 + | If you select this option, the mlx5_core driver will include the Innova FPGA core and allow 99 + | building sandbox-specific client drivers. 100 + 101 + 102 + **CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko) 103 + 104 + | Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support. 105 + 106 + 107 + **CONFIG_MLX5_MPFS=(y/n)** 108 + 109 + | Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC. 110 + | MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing 111 + | user configured unicast MAC addresses to the requesting PF. 112 + 113 + 114 + **CONFIG_MLX5_SF=(y/n)** 115 + 116 + | Build support for subfunction. 117 + | Subfunctons are more light weight than PCI SRIOV VFs. Choosing this option 118 + | will enable support for creating subfunction devices. 119 + 120 + 121 + **CONFIG_MLX5_SF_MANAGER=(y/n)** 122 + 123 + | Build support for subfuction port in the NIC. A Mellanox subfunction 124 + | port is managed through devlink. A subfunction supports RDMA, netdevice 125 + | and vdpa device. It is similar to a SRIOV VF but it doesn't require 126 + | SRIOV support. 127 + 128 + 129 + **CONFIG_MLX5_SW_STEERING=(y/n)** 130 + 131 + | Build support for software-managed steering in the NIC. 132 + 133 + 134 + **CONFIG_MLX5_TC_CT=(y/n)** 135 + 136 + | Support offloading connection tracking rules via tc ct action. 137 + 138 + 139 + **CONFIG_MLX5_TC_SAMPLE=(y/n)** 140 + 141 + | Support offloading sample rules via tc sample action. 142 + 143 + 144 + **CONFIG_MLX5_VDPA=(y/n)** 145 + 146 + | Support library for Mellanox VDPA drivers. Provides code that is 147 + | common for all types of VDPA drivers. The following drivers are planned: 148 + | net, block. 149 + 150 + 151 + **CONFIG_MLX5_VDPA_NET=(y/n)** 152 + 153 + | VDPA network driver for ConnectX6 and newer. Provides offloading 154 + | of virtio net datapath such that descriptors put on the ring will 155 + | be executed by the hardware. It also supports a variety of stateless 156 + | offloads depending on the actual device used and firmware version. 157 + 158 + 159 + **CONFIG_MLX5_VFIO_PCI=(y/n)** 160 + 161 + | This provides migration support for MLX5 devices using the VFIO framework. 162 + 163 + 164 + **External options** ( Choose if the corresponding mlx5 feature is required ) 165 + 166 + - CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool). 167 + - CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled 168 + - CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
+239
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/switchdev.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + .. include:: <isonum.txt> 3 + 4 + ========= 5 + Switchdev 6 + ========= 7 + 8 + :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 + 10 + .. _mlx5_bridge_offload: 11 + 12 + Bridge offload 13 + ============== 14 + 15 + The mlx5 driver implements support for offloading bridge rules when in switchdev 16 + mode. Linux bridge FDBs are automatically offloaded when mlx5 switchdev 17 + representor is attached to bridge. 18 + 19 + - Change device to switchdev mode:: 20 + 21 + $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 22 + 23 + - Attach mlx5 switchdev representor 'enp8s0f0' to bridge netdev 'bridge1':: 24 + 25 + $ ip link set enp8s0f0 master bridge1 26 + 27 + VLANs 28 + ----- 29 + 30 + Following bridge VLAN functions are supported by mlx5: 31 + 32 + - VLAN filtering (including multiple VLANs per port):: 33 + 34 + $ ip link set bridge1 type bridge vlan_filtering 1 35 + $ bridge vlan add dev enp8s0f0 vid 2-3 36 + 37 + - VLAN push on bridge ingress:: 38 + 39 + $ bridge vlan add dev enp8s0f0 vid 3 pvid 40 + 41 + - VLAN pop on bridge egress:: 42 + 43 + $ bridge vlan add dev enp8s0f0 vid 3 untagged 44 + 45 + Subfunction 46 + =========== 47 + 48 + mlx5 supports subfunction management using devlink port (see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface. 49 + 50 + A subfunction has its own function capabilities and its own resources. This 51 + means a subfunction has its own dedicated queues (txq, rxq, cq, eq). These 52 + queues are neither shared nor stolen from the parent PCI function. 53 + 54 + When a subfunction is RDMA capable, it has its own QP1, GID table, and RDMA 55 + resources neither shared nor stolen from the parent PCI function. 56 + 57 + A subfunction has a dedicated window in PCI BAR space that is not shared 58 + with the other subfunctions or the parent PCI function. This ensures that all 59 + devices (netdev, rdma, vdpa, etc.) of the subfunction accesses only assigned 60 + PCI BAR space. 61 + 62 + A subfunction supports eswitch representation through which it supports tc 63 + offloads. The user configures eswitch to send/receive packets from/to 64 + the subfunction port. 65 + 66 + Subfunctions share PCI level resources such as PCI MSI-X IRQs with 67 + other subfunctions and/or with its parent PCI function. 68 + 69 + Example mlx5 software, system, and device view:: 70 + 71 + _______ 72 + | admin | 73 + | user |---------- 74 + |_______| | 75 + | | 76 + ____|____ __|______ _________________ 77 + | | | | | | 78 + | devlink | | tc tool | | user | 79 + | tool | |_________| | applications | 80 + |_________| | |_________________| 81 + | | | | 82 + | | | | Userspace 83 + +---------|-------------|-------------------|----------|--------------------+ 84 + | | +----------+ +----------+ Kernel 85 + | | | netdev | | rdma dev | 86 + | | +----------+ +----------+ 87 + (devlink port add/del | ^ ^ 88 + port function set) | | | 89 + | | +---------------| 90 + _____|___ | | _______|_______ 91 + | | | | | mlx5 class | 92 + | devlink | +------------+ | | drivers | 93 + | kernel | | rep netdev | | |(mlx5_core,ib) | 94 + |_________| +------------+ | |_______________| 95 + | | | ^ 96 + (devlink ops) | | (probe/remove) 97 + _________|________ | | ____|________ 98 + | subfunction | | +---------------+ | subfunction | 99 + | management driver|----- | subfunction |---| driver | 100 + | (mlx5_core) | | auxiliary dev | | (mlx5_core) | 101 + |__________________| +---------------+ |_____________| 102 + | ^ 103 + (sf add/del, vhca events) | 104 + | (device add/del) 105 + _____|____ ____|________ 106 + | | | subfunction | 107 + | PCI NIC |--- activate/deactivate events--->| host driver | 108 + |__________| | (mlx5_core) | 109 + |_____________| 110 + 111 + Subfunction is created using devlink port interface. 112 + 113 + - Change device to switchdev mode:: 114 + 115 + $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 116 + 117 + - Add a devlink port of subfunction flavour:: 118 + 119 + $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 120 + pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 121 + function: 122 + hw_addr 00:00:00:00:00:00 state inactive opstate detached 123 + 124 + - Show a devlink port of the subfunction:: 125 + 126 + $ devlink port show pci/0000:06:00.0/32768 127 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 128 + function: 129 + hw_addr 00:00:00:00:00:00 state inactive opstate detached 130 + 131 + - Delete a devlink port of subfunction after use:: 132 + 133 + $ devlink port del pci/0000:06:00.0/32768 134 + 135 + Function attributes 136 + =================== 137 + 138 + The mlx5 driver provides a mechanism to setup PCI VF/SF function attributes in 139 + a unified way for SmartNIC and non-SmartNIC. 140 + 141 + This is supported only when the eswitch mode is set to switchdev. Port function 142 + configuration of the PCI VF/SF is supported through devlink eswitch port. 143 + 144 + Port function attributes should be set before PCI VF/SF is enumerated by the 145 + driver. 146 + 147 + MAC address setup 148 + ----------------- 149 + 150 + mlx5 driver support devlink port function attr mechanism to setup MAC 151 + address. (refer to Documentation/networking/devlink/devlink-port.rst) 152 + 153 + RoCE capability setup 154 + ~~~~~~~~~~~~~~~~~~~~~ 155 + Not all mlx5 PCI devices/SFs require RoCE capability. 156 + 157 + When RoCE capability is disabled, it saves 1 Mbytes worth of system memory per 158 + PCI devices/SF. 159 + 160 + mlx5 driver support devlink port function attr mechanism to setup RoCE 161 + capability. (refer to Documentation/networking/devlink/devlink-port.rst) 162 + 163 + migratable capability setup 164 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 165 + User who wants mlx5 PCI VFs to be able to perform live migration need to 166 + explicitly enable the VF migratable capability. 167 + 168 + mlx5 driver support devlink port function attr mechanism to setup migratable 169 + capability. (refer to Documentation/networking/devlink/devlink-port.rst) 170 + 171 + SF state setup 172 + -------------- 173 + 174 + To use the SF, the user must activate the SF using the SF function state 175 + attribute. 176 + 177 + - Get the state of the SF identified by its unique devlink port index:: 178 + 179 + $ devlink port show ens2f0npf0sf88 180 + pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 181 + function: 182 + hw_addr 00:00:00:00:88:88 state inactive opstate detached 183 + 184 + - Activate the function and verify its state is active:: 185 + 186 + $ devlink port function set ens2f0npf0sf88 state active 187 + 188 + $ devlink port show ens2f0npf0sf88 189 + pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 190 + function: 191 + hw_addr 00:00:00:00:88:88 state active opstate detached 192 + 193 + Upon function activation, the PF driver instance gets the event from the device 194 + that a particular SF was activated. It's the cue to put the device on bus, probe 195 + it and instantiate the devlink instance and class specific auxiliary devices 196 + for it. 197 + 198 + - Show the auxiliary device and port of the subfunction:: 199 + 200 + $ devlink dev show 201 + devlink dev show auxiliary/mlx5_core.sf.4 202 + 203 + $ devlink port show auxiliary/mlx5_core.sf.4/1 204 + auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false 205 + 206 + $ rdma link show mlx5_0/1 207 + link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88 208 + 209 + $ rdma dev show 210 + 8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112 211 + 13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112 212 + 213 + - Subfunction auxiliary device and class device hierarchy:: 214 + 215 + mlx5_core.sf.4 216 + (subfunction auxiliary device) 217 + /\ 218 + / \ 219 + / \ 220 + / \ 221 + / \ 222 + mlx5_core.eth.4 mlx5_core.rdma.4 223 + (sf eth aux dev) (sf rdma aux dev) 224 + | | 225 + | | 226 + p0sf88 mlx5_0 227 + (sf netdev) (sf rdma device) 228 + 229 + Additionally, the SF port also gets the event when the driver attaches to the 230 + auxiliary device of the subfunction. This results in changing the operational 231 + state of the function. This provides visibility to the user to decide when is it 232 + safe to delete the SF port for graceful termination of the subfunction. 233 + 234 + - Show the SF port operational state:: 235 + 236 + $ devlink port show ens2f0npf0sf88 237 + pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false 238 + function: 239 + hw_addr 00:00:00:00:88:88 state active opstate attached
+229
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/tracepoints.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 + .. include:: <isonum.txt> 3 + 4 + =========== 5 + Tracepoints 6 + =========== 7 + 8 + :Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 + 10 + mlx5 driver provides internal tracepoints for tracking and debugging using 11 + kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst). 12 + 13 + For the list of support mlx5 events, check `/sys/kernel/debug/tracing/events/mlx5/`. 14 + 15 + tc and eswitch offloads tracepoints: 16 + 17 + - mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5:: 18 + 19 + $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event 20 + $ cat /sys/kernel/debug/tracing/trace 21 + ... 22 + tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT 23 + 24 + - mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5:: 25 + 26 + $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event 27 + $ cat /sys/kernel/debug/tracing/trace 28 + ... 29 + tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL 30 + 31 + - mlx5e_stats_flower: trace flower stats request:: 32 + 33 + $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event 34 + $ cat /sys/kernel/debug/tracing/trace 35 + ... 36 + tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217 37 + 38 + - mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5:: 39 + 40 + $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event 41 + $ cat /sys/kernel/debug/tracing/trace 42 + ... 43 + kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1 44 + 45 + - mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events:: 46 + 47 + $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event 48 + $ cat /sys/kernel/debug/tracing/trace 49 + ... 50 + kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1 51 + 52 + Bridge offloads tracepoints: 53 + 54 + - mlx5_esw_bridge_fdb_entry_init: trace bridge FDB entry offloaded to mlx5:: 55 + 56 + $ echo mlx5:mlx5_esw_bridge_fdb_entry_init >> set_event 57 + $ cat /sys/kernel/debug/tracing/trace 58 + ... 59 + kworker/u20:9-2217 [003] ...1 318.582243: mlx5_esw_bridge_fdb_entry_init: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=0 flags=0 used=0 60 + 61 + - mlx5_esw_bridge_fdb_entry_cleanup: trace bridge FDB entry deleted from mlx5:: 62 + 63 + $ echo mlx5:mlx5_esw_bridge_fdb_entry_cleanup >> set_event 64 + $ cat /sys/kernel/debug/tracing/trace 65 + ... 66 + ip-2581 [005] ...1 318.629871: mlx5_esw_bridge_fdb_entry_cleanup: net_device=enp8s0f0_1 addr=e4:fd:05:08:00:03 vid=0 flags=0 used=16 67 + 68 + - mlx5_esw_bridge_fdb_entry_refresh: trace bridge FDB entry offload refreshed in 69 + mlx5:: 70 + 71 + $ echo mlx5:mlx5_esw_bridge_fdb_entry_refresh >> set_event 72 + $ cat /sys/kernel/debug/tracing/trace 73 + ... 74 + kworker/u20:8-3849 [003] ...1 466716: mlx5_esw_bridge_fdb_entry_refresh: net_device=enp8s0f0_0 addr=e4:fd:05:08:00:02 vid=3 flags=0 used=0 75 + 76 + - mlx5_esw_bridge_vlan_create: trace bridge VLAN object add on mlx5 77 + representor:: 78 + 79 + $ echo mlx5:mlx5_esw_bridge_vlan_create >> set_event 80 + $ cat /sys/kernel/debug/tracing/trace 81 + ... 82 + ip-2560 [007] ...1 318.460258: mlx5_esw_bridge_vlan_create: vid=1 flags=6 83 + 84 + - mlx5_esw_bridge_vlan_cleanup: trace bridge VLAN object delete from mlx5 85 + representor:: 86 + 87 + $ echo mlx5:mlx5_esw_bridge_vlan_cleanup >> set_event 88 + $ cat /sys/kernel/debug/tracing/trace 89 + ... 90 + bridge-2582 [007] ...1 318.653496: mlx5_esw_bridge_vlan_cleanup: vid=2 flags=8 91 + 92 + - mlx5_esw_bridge_vport_init: trace mlx5 vport assigned with bridge upper 93 + device:: 94 + 95 + $ echo mlx5:mlx5_esw_bridge_vport_init >> set_event 96 + $ cat /sys/kernel/debug/tracing/trace 97 + ... 98 + ip-2560 [007] ...1 318.458915: mlx5_esw_bridge_vport_init: vport_num=1 99 + 100 + - mlx5_esw_bridge_vport_cleanup: trace mlx5 vport removed from bridge upper 101 + device:: 102 + 103 + $ echo mlx5:mlx5_esw_bridge_vport_cleanup >> set_event 104 + $ cat /sys/kernel/debug/tracing/trace 105 + ... 106 + ip-5387 [000] ...1 573713: mlx5_esw_bridge_vport_cleanup: vport_num=1 107 + 108 + Eswitch QoS tracepoints: 109 + 110 + - mlx5_esw_vport_qos_create: trace creation of transmit scheduler arbiter for vport:: 111 + 112 + $ echo mlx5:mlx5_esw_vport_qos_create >> /sys/kernel/debug/tracing/set_event 113 + $ cat /sys/kernel/debug/tracing/trace 114 + ... 115 + <...>-23496 [018] .... 73136.838831: mlx5_esw_vport_qos_create: (0000:82:00.0) vport=2 tsar_ix=4 bw_share=0, max_rate=0 group=000000007b576bb3 116 + 117 + - mlx5_esw_vport_qos_config: trace configuration of transmit scheduler arbiter for vport:: 118 + 119 + $ echo mlx5:mlx5_esw_vport_qos_config >> /sys/kernel/debug/tracing/set_event 120 + $ cat /sys/kernel/debug/tracing/trace 121 + ... 122 + <...>-26548 [023] .... 75754.223823: mlx5_esw_vport_qos_config: (0000:82:00.0) vport=1 tsar_ix=3 bw_share=34, max_rate=10000 group=000000007b576bb3 123 + 124 + - mlx5_esw_vport_qos_destroy: trace deletion of transmit scheduler arbiter for vport:: 125 + 126 + $ echo mlx5:mlx5_esw_vport_qos_destroy >> /sys/kernel/debug/tracing/set_event 127 + $ cat /sys/kernel/debug/tracing/trace 128 + ... 129 + <...>-27418 [004] .... 76546.680901: mlx5_esw_vport_qos_destroy: (0000:82:00.0) vport=1 tsar_ix=3 130 + 131 + - mlx5_esw_group_qos_create: trace creation of transmit scheduler arbiter for rate group:: 132 + 133 + $ echo mlx5:mlx5_esw_group_qos_create >> /sys/kernel/debug/tracing/set_event 134 + $ cat /sys/kernel/debug/tracing/trace 135 + ... 136 + <...>-26578 [008] .... 75776.022112: mlx5_esw_group_qos_create: (0000:82:00.0) group=000000008dac63ea tsar_ix=5 137 + 138 + - mlx5_esw_group_qos_config: trace configuration of transmit scheduler arbiter for rate group:: 139 + 140 + $ echo mlx5:mlx5_esw_group_qos_config >> /sys/kernel/debug/tracing/set_event 141 + $ cat /sys/kernel/debug/tracing/trace 142 + ... 143 + <...>-27303 [020] .... 76461.455356: mlx5_esw_group_qos_config: (0000:82:00.0) group=000000008dac63ea tsar_ix=5 bw_share=100 max_rate=20000 144 + 145 + - mlx5_esw_group_qos_destroy: trace deletion of transmit scheduler arbiter for group:: 146 + 147 + $ echo mlx5:mlx5_esw_group_qos_destroy >> /sys/kernel/debug/tracing/set_event 148 + $ cat /sys/kernel/debug/tracing/trace 149 + ... 150 + <...>-27418 [006] .... 76547.187258: mlx5_esw_group_qos_destroy: (0000:82:00.0) group=000000007b576bb3 tsar_ix=1 151 + 152 + SF tracepoints: 153 + 154 + - mlx5_sf_add: trace addition of the SF port:: 155 + 156 + $ echo mlx5:mlx5_sf_add >> /sys/kernel/debug/tracing/set_event 157 + $ cat /sys/kernel/debug/tracing/trace 158 + ... 159 + devlink-9363 [031] ..... 24610.188722: mlx5_sf_add: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 sfnum=88 160 + 161 + - mlx5_sf_free: trace freeing of the SF port:: 162 + 163 + $ echo mlx5:mlx5_sf_free >> /sys/kernel/debug/tracing/set_event 164 + $ cat /sys/kernel/debug/tracing/trace 165 + ... 166 + devlink-9830 [038] ..... 26300.404749: mlx5_sf_free: (0000:06:00.0) port_index=32768 controller=0 hw_id=0x8000 167 + 168 + - mlx5_sf_activate: trace activation of the SF port:: 169 + 170 + $ echo mlx5:mlx5_sf_activate >> /sys/kernel/debug/tracing/set_event 171 + $ cat /sys/kernel/debug/tracing/trace 172 + ... 173 + devlink-29841 [008] ..... 3669.635095: mlx5_sf_activate: (0000:08:00.0) port_index=32768 controller=0 hw_id=0x8000 174 + 175 + - mlx5_sf_deactivate: trace deactivation of the SF port:: 176 + 177 + $ echo mlx5:mlx5_sf_deactivate >> /sys/kernel/debug/tracing/set_event 178 + $ cat /sys/kernel/debug/tracing/trace 179 + ... 180 + devlink-29994 [008] ..... 4015.969467: mlx5_sf_deactivate: (0000:08:00.0) port_index=32768 controller=0 hw_id=0x8000 181 + 182 + - mlx5_sf_hwc_alloc: trace allocating of the hardware SF context:: 183 + 184 + $ echo mlx5:mlx5_sf_hwc_alloc >> /sys/kernel/debug/tracing/set_event 185 + $ cat /sys/kernel/debug/tracing/trace 186 + ... 187 + devlink-9775 [031] ..... 26296.385259: mlx5_sf_hwc_alloc: (0000:06:00.0) controller=0 hw_id=0x8000 sfnum=88 188 + 189 + - mlx5_sf_hwc_free: trace freeing of the hardware SF context:: 190 + 191 + $ echo mlx5:mlx5_sf_hwc_free >> /sys/kernel/debug/tracing/set_event 192 + $ cat /sys/kernel/debug/tracing/trace 193 + ... 194 + kworker/u128:3-9093 [046] ..... 24625.365771: mlx5_sf_hwc_free: (0000:06:00.0) hw_id=0x8000 195 + 196 + - mlx5_sf_hwc_deferred_free: trace deferred freeing of the hardware SF context:: 197 + 198 + $ echo mlx5:mlx5_sf_hwc_deferred_free >> /sys/kernel/debug/tracing/set_event 199 + $ cat /sys/kernel/debug/tracing/trace 200 + ... 201 + devlink-9519 [046] ..... 24624.400271: mlx5_sf_hwc_deferred_free: (0000:06:00.0) hw_id=0x8000 202 + 203 + - mlx5_sf_update_state: trace state updates for SF contexts:: 204 + 205 + $ echo mlx5:mlx5_sf_update_state >> /sys/kernel/debug/tracing/set_event 206 + $ cat /sys/kernel/debug/tracing/trace 207 + ... 208 + kworker/u20:3-29490 [009] ..... 4141.453530: mlx5_sf_update_state: (0000:08:00.0) port_index=32768 controller=0 hw_id=0x8000 state=2 209 + 210 + - mlx5_sf_vhca_event: trace SF vhca event and state:: 211 + 212 + $ echo mlx5:mlx5_sf_vhca_event >> /sys/kernel/debug/tracing/set_event 213 + $ cat /sys/kernel/debug/tracing/trace 214 + ... 215 + kworker/u128:3-9093 [046] ..... 24625.365525: mlx5_sf_vhca_event: (0000:06:00.0) hw_id=0x8000 sfnum=88 vhca_state=1 216 + 217 + - mlx5_sf_dev_add: trace SF device add event:: 218 + 219 + $ echo mlx5:mlx5_sf_dev_add>> /sys/kernel/debug/tracing/set_event 220 + $ cat /sys/kernel/debug/tracing/trace 221 + ... 222 + kworker/u128:3-9093 [000] ..... 24616.524495: mlx5_sf_dev_add: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88 223 + 224 + - mlx5_sf_dev_del: trace SF device delete event:: 225 + 226 + $ echo mlx5:mlx5_sf_dev_del >> /sys/kernel/debug/tracing/set_event 227 + $ cat /sys/kernel/debug/tracing/trace 228 + ... 229 + kworker/u128:3-9093 [044] ..... 24624.400749: mlx5_sf_dev_del: (0000:06:00.0) sfdev=00000000fc5d96fd aux_id=4 hw_id=0x8000 sfnum=88
+1
drivers/net/ethernet/mellanox/mlx5/core/en.h
··· 454 454 struct mlx5_clock *clock; 455 455 struct net_device *netdev; 456 456 struct mlx5_core_dev *mdev; 457 + struct mlx5e_channel *channel; 457 458 struct mlx5e_priv *priv; 458 459 459 460 /* control path */
+1 -1
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
··· 771 771 if (test_bit(MLX5E_PTP_STATE_RX, c->state)) { 772 772 mlx5e_ptp_rx_set_fs(c->priv); 773 773 mlx5e_activate_rq(&c->rq); 774 - mlx5e_trigger_napi_sched(&c->napi); 775 774 } 775 + mlx5e_trigger_napi_sched(&c->napi); 776 776 } 777 777 778 778 void mlx5e_ptp_deactivate_channel(struct mlx5e_ptp *c)
+4
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
··· 81 81 sq->stats->recover++; 82 82 clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state); 83 83 mlx5e_activate_txqsq(sq); 84 + if (sq->channel) 85 + mlx5e_trigger_napi_icosq(sq->channel); 86 + else 87 + mlx5e_trigger_napi_sched(sq->cq.napi); 84 88 85 89 return 0; 86 90 out:
+3 -3
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/mirred.c
··· 232 232 parse_state->ifindexes[if_count] = out_dev->ifindex; 233 233 parse_state->if_count++; 234 234 is_uplink_rep = mlx5e_eswitch_uplink_rep(out_dev); 235 - err = mlx5_lag_do_mirred(priv->mdev, out_dev); 236 - if (err) 237 - return err; 235 + 236 + if (mlx5_lag_mpesw_do_mirred(priv->mdev, out_dev, extack)) 237 + return -EOPNOTSUPP; 238 238 239 239 out_dev = get_fdb_out_dev(uplink_dev, out_dev); 240 240 if (!out_dev)
+23
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
··· 158 158 attrs->family = x->props.family; 159 159 attrs->type = x->xso.type; 160 160 attrs->reqid = x->props.reqid; 161 + attrs->upspec.dport = ntohs(x->sel.dport); 162 + attrs->upspec.dport_mask = ntohs(x->sel.dport_mask); 163 + attrs->upspec.sport = ntohs(x->sel.sport); 164 + attrs->upspec.sport_mask = ntohs(x->sel.sport_mask); 165 + attrs->upspec.proto = x->sel.proto; 161 166 162 167 mlx5e_ipsec_init_limits(sa_entry, attrs); 163 168 } ··· 226 221 NL_SET_ERR_MSG_MOD(extack, "Cannot offload xfrm states with geniv other than seqiv"); 227 222 return -EINVAL; 228 223 } 224 + 225 + if (x->sel.proto != IPPROTO_IP && 226 + (x->sel.proto != IPPROTO_UDP || x->xso.dir != XFRM_DEV_OFFLOAD_OUT)) { 227 + NL_SET_ERR_MSG_MOD(extack, "Device does not support upper protocol other than UDP, and only Tx direction"); 228 + return -EINVAL; 229 + } 230 + 229 231 switch (x->xso.type) { 230 232 case XFRM_DEV_OFFLOAD_CRYPTO: 231 233 if (!(mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_CRYPTO)) { ··· 529 517 return -EINVAL; 530 518 } 531 519 520 + if (x->selector.proto != IPPROTO_IP && 521 + (x->selector.proto != IPPROTO_UDP || x->xdo.dir != XFRM_DEV_OFFLOAD_OUT)) { 522 + NL_SET_ERR_MSG_MOD(extack, "Device does not support upper protocol other than UDP, and only Tx direction"); 523 + return -EINVAL; 524 + } 525 + 532 526 return 0; 533 527 } 534 528 ··· 555 537 attrs->action = x->action; 556 538 attrs->type = XFRM_DEV_OFFLOAD_PACKET; 557 539 attrs->reqid = x->xfrm_vec[0].reqid; 540 + attrs->upspec.dport = ntohs(sel->dport); 541 + attrs->upspec.dport_mask = ntohs(sel->dport_mask); 542 + attrs->upspec.sport = ntohs(sel->sport); 543 + attrs->upspec.sport_mask = ntohs(sel->sport_mask); 544 + attrs->upspec.proto = sel->proto; 558 545 } 559 546 560 547 static int mlx5e_xfrm_add_policy(struct xfrm_policy *x,
+10
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h
··· 52 52 u32 aes_key[256 / 32]; 53 53 }; 54 54 55 + struct upspec { 56 + u16 dport; 57 + u16 dport_mask; 58 + u16 sport; 59 + u16 sport_mask; 60 + u8 proto; 61 + }; 62 + 55 63 struct mlx5_accel_esp_xfrm_attrs { 56 64 u32 esn; 57 65 u32 spi; ··· 76 68 __be32 a6[4]; 77 69 } daddr; 78 70 71 + struct upspec upspec; 79 72 u8 dir : 2; 80 73 u8 esn_overlap : 1; 81 74 u8 esn_trigger : 1; ··· 190 181 __be32 a6[4]; 191 182 } daddr; 192 183 184 + struct upspec upspec; 193 185 u8 family; 194 186 u8 action; 195 187 u8 type : 2;
+23
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c
··· 467 467 misc_parameters_2.metadata_reg_c_0, reqid); 468 468 } 469 469 470 + static void setup_fte_upper_proto_match(struct mlx5_flow_spec *spec, struct upspec *upspec) 471 + { 472 + if (upspec->proto != IPPROTO_UDP) 473 + return; 474 + 475 + spec->match_criteria_enable |= MLX5_MATCH_OUTER_HEADERS; 476 + MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, spec->match_criteria, ip_protocol); 477 + MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, ip_protocol, upspec->proto); 478 + if (upspec->dport) { 479 + MLX5_SET(fte_match_set_lyr_2_4, spec->match_criteria, udp_dport, 480 + upspec->dport_mask); 481 + MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, udp_dport, upspec->dport); 482 + } 483 + 484 + if (upspec->sport) { 485 + MLX5_SET(fte_match_set_lyr_2_4, spec->match_criteria, udp_dport, 486 + upspec->sport_mask); 487 + MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, udp_dport, upspec->sport); 488 + } 489 + } 490 + 470 491 static int setup_modify_header(struct mlx5_core_dev *mdev, u32 val, u8 dir, 471 492 struct mlx5_flow_act *flow_act) 472 493 { ··· 675 654 setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6); 676 655 677 656 setup_fte_no_frags(spec); 657 + setup_fte_upper_proto_match(spec, &attrs->upspec); 678 658 679 659 switch (attrs->type) { 680 660 case XFRM_DEV_OFFLOAD_CRYPTO: ··· 750 728 setup_fte_addr6(spec, attrs->saddr.a6, attrs->daddr.a6); 751 729 752 730 setup_fte_no_frags(spec); 731 + setup_fte_upper_proto_match(spec, &attrs->upspec); 753 732 754 733 err = setup_modify_header(mdev, attrs->reqid, XFRM_DEV_OFFLOAD_OUT, 755 734 &flow_act);
+9 -6
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
··· 1470 1470 sq->mkey_be = c->mkey_be; 1471 1471 sq->netdev = c->netdev; 1472 1472 sq->mdev = c->mdev; 1473 + sq->channel = c; 1473 1474 sq->priv = c->priv; 1474 1475 sq->ch_ix = c->ix; 1475 1476 sq->txq_ix = txq_ix; ··· 2483 2482 mlx5e_activate_xsk(c); 2484 2483 else 2485 2484 mlx5e_activate_rq(&c->rq); 2486 - 2487 - mlx5e_trigger_napi_icosq(c); 2488 2485 } 2489 2486 2490 2487 static void mlx5e_deactivate_channel(struct mlx5e_channel *c) ··· 2574 2575 return err; 2575 2576 } 2576 2577 2577 - static void mlx5e_activate_channels(struct mlx5e_channels *chs) 2578 + static void mlx5e_activate_channels(struct mlx5e_priv *priv, struct mlx5e_channels *chs) 2578 2579 { 2579 2580 int i; 2580 2581 2581 2582 for (i = 0; i < chs->num; i++) 2582 2583 mlx5e_activate_channel(chs->c[i]); 2584 + 2585 + if (priv->htb) 2586 + mlx5e_qos_activate_queues(priv); 2587 + 2588 + for (i = 0; i < chs->num; i++) 2589 + mlx5e_trigger_napi_icosq(chs->c[i]); 2583 2590 2584 2591 if (chs->ptp) 2585 2592 mlx5e_ptp_activate_channel(chs->ptp); ··· 2893 2888 void mlx5e_activate_priv_channels(struct mlx5e_priv *priv) 2894 2889 { 2895 2890 mlx5e_build_txq_maps(priv); 2896 - mlx5e_activate_channels(&priv->channels); 2897 - if (priv->htb) 2898 - mlx5e_qos_activate_queues(priv); 2891 + mlx5e_activate_channels(priv, &priv->channels); 2899 2892 mlx5e_xdp_tx_enable(priv); 2900 2893 2901 2894 /* dev_watchdog() wants all TX queues to be started when the carrier is
+2
drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
··· 172 172 MLX5_PTYS_RATE_EDR = 1 << 5, 173 173 MLX5_PTYS_RATE_HDR = 1 << 6, 174 174 MLX5_PTYS_RATE_NDR = 1 << 7, 175 + MLX5_PTYS_RATE_XDR = 1 << 8, 175 176 }; 176 177 177 178 static inline int mlx5_ptys_rate_enum_to_int(enum mlx5_ptys_rate rate) ··· 186 185 case MLX5_PTYS_RATE_EDR: return 25000; 187 186 case MLX5_PTYS_RATE_HDR: return 50000; 188 187 case MLX5_PTYS_RATE_NDR: return 100000; 188 + case MLX5_PTYS_RATE_XDR: return 200000; 189 189 default: return -1; 190 190 } 191 191 }
+6 -6
drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c
··· 22 22 struct mlx5_lag *ldev; 23 23 char *mode = NULL; 24 24 25 - ldev = dev->priv.lag; 25 + ldev = mlx5_lag_dev(dev); 26 26 mutex_lock(&ldev->lock); 27 27 if (__mlx5_lag_is_active(ldev)) 28 28 mode = get_str_mode_type(ldev); ··· 41 41 int ret = 0; 42 42 char *mode; 43 43 44 - ldev = dev->priv.lag; 44 + ldev = mlx5_lag_dev(dev); 45 45 mutex_lock(&ldev->lock); 46 46 if (__mlx5_lag_is_active(ldev)) 47 47 mode = mlx5_get_str_port_sel_mode(ldev->mode, ldev->mode_flags); ··· 61 61 struct mlx5_lag *ldev; 62 62 bool active; 63 63 64 - ldev = dev->priv.lag; 64 + ldev = mlx5_lag_dev(dev); 65 65 mutex_lock(&ldev->lock); 66 66 active = __mlx5_lag_is_active(ldev); 67 67 mutex_unlock(&ldev->lock); ··· 77 77 bool shared_fdb; 78 78 bool lag_active; 79 79 80 - ldev = dev->priv.lag; 80 + ldev = mlx5_lag_dev(dev); 81 81 mutex_lock(&ldev->lock); 82 82 lag_active = __mlx5_lag_is_active(ldev); 83 83 if (!lag_active) ··· 108 108 int num_ports; 109 109 int i; 110 110 111 - ldev = dev->priv.lag; 111 + ldev = mlx5_lag_dev(dev); 112 112 mutex_lock(&ldev->lock); 113 113 lag_active = __mlx5_lag_is_active(ldev); 114 114 if (lag_active) { ··· 142 142 struct mlx5_lag *ldev; 143 143 int i; 144 144 145 - ldev = dev->priv.lag; 145 + ldev = mlx5_lag_dev(dev); 146 146 mutex_lock(&ldev->lock); 147 147 for (i = 0; i < ldev->ports; i++) { 148 148 if (!ldev->pf[i].dev)
+2 -3
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
··· 1187 1187 1188 1188 tmp_dev = mlx5_get_next_phys_dev_lag(dev); 1189 1189 if (tmp_dev) 1190 - ldev = tmp_dev->priv.lag; 1190 + ldev = mlx5_lag_dev(tmp_dev); 1191 1191 1192 1192 if (!ldev) { 1193 1193 ldev = mlx5_lag_dev_alloc(dev); ··· 1386 1386 1387 1387 spin_lock_irqsave(&lag_lock, flags); 1388 1388 ldev = mlx5_lag_dev(dev); 1389 - res = ldev && __mlx5_lag_is_sriov(ldev) && 1390 - test_bit(MLX5_LAG_MODE_FLAG_SHARED_FDB, &ldev->mode_flags); 1389 + res = ldev && test_bit(MLX5_LAG_MODE_FLAG_SHARED_FDB, &ldev->mode_flags); 1391 1390 spin_unlock_irqrestore(&lag_lock, flags); 1392 1391 1393 1392 return res;
-15
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
··· 50 50 enum netdev_lag_hash hash_type; 51 51 }; 52 52 53 - enum mpesw_op { 54 - MLX5_MPESW_OP_ENABLE, 55 - MLX5_MPESW_OP_DISABLE, 56 - }; 57 - 58 - struct mlx5_mpesw_work_st { 59 - struct work_struct work; 60 - struct mlx5_lag *lag; 61 - enum mpesw_op op; 62 - struct completion comp; 63 - int result; 64 - }; 65 - 66 53 /* LAG data of a ConnectX card. 67 54 * It serves both its phys functions. 68 55 */ ··· 111 124 int mlx5_lag_dev_get_netdev_idx(struct mlx5_lag *ldev, 112 125 struct net_device *ndev); 113 126 bool mlx5_shared_fdb_supported(struct mlx5_lag *ldev); 114 - void mlx5_lag_del_mpesw_rule(struct mlx5_core_dev *dev); 115 - int mlx5_lag_add_mpesw_rule(struct mlx5_core_dev *dev); 116 127 117 128 char *mlx5_get_str_port_sel_mode(enum mlx5_lag_mode mode, unsigned long flags); 118 129 void mlx5_infer_tx_enabled(struct lag_tracker *tracker, u8 num_ports,
+2 -6
drivers/net/ethernet/mellanox/mlx5/core/lag/mp.c
··· 28 28 29 29 bool mlx5_lag_is_multipath(struct mlx5_core_dev *dev) 30 30 { 31 - struct mlx5_lag *ldev; 32 - bool res; 31 + struct mlx5_lag *ldev = mlx5_lag_dev(dev); 33 32 34 - ldev = mlx5_lag_dev(dev); 35 - res = ldev && __mlx5_lag_is_multipath(ldev); 36 - 37 - return res; 33 + return ldev && __mlx5_lag_is_multipath(ldev); 38 34 } 39 35 40 36 /**
+11 -9
drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
··· 58 58 static int mlx5_lag_mpesw_queue_work(struct mlx5_core_dev *dev, 59 59 enum mpesw_op op) 60 60 { 61 - struct mlx5_lag *ldev = dev->priv.lag; 61 + struct mlx5_lag *ldev = mlx5_lag_dev(dev); 62 62 struct mlx5_mpesw_work_st *work; 63 63 int err = 0; 64 64 ··· 96 96 return mlx5_lag_mpesw_queue_work(dev, MLX5_MPESW_OP_ENABLE); 97 97 } 98 98 99 - int mlx5_lag_do_mirred(struct mlx5_core_dev *mdev, struct net_device *out_dev) 99 + int mlx5_lag_mpesw_do_mirred(struct mlx5_core_dev *mdev, 100 + struct net_device *out_dev, 101 + struct netlink_ext_ack *extack) 100 102 { 101 - struct mlx5_lag *ldev = mdev->priv.lag; 103 + struct mlx5_lag *ldev = mlx5_lag_dev(mdev); 102 104 103 105 if (!netif_is_bond_master(out_dev) || !ldev) 104 106 return 0; 105 107 106 - if (ldev->mode == MLX5_LAG_MODE_MPESW) 107 - return -EOPNOTSUPP; 108 + if (ldev->mode != MLX5_LAG_MODE_MPESW) 109 + return 0; 108 110 109 - return 0; 111 + NL_SET_ERR_MSG_MOD(extack, "can't forward to bond in mpesw mode"); 112 + return -EOPNOTSUPP; 110 113 } 111 114 112 115 bool mlx5_lag_mpesw_is_activated(struct mlx5_core_dev *dev) 113 116 { 114 - bool ret; 117 + struct mlx5_lag *ldev = mlx5_lag_dev(dev); 115 118 116 - ret = dev->priv.lag && dev->priv.lag->mode == MLX5_LAG_MODE_MPESW; 117 - return ret; 119 + return ldev && ldev->mode == MLX5_LAG_MODE_MPESW; 118 120 } 119 121 120 122 void mlx5_lag_mpesw_init(struct mlx5_lag *ldev)
+18 -1
drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.h
··· 12 12 atomic_t mpesw_rule_count; 13 13 }; 14 14 15 - int mlx5_lag_do_mirred(struct mlx5_core_dev *mdev, struct net_device *out_dev); 15 + enum mpesw_op { 16 + MLX5_MPESW_OP_ENABLE, 17 + MLX5_MPESW_OP_DISABLE, 18 + }; 19 + 20 + struct mlx5_mpesw_work_st { 21 + struct work_struct work; 22 + struct mlx5_lag *lag; 23 + enum mpesw_op op; 24 + struct completion comp; 25 + int result; 26 + }; 27 + 28 + int mlx5_lag_mpesw_do_mirred(struct mlx5_core_dev *mdev, 29 + struct net_device *out_dev, 30 + struct netlink_ext_ack *extack); 16 31 bool mlx5_lag_mpesw_is_activated(struct mlx5_core_dev *dev); 32 + void mlx5_lag_del_mpesw_rule(struct mlx5_core_dev *dev); 33 + int mlx5_lag_add_mpesw_rule(struct mlx5_core_dev *dev); 17 34 #if IS_ENABLED(CONFIG_MLX5_ESWITCH) 18 35 void mlx5_lag_mpesw_init(struct mlx5_lag *ldev); 19 36 void mlx5_lag_mpesw_cleanup(struct mlx5_lag *ldev);
+12 -3
drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
··· 362 362 return mlx5_ptp_adjtime(ptp, delta); 363 363 } 364 364 365 - static int mlx5_ptp_adjfreq_real_time(struct mlx5_core_dev *mdev, s32 freq) 365 + static int mlx5_ptp_freq_adj_real_time(struct mlx5_core_dev *mdev, long scaled_ppm) 366 366 { 367 367 u32 in[MLX5_ST_SZ_DW(mtutc_reg)] = {}; 368 368 ··· 370 370 return 0; 371 371 372 372 MLX5_SET(mtutc_reg, in, operation, MLX5_MTUTC_OPERATION_ADJUST_FREQ_UTC); 373 - MLX5_SET(mtutc_reg, in, freq_adjustment, freq); 373 + 374 + if (MLX5_CAP_MCAM_FEATURE(mdev, mtutc_freq_adj_units)) { 375 + MLX5_SET(mtutc_reg, in, freq_adj_units, 376 + MLX5_MTUTC_FREQ_ADJ_UNITS_SCALED_PPM); 377 + MLX5_SET(mtutc_reg, in, freq_adjustment, scaled_ppm); 378 + } else { 379 + MLX5_SET(mtutc_reg, in, freq_adj_units, MLX5_MTUTC_FREQ_ADJ_UNITS_PPB); 380 + MLX5_SET(mtutc_reg, in, freq_adjustment, scaled_ppm_to_ppb(scaled_ppm)); 381 + } 374 382 375 383 return mlx5_set_mtutc(mdev, in, sizeof(in)); 376 384 } ··· 393 385 int err; 394 386 395 387 mdev = container_of(clock, struct mlx5_core_dev, clock); 396 - err = mlx5_ptp_adjfreq_real_time(mdev, scaled_ppm_to_ppb(scaled_ppm)); 388 + 389 + err = mlx5_ptp_freq_adj_real_time(mdev, scaled_ppm); 397 390 if (err) 398 391 return err; 399 392
+2 -1
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
··· 211 211 212 212 n = find_first_bit(&fp->bitmask, 8 * sizeof(fp->bitmask)); 213 213 if (n >= MLX5_NUM_4K_IN_PAGE) { 214 - mlx5_core_warn(dev, "alloc 4k bug\n"); 214 + mlx5_core_warn(dev, "alloc 4k bug: fw page = 0x%llx, n = %u, bitmask: %lu, max num of 4K pages: %d\n", 215 + fp->addr, n, fp->bitmask, MLX5_NUM_4K_IN_PAGE); 215 216 return -ENOENT; 216 217 } 217 218 clear_bit(n, &fp->bitmask);
+10 -2
include/linux/mlx5/mlx5_ifc.h
··· 9926 9926 }; 9927 9927 9928 9928 enum { 9929 + MLX5_MTUTC_FREQ_ADJ_UNITS_PPB = 0x0, 9930 + MLX5_MTUTC_FREQ_ADJ_UNITS_SCALED_PPM = 0x1, 9931 + }; 9932 + 9933 + enum { 9929 9934 MLX5_MTUTC_OPERATION_SET_TIME_IMMEDIATE = 0x1, 9930 9935 MLX5_MTUTC_OPERATION_ADJUST_TIME = 0x2, 9931 9936 MLX5_MTUTC_OPERATION_ADJUST_FREQ_UTC = 0x3, 9932 9937 }; 9933 9938 9934 9939 struct mlx5_ifc_mtutc_reg_bits { 9935 - u8 reserved_at_0[0x1c]; 9940 + u8 reserved_at_0[0x5]; 9941 + u8 freq_adj_units[0x3]; 9942 + u8 reserved_at_8[0x14]; 9936 9943 u8 operation[0x4]; 9937 9944 9938 9945 u8 freq_adjustment[0x20]; ··· 10012 10005 }; 10013 10006 10014 10007 struct mlx5_ifc_mcam_enhanced_features_bits { 10015 - u8 reserved_at_0[0x51]; 10008 + u8 reserved_at_0[0x50]; 10009 + u8 mtutc_freq_adj_units[0x1]; 10016 10010 u8 mtutc_time_adjustment_extended_range[0x1]; 10017 10011 u8 reserved_at_52[0xb]; 10018 10012 u8 mcia_32dwords[0x1];