Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'net-mlx5e-add-pcie-congestion-event-extras'

Tariq Toukan says:

====================
net/mlx5e: Add pcie congestion event extras

This small series by Dragos covers gaps requested in the initial pcie
congestion series [1]:
- Make pcie congestion thresholds configurable via devlink.
- Add a counter for stale pcie congestion events.

[1] https://lore.kernel.org/1752130292-22249-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/1757237976-531416-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+238 -10
+6 -1
Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
··· 1348 1348 is in a congested state. 1349 1349 If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested. 1350 1350 If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested. 1351 - - Tnformative 1351 + - Informative 1352 1352 1353 1353 * - `pci_bw_inbound_low` 1354 1354 - The number of times the device crossed the low inbound PCIe bandwidth ··· 1372 1372 is in a congested state. 1373 1373 If pci_bw_outbound_high == pci_bw_outbound_low then the device is not congested. 1374 1374 If pci_bw_outbound_high > pci_bw_outbound_low then the device is congested. 1375 + - Informative 1376 + 1377 + * - `pci_bw_stale_event` 1378 + - The number of times the device fired a PCIe congestion event but on query 1379 + there was no change in state. 1375 1380 - Informative
+52
Documentation/networking/devlink/mlx5.rst
··· 146 146 - u32 147 147 - driverinit 148 148 - Control the size (in packets) of the hairpin queues. 149 + * - ``pcie_cong_inbound_high`` 150 + - u16 151 + - driverinit 152 + - High threshold configuration for PCIe congestion events. The firmware 153 + will send an event once device side inbound PCIe traffic went 154 + above the configured high threshold for a long enough period (at least 155 + 200ms). 156 + 157 + See pci_bw_inbound_high ethtool stat. 158 + 159 + Units are 0.01 %. Accepted values are in range [0, 10000]. 160 + pcie_cong_inbound_low < pcie_cong_inbound_high. 161 + Default value: 9000 (Corresponds to 90%). 162 + * - ``pcie_cong_inbound_low`` 163 + - u16 164 + - driverinit 165 + - Low threshold configuration for PCIe congestion events. The firmware 166 + will send an event once device side inbound PCIe traffic went 167 + below the configured low threshold, only after having been previously in 168 + a congested state. 169 + 170 + See pci_bw_inbound_low ethtool stat. 171 + 172 + Units are 0.01 %. Accepted values are in range [0, 10000]. 173 + pcie_cong_inbound_low < pcie_cong_inbound_high. 174 + Default value: 7500. 175 + * - ``pcie_cong_outbound_high`` 176 + - u16 177 + - driverinit 178 + - High threshold configuration for PCIe congestion events. The firmware 179 + will send an event once device side outbound PCIe traffic went 180 + above the configured high threshold for a long enough period (at least 181 + 200ms). 182 + 183 + See pci_bw_outbound_high ethtool stat. 184 + 185 + Units are 0.01 %. Accepted values are in range [0, 10000]. 186 + pcie_cong_outbound_low < pcie_cong_outbound_high. 187 + Default value: 9000 (Corresponds to 90%). 188 + * - ``pcie_cong_outbound_low`` 189 + - u16 190 + - driverinit 191 + - Low threshold configuration for PCIe congestion events. The firmware 192 + will send an event once device side outbound PCIe traffic went 193 + below the configured low threshold, only after having been previously in 194 + a congested state. 195 + 196 + See pci_bw_outbound_low ethtool stat. 197 + 198 + Units are 0.01 %. Accepted values are in range [0, 10000]. 199 + pcie_cong_outbound_low < pcie_cong_outbound_high. 200 + Default value: 7500. 149 201 150 202 * - ``cqe_compress_type`` 151 203 - string
+106
drivers/net/ethernet/mellanox/mlx5/core/devlink.c
··· 651 651 ARRAY_SIZE(mlx5_devlink_eth_params)); 652 652 } 653 653 654 + #define MLX5_PCIE_CONG_THRESH_MAX 10000 655 + #define MLX5_PCIE_CONG_THRESH_DEF_LOW 7500 656 + #define MLX5_PCIE_CONG_THRESH_DEF_HIGH 9000 657 + 658 + static int 659 + mlx5_devlink_pcie_cong_thresh_validate(struct devlink *devl, u32 id, 660 + union devlink_param_value val, 661 + struct netlink_ext_ack *extack) 662 + { 663 + if (val.vu16 > MLX5_PCIE_CONG_THRESH_MAX) { 664 + NL_SET_ERR_MSG_FMT_MOD(extack, "Value %u > max supported (%u)", 665 + val.vu16, MLX5_PCIE_CONG_THRESH_MAX); 666 + 667 + return -EINVAL; 668 + } 669 + 670 + switch (id) { 671 + case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW: 672 + case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH: 673 + case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW: 674 + case MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH: 675 + break; 676 + default: 677 + return -EOPNOTSUPP; 678 + } 679 + 680 + return 0; 681 + } 682 + 683 + static void mlx5_devlink_pcie_cong_init_values(struct devlink *devlink) 684 + { 685 + union devlink_param_value value; 686 + u32 id; 687 + 688 + value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_LOW; 689 + id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW; 690 + devl_param_driverinit_value_set(devlink, id, value); 691 + 692 + value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_HIGH; 693 + id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH; 694 + devl_param_driverinit_value_set(devlink, id, value); 695 + 696 + value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_LOW; 697 + id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW; 698 + devl_param_driverinit_value_set(devlink, id, value); 699 + 700 + value.vu16 = MLX5_PCIE_CONG_THRESH_DEF_HIGH; 701 + id = MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH; 702 + devl_param_driverinit_value_set(devlink, id, value); 703 + } 704 + 705 + static const struct devlink_param mlx5_devlink_pcie_cong_params[] = { 706 + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW, 707 + "pcie_cong_inbound_low", DEVLINK_PARAM_TYPE_U16, 708 + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, 709 + mlx5_devlink_pcie_cong_thresh_validate), 710 + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH, 711 + "pcie_cong_inbound_high", DEVLINK_PARAM_TYPE_U16, 712 + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, 713 + mlx5_devlink_pcie_cong_thresh_validate), 714 + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW, 715 + "pcie_cong_outbound_low", DEVLINK_PARAM_TYPE_U16, 716 + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, 717 + mlx5_devlink_pcie_cong_thresh_validate), 718 + DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH, 719 + "pcie_cong_outbound_high", DEVLINK_PARAM_TYPE_U16, 720 + BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL, 721 + mlx5_devlink_pcie_cong_thresh_validate), 722 + }; 723 + 724 + static int mlx5_devlink_pcie_cong_params_register(struct devlink *devlink) 725 + { 726 + struct mlx5_core_dev *dev = devlink_priv(devlink); 727 + int err; 728 + 729 + if (!mlx5_pcie_cong_event_supported(dev)) 730 + return 0; 731 + 732 + err = devl_params_register(devlink, mlx5_devlink_pcie_cong_params, 733 + ARRAY_SIZE(mlx5_devlink_pcie_cong_params)); 734 + if (err) 735 + return err; 736 + 737 + mlx5_devlink_pcie_cong_init_values(devlink); 738 + 739 + return 0; 740 + } 741 + 742 + static void mlx5_devlink_pcie_cong_params_unregister(struct devlink *devlink) 743 + { 744 + struct mlx5_core_dev *dev = devlink_priv(devlink); 745 + 746 + if (!mlx5_pcie_cong_event_supported(dev)) 747 + return; 748 + 749 + devl_params_unregister(devlink, mlx5_devlink_pcie_cong_params, 750 + ARRAY_SIZE(mlx5_devlink_pcie_cong_params)); 751 + } 752 + 654 753 static int mlx5_devlink_enable_rdma_validate(struct devlink *devlink, u32 id, 655 754 union devlink_param_value val, 656 755 struct netlink_ext_ack *extack) ··· 995 896 if (err) 996 897 goto max_uc_list_err; 997 898 899 + err = mlx5_devlink_pcie_cong_params_register(devlink); 900 + if (err) 901 + goto pcie_cong_err; 902 + 998 903 err = mlx5_nv_param_register_dl_params(devlink); 999 904 if (err) 1000 905 goto nv_param_err; ··· 1006 903 return 0; 1007 904 1008 905 nv_param_err: 906 + mlx5_devlink_pcie_cong_params_unregister(devlink); 907 + pcie_cong_err: 1009 908 mlx5_devlink_max_uc_list_params_unregister(devlink); 1010 909 max_uc_list_err: 1011 910 mlx5_devlink_auxdev_params_unregister(devlink); ··· 1020 915 void mlx5_devlink_params_unregister(struct devlink *devlink) 1021 916 { 1022 917 mlx5_nv_param_unregister_dl_params(devlink); 918 + mlx5_devlink_pcie_cong_params_unregister(devlink); 1023 919 mlx5_devlink_max_uc_list_params_unregister(devlink); 1024 920 mlx5_devlink_auxdev_params_unregister(devlink); 1025 921 devl_params_unregister(devlink, mlx5_devlink_params,
+4
drivers/net/ethernet/mellanox/mlx5/core/devlink.h
··· 22 22 MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT, 23 23 MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES, 24 24 MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE, 25 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW, 26 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH, 27 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW, 28 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH, 25 29 MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE 26 30 }; 27 31
+70 -9
drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 2 // Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. 3 3 4 + #include "../devlink.h" 4 5 #include "en.h" 5 6 #include "pcie_cong_event.h" 6 7 ··· 24 23 u32 pci_bw_inbound_low; 25 24 u32 pci_bw_outbound_high; 26 25 u32 pci_bw_outbound_low; 26 + u32 pci_bw_stale_event; 27 27 }; 28 28 29 29 struct mlx5e_pcie_cong_event { ··· 43 41 struct mlx5e_pcie_cong_stats stats; 44 42 }; 45 43 46 - /* In units of 0.01 % */ 47 - static const struct mlx5e_pcie_cong_thresh default_thresh_config = { 48 - .inbound_high = 9000, 49 - .inbound_low = 7500, 50 - .outbound_high = 9000, 51 - .outbound_low = 7500, 52 - }; 53 44 54 45 static const struct counter_desc mlx5e_pcie_cong_stats_desc[] = { 55 46 { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, ··· 53 58 pci_bw_outbound_high) }, 54 59 { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, 55 60 pci_bw_outbound_low) }, 61 + { MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats, 62 + pci_bw_stale_event) }, 56 63 }; 57 64 58 65 #define NUM_PCIE_CONG_COUNTERS ARRAY_SIZE(mlx5e_pcie_cong_stats_desc) ··· 215 218 } 216 219 217 220 changes = cong_event->state ^ new_cong_state; 218 - if (!changes) 221 + if (!changes) { 222 + cong_event->stats.pci_bw_stale_event++; 219 223 return; 224 + } 220 225 221 226 cong_event->state = new_cong_state; 222 227 ··· 248 249 return NOTIFY_OK; 249 250 } 250 251 252 + static int 253 + mlx5e_pcie_cong_get_thresh_config(struct mlx5_core_dev *dev, 254 + struct mlx5e_pcie_cong_thresh *config) 255 + { 256 + u32 ids[4] = { 257 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_LOW, 258 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_IN_HIGH, 259 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_LOW, 260 + MLX5_DEVLINK_PARAM_ID_PCIE_CONG_OUT_HIGH, 261 + }; 262 + struct devlink *devlink = priv_to_devlink(dev); 263 + union devlink_param_value val[4]; 264 + 265 + for (int i = 0; i < 4; i++) { 266 + u32 id = ids[i]; 267 + int err; 268 + 269 + err = devl_param_driverinit_value_get(devlink, id, &val[i]); 270 + if (err) 271 + return err; 272 + } 273 + 274 + config->inbound_low = val[0].vu16; 275 + config->inbound_high = val[1].vu16; 276 + config->outbound_low = val[2].vu16; 277 + config->outbound_high = val[3].vu16; 278 + 279 + return 0; 280 + } 281 + 282 + static int 283 + mlx5e_thresh_config_validate(struct mlx5_core_dev *mdev, 284 + const struct mlx5e_pcie_cong_thresh *config) 285 + { 286 + int err = 0; 287 + 288 + if (config->inbound_low >= config->inbound_high) { 289 + err = -EINVAL; 290 + mlx5_core_err(mdev, "PCIe inbound congestion threshold configuration invalid: low (%u) >= high (%u).\n", 291 + config->inbound_low, config->inbound_high); 292 + } 293 + 294 + if (config->outbound_low >= config->outbound_high) { 295 + err = -EINVAL; 296 + mlx5_core_err(mdev, "PCIe outbound congestion threshold configuration invalid: low (%u) >= high (%u).\n", 297 + config->outbound_low, config->outbound_high); 298 + } 299 + 300 + return err; 301 + } 302 + 251 303 int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv) 252 304 { 305 + struct mlx5e_pcie_cong_thresh thresh_config = {}; 253 306 struct mlx5e_pcie_cong_event *cong_event; 254 307 struct mlx5_core_dev *mdev = priv->mdev; 255 308 int err; 256 309 257 310 if (!mlx5_pcie_cong_event_supported(mdev)) 258 311 return 0; 312 + 313 + err = mlx5e_pcie_cong_get_thresh_config(mdev, &thresh_config); 314 + if (WARN_ON(err)) 315 + return err; 316 + 317 + err = mlx5e_thresh_config_validate(mdev, &thresh_config); 318 + if (err) { 319 + mlx5_core_err(mdev, "PCIe congestion event feature disabled\n"); 320 + return err; 321 + } 259 322 260 323 cong_event = kvzalloc_node(sizeof(*cong_event), GFP_KERNEL, 261 324 mdev->priv.numa_node); ··· 330 269 331 270 cong_event->priv = priv; 332 271 333 - err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config, 272 + err = mlx5_cmd_pcie_cong_event_set(mdev, &thresh_config, 334 273 &cong_event->obj_id); 335 274 if (err) { 336 275 mlx5_core_warn(mdev, "Error creating a PCIe congestion event object\n");