Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'support-rate-management-on-traffic-classes-in-devlink-and-mlx5'

Mark Bloch says:

====================
Support rate management on traffic classes in devlink and mlx5

This patch series extends the devlink-rate API to support traffic class
(TC) bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.

Additionally the series refines the QoS handling in net/mlx5 to support
TC arbitration and bandwidth management on vports and rate nodes.

Discussions on traffic class shaping in net-shapers began in V5 [1],
where we discussed with maintainers whether net-shapers should support
traffic classes and how this could be implemented.

Later, after further conversations with Paolo Abeni and Simon Horman,
Cosmin provided an update [2], confirming that net-shapers' tree-based
hierarchy aligns well with traffic classes when treated as distinct
subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
queues and traffic classes, this approach seems feasible, though some
open questions remain regarding queue reconfiguration and certain mlx5
scheduling behaviors.

Building on that discussion, Cosmin has now shared a concrete
implementation plan on the netdev mailing list [3]. The plan, developed
in collaboration with Paolo and Simon, outlines how net-shapers can be
extended to support the same use cases currently covered by
devlink-rate, with the eventual goal of aligning both and simplifying
the shaping infrastructure in the kernel.

This work was presented at Netdev 0x19 in Zagreb [4].
There we presented how TC scheduling is enforced in mlx5 hardware,
which led to discussions on the mailing list.

A summary of how things work:

Classification means labeling a packet with a traffic class based on
the packet's DSCP or VLAN PCP field, then treating packets with
different traffic classes differently during transmit processing.

In a virtualized setup, VFs are untrusted and do not control
classification or shaping. Classification is done by the hardware using
a prio-to-TC mapping set by the hypervisor. VFs only select which send
queue to use and are expected to respect the classification logic by
sending each traffic class on its dedicated queue. As stated in the
net-shapers plan [3], each transmit queue should carry only a single
traffic class. Mixing classes in a single queue can lead to HOL
blocking.

In the mlx5 implementation, if the queue used does not match the
classified traffic class, the hardware moves the queue to the correct
TC scheduler. This movement is not a reclassification; it’s a necessary
enforcement step to ensure traffic class isolation is maintained.

Extend devlink-rate API to support rate management on TCs:
- devlink: Extend the devlink rate API to support traffic class
bandwidth management

Introduce a no-op implementation:
- net/mlx5: Add no-op implementation for setting tc-bw on rate objects

Add support for enabling and disabling TC QoS on vports and nodes:
- net/mlx5: Add support for setting tc-bw on nodes
- net/mlx5: Add traffic class scheduling support for vport QoS

Support for setting tc-bw on rate objects:
- net/mlx5: Manage TC arbiter nodes and implement full support for
tc-bw

[1]
https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/
[2]
https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.camel@nvidia.com/
[3]
https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.camel@nvidia.com/T/
[4]
https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-with-ets-and-traffic-classes.html
====================

Link: https://patch.msgid.link/20250629142138.361537-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+1825 -73
+31 -1
Documentation/netlink/specs/devlink.yaml
··· 224 224 value: 10 225 225 - 226 226 name: binary 227 + - 228 + name: rate-tc-index-max 229 + type: const 230 + value: 7 227 231 228 232 attribute-sets: 229 233 - ··· 848 844 - 849 845 name: region-direct 850 846 type: flag 851 - 847 + - 848 + name: rate-tc-bws 849 + type: nest 850 + multi-attr: true 851 + nested-attributes: dl-rate-tc-bws 852 + - 853 + name: rate-tc-index 854 + type: u8 855 + checks: 856 + max: rate-tc-index-max 857 + - 858 + name: rate-tc-bw 859 + type: u32 860 + doc: | 861 + Specifies the bandwidth share assigned to the Traffic Class. 862 + The bandwidth for the traffic class is determined 863 + in proportion to the sum of the shares of all configured classes. 852 864 - 853 865 name: dl-dev-stats 854 866 subset-of: devlink ··· 1269 1249 - 1270 1250 name: flash 1271 1251 type: flag 1252 + - 1253 + name: dl-rate-tc-bws 1254 + subset-of: devlink 1255 + attributes: 1256 + - 1257 + name: rate-tc-index 1258 + - 1259 + name: rate-tc-bw 1272 1260 1273 1261 operations: 1274 1262 enum-model: directional ··· 2204 2176 - rate-tx-priority 2205 2177 - rate-tx-weight 2206 2178 - rate-parent-node-name 2179 + - rate-tc-bws 2207 2180 2208 2181 - 2209 2182 name: rate-new ··· 2225 2196 - rate-tx-priority 2226 2197 - rate-tx-weight 2227 2198 - rate-parent-node-name 2199 + - rate-tc-bws 2228 2200 2229 2201 - 2230 2202 name: rate-del
+2
drivers/net/ethernet/mellanox/mlx5/core/devlink.c
··· 376 376 .eswitch_encap_mode_get = mlx5_devlink_eswitch_encap_mode_get, 377 377 .rate_leaf_tx_share_set = mlx5_esw_devlink_rate_leaf_tx_share_set, 378 378 .rate_leaf_tx_max_set = mlx5_esw_devlink_rate_leaf_tx_max_set, 379 + .rate_leaf_tc_bw_set = mlx5_esw_devlink_rate_leaf_tc_bw_set, 380 + .rate_node_tc_bw_set = mlx5_esw_devlink_rate_node_tc_bw_set, 379 381 .rate_node_tx_share_set = mlx5_esw_devlink_rate_node_tx_share_set, 380 382 .rate_node_tx_max_set = mlx5_esw_devlink_rate_node_tx_max_set, 381 383 .rate_node_new = mlx5_esw_devlink_rate_node_new,
+1005 -32
drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
··· 64 64 enum sched_node_type { 65 65 SCHED_NODE_TYPE_VPORTS_TSAR, 66 66 SCHED_NODE_TYPE_VPORT, 67 + SCHED_NODE_TYPE_TC_ARBITER_TSAR, 68 + SCHED_NODE_TYPE_RATE_LIMITER, 69 + SCHED_NODE_TYPE_VPORT_TC, 70 + SCHED_NODE_TYPE_VPORTS_TC_TSAR, 67 71 }; 68 72 69 73 static const char * const sched_node_type_str[] = { 70 74 [SCHED_NODE_TYPE_VPORTS_TSAR] = "vports TSAR", 71 75 [SCHED_NODE_TYPE_VPORT] = "vport", 76 + [SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR", 77 + [SCHED_NODE_TYPE_RATE_LIMITER] = "Rate Limiter", 78 + [SCHED_NODE_TYPE_VPORT_TC] = "vport TC", 79 + [SCHED_NODE_TYPE_VPORTS_TC_TSAR] = "vports TC TSAR", 72 80 }; 73 81 74 82 struct mlx5_esw_sched_node { ··· 100 92 struct mlx5_vport *vport; 101 93 /* Level in the hierarchy. The root node level is 1. */ 102 94 u8 level; 95 + /* Valid only when this node represents a traffic class. */ 96 + u8 tc; 103 97 }; 104 98 105 99 static void esw_qos_node_attach_to_parent(struct mlx5_esw_sched_node *node) ··· 116 106 } 117 107 } 118 108 109 + static int esw_qos_num_tcs(struct mlx5_core_dev *dev) 110 + { 111 + int num_tcs = mlx5_max_tc(dev) + 1; 112 + 113 + return num_tcs < DEVLINK_RATE_TCS_MAX ? num_tcs : DEVLINK_RATE_TCS_MAX; 114 + } 115 + 119 116 static void 120 117 esw_qos_node_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_sched_node *parent) 121 118 { ··· 133 116 esw_qos_node_attach_to_parent(node); 134 117 } 135 118 119 + static void esw_qos_nodes_set_parent(struct list_head *nodes, 120 + struct mlx5_esw_sched_node *parent) 121 + { 122 + struct mlx5_esw_sched_node *node, *tmp; 123 + 124 + list_for_each_entry_safe(node, tmp, nodes, entry) { 125 + esw_qos_node_set_parent(node, parent); 126 + if (!list_empty(&node->children) && 127 + parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { 128 + struct mlx5_esw_sched_node *child; 129 + 130 + list_for_each_entry(child, &node->children, entry) { 131 + struct mlx5_vport *vport = child->vport; 132 + 133 + if (vport) 134 + vport->qos.sched_node->parent = parent; 135 + } 136 + } 137 + } 138 + } 139 + 136 140 void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport) 137 141 { 142 + if (vport->qos.sched_nodes) { 143 + int num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev); 144 + int i; 145 + 146 + for (i = 0; i < num_tcs; i++) 147 + kfree(vport->qos.sched_nodes[i]); 148 + kfree(vport->qos.sched_nodes); 149 + } 150 + 138 151 kfree(vport->qos.sched_node); 139 152 memset(&vport->qos, 0, sizeof(vport->qos)); 140 153 } ··· 188 141 189 142 static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op) 190 143 { 191 - if (node->vport) { 144 + switch (node->type) { 145 + case SCHED_NODE_TYPE_VPORTS_TC_TSAR: 146 + esw_warn(node->esw->dev, 147 + "E-Switch %s %s scheduling element failed (tc=%d,err=%d)\n", 148 + op, sched_node_type_str[node->type], node->tc, err); 149 + break; 150 + case SCHED_NODE_TYPE_VPORT_TC: 151 + esw_warn(node->esw->dev, 152 + "E-Switch %s %s scheduling element failed (vport=%d,tc=%d,err=%d)\n", 153 + op, 154 + sched_node_type_str[node->type], 155 + node->vport->vport, node->tc, err); 156 + break; 157 + case SCHED_NODE_TYPE_VPORT: 192 158 esw_warn(node->esw->dev, 193 159 "E-Switch %s %s scheduling element failed (vport=%d,err=%d)\n", 194 160 op, sched_node_type_str[node->type], node->vport->vport, err); 195 - return; 161 + break; 162 + case SCHED_NODE_TYPE_RATE_LIMITER: 163 + case SCHED_NODE_TYPE_TC_ARBITER_TSAR: 164 + case SCHED_NODE_TYPE_VPORTS_TSAR: 165 + esw_warn(node->esw->dev, 166 + "E-Switch %s %s scheduling element failed (err=%d)\n", 167 + op, sched_node_type_str[node->type], err); 168 + break; 169 + default: 170 + esw_warn(node->esw->dev, 171 + "E-Switch %s scheduling element failed (err=%d)\n", 172 + op, err); 173 + break; 196 174 } 197 - 198 - esw_warn(node->esw->dev, 199 - "E-Switch %s %s scheduling element failed (err=%d)\n", 200 - op, sched_node_type_str[node->type], err); 201 175 } 202 176 203 177 static int esw_qos_node_create_sched_element(struct mlx5_esw_sched_node *node, void *ctx, ··· 301 233 return 0; 302 234 } 303 235 236 + static int esw_qos_create_rate_limit_element(struct mlx5_esw_sched_node *node, 237 + struct netlink_ext_ack *extack) 238 + { 239 + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; 240 + 241 + if (!mlx5_qos_element_type_supported( 242 + node->esw->dev, 243 + SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT, 244 + SCHEDULING_HIERARCHY_E_SWITCH)) 245 + return -EOPNOTSUPP; 246 + 247 + MLX5_SET(scheduling_context, sched_ctx, max_average_bw, node->max_rate); 248 + MLX5_SET(scheduling_context, sched_ctx, element_type, 249 + SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT); 250 + 251 + return esw_qos_node_create_sched_element(node, sched_ctx, extack); 252 + } 253 + 304 254 static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw, 305 255 struct mlx5_esw_sched_node *parent) 306 256 { ··· 352 266 return 0; 353 267 } 354 268 355 - static u32 esw_qos_calc_bw_share(u32 min_rate, u32 divider, u32 fw_max) 269 + static u32 esw_qos_calc_bw_share(u32 value, u32 divider, u32 fw_max) 356 270 { 357 271 if (!divider) 358 272 return 0; 359 - return min_t(u32, max_t(u32, DIV_ROUND_UP(min_rate, divider), MLX5_MIN_BW_SHARE), fw_max); 273 + return min_t(u32, fw_max, 274 + max_t(u32, 275 + DIV_ROUND_UP(value, divider), MLX5_MIN_BW_SHARE)); 360 276 } 361 277 362 278 static void esw_qos_update_sched_node_bw_share(struct mlx5_esw_sched_node *node, ··· 385 297 if (node->esw != esw || node->ix == esw->qos.root_tsar_ix) 386 298 continue; 387 299 388 - esw_qos_update_sched_node_bw_share(node, divider, extack); 300 + /* Vports TC TSARs don't have a minimum rate configured, 301 + * so there's no need to update the bw_share on them. 302 + */ 303 + if (node->type != SCHED_NODE_TYPE_VPORTS_TC_TSAR) { 304 + esw_qos_update_sched_node_bw_share(node, divider, 305 + extack); 306 + } 389 307 390 308 if (list_empty(&node->children)) 391 309 continue; 392 310 393 311 esw_qos_normalize_min_rate(node->esw, node, extack); 394 312 } 313 + } 314 + 315 + static u32 esw_qos_calculate_tc_bw_divider(u32 *tc_bw) 316 + { 317 + u32 total = 0; 318 + int i; 319 + 320 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) 321 + total += tc_bw[i]; 322 + 323 + /* If total is zero, tc-bw config is disabled and we shouldn't reach 324 + * here. 325 + */ 326 + return WARN_ON(!total) ? 1 : total; 395 327 } 396 328 397 329 static int esw_qos_set_node_min_rate(struct mlx5_esw_sched_node *node, ··· 458 350 tsar_ix); 459 351 } 460 352 461 - static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, 462 - struct netlink_ext_ack *extack) 353 + static int 354 + esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_node, 355 + struct netlink_ext_ack *extack) 463 356 { 464 357 u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; 465 358 struct mlx5_core_dev *dev = vport_node->esw->dev; 466 359 void *attr; 467 360 468 - if (!mlx5_qos_element_type_supported(dev, 469 - SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT, 470 - SCHEDULING_HIERARCHY_E_SWITCH)) 361 + if (!mlx5_qos_element_type_supported( 362 + dev, 363 + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT, 364 + SCHEDULING_HIERARCHY_E_SWITCH)) 471 365 return -EOPNOTSUPP; 472 366 473 367 MLX5_SET(scheduling_context, sched_ctx, element_type, 474 368 SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT); 475 369 attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); 476 370 MLX5_SET(vport_element, attr, vport_number, vport_node->vport->vport); 477 - MLX5_SET(scheduling_context, sched_ctx, parent_element_id, vport_node->parent->ix); 478 - MLX5_SET(scheduling_context, sched_ctx, max_average_bw, vport_node->max_rate); 371 + MLX5_SET(scheduling_context, sched_ctx, parent_element_id, 372 + vport_node->parent->ix); 373 + MLX5_SET(scheduling_context, sched_ctx, max_average_bw, 374 + vport_node->max_rate); 479 375 480 376 return esw_qos_node_create_sched_element(vport_node, sched_ctx, extack); 377 + } 378 + 379 + static int 380 + esw_qos_vport_tc_create_sched_element(struct mlx5_esw_sched_node *vport_tc_node, 381 + u32 rate_limit_elem_ix, 382 + struct netlink_ext_ack *extack) 383 + { 384 + u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; 385 + struct mlx5_core_dev *dev = vport_tc_node->esw->dev; 386 + void *attr; 387 + 388 + if (!mlx5_qos_element_type_supported( 389 + dev, 390 + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC, 391 + SCHEDULING_HIERARCHY_E_SWITCH)) 392 + return -EOPNOTSUPP; 393 + 394 + MLX5_SET(scheduling_context, sched_ctx, element_type, 395 + SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC); 396 + attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes); 397 + MLX5_SET(vport_tc_element, attr, vport_number, 398 + vport_tc_node->vport->vport); 399 + MLX5_SET(vport_tc_element, attr, traffic_class, vport_tc_node->tc); 400 + MLX5_SET(scheduling_context, sched_ctx, max_bw_obj_id, 401 + rate_limit_elem_ix); 402 + MLX5_SET(scheduling_context, sched_ctx, parent_element_id, 403 + vport_tc_node->parent->ix); 404 + MLX5_SET(scheduling_context, sched_ctx, bw_share, 405 + vport_tc_node->bw_share); 406 + 407 + return esw_qos_node_create_sched_element(vport_tc_node, sched_ctx, 408 + extack); 481 409 } 482 410 483 411 static struct mlx5_esw_sched_node * ··· 532 388 node->parent = parent; 533 389 INIT_LIST_HEAD(&node->children); 534 390 esw_qos_node_attach_to_parent(node); 391 + if (!parent) { 392 + /* The caller is responsible for inserting the node into the 393 + * parent list if necessary. This function can also be used with 394 + * a NULL parent, which doesn't necessarily indicate that it 395 + * refers to the root scheduling element. 396 + */ 397 + list_del_init(&node->entry); 398 + } 535 399 536 400 return node; 537 401 } ··· 554 402 { 555 403 esw_qos_node_destroy_sched_element(node, extack); 556 404 __esw_qos_free_node(node); 405 + } 406 + 407 + static int esw_qos_create_vports_tc_node(struct mlx5_esw_sched_node *parent, 408 + u8 tc, struct netlink_ext_ack *extack) 409 + { 410 + u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; 411 + struct mlx5_core_dev *dev = parent->esw->dev; 412 + struct mlx5_esw_sched_node *vports_tc_node; 413 + void *attr; 414 + int err; 415 + 416 + if (!mlx5_qos_element_type_supported( 417 + dev, 418 + SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR, 419 + SCHEDULING_HIERARCHY_E_SWITCH) || 420 + !mlx5_qos_tsar_type_supported(dev, 421 + TSAR_ELEMENT_TSAR_TYPE_DWRR, 422 + SCHEDULING_HIERARCHY_E_SWITCH)) 423 + return -EOPNOTSUPP; 424 + 425 + vports_tc_node = __esw_qos_alloc_node(parent->esw, 0, 426 + SCHED_NODE_TYPE_VPORTS_TC_TSAR, 427 + parent); 428 + if (!vports_tc_node) { 429 + NL_SET_ERR_MSG_MOD(extack, "E-Switch alloc node failed"); 430 + esw_warn(dev, "Failed to alloc vports TC node (tc=%d)\n", tc); 431 + return -ENOMEM; 432 + } 433 + 434 + attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes); 435 + MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_DWRR); 436 + MLX5_SET(tsar_element, attr, traffic_class, tc); 437 + MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, parent->ix); 438 + MLX5_SET(scheduling_context, tsar_ctx, element_type, 439 + SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR); 440 + 441 + err = esw_qos_node_create_sched_element(vports_tc_node, tsar_ctx, 442 + extack); 443 + if (err) 444 + goto err_create_sched_element; 445 + 446 + vports_tc_node->tc = tc; 447 + 448 + return 0; 449 + 450 + err_create_sched_element: 451 + __esw_qos_free_node(vports_tc_node); 452 + return err; 453 + } 454 + 455 + static void 456 + esw_qos_tc_arbiter_get_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, 457 + u32 *tc_bw) 458 + { 459 + struct mlx5_esw_sched_node *vports_tc_node; 460 + 461 + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) 462 + tc_bw[vports_tc_node->tc] = vports_tc_node->bw_share; 463 + } 464 + 465 + static void 466 + esw_qos_set_tc_arbiter_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, 467 + u32 *tc_bw, struct netlink_ext_ack *extack) 468 + { 469 + struct mlx5_eswitch *esw = tc_arbiter_node->esw; 470 + struct mlx5_esw_sched_node *vports_tc_node; 471 + u32 divider, fw_max_bw_share; 472 + 473 + fw_max_bw_share = MLX5_CAP_QOS(esw->dev, max_tsar_bw_share); 474 + divider = esw_qos_calculate_tc_bw_divider(tc_bw); 475 + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { 476 + u8 tc = vports_tc_node->tc; 477 + u32 bw_share; 478 + 479 + bw_share = tc_bw[tc] * fw_max_bw_share; 480 + bw_share = esw_qos_calc_bw_share(bw_share, divider, 481 + fw_max_bw_share); 482 + esw_qos_sched_elem_config(vports_tc_node, 0, bw_share, extack); 483 + } 484 + } 485 + 486 + static void 487 + esw_qos_destroy_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node, 488 + struct netlink_ext_ack *extack) 489 + { 490 + struct mlx5_esw_sched_node *vports_tc_node, *tmp; 491 + 492 + list_for_each_entry_safe(vports_tc_node, tmp, 493 + &tc_arbiter_node->children, entry) 494 + esw_qos_destroy_node(vports_tc_node, extack); 495 + } 496 + 497 + static int 498 + esw_qos_create_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node, 499 + struct netlink_ext_ack *extack) 500 + { 501 + struct mlx5_eswitch *esw = tc_arbiter_node->esw; 502 + int err, i, num_tcs = esw_qos_num_tcs(esw->dev); 503 + 504 + for (i = 0; i < num_tcs; i++) { 505 + err = esw_qos_create_vports_tc_node(tc_arbiter_node, i, extack); 506 + if (err) 507 + goto err_tc_node_create; 508 + } 509 + 510 + return 0; 511 + 512 + err_tc_node_create: 513 + esw_qos_destroy_vports_tc_nodes(tc_arbiter_node, NULL); 514 + return err; 515 + } 516 + 517 + static int esw_qos_create_tc_arbiter_sched_elem( 518 + struct mlx5_esw_sched_node *tc_arbiter_node, 519 + struct netlink_ext_ack *extack) 520 + { 521 + u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {}; 522 + u32 tsar_parent_ix; 523 + void *attr; 524 + 525 + if (!mlx5_qos_tsar_type_supported(tc_arbiter_node->esw->dev, 526 + TSAR_ELEMENT_TSAR_TYPE_TC_ARB, 527 + SCHEDULING_HIERARCHY_E_SWITCH)) { 528 + NL_SET_ERR_MSG_MOD(extack, 529 + "E-Switch TC Arbiter scheduling element is not supported"); 530 + return -EOPNOTSUPP; 531 + } 532 + 533 + attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes); 534 + MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_TC_ARB); 535 + tsar_parent_ix = tc_arbiter_node->parent ? tc_arbiter_node->parent->ix : 536 + tc_arbiter_node->esw->qos.root_tsar_ix; 537 + MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, 538 + tsar_parent_ix); 539 + MLX5_SET(scheduling_context, tsar_ctx, element_type, 540 + SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR); 541 + MLX5_SET(scheduling_context, tsar_ctx, max_average_bw, 542 + tc_arbiter_node->max_rate); 543 + MLX5_SET(scheduling_context, tsar_ctx, bw_share, 544 + tc_arbiter_node->bw_share); 545 + 546 + return esw_qos_node_create_sched_element(tc_arbiter_node, tsar_ctx, 547 + extack); 557 548 } 558 549 559 550 static struct mlx5_esw_sched_node * ··· 721 426 goto err_alloc_node; 722 427 } 723 428 429 + list_add_tail(&node->entry, &esw->qos.domain->nodes); 724 430 esw_qos_normalize_min_rate(esw, NULL, extack); 725 431 trace_mlx5_esw_node_qos_create(esw->dev, node, node->ix); 726 432 ··· 763 467 { 764 468 struct mlx5_eswitch *esw = node->esw; 765 469 470 + if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) 471 + esw_qos_destroy_vports_tc_nodes(node, extack); 472 + 766 473 trace_mlx5_esw_node_qos_destroy(esw->dev, node, node->ix); 767 474 esw_qos_destroy_node(node, extack); 768 475 esw_qos_normalize_min_rate(esw, NULL, extack); ··· 797 498 SCHED_NODE_TYPE_VPORTS_TSAR, 798 499 NULL)) 799 500 esw->qos.node0 = ERR_PTR(-ENOMEM); 501 + else 502 + list_add_tail(&esw->qos.node0->entry, 503 + &esw->qos.domain->nodes); 800 504 } 801 505 if (IS_ERR(esw->qos.node0)) { 802 506 err = PTR_ERR(esw->qos.node0); ··· 857 555 esw_qos_destroy(esw); 858 556 } 859 557 558 + static void 559 + esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node, 560 + struct netlink_ext_ack *extack) 561 + { 562 + /* Clean up all Vports TC nodes within the TC arbiter node. */ 563 + esw_qos_destroy_vports_tc_nodes(node, extack); 564 + /* Destroy the scheduling element for the TC arbiter node itself. */ 565 + esw_qos_node_destroy_sched_element(node, extack); 566 + } 567 + 568 + static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node, 569 + struct netlink_ext_ack *extack) 570 + { 571 + u32 curr_ix = node->ix; 572 + int err; 573 + 574 + err = esw_qos_create_tc_arbiter_sched_elem(node, extack); 575 + if (err) 576 + return err; 577 + /* Initialize the vports TC nodes within created TC arbiter TSAR. */ 578 + err = esw_qos_create_vports_tc_nodes(node, extack); 579 + if (err) 580 + goto err_vports_tc_nodes; 581 + 582 + node->type = SCHED_NODE_TYPE_TC_ARBITER_TSAR; 583 + 584 + return 0; 585 + 586 + err_vports_tc_nodes: 587 + /* If initialization fails, clean up the scheduling element 588 + * for the TC arbiter node. 589 + */ 590 + esw_qos_node_destroy_sched_element(node, NULL); 591 + node->ix = curr_ix; 592 + return err; 593 + } 594 + 595 + static int 596 + esw_qos_create_vport_tc_sched_node(struct mlx5_vport *vport, 597 + u32 rate_limit_elem_ix, 598 + struct mlx5_esw_sched_node *vports_tc_node, 599 + struct netlink_ext_ack *extack) 600 + { 601 + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; 602 + struct mlx5_esw_sched_node *vport_tc_node; 603 + u8 tc = vports_tc_node->tc; 604 + int err; 605 + 606 + vport_tc_node = __esw_qos_alloc_node(vport_node->esw, 0, 607 + SCHED_NODE_TYPE_VPORT_TC, 608 + vports_tc_node); 609 + if (!vport_tc_node) 610 + return -ENOMEM; 611 + 612 + vport_tc_node->min_rate = vport_node->min_rate; 613 + vport_tc_node->tc = tc; 614 + vport_tc_node->vport = vport; 615 + err = esw_qos_vport_tc_create_sched_element(vport_tc_node, 616 + rate_limit_elem_ix, 617 + extack); 618 + if (err) 619 + goto err_out; 620 + 621 + vport->qos.sched_nodes[tc] = vport_tc_node; 622 + 623 + return 0; 624 + err_out: 625 + __esw_qos_free_node(vport_tc_node); 626 + return err; 627 + } 628 + 629 + static void 630 + esw_qos_destroy_vport_tc_sched_elements(struct mlx5_vport *vport, 631 + struct netlink_ext_ack *extack) 632 + { 633 + int i, num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev); 634 + 635 + for (i = 0; i < num_tcs; i++) { 636 + if (vport->qos.sched_nodes[i]) { 637 + __esw_qos_destroy_node(vport->qos.sched_nodes[i], 638 + extack); 639 + } 640 + } 641 + 642 + kfree(vport->qos.sched_nodes); 643 + vport->qos.sched_nodes = NULL; 644 + } 645 + 646 + static int 647 + esw_qos_create_vport_tc_sched_elements(struct mlx5_vport *vport, 648 + enum sched_node_type type, 649 + struct netlink_ext_ack *extack) 650 + { 651 + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; 652 + struct mlx5_esw_sched_node *tc_arbiter_node, *vports_tc_node; 653 + int err, num_tcs = esw_qos_num_tcs(vport_node->esw->dev); 654 + u32 rate_limit_elem_ix; 655 + 656 + vport->qos.sched_nodes = kcalloc(num_tcs, 657 + sizeof(struct mlx5_esw_sched_node *), 658 + GFP_KERNEL); 659 + if (!vport->qos.sched_nodes) { 660 + NL_SET_ERR_MSG_MOD(extack, 661 + "Allocating the vport TC scheduling elements failed."); 662 + return -ENOMEM; 663 + } 664 + 665 + rate_limit_elem_ix = type == SCHED_NODE_TYPE_RATE_LIMITER ? 666 + vport_node->ix : 0; 667 + tc_arbiter_node = type == SCHED_NODE_TYPE_RATE_LIMITER ? 668 + vport_node->parent : vport_node; 669 + list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) { 670 + err = esw_qos_create_vport_tc_sched_node(vport, 671 + rate_limit_elem_ix, 672 + vports_tc_node, 673 + extack); 674 + if (err) 675 + goto err_create_vport_tc; 676 + } 677 + 678 + return 0; 679 + 680 + err_create_vport_tc: 681 + esw_qos_destroy_vport_tc_sched_elements(vport, NULL); 682 + 683 + return err; 684 + } 685 + 686 + static int 687 + esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type, 688 + struct netlink_ext_ack *extack) 689 + { 690 + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; 691 + int err, new_level, max_level; 692 + 693 + if (type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { 694 + /* Increase the parent's level by 2 to account for both the 695 + * TC arbiter and the vports TC scheduling element. 696 + */ 697 + new_level = vport_node->parent->level + 2; 698 + max_level = 1 << MLX5_CAP_QOS(vport_node->esw->dev, 699 + log_esw_max_sched_depth); 700 + if (new_level > max_level) { 701 + NL_SET_ERR_MSG_MOD(extack, 702 + "TC arbitration on leafs is not supported beyond max scheduling depth"); 703 + return -EOPNOTSUPP; 704 + } 705 + } 706 + 707 + esw_assert_qos_lock_held(vport->dev->priv.eswitch); 708 + 709 + if (type == SCHED_NODE_TYPE_RATE_LIMITER) 710 + err = esw_qos_create_rate_limit_element(vport_node, extack); 711 + else 712 + err = esw_qos_tc_arbiter_scheduling_setup(vport_node, extack); 713 + if (err) 714 + return err; 715 + 716 + /* Rate limiters impact multiple nodes not directly connected to them 717 + * and are not direct members of the QoS hierarchy. 718 + * Unlink it from the parent to reflect that. 719 + */ 720 + if (type == SCHED_NODE_TYPE_RATE_LIMITER) { 721 + list_del_init(&vport_node->entry); 722 + vport_node->level = 0; 723 + } 724 + 725 + err = esw_qos_create_vport_tc_sched_elements(vport, type, extack); 726 + if (err) 727 + goto err_sched_nodes; 728 + 729 + return 0; 730 + 731 + err_sched_nodes: 732 + if (type == SCHED_NODE_TYPE_RATE_LIMITER) { 733 + esw_qos_node_destroy_sched_element(vport_node, NULL); 734 + list_add_tail(&vport_node->entry, 735 + &vport_node->parent->children); 736 + vport_node->level = vport_node->parent->level + 1; 737 + } else { 738 + esw_qos_tc_arbiter_scheduling_teardown(vport_node, NULL); 739 + } 740 + return err; 741 + } 742 + 743 + static void esw_qos_vport_tc_disable(struct mlx5_vport *vport, 744 + struct netlink_ext_ack *extack) 745 + { 746 + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; 747 + enum sched_node_type curr_type = vport_node->type; 748 + 749 + esw_qos_destroy_vport_tc_sched_elements(vport, extack); 750 + 751 + if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER) 752 + esw_qos_node_destroy_sched_element(vport_node, extack); 753 + else 754 + esw_qos_tc_arbiter_scheduling_teardown(vport_node, extack); 755 + } 756 + 757 + static int esw_qos_set_vport_tcs_min_rate(struct mlx5_vport *vport, 758 + u32 min_rate, 759 + struct netlink_ext_ack *extack) 760 + { 761 + struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; 762 + int err, i, num_tcs = esw_qos_num_tcs(vport_node->esw->dev); 763 + 764 + for (i = 0; i < num_tcs; i++) { 765 + err = esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], 766 + min_rate, extack); 767 + if (err) 768 + goto err_out; 769 + } 770 + vport_node->min_rate = min_rate; 771 + 772 + return 0; 773 + err_out: 774 + for (--i; i >= 0; i--) { 775 + esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], 776 + vport_node->min_rate, extack); 777 + } 778 + return err; 779 + } 780 + 860 781 static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack) 861 782 { 862 783 struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node; 863 784 struct mlx5_esw_sched_node *parent = vport_node->parent; 785 + enum sched_node_type curr_type = vport_node->type; 864 786 865 - esw_qos_node_destroy_sched_element(vport_node, extack); 787 + if (curr_type == SCHED_NODE_TYPE_VPORT) 788 + esw_qos_node_destroy_sched_element(vport_node, extack); 789 + else 790 + esw_qos_vport_tc_disable(vport, extack); 866 791 867 792 vport_node->bw_share = 0; 868 793 list_del_init(&vport_node->entry); ··· 1098 569 trace_mlx5_esw_vport_qos_destroy(vport_node->esw->dev, vport); 1099 570 } 1100 571 1101 - static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent, 572 + static int esw_qos_vport_enable(struct mlx5_vport *vport, 573 + enum sched_node_type type, 574 + struct mlx5_esw_sched_node *parent, 1102 575 struct netlink_ext_ack *extack) 1103 576 { 1104 577 int err; ··· 1108 577 esw_assert_qos_lock_held(vport->dev->priv.eswitch); 1109 578 1110 579 esw_qos_node_set_parent(vport->qos.sched_node, parent); 1111 - err = esw_qos_vport_create_sched_element(vport->qos.sched_node, extack); 580 + if (type == SCHED_NODE_TYPE_VPORT) { 581 + err = esw_qos_vport_create_sched_element(vport->qos.sched_node, 582 + extack); 583 + } else { 584 + err = esw_qos_vport_tc_enable(vport, type, extack); 585 + } 1112 586 if (err) 1113 587 return err; 1114 588 589 + vport->qos.sched_node->type = type; 1115 590 esw_qos_normalize_min_rate(parent->esw, parent, extack); 1116 591 trace_mlx5_esw_vport_qos_create(vport->dev, vport, 1117 592 vport->qos.sched_node->max_rate, ··· 1148 611 sched_node->min_rate = min_rate; 1149 612 sched_node->vport = vport; 1150 613 vport->qos.sched_node = sched_node; 1151 - err = esw_qos_vport_enable(vport, parent, extack); 614 + err = esw_qos_vport_enable(vport, type, parent, extack); 1152 615 if (err) { 1153 - __esw_qos_free_node(sched_node); 1154 616 esw_qos_put(esw); 1155 617 vport->qos.sched_node = NULL; 1156 618 } ··· 1202 666 if (!vport_node) 1203 667 return mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, NULL, 0, min_rate, 1204 668 extack); 669 + else if (vport_node->type == SCHED_NODE_TYPE_RATE_LIMITER) 670 + return esw_qos_set_vport_tcs_min_rate(vport, min_rate, extack); 1205 671 else 1206 672 return esw_qos_set_node_min_rate(vport_node, min_rate, extack); 1207 673 } ··· 1236 698 return enabled; 1237 699 } 1238 700 701 + static int esw_qos_vport_tc_check_type(enum sched_node_type curr_type, 702 + enum sched_node_type new_type, 703 + struct netlink_ext_ack *extack) 704 + { 705 + if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && 706 + new_type == SCHED_NODE_TYPE_RATE_LIMITER) { 707 + NL_SET_ERR_MSG_MOD(extack, 708 + "Cannot switch from vport-level TC arbitration to node-level TC arbitration"); 709 + return -EOPNOTSUPP; 710 + } 711 + 712 + if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER && 713 + new_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { 714 + NL_SET_ERR_MSG_MOD(extack, 715 + "Cannot switch from node-level TC arbitration to vport-level TC arbitration"); 716 + return -EOPNOTSUPP; 717 + } 718 + 719 + return 0; 720 + } 721 + 722 + static int esw_qos_vport_update(struct mlx5_vport *vport, 723 + enum sched_node_type type, 724 + struct mlx5_esw_sched_node *parent, 725 + struct netlink_ext_ack *extack) 726 + { 727 + struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent; 728 + enum sched_node_type curr_type = vport->qos.sched_node->type; 729 + u32 curr_tc_bw[DEVLINK_RATE_TCS_MAX] = {0}; 730 + int err; 731 + 732 + esw_assert_qos_lock_held(vport->dev->priv.eswitch); 733 + parent = parent ?: curr_parent; 734 + if (curr_type == type && curr_parent == parent) 735 + return 0; 736 + 737 + err = esw_qos_vport_tc_check_type(curr_type, type, extack); 738 + if (err) 739 + return err; 740 + 741 + if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) { 742 + esw_qos_tc_arbiter_get_bw_shares(vport->qos.sched_node, 743 + curr_tc_bw); 744 + } 745 + 746 + esw_qos_vport_disable(vport, extack); 747 + 748 + err = esw_qos_vport_enable(vport, type, parent, extack); 749 + if (err) { 750 + esw_qos_vport_enable(vport, curr_type, curr_parent, NULL); 751 + extack = NULL; 752 + } 753 + 754 + if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) { 755 + esw_qos_set_tc_arbiter_bw_shares(vport->qos.sched_node, 756 + curr_tc_bw, extack); 757 + } 758 + 759 + return err; 760 + } 761 + 1239 762 static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent, 1240 763 struct netlink_ext_ack *extack) 1241 764 { 1242 765 struct mlx5_eswitch *esw = vport->dev->priv.eswitch; 1243 766 struct mlx5_esw_sched_node *curr_parent; 1244 - int err; 767 + enum sched_node_type type; 1245 768 1246 769 esw_assert_qos_lock_held(esw); 1247 770 curr_parent = vport->qos.sched_node->parent; ··· 1310 711 if (curr_parent == parent) 1311 712 return 0; 1312 713 1313 - esw_qos_vport_disable(vport, extack); 714 + /* Set vport QoS type based on parent node type if different from 715 + * default QoS; otherwise, use the vport's current QoS type. 716 + */ 717 + if (parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) 718 + type = SCHED_NODE_TYPE_RATE_LIMITER; 719 + else if (curr_parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) 720 + type = SCHED_NODE_TYPE_VPORT; 721 + else 722 + type = vport->qos.sched_node->type; 1314 723 1315 - err = esw_qos_vport_enable(vport, parent, extack); 724 + return esw_qos_vport_update(vport, type, parent, extack); 725 + } 726 + 727 + static void 728 + esw_qos_switch_vport_tcs_to_vport(struct mlx5_esw_sched_node *tc_arbiter_node, 729 + struct mlx5_esw_sched_node *node, 730 + struct netlink_ext_ack *extack) 731 + { 732 + struct mlx5_esw_sched_node *vports_tc_node, *vport_tc_node, *tmp; 733 + 734 + vports_tc_node = list_first_entry(&tc_arbiter_node->children, 735 + struct mlx5_esw_sched_node, 736 + entry); 737 + 738 + list_for_each_entry_safe(vport_tc_node, tmp, &vports_tc_node->children, 739 + entry) 740 + esw_qos_vport_update_parent(vport_tc_node->vport, node, extack); 741 + } 742 + 743 + static int esw_qos_switch_tc_arbiter_node_to_vports( 744 + struct mlx5_esw_sched_node *tc_arbiter_node, 745 + struct mlx5_esw_sched_node *node, 746 + struct netlink_ext_ack *extack) 747 + { 748 + u32 parent_tsar_ix = node->parent ? 749 + node->parent->ix : node->esw->qos.root_tsar_ix; 750 + int err; 751 + 752 + err = esw_qos_create_node_sched_elem(node->esw->dev, parent_tsar_ix, 753 + node->max_rate, node->bw_share, 754 + &node->ix); 1316 755 if (err) { 1317 - if (esw_qos_vport_enable(vport, curr_parent, NULL)) 1318 - esw_warn(parent->esw->dev, "vport restore QoS failed (vport=%d)\n", 1319 - vport->vport); 756 + NL_SET_ERR_MSG_MOD(extack, 757 + "Failed to create scheduling element for vports node when disabliing vports TC QoS"); 758 + return err; 1320 759 } 1321 760 761 + node->type = SCHED_NODE_TYPE_VPORTS_TSAR; 762 + 763 + /* Disable TC QoS for vports in the arbiter node. */ 764 + esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, extack); 765 + 766 + return 0; 767 + } 768 + 769 + static int esw_qos_switch_vports_node_to_tc_arbiter( 770 + struct mlx5_esw_sched_node *node, 771 + struct mlx5_esw_sched_node *tc_arbiter_node, 772 + struct netlink_ext_ack *extack) 773 + { 774 + struct mlx5_esw_sched_node *vport_node, *tmp; 775 + struct mlx5_vport *vport; 776 + int err; 777 + 778 + /* Enable TC QoS for each vport in the node. */ 779 + list_for_each_entry_safe(vport_node, tmp, &node->children, entry) { 780 + vport = vport_node->vport; 781 + err = esw_qos_vport_update_parent(vport, tc_arbiter_node, 782 + extack); 783 + if (err) 784 + goto err_out; 785 + } 786 + 787 + /* Destroy the current vports node TSAR. */ 788 + err = mlx5_destroy_scheduling_element_cmd(node->esw->dev, 789 + SCHEDULING_HIERARCHY_E_SWITCH, 790 + node->ix); 791 + if (err) 792 + goto err_out; 793 + 794 + return 0; 795 + err_out: 796 + /* Restore vports back into the node if an error occurs. */ 797 + esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, NULL); 798 + 799 + return err; 800 + } 801 + 802 + static struct mlx5_esw_sched_node * 803 + esw_qos_move_node(struct mlx5_esw_sched_node *curr_node) 804 + { 805 + struct mlx5_esw_sched_node *new_node; 806 + 807 + new_node = __esw_qos_alloc_node(curr_node->esw, curr_node->ix, 808 + curr_node->type, NULL); 809 + if (!IS_ERR(new_node)) 810 + esw_qos_nodes_set_parent(&curr_node->children, new_node); 811 + 812 + return new_node; 813 + } 814 + 815 + static int esw_qos_node_disable_tc_arbitration(struct mlx5_esw_sched_node *node, 816 + struct netlink_ext_ack *extack) 817 + { 818 + struct mlx5_esw_sched_node *curr_node; 819 + int err; 820 + 821 + if (node->type != SCHED_NODE_TYPE_TC_ARBITER_TSAR) 822 + return 0; 823 + 824 + /* Allocate a new rate node to hold the current state, which will allow 825 + * for restoring the vports back to this node after disabling TC 826 + * arbitration. 827 + */ 828 + curr_node = esw_qos_move_node(node); 829 + if (IS_ERR(curr_node)) { 830 + NL_SET_ERR_MSG_MOD(extack, "Failed setting up vports node"); 831 + return PTR_ERR(curr_node); 832 + } 833 + 834 + /* Disable TC QoS for all vports, and assign them back to the node. */ 835 + err = esw_qos_switch_tc_arbiter_node_to_vports(curr_node, node, extack); 836 + if (err) 837 + goto err_out; 838 + 839 + /* Clean up the TC arbiter node after disabling TC QoS for vports. */ 840 + esw_qos_tc_arbiter_scheduling_teardown(curr_node, extack); 841 + goto out; 842 + err_out: 843 + esw_qos_nodes_set_parent(&curr_node->children, node); 844 + out: 845 + __esw_qos_free_node(curr_node); 846 + return err; 847 + } 848 + 849 + static int esw_qos_node_enable_tc_arbitration(struct mlx5_esw_sched_node *node, 850 + struct netlink_ext_ack *extack) 851 + { 852 + struct mlx5_esw_sched_node *curr_node, *child; 853 + int err, new_level, max_level; 854 + 855 + if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) 856 + return 0; 857 + 858 + /* Increase the hierarchy level by one to account for the additional 859 + * vports TC scheduling node, and verify that the new level does not 860 + * exceed the maximum allowed depth. 861 + */ 862 + new_level = node->level + 1; 863 + max_level = 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); 864 + if (new_level > max_level) { 865 + NL_SET_ERR_MSG_MOD(extack, 866 + "TC arbitration on nodes is not supported beyond max scheduling depth"); 867 + return -EOPNOTSUPP; 868 + } 869 + 870 + /* Ensure the node does not contain non-leaf children before assigning 871 + * TC bandwidth. 872 + */ 873 + if (!list_empty(&node->children)) { 874 + list_for_each_entry(child, &node->children, entry) { 875 + if (!child->vport) { 876 + NL_SET_ERR_MSG_MOD(extack, 877 + "Cannot configure TC bandwidth on a node with non-leaf children"); 878 + return -EOPNOTSUPP; 879 + } 880 + } 881 + } 882 + 883 + /* Allocate a new node that will store the information of the current 884 + * node. This will be used later to restore the node if necessary. 885 + */ 886 + curr_node = esw_qos_move_node(node); 887 + if (IS_ERR(curr_node)) { 888 + NL_SET_ERR_MSG_MOD(extack, "Failed setting up node TC QoS"); 889 + return PTR_ERR(curr_node); 890 + } 891 + 892 + /* Initialize the TC arbiter node for QoS management. 893 + * This step prepares the node for handling Traffic Class arbitration. 894 + */ 895 + err = esw_qos_tc_arbiter_scheduling_setup(node, extack); 896 + if (err) 897 + goto err_setup; 898 + 899 + /* Enable TC QoS for each vport within the current node. */ 900 + err = esw_qos_switch_vports_node_to_tc_arbiter(curr_node, node, extack); 901 + if (err) 902 + goto err_switch_vports; 903 + goto out; 904 + 905 + err_switch_vports: 906 + esw_qos_tc_arbiter_scheduling_teardown(node, NULL); 907 + node->ix = curr_node->ix; 908 + node->type = curr_node->type; 909 + err_setup: 910 + esw_qos_nodes_set_parent(&curr_node->children, node); 911 + out: 912 + __esw_qos_free_node(curr_node); 1322 913 return err; 1323 914 } 1324 915 ··· 1637 848 return 0; 1638 849 } 1639 850 851 + static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, 852 + u32 *tc_bw) 853 + { 854 + int i, num_tcs = esw_qos_num_tcs(esw->dev); 855 + 856 + for (i = num_tcs; i < DEVLINK_RATE_TCS_MAX; i++) { 857 + if (tc_bw[i]) 858 + return false; 859 + } 860 + 861 + return true; 862 + } 863 + 864 + static bool esw_qos_vport_validate_unsupported_tc_bw(struct mlx5_vport *vport, 865 + u32 *tc_bw) 866 + { 867 + struct mlx5_eswitch *esw = vport->qos.sched_node ? 868 + vport->qos.sched_node->parent->esw : 869 + vport->dev->priv.eswitch; 870 + 871 + return esw_qos_validate_unsupported_tc_bw(esw, tc_bw); 872 + } 873 + 874 + static bool esw_qos_tc_bw_disabled(u32 *tc_bw) 875 + { 876 + int i; 877 + 878 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) { 879 + if (tc_bw[i]) 880 + return false; 881 + } 882 + 883 + return true; 884 + } 885 + 1640 886 int mlx5_esw_qos_init(struct mlx5_eswitch *esw) 1641 887 { 1642 888 if (esw->qos.domain) ··· 1726 902 1727 903 esw_qos_lock(esw); 1728 904 err = mlx5_esw_qos_set_vport_max_rate(vport, tx_max, extack); 905 + esw_qos_unlock(esw); 906 + return err; 907 + } 908 + 909 + int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, 910 + void *priv, 911 + u32 *tc_bw, 912 + struct netlink_ext_ack *extack) 913 + { 914 + struct mlx5_esw_sched_node *vport_node; 915 + struct mlx5_vport *vport = priv; 916 + struct mlx5_eswitch *esw; 917 + bool disable; 918 + int err = 0; 919 + 920 + esw = vport->dev->priv.eswitch; 921 + if (!mlx5_esw_allowed(esw)) 922 + return -EPERM; 923 + 924 + disable = esw_qos_tc_bw_disabled(tc_bw); 925 + esw_qos_lock(esw); 926 + 927 + if (!esw_qos_vport_validate_unsupported_tc_bw(vport, tc_bw)) { 928 + NL_SET_ERR_MSG_MOD(extack, 929 + "E-Switch traffic classes number is not supported"); 930 + err = -EOPNOTSUPP; 931 + goto unlock; 932 + } 933 + 934 + vport_node = vport->qos.sched_node; 935 + if (disable && !vport_node) 936 + goto unlock; 937 + 938 + if (disable) { 939 + if (vport_node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) 940 + err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, 941 + NULL, extack); 942 + goto unlock; 943 + } 944 + 945 + if (!vport_node) { 946 + err = mlx5_esw_qos_vport_enable(vport, 947 + SCHED_NODE_TYPE_TC_ARBITER_TSAR, 948 + NULL, 0, 0, extack); 949 + vport_node = vport->qos.sched_node; 950 + } else { 951 + err = esw_qos_vport_update(vport, 952 + SCHED_NODE_TYPE_TC_ARBITER_TSAR, 953 + NULL, extack); 954 + } 955 + if (!err) 956 + esw_qos_set_tc_arbiter_bw_shares(vport_node, tc_bw, extack); 957 + unlock: 958 + esw_qos_unlock(esw); 959 + return err; 960 + } 961 + 962 + int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, 963 + void *priv, 964 + u32 *tc_bw, 965 + struct netlink_ext_ack *extack) 966 + { 967 + struct mlx5_esw_sched_node *node = priv; 968 + struct mlx5_eswitch *esw = node->esw; 969 + bool disable; 970 + int err; 971 + 972 + if (!esw_qos_validate_unsupported_tc_bw(esw, tc_bw)) { 973 + NL_SET_ERR_MSG_MOD(extack, 974 + "E-Switch traffic classes number is not supported"); 975 + return -EOPNOTSUPP; 976 + } 977 + 978 + disable = esw_qos_tc_bw_disabled(tc_bw); 979 + esw_qos_lock(esw); 980 + if (disable) { 981 + err = esw_qos_node_disable_tc_arbitration(node, extack); 982 + goto unlock; 983 + } 984 + 985 + err = esw_qos_node_enable_tc_arbitration(node, extack); 986 + if (!err) 987 + esw_qos_set_tc_arbiter_bw_shares(node, tc_bw, extack); 988 + unlock: 1729 989 esw_qos_unlock(esw); 1730 990 return err; 1731 991 } ··· 1904 996 } 1905 997 1906 998 esw_qos_lock(esw); 1907 - if (!vport->qos.sched_node && parent) 1908 - err = mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, parent, 0, 0, extack); 1909 - else if (vport->qos.sched_node) 999 + if (!vport->qos.sched_node && parent) { 1000 + enum sched_node_type type; 1001 + 1002 + type = parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR ? 1003 + SCHED_NODE_TYPE_RATE_LIMITER : SCHED_NODE_TYPE_VPORT; 1004 + err = mlx5_esw_qos_vport_enable(vport, type, parent, 0, 0, 1005 + extack); 1006 + } else if (vport->qos.sched_node) { 1910 1007 err = esw_qos_vport_update_parent(vport, parent, extack); 1008 + } 1911 1009 esw_qos_unlock(esw); 1912 1010 return err; 1913 1011 } ··· 1933 1019 return mlx5_esw_qos_vport_update_parent(vport, node, extack); 1934 1020 } 1935 1021 1022 + static bool esw_qos_is_node_empty(struct mlx5_esw_sched_node *node) 1023 + { 1024 + if (list_empty(&node->children)) 1025 + return true; 1026 + 1027 + if (node->type != SCHED_NODE_TYPE_TC_ARBITER_TSAR) 1028 + return false; 1029 + 1030 + node = list_first_entry(&node->children, struct mlx5_esw_sched_node, 1031 + entry); 1032 + 1033 + return esw_qos_is_node_empty(node); 1034 + } 1035 + 1936 1036 static int 1937 1037 mlx5_esw_qos_node_validate_set_parent(struct mlx5_esw_sched_node *node, 1938 1038 struct mlx5_esw_sched_node *parent, ··· 1960 1032 return -EOPNOTSUPP; 1961 1033 } 1962 1034 1963 - if (!list_empty(&node->children)) { 1035 + if (!esw_qos_is_node_empty(node)) { 1964 1036 NL_SET_ERR_MSG_MOD(extack, 1965 1037 "Cannot reassign a node that contains rate objects"); 1966 1038 return -EOPNOTSUPP; 1967 1039 } 1968 1040 1041 + if (parent && parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { 1042 + NL_SET_ERR_MSG_MOD(extack, 1043 + "Cannot attach a node to a parent with TC bandwidth configured"); 1044 + return -EOPNOTSUPP; 1045 + } 1046 + 1969 1047 new_level = parent ? parent->level + 1 : 2; 1048 + if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { 1049 + /* Increase by one to account for the vports TC scheduling 1050 + * element. 1051 + */ 1052 + new_level += 1; 1053 + } 1054 + 1970 1055 max_level = 1 << MLX5_CAP_QOS(node->esw->dev, log_esw_max_sched_depth); 1971 1056 if (new_level > max_level) { 1972 1057 NL_SET_ERR_MSG_MOD(extack, ··· 1988 1047 } 1989 1048 1990 1049 return 0; 1050 + } 1051 + 1052 + static int 1053 + esw_qos_tc_arbiter_node_update_parent(struct mlx5_esw_sched_node *node, 1054 + struct mlx5_esw_sched_node *parent, 1055 + struct netlink_ext_ack *extack) 1056 + { 1057 + struct mlx5_esw_sched_node *curr_parent = node->parent; 1058 + u32 curr_tc_bw[DEVLINK_RATE_TCS_MAX] = {0}; 1059 + struct mlx5_eswitch *esw = node->esw; 1060 + int err; 1061 + 1062 + esw_qos_tc_arbiter_get_bw_shares(node, curr_tc_bw); 1063 + esw_qos_tc_arbiter_scheduling_teardown(node, extack); 1064 + esw_qos_node_set_parent(node, parent); 1065 + err = esw_qos_tc_arbiter_scheduling_setup(node, extack); 1066 + if (err) { 1067 + esw_qos_node_set_parent(node, curr_parent); 1068 + if (esw_qos_tc_arbiter_scheduling_setup(node, extack)) { 1069 + esw_warn(esw->dev, "Node restore QoS failed\n"); 1070 + return err; 1071 + } 1072 + } 1073 + esw_qos_set_tc_arbiter_bw_shares(node, curr_tc_bw, extack); 1074 + 1075 + return err; 1991 1076 } 1992 1077 1993 1078 static int esw_qos_vports_node_update_parent(struct mlx5_esw_sched_node *node, ··· 2061 1094 2062 1095 esw_qos_lock(esw); 2063 1096 curr_parent = node->parent; 2064 - err = esw_qos_vports_node_update_parent(node, parent, extack); 1097 + if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) { 1098 + err = esw_qos_tc_arbiter_node_update_parent(node, parent, 1099 + extack); 1100 + } else { 1101 + err = esw_qos_vports_node_update_parent(node, parent, extack); 1102 + } 1103 + 2065 1104 if (err) 2066 1105 goto out; 2067 1106
+8
drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
··· 21 21 u64 tx_share, struct netlink_ext_ack *extack); 22 22 int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *priv, 23 23 u64 tx_max, struct netlink_ext_ack *extack); 24 + int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_node, 25 + void *priv, 26 + u32 *tc_bw, 27 + struct netlink_ext_ack *extack); 28 + int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, 29 + void *priv, 30 + u32 *tc_bw, 31 + struct netlink_ext_ack *extack); 24 32 int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv, 25 33 u64 tx_share, struct netlink_ext_ack *extack); 26 34 int mlx5_esw_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void *priv,
+12 -2
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
··· 212 212 213 213 struct mlx5_vport_info info; 214 214 215 - /* Protected with the E-Switch qos domain lock. */ 215 + /* Protected with the E-Switch qos domain lock. The Vport QoS can 216 + * either be disabled (sched_node is NULL) or in one of three states: 217 + * 1. Regular QoS (sched_node is a vport node). 218 + * 2. TC QoS enabled on the vport (sched_node is a TC arbiter). 219 + * 3. TC QoS enabled on the vport's parent node 220 + * (sched_node is a rate limit node). 221 + * When TC is enabled in either mode, the vport owns vport TC scheduling 222 + * nodes. 223 + */ 216 224 struct { 217 - /* Vport scheduling element node. */ 225 + /* Vport scheduling node. */ 218 226 struct mlx5_esw_sched_node *sched_node; 227 + /* Array of vport traffic class scheduling nodes. */ 228 + struct mlx5_esw_sched_node **sched_nodes; 219 229 } qos; 220 230 221 231 u16 vport;
+43
drivers/net/netdevsim/dev.c
··· 388 388 .owner = THIS_MODULE, 389 389 }; 390 390 391 + static void nsim_dev_tc_bw_debugfs_init(struct dentry *ddir, u32 *tc_bw) 392 + { 393 + int i; 394 + 395 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) { 396 + char name[16]; 397 + 398 + snprintf(name, sizeof(name), "tc%d_bw", i); 399 + debugfs_create_u32(name, 0400, ddir, &tc_bw[i]); 400 + } 401 + } 391 402 static int nsim_dev_port_debugfs_init(struct nsim_dev *nsim_dev, 392 403 struct nsim_dev_port *nsim_dev_port) 393 404 { ··· 426 415 nsim_dev_port->ddir, 427 416 &nsim_dev_port->parent_name, 428 417 &nsim_dev_rate_parent_fops); 418 + nsim_dev_tc_bw_debugfs_init(nsim_dev_port->ddir, 419 + nsim_dev_port->tc_bw); 429 420 } 430 421 debugfs_create_symlink("dev", nsim_dev_port->ddir, dev_link_name); 431 422 ··· 1185 1172 return 0; 1186 1173 } 1187 1174 1175 + static int nsim_leaf_tc_bw_set(struct devlink_rate *devlink_rate, 1176 + void *priv, u32 *tc_bw, 1177 + struct netlink_ext_ack *extack) 1178 + { 1179 + struct nsim_dev_port *nsim_dev_port = priv; 1180 + int i; 1181 + 1182 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) 1183 + nsim_dev_port->tc_bw[i] = tc_bw[i]; 1184 + 1185 + return 0; 1186 + } 1187 + 1188 1188 static int nsim_leaf_tx_share_set(struct devlink_rate *devlink_rate, void *priv, 1189 1189 u64 tx_share, struct netlink_ext_ack *extack) 1190 1190 { ··· 1236 1210 char *parent_name; 1237 1211 u16 tx_share; 1238 1212 u16 tx_max; 1213 + u32 tc_bw[DEVLINK_RATE_TCS_MAX]; 1239 1214 }; 1215 + 1216 + static int nsim_node_tc_bw_set(struct devlink_rate *devlink_rate, void *priv, 1217 + u32 *tc_bw, struct netlink_ext_ack *extack) 1218 + { 1219 + struct nsim_rate_node *nsim_node = priv; 1220 + int i; 1221 + 1222 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) 1223 + nsim_node->tc_bw[i] = tc_bw[i]; 1224 + 1225 + return 0; 1226 + } 1240 1227 1241 1228 static int nsim_node_tx_share_set(struct devlink_rate *devlink_rate, void *priv, 1242 1229 u64 tx_share, struct netlink_ext_ack *extack) ··· 1302 1263 nsim_node->ddir, 1303 1264 &nsim_node->parent_name, 1304 1265 &nsim_dev_rate_parent_fops); 1266 + 1267 + nsim_dev_tc_bw_debugfs_init(nsim_node->ddir, nsim_node->tc_bw); 1305 1268 1306 1269 *priv = nsim_node; 1307 1270 return 0; ··· 1381 1340 .trap_policer_counter_get = nsim_dev_devlink_trap_policer_counter_get, 1382 1341 .rate_leaf_tx_share_set = nsim_leaf_tx_share_set, 1383 1342 .rate_leaf_tx_max_set = nsim_leaf_tx_max_set, 1343 + .rate_leaf_tc_bw_set = nsim_leaf_tc_bw_set, 1384 1344 .rate_node_tx_share_set = nsim_node_tx_share_set, 1385 1345 .rate_node_tx_max_set = nsim_node_tx_max_set, 1346 + .rate_node_tc_bw_set = nsim_node_tc_bw_set, 1386 1347 .rate_node_new = nsim_rate_node_new, 1387 1348 .rate_node_del = nsim_rate_node_del, 1388 1349 .rate_leaf_parent_set = nsim_rate_leaf_parent_set,
+1
drivers/net/netdevsim/netdevsim.h
··· 276 276 struct dentry *ddir; 277 277 struct dentry *rate_parent; 278 278 char *parent_name; 279 + u32 tc_bw[DEVLINK_RATE_TCS_MAX]; 279 280 struct netdevsim *ns; 280 281 }; 281 282
+4 -9
drivers/net/vxlan/vxlan_vnifilter.c
··· 971 971 if (!(vxlan->cfg.flags & VXLAN_F_VNIFILTER)) 972 972 return -EOPNOTSUPP; 973 973 974 - nlmsg_for_each_attr(attr, nlh, sizeof(*tmsg), rem) { 975 - switch (nla_type(attr)) { 976 - case VXLAN_VNIFILTER_ENTRY: 977 - err = vxlan_process_vni_filter(vxlan, attr, 978 - nlh->nlmsg_type, extack); 979 - break; 980 - default: 981 - continue; 982 - } 974 + nlmsg_for_each_attr_type(attr, VXLAN_VNIFILTER_ENTRY, nlh, 975 + sizeof(*tmsg), rem) { 976 + err = vxlan_process_vni_filter(vxlan, attr, nlh->nlmsg_type, 977 + extack); 983 978 vnis++; 984 979 if (err) 985 980 break;
+14 -22
fs/nfsd/nfsctl.c
··· 1621 1621 return -EINVAL; 1622 1622 1623 1623 /* count number of SERVER_THREADS values */ 1624 - nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { 1625 - if (nla_type(attr) == NFSD_A_SERVER_THREADS) 1626 - nrpools++; 1627 - } 1624 + nlmsg_for_each_attr_type(attr, NFSD_A_SERVER_THREADS, info->nlhdr, 1625 + GENL_HDRLEN, rem) 1626 + nrpools++; 1628 1627 1629 1628 mutex_lock(&nfsd_mutex); 1630 1629 ··· 1634 1635 } 1635 1636 1636 1637 i = 0; 1637 - nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { 1638 - if (nla_type(attr) == NFSD_A_SERVER_THREADS) { 1639 - nthreads[i++] = nla_get_u32(attr); 1640 - if (i >= nrpools) 1641 - break; 1642 - } 1638 + nlmsg_for_each_attr_type(attr, NFSD_A_SERVER_THREADS, info->nlhdr, 1639 + GENL_HDRLEN, rem) { 1640 + nthreads[i++] = nla_get_u32(attr); 1641 + if (i >= nrpools) 1642 + break; 1643 1643 } 1644 1644 1645 1645 if (info->attrs[NFSD_A_SERVER_GRACETIME] || ··· 1779 1781 for (i = 0; i <= NFSD_SUPPORTED_MINOR_VERSION; i++) 1780 1782 nfsd_minorversion(nn, i, NFSD_CLEAR); 1781 1783 1782 - nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { 1784 + nlmsg_for_each_attr_type(attr, NFSD_A_SERVER_PROTO_VERSION, info->nlhdr, 1785 + GENL_HDRLEN, rem) { 1783 1786 struct nlattr *tb[NFSD_A_VERSION_MAX + 1]; 1784 1787 u32 major, minor = 0; 1785 1788 bool enabled; 1786 - 1787 - if (nla_type(attr) != NFSD_A_SERVER_PROTO_VERSION) 1788 - continue; 1789 1789 1790 1790 if (nla_parse_nested(tb, NFSD_A_VERSION_MAX, attr, 1791 1791 nfsd_version_nl_policy, info->extack) < 0) ··· 1935 1939 * Walk the list of server_socks from userland and move any that match 1936 1940 * back to sv_permsocks 1937 1941 */ 1938 - nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { 1942 + nlmsg_for_each_attr_type(attr, NFSD_A_SERVER_SOCK_ADDR, info->nlhdr, 1943 + GENL_HDRLEN, rem) { 1939 1944 struct nlattr *tb[NFSD_A_SOCK_MAX + 1]; 1940 1945 const char *xcl_name; 1941 1946 struct sockaddr *sa; 1942 - 1943 - if (nla_type(attr) != NFSD_A_SERVER_SOCK_ADDR) 1944 - continue; 1945 1947 1946 1948 if (nla_parse_nested(tb, NFSD_A_SOCK_MAX, attr, 1947 1949 nfsd_sock_nl_policy, info->extack) < 0) ··· 1995 2001 svc_xprt_destroy_all(serv, net); 1996 2002 1997 2003 /* walk list of addrs again, open any that still don't exist */ 1998 - nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { 2004 + nlmsg_for_each_attr_type(attr, NFSD_A_SERVER_SOCK_ADDR, info->nlhdr, 2005 + GENL_HDRLEN, rem) { 1999 2006 struct nlattr *tb[NFSD_A_SOCK_MAX + 1]; 2000 2007 const char *xcl_name; 2001 2008 struct sockaddr *sa; 2002 2009 int ret; 2003 - 2004 - if (nla_type(attr) != NFSD_A_SERVER_SOCK_ADDR) 2005 - continue; 2006 2010 2007 2011 if (nla_parse_nested(tb, NFSD_A_SOCK_MAX, attr, 2008 2012 nfsd_sock_nl_policy, info->extack) < 0)
+8
include/net/devlink.h
··· 118 118 119 119 u32 tx_priority; 120 120 u32 tx_weight; 121 + 122 + u32 tc_bw[DEVLINK_RATE_TCS_MAX]; 121 123 }; 122 124 123 125 struct devlink_port { ··· 1488 1486 u32 tx_priority, struct netlink_ext_ack *extack); 1489 1487 int (*rate_leaf_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv, 1490 1488 u32 tx_weight, struct netlink_ext_ack *extack); 1489 + int (*rate_leaf_tc_bw_set)(struct devlink_rate *devlink_rate, 1490 + void *priv, u32 *tc_bw, 1491 + struct netlink_ext_ack *extack); 1491 1492 int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv, 1492 1493 u64 tx_share, struct netlink_ext_ack *extack); 1493 1494 int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv, ··· 1499 1494 u32 tx_priority, struct netlink_ext_ack *extack); 1500 1495 int (*rate_node_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv, 1501 1496 u32 tx_weight, struct netlink_ext_ack *extack); 1497 + int (*rate_node_tc_bw_set)(struct devlink_rate *devlink_rate, 1498 + void *priv, u32 *tc_bw, 1499 + struct netlink_ext_ack *extack); 1502 1500 int (*rate_node_new)(struct devlink_rate *rate_node, void **priv, 1503 1501 struct netlink_ext_ack *extack); 1504 1502 int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
+14
include/net/netlink.h
··· 68 68 * nlmsg_for_each_msg() loop over all messages 69 69 * nlmsg_validate() validate netlink message incl. attrs 70 70 * nlmsg_for_each_attr() loop over all attributes 71 + * nlmsg_for_each_attr_type() loop over all attributes with the 72 + * given type 71 73 * 72 74 * Misc: 73 75 * nlmsg_report() report back to application? ··· 967 965 #define nlmsg_for_each_attr(pos, nlh, hdrlen, rem) \ 968 966 nla_for_each_attr(pos, nlmsg_attrdata(nlh, hdrlen), \ 969 967 nlmsg_attrlen(nlh, hdrlen), rem) 968 + 969 + /** 970 + * nlmsg_for_each_attr_type - iterate over a stream of attributes 971 + * @pos: loop counter, set to the current attribute 972 + * @type: required attribute type for @pos 973 + * @nlh: netlink message header 974 + * @hdrlen: length of the family specific header 975 + * @rem: initialized to len, holds bytes currently remaining in stream 976 + */ 977 + #define nlmsg_for_each_attr_type(pos, type, nlh, hdrlen, rem) \ 978 + nlmsg_for_each_attr(pos, nlh, hdrlen, rem) \ 979 + if (nla_type(pos) == type) 970 980 971 981 /** 972 982 * nlmsg_put - Add a new netlink message to an skb
+9
include/uapi/linux/devlink.h
··· 221 221 */ 222 222 }; 223 223 224 + /* IEEE 802.1Qaz standard supported values. */ 225 + 226 + #define DEVLINK_RATE_TCS_MAX 8 227 + #define DEVLINK_RATE_TC_INDEX_MAX (DEVLINK_RATE_TCS_MAX - 1) 228 + 224 229 enum devlink_rate_type { 225 230 DEVLINK_RATE_TYPE_LEAF, 226 231 DEVLINK_RATE_TYPE_NODE, ··· 633 628 DEVLINK_ATTR_RATE_TX_WEIGHT, /* u32 */ 634 629 635 630 DEVLINK_ATTR_REGION_DIRECT, /* flag */ 631 + 632 + DEVLINK_ATTR_RATE_TC_BWS, /* nested */ 633 + DEVLINK_ATTR_RATE_TC_INDEX, /* u8 */ 634 + DEVLINK_ATTR_RATE_TC_BW, /* u32 */ 636 635 637 636 /* Add new attributes above here, update the spec in 638 637 * Documentation/netlink/specs/devlink.yaml and re-generate
+127
net/devlink/rate.c
··· 80 80 return ERR_PTR(-EINVAL); 81 81 } 82 82 83 + static int devlink_rate_put_tc_bws(struct sk_buff *msg, u32 *tc_bw) 84 + { 85 + struct nlattr *nla_tc_bw; 86 + int i; 87 + 88 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) { 89 + nla_tc_bw = nla_nest_start(msg, DEVLINK_ATTR_RATE_TC_BWS); 90 + if (!nla_tc_bw) 91 + return -EMSGSIZE; 92 + 93 + if (nla_put_u8(msg, DEVLINK_ATTR_RATE_TC_INDEX, i) || 94 + nla_put_u32(msg, DEVLINK_ATTR_RATE_TC_BW, tc_bw[i])) 95 + goto nla_put_failure; 96 + 97 + nla_nest_end(msg, nla_tc_bw); 98 + } 99 + return 0; 100 + 101 + nla_put_failure: 102 + nla_nest_cancel(msg, nla_tc_bw); 103 + return -EMSGSIZE; 104 + } 105 + 83 106 static int devlink_nl_rate_fill(struct sk_buff *msg, 84 107 struct devlink_rate *devlink_rate, 85 108 enum devlink_command cmd, u32 portid, u32 seq, ··· 151 128 if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME, 152 129 devlink_rate->parent->name)) 153 130 goto nla_put_failure; 131 + 132 + if (devlink_rate_put_tc_bws(msg, devlink_rate->tc_bw)) 133 + goto nla_put_failure; 154 134 155 135 genlmsg_end(msg, hdr); 156 136 return 0; ··· 342 316 return 0; 343 317 } 344 318 319 + static int devlink_nl_rate_tc_bw_parse(struct nlattr *parent_nest, u32 *tc_bw, 320 + unsigned long *bitmap, 321 + struct netlink_ext_ack *extack) 322 + { 323 + struct nlattr *tb[DEVLINK_ATTR_MAX + 1]; 324 + u8 tc_index; 325 + int err; 326 + 327 + err = nla_parse_nested(tb, DEVLINK_ATTR_MAX, parent_nest, 328 + devlink_dl_rate_tc_bws_nl_policy, extack); 329 + if (err) 330 + return err; 331 + 332 + if (!tb[DEVLINK_ATTR_RATE_TC_INDEX]) { 333 + NL_SET_ERR_ATTR_MISS(extack, parent_nest, 334 + DEVLINK_ATTR_RATE_TC_INDEX); 335 + return -EINVAL; 336 + } 337 + 338 + tc_index = nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]); 339 + 340 + if (!tb[DEVLINK_ATTR_RATE_TC_BW]) { 341 + NL_SET_ERR_ATTR_MISS(extack, parent_nest, 342 + DEVLINK_ATTR_RATE_TC_BW); 343 + return -EINVAL; 344 + } 345 + 346 + if (test_and_set_bit(tc_index, bitmap)) { 347 + NL_SET_ERR_MSG_FMT(extack, 348 + "Duplicate traffic class index specified (%u)", 349 + tc_index); 350 + return -EINVAL; 351 + } 352 + 353 + tc_bw[tc_index] = nla_get_u32(tb[DEVLINK_ATTR_RATE_TC_BW]); 354 + 355 + return 0; 356 + } 357 + 358 + static int devlink_nl_rate_tc_bw_set(struct devlink_rate *devlink_rate, 359 + struct genl_info *info) 360 + { 361 + DECLARE_BITMAP(bitmap, DEVLINK_RATE_TCS_MAX) = {}; 362 + struct devlink *devlink = devlink_rate->devlink; 363 + const struct devlink_ops *ops = devlink->ops; 364 + u32 tc_bw[DEVLINK_RATE_TCS_MAX] = {}; 365 + int rem, err = -EOPNOTSUPP, i; 366 + struct nlattr *attr; 367 + 368 + nlmsg_for_each_attr_type(attr, DEVLINK_ATTR_RATE_TC_BWS, info->nlhdr, 369 + GENL_HDRLEN, rem) { 370 + err = devlink_nl_rate_tc_bw_parse(attr, tc_bw, bitmap, 371 + info->extack); 372 + if (err) 373 + return err; 374 + } 375 + 376 + for (i = 0; i < DEVLINK_RATE_TCS_MAX; i++) { 377 + if (!test_bit(i, bitmap)) { 378 + NL_SET_ERR_MSG_FMT(info->extack, 379 + "Bandwidth values must be specified for all %u traffic classes", 380 + DEVLINK_RATE_TCS_MAX); 381 + return -EINVAL; 382 + } 383 + } 384 + 385 + if (devlink_rate_is_leaf(devlink_rate)) 386 + err = ops->rate_leaf_tc_bw_set(devlink_rate, devlink_rate->priv, 387 + tc_bw, info->extack); 388 + else if (devlink_rate_is_node(devlink_rate)) 389 + err = ops->rate_node_tc_bw_set(devlink_rate, devlink_rate->priv, 390 + tc_bw, info->extack); 391 + 392 + if (err) 393 + return err; 394 + 395 + memcpy(devlink_rate->tc_bw, tc_bw, sizeof(tc_bw)); 396 + 397 + return 0; 398 + } 399 + 345 400 static int devlink_nl_rate_set(struct devlink_rate *devlink_rate, 346 401 const struct devlink_ops *ops, 347 402 struct genl_info *info) ··· 495 388 return err; 496 389 } 497 390 391 + if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) { 392 + err = devlink_nl_rate_tc_bw_set(devlink_rate, info); 393 + if (err) 394 + return err; 395 + } 396 + 498 397 return 0; 499 398 } 500 399 ··· 536 423 "TX weight set isn't supported for the leafs"); 537 424 return false; 538 425 } 426 + if (attrs[DEVLINK_ATTR_RATE_TC_BWS] && 427 + !ops->rate_leaf_tc_bw_set) { 428 + NL_SET_ERR_MSG_ATTR(info->extack, 429 + attrs[DEVLINK_ATTR_RATE_TC_BWS], 430 + "TC bandwidth set isn't supported for the leafs"); 431 + return false; 432 + } 539 433 } else if (type == DEVLINK_RATE_TYPE_NODE) { 540 434 if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) { 541 435 NL_SET_ERR_MSG(info->extack, "TX share set isn't supported for the nodes"); ··· 567 447 NL_SET_ERR_MSG_ATTR(info->extack, 568 448 attrs[DEVLINK_ATTR_RATE_TX_WEIGHT], 569 449 "TX weight set isn't supported for the nodes"); 450 + return false; 451 + } 452 + if (attrs[DEVLINK_ATTR_RATE_TC_BWS] && 453 + !ops->rate_node_tc_bw_set) { 454 + NL_SET_ERR_MSG_ATTR(info->extack, 455 + attrs[DEVLINK_ATTR_RATE_TC_BWS], 456 + "TC bandwidth set isn't supported for the nodes"); 570 457 return false; 571 458 } 572 459 } else {
+1 -1
tools/testing/selftests/drivers/net/hw/lib/py/__init__.py
··· 13 13 14 14 # Import one by one to avoid pylint false positives 15 15 from net.lib.py import EthtoolFamily, NetdevFamily, NetshaperFamily, \ 16 - NlError, RtnlFamily 16 + NlError, RtnlFamily, DevlinkFamily 17 17 from net.lib.py import CmdExitFailure 18 18 from net.lib.py import bkg, cmd, defer, ethtool, fd_read_timeout, ip, \ 19 19 rand_port, tool, wait_port_listen
+1 -1
tools/testing/selftests/drivers/net/lib/py/__init__.py
··· 12 12 13 13 # Import one by one to avoid pylint false positives 14 14 from net.lib.py import EthtoolFamily, NetdevFamily, NetshaperFamily, \ 15 - NlError, RtnlFamily 15 + NlError, RtnlFamily, DevlinkFamily 16 16 from net.lib.py import CmdExitFailure 17 17 from net.lib.py import bkg, cmd, defer, ethtool, fd_read_timeout, ip, \ 18 18 rand_port, tool, wait_port_listen
+53
tools/testing/selftests/drivers/net/netdevsim/devlink.sh
··· 608 608 check_err $? "Unexpected parent attr value $api_value != $parent" 609 609 } 610 610 611 + rate_attr_tc_bw_check() 612 + { 613 + local handle=$1 614 + local tc_bw=$2 615 + local debug_file=$3 616 + 617 + local tc_bw_str="" 618 + for bw in $tc_bw; do 619 + local tc=${bw%%:*} 620 + local value=${bw##*:} 621 + tc_bw_str="$tc_bw_str $tc:$value" 622 + done 623 + tc_bw_str=${tc_bw_str# } 624 + 625 + rate_attr_set "$handle" tc-bw "$tc_bw_str" 626 + check_err $? "Failed to set tc-bw values" 627 + 628 + for bw in $tc_bw; do 629 + local tc=${bw%%:*} 630 + local value=${bw##*:} 631 + local debug_value 632 + debug_value=$(cat "$debug_file"/tc"${tc}"_bw) 633 + check_err $? "Failed to read tc-bw value from debugfs for tc$tc" 634 + [ "$debug_value" == "$value" ] 635 + check_err $? "Unexpected tc-bw debug value for tc$tc: $debug_value != $value" 636 + done 637 + 638 + for bw in $tc_bw; do 639 + local tc=${bw%%:*} 640 + local expected_value=${bw##*:} 641 + local api_value 642 + api_value=$(rate_attr_get "$handle" tc_"$tc") 643 + if [ "$api_value" = "null" ]; then 644 + api_value=0 645 + fi 646 + [ "$api_value" == "$expected_value" ] 647 + check_err $? "Unexpected tc-bw value for tc$tc: $api_value != $expected_value" 648 + done 649 + } 650 + 611 651 rate_node_add() 612 652 { 613 653 local handle=$1 ··· 689 649 rate=$(($rate+100)) 690 650 done 691 651 652 + local tc_bw="0:0 1:40 2:0 3:0 4:0 5:0 6:60 7:0" 653 + for r_obj in $leafs 654 + do 655 + rate_attr_tc_bw_check "$r_obj" "$tc_bw" \ 656 + "$DEBUGFS_DIR"/ports/"${r_obj##*/}" 657 + done 658 + 692 659 local node1_name='group1' 693 660 local node1="$DL_HANDLE/$node1_name" 694 661 rate_node_add "$node1" ··· 712 665 local node_tx_max=100 713 666 rate_attr_tx_rate_check $node1 tx_max $node_tx_max \ 714 667 $DEBUGFS_DIR/rate_nodes/${node1##*/}/tx_max 668 + 669 + 670 + local tc_bw="0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0" 671 + rate_attr_tc_bw_check $node1 "$tc_bw" \ 672 + "$DEBUGFS_DIR"/rate_nodes/"${node1##*/}" 673 + 715 674 716 675 rate_node_del "$node1" 717 676 check_err $? "Failed to delete node $node1"
+1 -1
tools/testing/selftests/net/lib/py/__init__.py
··· 6 6 from .nsim import * 7 7 from .utils import * 8 8 from .ynl import NlError, YnlFamily, EthtoolFamily, NetdevFamily, RtnlFamily, RtnlAddrFamily 9 - from .ynl import NetshaperFamily 9 + from .ynl import NetshaperFamily, DevlinkFamily
+5
tools/testing/selftests/net/lib/py/ynl.py
··· 56 56 def __init__(self, recv_size=0): 57 57 super().__init__((SPEC_PATH / Path('net_shaper.yaml')).as_posix(), 58 58 schema='', recv_size=recv_size) 59 + 60 + class DevlinkFamily(YnlFamily): 61 + def __init__(self, recv_size=0): 62 + super().__init__((SPEC_PATH / Path('devlink.yaml')).as_posix(), 63 + schema='', recv_size=recv_size)