Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'mlxsw-resilient-nh-groups' Ido Schimmel says:

====================
mlxsw: Add support for resilient nexthop groups

This patchset adds support for resilient nexthop groups in mlxsw. As far
as the hardware is concerned, resilient groups are the same as regular
groups. The differences lie in how mlxsw manages the individual
adjacency entries (nexthop buckets) that make up the group.

The first difference is that unlike regular groups the driver needs to
periodically update the kernel about activity of nexthop buckets so that
the kernel will not treat the buckets as idle, given traffic is
offloaded from the CPU to the ASIC. This is similar to what mlxsw is
already doing with respect to neighbour entries. The update interval is
set to 1 second to allow for short idle timers.

The second difference is that nexthop buckets that correspond to an
unresolved neighbour must be programmed to the device, as the size of
the group must remain fixed. This is achieved by programming such
entries with trap action, in order to trigger neighbour resolution by
the kernel.

The third difference is atomic replacement of individual nexthop
buckets. While the driver periodically updates the kernel about activity
of nexthop buckets, it is possible for a bucket to become active just
before the kernel decides to replace it with a different nexthop. To
avoid such situations and connections being reset, the driver instructs
the device to only replace an adjacency entry if it is inactive.
Failures are propagated back to the nexthop code.

Patchset overview:

Patches #1-#7 gradually add support for resilient nexthop groups

Patch #8 finally enables such groups to be programmed to the device

Patches #9-#10 add mlxsw-specific selftests
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+594 -23
+55
drivers/net/ethernet/mellanox/mlxsw/reg.h
··· 8130 8130 mlxsw_reg_rtdp_ipip_expected_gre_key_set(payload, expected_gre_key); 8131 8131 } 8132 8132 8133 + /* RATRAD - Router Adjacency Table Activity Dump Register 8134 + * ------------------------------------------------------ 8135 + * The RATRAD register is used to dump and optionally clear activity bits of 8136 + * router adjacency table entries. 8137 + */ 8138 + #define MLXSW_REG_RATRAD_ID 0x8022 8139 + #define MLXSW_REG_RATRAD_LEN 0x210 8140 + 8141 + MLXSW_REG_DEFINE(ratrad, MLXSW_REG_RATRAD_ID, MLXSW_REG_RATRAD_LEN); 8142 + 8143 + enum { 8144 + /* Read activity */ 8145 + MLXSW_REG_RATRAD_OP_READ_ACTIVITY, 8146 + /* Read and clear activity */ 8147 + MLXSW_REG_RATRAD_OP_READ_CLEAR_ACTIVITY, 8148 + }; 8149 + 8150 + /* reg_ratrad_op 8151 + * Access: Operation 8152 + */ 8153 + MLXSW_ITEM32(reg, ratrad, op, 0x00, 30, 2); 8154 + 8155 + /* reg_ratrad_ecmp_size 8156 + * ecmp_size is the amount of sequential entries from adjacency_index. Valid 8157 + * ranges: 8158 + * Spectrum-1: 32-64, 512, 1024, 2048, 4096 8159 + * Spectrum-2/3: 32-128, 256, 512, 1024, 2048, 4096 8160 + * Access: Index 8161 + */ 8162 + MLXSW_ITEM32(reg, ratrad, ecmp_size, 0x00, 0, 13); 8163 + 8164 + /* reg_ratrad_adjacency_index 8165 + * Index into the adjacency table. 8166 + * Access: Index 8167 + */ 8168 + MLXSW_ITEM32(reg, ratrad, adjacency_index, 0x04, 0, 24); 8169 + 8170 + /* reg_ratrad_activity_vector 8171 + * Activity bit per adjacency index. 8172 + * Bits higher than ecmp_size are reserved. 8173 + * Access: RO 8174 + */ 8175 + MLXSW_ITEM_BIT_ARRAY(reg, ratrad, activity_vector, 0x10, 0x200, 1); 8176 + 8177 + static inline void mlxsw_reg_ratrad_pack(char *payload, u32 adjacency_index, 8178 + u16 ecmp_size) 8179 + { 8180 + MLXSW_REG_ZERO(ratrad, payload); 8181 + mlxsw_reg_ratrad_op_set(payload, 8182 + MLXSW_REG_RATRAD_OP_READ_CLEAR_ACTIVITY); 8183 + mlxsw_reg_ratrad_ecmp_size_set(payload, ecmp_size); 8184 + mlxsw_reg_ratrad_adjacency_index_set(payload, adjacency_index); 8185 + } 8186 + 8133 8187 /* RIGR-V2 - Router Interface Group Register Version 2 8134 8188 * --------------------------------------------------- 8135 8189 * The RIGR_V2 register is used to add, remove and query egress interface list ··· 12168 12114 MLXSW_REG(rtar), 12169 12115 MLXSW_REG(ratr), 12170 12116 MLXSW_REG(rtdp), 12117 + MLXSW_REG(ratrad), 12171 12118 MLXSW_REG(rdpm), 12172 12119 MLXSW_REG(ricnt), 12173 12120 MLXSW_REG(rrcr),
+3 -1
drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
··· 1178 1178 1179 1179 static int mlxsw_sp_dpipe_table_adj_counters_update(void *priv, bool enable) 1180 1180 { 1181 + char ratr_pl[MLXSW_REG_RATR_LEN]; 1181 1182 struct mlxsw_sp *mlxsw_sp = priv; 1182 1183 struct mlxsw_sp_nexthop *nh; 1183 1184 u32 adj_hash_index = 0; ··· 1197 1196 else 1198 1197 mlxsw_sp_nexthop_counter_free(mlxsw_sp, nh); 1199 1198 mlxsw_sp_nexthop_eth_update(mlxsw_sp, 1200 - adj_index + adj_hash_index, nh); 1199 + adj_index + adj_hash_index, nh, 1200 + true, ratr_pl); 1201 1201 } 1202 1202 return 0; 1203 1203 }
+6 -4
drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
··· 127 127 128 128 static int 129 129 mlxsw_sp_ipip_nexthop_update_gre4(struct mlxsw_sp *mlxsw_sp, u32 adj_index, 130 - struct mlxsw_sp_ipip_entry *ipip_entry) 130 + struct mlxsw_sp_ipip_entry *ipip_entry, 131 + bool force, char *ratr_pl) 131 132 { 132 133 u16 rif_index = mlxsw_sp_ipip_lb_rif_index(ipip_entry->ol_lb); 133 134 __be32 daddr4 = mlxsw_sp_ipip_netdev_daddr4(ipip_entry->ol_dev); 134 - char ratr_pl[MLXSW_REG_RATR_LEN]; 135 + enum mlxsw_reg_ratr_op op; 135 136 136 - mlxsw_reg_ratr_pack(ratr_pl, MLXSW_REG_RATR_OP_WRITE_WRITE_ENTRY, 137 - true, MLXSW_REG_RATR_TYPE_IPIP, 137 + op = force ? MLXSW_REG_RATR_OP_WRITE_WRITE_ENTRY : 138 + MLXSW_REG_RATR_OP_WRITE_WRITE_ENTRY_ON_ACTIVITY; 139 + mlxsw_reg_ratr_pack(ratr_pl, op, true, MLXSW_REG_RATR_TYPE_IPIP, 138 140 adj_index, rif_index); 139 141 mlxsw_reg_ratr_ipip4_entry_pack(ratr_pl, be32_to_cpu(daddr4)); 140 142
+2 -1
drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.h
··· 40 40 enum mlxsw_sp_l3proto ul_proto; /* Underlay. */ 41 41 42 42 int (*nexthop_update)(struct mlxsw_sp *mlxsw_sp, u32 adj_index, 43 - struct mlxsw_sp_ipip_entry *ipip_entry); 43 + struct mlxsw_sp_ipip_entry *ipip_entry, 44 + bool force, char *ratr_pl); 44 45 45 46 bool (*can_offload)(const struct mlxsw_sp *mlxsw_sp, 46 47 const struct net_device *ol_dev);
+406 -16
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
··· 2911 2911 u16 count; 2912 2912 int sum_norm_weight; 2913 2913 u8 adj_index_valid:1, 2914 - gateway:1; /* routes using the group use a gateway */ 2914 + gateway:1, /* routes using the group use a gateway */ 2915 + is_resilient:1; 2916 + struct list_head list; /* member in nh_res_grp_list */ 2915 2917 struct mlxsw_sp_nexthop nexthops[0]; 2916 2918 #define nh_rif nexthops[0].rif 2917 2919 }; ··· 3420 3418 3421 3419 static int __mlxsw_sp_nexthop_eth_update(struct mlxsw_sp *mlxsw_sp, 3422 3420 u32 adj_index, 3423 - struct mlxsw_sp_nexthop *nh) 3421 + struct mlxsw_sp_nexthop *nh, 3422 + bool force, char *ratr_pl) 3424 3423 { 3425 3424 struct mlxsw_sp_neigh_entry *neigh_entry = nh->neigh_entry; 3426 - char ratr_pl[MLXSW_REG_RATR_LEN]; 3425 + enum mlxsw_reg_ratr_op op; 3427 3426 u16 rif_index; 3428 3427 3429 3428 rif_index = nh->rif ? nh->rif->rif_index : 3430 3429 mlxsw_sp->router->lb_rif_index; 3431 - mlxsw_reg_ratr_pack(ratr_pl, MLXSW_REG_RATR_OP_WRITE_WRITE_ENTRY, 3432 - true, MLXSW_REG_RATR_TYPE_ETHERNET, 3430 + op = force ? MLXSW_REG_RATR_OP_WRITE_WRITE_ENTRY : 3431 + MLXSW_REG_RATR_OP_WRITE_WRITE_ENTRY_ON_ACTIVITY; 3432 + mlxsw_reg_ratr_pack(ratr_pl, op, true, MLXSW_REG_RATR_TYPE_ETHERNET, 3433 3433 adj_index, rif_index); 3434 3434 switch (nh->action) { 3435 3435 case MLXSW_SP_NEXTHOP_ACTION_FORWARD: ··· 3459 3455 } 3460 3456 3461 3457 int mlxsw_sp_nexthop_eth_update(struct mlxsw_sp *mlxsw_sp, u32 adj_index, 3462 - struct mlxsw_sp_nexthop *nh) 3458 + struct mlxsw_sp_nexthop *nh, bool force, 3459 + char *ratr_pl) 3463 3460 { 3464 3461 int i; 3465 3462 ··· 3468 3463 int err; 3469 3464 3470 3465 err = __mlxsw_sp_nexthop_eth_update(mlxsw_sp, adj_index + i, 3471 - nh); 3466 + nh, force, ratr_pl); 3472 3467 if (err) 3473 3468 return err; 3474 3469 } ··· 3478 3473 3479 3474 static int __mlxsw_sp_nexthop_ipip_update(struct mlxsw_sp *mlxsw_sp, 3480 3475 u32 adj_index, 3481 - struct mlxsw_sp_nexthop *nh) 3476 + struct mlxsw_sp_nexthop *nh, 3477 + bool force, char *ratr_pl) 3482 3478 { 3483 3479 const struct mlxsw_sp_ipip_ops *ipip_ops; 3484 3480 3485 3481 ipip_ops = mlxsw_sp->router->ipip_ops_arr[nh->ipip_entry->ipipt]; 3486 - return ipip_ops->nexthop_update(mlxsw_sp, adj_index, nh->ipip_entry); 3482 + return ipip_ops->nexthop_update(mlxsw_sp, adj_index, nh->ipip_entry, 3483 + force, ratr_pl); 3487 3484 } 3488 3485 3489 3486 static int mlxsw_sp_nexthop_ipip_update(struct mlxsw_sp *mlxsw_sp, 3490 3487 u32 adj_index, 3491 - struct mlxsw_sp_nexthop *nh) 3488 + struct mlxsw_sp_nexthop *nh, bool force, 3489 + char *ratr_pl) 3492 3490 { 3493 3491 int i; 3494 3492 ··· 3499 3491 int err; 3500 3492 3501 3493 err = __mlxsw_sp_nexthop_ipip_update(mlxsw_sp, adj_index + i, 3502 - nh); 3494 + nh, force, ratr_pl); 3503 3495 if (err) 3504 3496 return err; 3505 3497 } ··· 3508 3500 } 3509 3501 3510 3502 static int mlxsw_sp_nexthop_update(struct mlxsw_sp *mlxsw_sp, u32 adj_index, 3511 - struct mlxsw_sp_nexthop *nh) 3503 + struct mlxsw_sp_nexthop *nh, bool force, 3504 + char *ratr_pl) 3512 3505 { 3513 3506 /* When action is discard or trap, the nexthop must be 3514 3507 * programmed as an Ethernet nexthop. ··· 3517 3508 if (nh->type == MLXSW_SP_NEXTHOP_TYPE_ETH || 3518 3509 nh->action == MLXSW_SP_NEXTHOP_ACTION_DISCARD || 3519 3510 nh->action == MLXSW_SP_NEXTHOP_ACTION_TRAP) 3520 - return mlxsw_sp_nexthop_eth_update(mlxsw_sp, adj_index, nh); 3511 + return mlxsw_sp_nexthop_eth_update(mlxsw_sp, adj_index, nh, 3512 + force, ratr_pl); 3521 3513 else 3522 - return mlxsw_sp_nexthop_ipip_update(mlxsw_sp, adj_index, nh); 3514 + return mlxsw_sp_nexthop_ipip_update(mlxsw_sp, adj_index, nh, 3515 + force, ratr_pl); 3523 3516 } 3524 3517 3525 3518 static int ··· 3529 3518 struct mlxsw_sp_nexthop_group_info *nhgi, 3530 3519 bool reallocate) 3531 3520 { 3521 + char ratr_pl[MLXSW_REG_RATR_LEN]; 3532 3522 u32 adj_index = nhgi->adj_index; /* base */ 3533 3523 struct mlxsw_sp_nexthop *nh; 3534 3524 int i; ··· 3545 3533 if (nh->update || reallocate) { 3546 3534 int err = 0; 3547 3535 3548 - err = mlxsw_sp_nexthop_update(mlxsw_sp, adj_index, nh); 3536 + err = mlxsw_sp_nexthop_update(mlxsw_sp, adj_index, nh, 3537 + true, ratr_pl); 3549 3538 if (err) 3550 3539 return err; 3551 3540 nh->update = 0; ··· 3764 3751 } 3765 3752 3766 3753 static void 3754 + mlxsw_sp_nexthop_bucket_offload_refresh(struct mlxsw_sp *mlxsw_sp, 3755 + const struct mlxsw_sp_nexthop *nh, 3756 + u16 bucket_index) 3757 + { 3758 + struct mlxsw_sp_nexthop_group *nh_grp = nh->nhgi->nh_grp; 3759 + bool offload = false, trap = false; 3760 + 3761 + if (nh->offloaded) { 3762 + if (nh->action == MLXSW_SP_NEXTHOP_ACTION_TRAP) 3763 + trap = true; 3764 + else 3765 + offload = true; 3766 + } 3767 + nexthop_bucket_set_hw_flags(mlxsw_sp_net(mlxsw_sp), nh_grp->obj.id, 3768 + bucket_index, offload, trap); 3769 + } 3770 + 3771 + static void 3767 3772 mlxsw_sp_nexthop_obj_group_offload_refresh(struct mlxsw_sp *mlxsw_sp, 3768 3773 struct mlxsw_sp_nexthop_group *nh_grp) 3769 3774 { 3775 + int i; 3776 + 3770 3777 /* Do not update the flags if the nexthop group is being destroyed 3771 3778 * since: 3772 3779 * 1. The nexthop objects is being deleted, in which case the flags are ··· 3800 3767 3801 3768 nexthop_set_hw_flags(mlxsw_sp_net(mlxsw_sp), nh_grp->obj.id, 3802 3769 nh_grp->nhgi->adj_index_valid, false); 3770 + 3771 + /* Update flags of individual nexthop buckets in case of a resilient 3772 + * nexthop group. 3773 + */ 3774 + if (!nh_grp->nhgi->is_resilient) 3775 + return; 3776 + 3777 + for (i = 0; i < nh_grp->nhgi->count; i++) { 3778 + struct mlxsw_sp_nexthop *nh = &nh_grp->nhgi->nexthops[i]; 3779 + 3780 + mlxsw_sp_nexthop_bucket_offload_refresh(mlxsw_sp, nh, i); 3781 + } 3803 3782 } 3804 3783 3805 3784 static void ··· 3865 3820 dev_warn(mlxsw_sp->bus_info->dev, "Failed to update neigh MAC in adjacency table.\n"); 3866 3821 goto set_trap; 3867 3822 } 3823 + /* Flags of individual nexthop buckets might need to be 3824 + * updated. 3825 + */ 3826 + mlxsw_sp_nexthop_group_offload_refresh(mlxsw_sp, nh_grp); 3868 3827 return 0; 3869 3828 } 3870 3829 mlxsw_sp_nexthop_group_normalize(nhgi); ··· 3953 3904 { 3954 3905 if (!removing) { 3955 3906 nh->action = MLXSW_SP_NEXTHOP_ACTION_FORWARD; 3907 + nh->should_offload = 1; 3908 + } else if (nh->nhgi->is_resilient) { 3909 + nh->action = MLXSW_SP_NEXTHOP_ACTION_TRAP; 3956 3910 nh->should_offload = 1; 3957 3911 } else { 3958 3912 nh->should_offload = 0; ··· 4374 4322 } 4375 4323 } 4376 4324 4325 + static void 4326 + mlxsw_sp_nh_grp_activity_get(struct mlxsw_sp *mlxsw_sp, 4327 + const struct mlxsw_sp_nexthop_group *nh_grp, 4328 + unsigned long *activity) 4329 + { 4330 + char *ratrad_pl; 4331 + int i, err; 4332 + 4333 + ratrad_pl = kmalloc(MLXSW_REG_RATRAD_LEN, GFP_KERNEL); 4334 + if (!ratrad_pl) 4335 + return; 4336 + 4337 + mlxsw_reg_ratrad_pack(ratrad_pl, nh_grp->nhgi->adj_index, 4338 + nh_grp->nhgi->count); 4339 + err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(ratrad), ratrad_pl); 4340 + if (err) 4341 + goto out; 4342 + 4343 + for (i = 0; i < nh_grp->nhgi->count; i++) { 4344 + if (!mlxsw_reg_ratrad_activity_vector_get(ratrad_pl, i)) 4345 + continue; 4346 + bitmap_set(activity, i, 1); 4347 + } 4348 + 4349 + out: 4350 + kfree(ratrad_pl); 4351 + } 4352 + 4353 + #define MLXSW_SP_NH_GRP_ACTIVITY_UPDATE_INTERVAL 1000 /* ms */ 4354 + 4355 + static void 4356 + mlxsw_sp_nh_grp_activity_update(struct mlxsw_sp *mlxsw_sp, 4357 + const struct mlxsw_sp_nexthop_group *nh_grp) 4358 + { 4359 + unsigned long *activity; 4360 + 4361 + activity = bitmap_zalloc(nh_grp->nhgi->count, GFP_KERNEL); 4362 + if (!activity) 4363 + return; 4364 + 4365 + mlxsw_sp_nh_grp_activity_get(mlxsw_sp, nh_grp, activity); 4366 + nexthop_res_grp_activity_update(mlxsw_sp_net(mlxsw_sp), nh_grp->obj.id, 4367 + nh_grp->nhgi->count, activity); 4368 + 4369 + bitmap_free(activity); 4370 + } 4371 + 4372 + static void 4373 + mlxsw_sp_nh_grp_activity_work_schedule(struct mlxsw_sp *mlxsw_sp) 4374 + { 4375 + unsigned int interval = MLXSW_SP_NH_GRP_ACTIVITY_UPDATE_INTERVAL; 4376 + 4377 + mlxsw_core_schedule_dw(&mlxsw_sp->router->nh_grp_activity_dw, 4378 + msecs_to_jiffies(interval)); 4379 + } 4380 + 4381 + static void mlxsw_sp_nh_grp_activity_work(struct work_struct *work) 4382 + { 4383 + struct mlxsw_sp_nexthop_group_info *nhgi; 4384 + struct mlxsw_sp_router *router; 4385 + bool reschedule = false; 4386 + 4387 + router = container_of(work, struct mlxsw_sp_router, 4388 + nh_grp_activity_dw.work); 4389 + 4390 + mutex_lock(&router->lock); 4391 + 4392 + list_for_each_entry(nhgi, &router->nh_res_grp_list, list) { 4393 + mlxsw_sp_nh_grp_activity_update(router->mlxsw_sp, nhgi->nh_grp); 4394 + reschedule = true; 4395 + } 4396 + 4397 + mutex_unlock(&router->lock); 4398 + 4399 + if (!reschedule) 4400 + return; 4401 + mlxsw_sp_nh_grp_activity_work_schedule(router->mlxsw_sp); 4402 + } 4403 + 4377 4404 static int 4378 4405 mlxsw_sp_nexthop_obj_single_validate(struct mlxsw_sp *mlxsw_sp, 4379 4406 const struct nh_notifier_single_info *nh, ··· 4519 4388 return 0; 4520 4389 } 4521 4390 4391 + static int 4392 + mlxsw_sp_nexthop_obj_res_group_size_validate(struct mlxsw_sp *mlxsw_sp, 4393 + const struct nh_notifier_res_table_info *nh_res_table, 4394 + struct netlink_ext_ack *extack) 4395 + { 4396 + unsigned int alloc_size; 4397 + bool valid_size = false; 4398 + int err, i; 4399 + 4400 + if (nh_res_table->num_nh_buckets < 32) { 4401 + NL_SET_ERR_MSG_MOD(extack, "Minimum number of buckets is 32"); 4402 + return -EINVAL; 4403 + } 4404 + 4405 + for (i = 0; i < mlxsw_sp->router->adj_grp_size_ranges_count; i++) { 4406 + const struct mlxsw_sp_adj_grp_size_range *size_range; 4407 + 4408 + size_range = &mlxsw_sp->router->adj_grp_size_ranges[i]; 4409 + 4410 + if (nh_res_table->num_nh_buckets >= size_range->start && 4411 + nh_res_table->num_nh_buckets <= size_range->end) { 4412 + valid_size = true; 4413 + break; 4414 + } 4415 + } 4416 + 4417 + if (!valid_size) { 4418 + NL_SET_ERR_MSG_MOD(extack, "Invalid number of buckets"); 4419 + return -EINVAL; 4420 + } 4421 + 4422 + err = mlxsw_sp_kvdl_alloc_count_query(mlxsw_sp, 4423 + MLXSW_SP_KVDL_ENTRY_TYPE_ADJ, 4424 + nh_res_table->num_nh_buckets, 4425 + &alloc_size); 4426 + if (err || nh_res_table->num_nh_buckets != alloc_size) { 4427 + NL_SET_ERR_MSG_MOD(extack, "Number of buckets does not fit allocation size of any KVDL partition"); 4428 + return -EINVAL; 4429 + } 4430 + 4431 + return 0; 4432 + } 4433 + 4434 + static int 4435 + mlxsw_sp_nexthop_obj_res_group_validate(struct mlxsw_sp *mlxsw_sp, 4436 + const struct nh_notifier_res_table_info *nh_res_table, 4437 + struct netlink_ext_ack *extack) 4438 + { 4439 + int err; 4440 + u16 i; 4441 + 4442 + err = mlxsw_sp_nexthop_obj_res_group_size_validate(mlxsw_sp, 4443 + nh_res_table, 4444 + extack); 4445 + if (err) 4446 + return err; 4447 + 4448 + for (i = 0; i < nh_res_table->num_nh_buckets; i++) { 4449 + const struct nh_notifier_single_info *nh; 4450 + int err; 4451 + 4452 + nh = &nh_res_table->nhs[i]; 4453 + err = mlxsw_sp_nexthop_obj_group_entry_validate(mlxsw_sp, nh, 4454 + extack); 4455 + if (err) 4456 + return err; 4457 + } 4458 + 4459 + return 0; 4460 + } 4461 + 4522 4462 static int mlxsw_sp_nexthop_obj_validate(struct mlxsw_sp *mlxsw_sp, 4523 4463 unsigned long event, 4524 4464 struct nh_notifier_info *info) 4525 4465 { 4526 - if (event != NEXTHOP_EVENT_REPLACE) 4466 + struct nh_notifier_single_info *nh; 4467 + 4468 + if (event != NEXTHOP_EVENT_REPLACE && 4469 + event != NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE && 4470 + event != NEXTHOP_EVENT_BUCKET_REPLACE) 4527 4471 return 0; 4528 4472 4529 4473 switch (info->type) { ··· 4609 4403 return mlxsw_sp_nexthop_obj_group_validate(mlxsw_sp, 4610 4404 info->nh_grp, 4611 4405 info->extack); 4406 + case NH_NOTIFIER_INFO_TYPE_RES_TABLE: 4407 + return mlxsw_sp_nexthop_obj_res_group_validate(mlxsw_sp, 4408 + info->nh_res_table, 4409 + info->extack); 4410 + case NH_NOTIFIER_INFO_TYPE_RES_BUCKET: 4411 + nh = &info->nh_res_bucket->new_nh; 4412 + return mlxsw_sp_nexthop_obj_group_entry_validate(mlxsw_sp, nh, 4413 + info->extack); 4612 4414 default: 4613 4415 NL_SET_ERR_MSG_MOD(info->extack, "Unsupported nexthop type"); 4614 4416 return -EOPNOTSUPP; ··· 4634 4420 return info->nh->gw_family || info->nh->is_reject || 4635 4421 mlxsw_sp_netdev_ipip_type(mlxsw_sp, dev, NULL); 4636 4422 case NH_NOTIFIER_INFO_TYPE_GRP: 4423 + case NH_NOTIFIER_INFO_TYPE_RES_TABLE: 4637 4424 /* Already validated earlier. */ 4638 4425 return true; 4639 4426 default: ··· 4699 4484 if (nh_obj->is_reject) 4700 4485 mlxsw_sp_nexthop_obj_blackhole_init(mlxsw_sp, nh); 4701 4486 4487 + /* In a resilient nexthop group, all the nexthops must be written to 4488 + * the adjacency table. Even if they do not have a valid neighbour or 4489 + * RIF. 4490 + */ 4491 + if (nh_grp->nhgi->is_resilient && !nh->should_offload) { 4492 + nh->action = MLXSW_SP_NEXTHOP_ACTION_TRAP; 4493 + nh->should_offload = 1; 4494 + } 4495 + 4702 4496 return 0; 4703 4497 4704 4498 err_type_init: ··· 4724 4500 mlxsw_sp_nexthop_type_fini(mlxsw_sp, nh); 4725 4501 list_del(&nh->router_list_node); 4726 4502 mlxsw_sp_nexthop_counter_free(mlxsw_sp, nh); 4503 + nh->should_offload = 0; 4727 4504 } 4728 4505 4729 4506 static int ··· 4734 4509 { 4735 4510 struct mlxsw_sp_nexthop_group_info *nhgi; 4736 4511 struct mlxsw_sp_nexthop *nh; 4512 + bool is_resilient = false; 4737 4513 unsigned int nhs; 4738 4514 int err, i; 4739 4515 ··· 4744 4518 break; 4745 4519 case NH_NOTIFIER_INFO_TYPE_GRP: 4746 4520 nhs = info->nh_grp->num_nh; 4521 + break; 4522 + case NH_NOTIFIER_INFO_TYPE_RES_TABLE: 4523 + nhs = info->nh_res_table->num_nh_buckets; 4524 + is_resilient = true; 4747 4525 break; 4748 4526 default: 4749 4527 return -EINVAL; ··· 4759 4529 nh_grp->nhgi = nhgi; 4760 4530 nhgi->nh_grp = nh_grp; 4761 4531 nhgi->gateway = mlxsw_sp_nexthop_obj_is_gateway(mlxsw_sp, info); 4532 + nhgi->is_resilient = is_resilient; 4762 4533 nhgi->count = nhs; 4763 4534 for (i = 0; i < nhgi->count; i++) { 4764 4535 struct nh_notifier_single_info *nh_obj; ··· 4775 4544 nh_obj = &info->nh_grp->nh_entries[i].nh; 4776 4545 weight = info->nh_grp->nh_entries[i].weight; 4777 4546 break; 4547 + case NH_NOTIFIER_INFO_TYPE_RES_TABLE: 4548 + nh_obj = &info->nh_res_table->nhs[i]; 4549 + weight = 1; 4550 + break; 4778 4551 default: 4779 4552 err = -EINVAL; 4780 4553 goto err_nexthop_obj_init; ··· 4792 4557 if (err) { 4793 4558 NL_SET_ERR_MSG_MOD(info->extack, "Failed to write adjacency entries to the device"); 4794 4559 goto err_group_refresh; 4560 + } 4561 + 4562 + /* Add resilient nexthop groups to a list so that the activity of their 4563 + * nexthop buckets will be periodically queried and cleared. 4564 + */ 4565 + if (nhgi->is_resilient) { 4566 + if (list_empty(&mlxsw_sp->router->nh_res_grp_list)) 4567 + mlxsw_sp_nh_grp_activity_work_schedule(mlxsw_sp); 4568 + list_add(&nhgi->list, &mlxsw_sp->router->nh_res_grp_list); 4795 4569 } 4796 4570 4797 4571 return 0; ··· 4821 4577 struct mlxsw_sp_nexthop_group *nh_grp) 4822 4578 { 4823 4579 struct mlxsw_sp_nexthop_group_info *nhgi = nh_grp->nhgi; 4580 + struct mlxsw_sp_router *router = mlxsw_sp->router; 4824 4581 int i; 4582 + 4583 + if (nhgi->is_resilient) { 4584 + list_del(&nhgi->list); 4585 + if (list_empty(&mlxsw_sp->router->nh_res_grp_list)) 4586 + cancel_delayed_work(&router->nh_grp_activity_dw); 4587 + } 4825 4588 4826 4589 for (i = nhgi->count - 1; i >= 0; i--) { 4827 4590 struct mlxsw_sp_nexthop *nh = &nhgi->nexthops[i]; ··· 5022 4771 mlxsw_sp_nexthop_obj_group_destroy(mlxsw_sp, nh_grp); 5023 4772 } 5024 4773 4774 + static int mlxsw_sp_nexthop_obj_bucket_query(struct mlxsw_sp *mlxsw_sp, 4775 + u32 adj_index, char *ratr_pl) 4776 + { 4777 + MLXSW_REG_ZERO(ratr, ratr_pl); 4778 + mlxsw_reg_ratr_op_set(ratr_pl, MLXSW_REG_RATR_OP_QUERY_READ); 4779 + mlxsw_reg_ratr_adjacency_index_low_set(ratr_pl, adj_index); 4780 + mlxsw_reg_ratr_adjacency_index_high_set(ratr_pl, adj_index >> 16); 4781 + 4782 + return mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(ratr), ratr_pl); 4783 + } 4784 + 4785 + static int mlxsw_sp_nexthop_obj_bucket_compare(char *ratr_pl, char *ratr_pl_new) 4786 + { 4787 + /* Clear the opcode and activity on both the old and new payload as 4788 + * they are irrelevant for the comparison. 4789 + */ 4790 + mlxsw_reg_ratr_op_set(ratr_pl, MLXSW_REG_RATR_OP_QUERY_READ); 4791 + mlxsw_reg_ratr_a_set(ratr_pl, 0); 4792 + mlxsw_reg_ratr_op_set(ratr_pl_new, MLXSW_REG_RATR_OP_QUERY_READ); 4793 + mlxsw_reg_ratr_a_set(ratr_pl_new, 0); 4794 + 4795 + /* If the contents of the adjacency entry are consistent with the 4796 + * replacement request, then replacement was successful. 4797 + */ 4798 + if (!memcmp(ratr_pl, ratr_pl_new, MLXSW_REG_RATR_LEN)) 4799 + return 0; 4800 + 4801 + return -EINVAL; 4802 + } 4803 + 4804 + static int 4805 + mlxsw_sp_nexthop_obj_bucket_adj_update(struct mlxsw_sp *mlxsw_sp, 4806 + struct mlxsw_sp_nexthop *nh, 4807 + struct nh_notifier_info *info) 4808 + { 4809 + u16 bucket_index = info->nh_res_bucket->bucket_index; 4810 + struct netlink_ext_ack *extack = info->extack; 4811 + bool force = info->nh_res_bucket->force; 4812 + char ratr_pl_new[MLXSW_REG_RATR_LEN]; 4813 + char ratr_pl[MLXSW_REG_RATR_LEN]; 4814 + u32 adj_index; 4815 + int err; 4816 + 4817 + /* No point in trying an atomic replacement if the idle timer interval 4818 + * is smaller than the interval in which we query and clear activity. 4819 + */ 4820 + force = info->nh_res_bucket->idle_timer_ms < 4821 + MLXSW_SP_NH_GRP_ACTIVITY_UPDATE_INTERVAL; 4822 + 4823 + adj_index = nh->nhgi->adj_index + bucket_index; 4824 + err = mlxsw_sp_nexthop_update(mlxsw_sp, adj_index, nh, force, ratr_pl); 4825 + if (err) { 4826 + NL_SET_ERR_MSG_MOD(extack, "Failed to overwrite nexthop bucket"); 4827 + return err; 4828 + } 4829 + 4830 + if (!force) { 4831 + err = mlxsw_sp_nexthop_obj_bucket_query(mlxsw_sp, adj_index, 4832 + ratr_pl_new); 4833 + if (err) { 4834 + NL_SET_ERR_MSG_MOD(extack, "Failed to query nexthop bucket state after replacement. State might be inconsistent"); 4835 + return err; 4836 + } 4837 + 4838 + err = mlxsw_sp_nexthop_obj_bucket_compare(ratr_pl, ratr_pl_new); 4839 + if (err) { 4840 + NL_SET_ERR_MSG_MOD(extack, "Nexthop bucket was not replaced because it was active during replacement"); 4841 + return err; 4842 + } 4843 + } 4844 + 4845 + nh->update = 0; 4846 + nh->offloaded = 1; 4847 + mlxsw_sp_nexthop_bucket_offload_refresh(mlxsw_sp, nh, bucket_index); 4848 + 4849 + return 0; 4850 + } 4851 + 4852 + static int mlxsw_sp_nexthop_obj_bucket_replace(struct mlxsw_sp *mlxsw_sp, 4853 + struct nh_notifier_info *info) 4854 + { 4855 + u16 bucket_index = info->nh_res_bucket->bucket_index; 4856 + struct netlink_ext_ack *extack = info->extack; 4857 + struct mlxsw_sp_nexthop_group_info *nhgi; 4858 + struct nh_notifier_single_info *nh_obj; 4859 + struct mlxsw_sp_nexthop_group *nh_grp; 4860 + struct mlxsw_sp_nexthop *nh; 4861 + int err; 4862 + 4863 + nh_grp = mlxsw_sp_nexthop_obj_group_lookup(mlxsw_sp, info->id); 4864 + if (!nh_grp) { 4865 + NL_SET_ERR_MSG_MOD(extack, "Nexthop group was not found"); 4866 + return -EINVAL; 4867 + } 4868 + 4869 + nhgi = nh_grp->nhgi; 4870 + 4871 + if (bucket_index >= nhgi->count) { 4872 + NL_SET_ERR_MSG_MOD(extack, "Nexthop bucket index out of range"); 4873 + return -EINVAL; 4874 + } 4875 + 4876 + nh = &nhgi->nexthops[bucket_index]; 4877 + mlxsw_sp_nexthop_obj_fini(mlxsw_sp, nh); 4878 + 4879 + nh_obj = &info->nh_res_bucket->new_nh; 4880 + err = mlxsw_sp_nexthop_obj_init(mlxsw_sp, nh_grp, nh, nh_obj, 1); 4881 + if (err) { 4882 + NL_SET_ERR_MSG_MOD(extack, "Failed to initialize nexthop object for nexthop bucket replacement"); 4883 + goto err_nexthop_obj_init; 4884 + } 4885 + 4886 + err = mlxsw_sp_nexthop_obj_bucket_adj_update(mlxsw_sp, nh, info); 4887 + if (err) 4888 + goto err_nexthop_obj_bucket_adj_update; 4889 + 4890 + return 0; 4891 + 4892 + err_nexthop_obj_bucket_adj_update: 4893 + mlxsw_sp_nexthop_obj_fini(mlxsw_sp, nh); 4894 + err_nexthop_obj_init: 4895 + nh_obj = &info->nh_res_bucket->old_nh; 4896 + mlxsw_sp_nexthop_obj_init(mlxsw_sp, nh_grp, nh, nh_obj, 1); 4897 + /* The old adjacency entry was not overwritten */ 4898 + nh->update = 0; 4899 + nh->offloaded = 1; 4900 + return err; 4901 + } 4902 + 5025 4903 static int mlxsw_sp_nexthop_obj_event(struct notifier_block *nb, 5026 4904 unsigned long event, void *ptr) 5027 4905 { ··· 5171 4791 break; 5172 4792 case NEXTHOP_EVENT_DEL: 5173 4793 mlxsw_sp_nexthop_obj_del(router->mlxsw_sp, info); 4794 + break; 4795 + case NEXTHOP_EVENT_BUCKET_REPLACE: 4796 + err = mlxsw_sp_nexthop_obj_bucket_replace(router->mlxsw_sp, 4797 + info); 5174 4798 break; 5175 4799 default: 5176 4800 break; ··· 9830 9446 if (err) 9831 9447 goto err_ll_op_ctx_init; 9832 9448 9449 + INIT_LIST_HEAD(&mlxsw_sp->router->nh_res_grp_list); 9450 + INIT_DELAYED_WORK(&mlxsw_sp->router->nh_grp_activity_dw, 9451 + mlxsw_sp_nh_grp_activity_work); 9452 + 9833 9453 INIT_LIST_HEAD(&mlxsw_sp->router->nexthop_neighs_list); 9834 9454 err = __mlxsw_sp_router_init(mlxsw_sp); 9835 9455 if (err) ··· 9957 9569 err_rifs_init: 9958 9570 __mlxsw_sp_router_fini(mlxsw_sp); 9959 9571 err_router_init: 9572 + cancel_delayed_work_sync(&mlxsw_sp->router->nh_grp_activity_dw); 9960 9573 mlxsw_sp_router_ll_op_ctx_fini(router); 9961 9574 err_ll_op_ctx_init: 9962 9575 mlxsw_sp_router_xm_fini(mlxsw_sp); ··· 9989 9600 mlxsw_sp_ipips_fini(mlxsw_sp); 9990 9601 mlxsw_sp_rifs_fini(mlxsw_sp); 9991 9602 __mlxsw_sp_router_fini(mlxsw_sp); 9603 + cancel_delayed_work_sync(&mlxsw_sp->router->nh_grp_activity_dw); 9992 9604 mlxsw_sp_router_ll_op_ctx_fini(mlxsw_sp->router); 9993 9605 mlxsw_sp_router_xm_fini(mlxsw_sp); 9994 9606 mutex_destroy(&mlxsw_sp->router->lock);
+4 -1
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.h
··· 80 80 struct mlxsw_sp_router_xm *xm; 81 81 const struct mlxsw_sp_adj_grp_size_range *adj_grp_size_ranges; 82 82 size_t adj_grp_size_ranges_count; 83 + struct delayed_work nh_grp_activity_dw; 84 + struct list_head nh_res_grp_list; 83 85 }; 84 86 85 87 struct mlxsw_sp_fib_entry_priv { ··· 211 209 int mlxsw_sp_nexthop_counter_get(struct mlxsw_sp *mlxsw_sp, 212 210 struct mlxsw_sp_nexthop *nh, u64 *p_counter); 213 211 int mlxsw_sp_nexthop_eth_update(struct mlxsw_sp *mlxsw_sp, u32 adj_index, 214 - struct mlxsw_sp_nexthop *nh); 212 + struct mlxsw_sp_nexthop *nh, bool force, 213 + char *ratr_pl); 215 214 void mlxsw_sp_nexthop_counter_alloc(struct mlxsw_sp *mlxsw_sp, 216 215 struct mlxsw_sp_nexthop *nh); 217 216 void mlxsw_sp_nexthop_counter_free(struct mlxsw_sp *mlxsw_sp,
+82
tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh
··· 33 33 nexthop_obj_invalid_test 34 34 nexthop_obj_offload_test 35 35 nexthop_obj_group_offload_test 36 + nexthop_obj_bucket_offload_test 36 37 nexthop_obj_blackhole_offload_test 37 38 nexthop_obj_route_offload_test 38 39 devlink_reload_test ··· 740 739 741 740 ip nexthop add id 1 dev $swp1 742 741 ip nexthop add id 2 dev $swp1 742 + ip nexthop add id 3 via 192.0.2.3 dev $swp1 743 743 ip nexthop add id 10 group 1/2 744 744 check_fail $? "managed to configure a nexthop group with device-only nexthops when should not" 745 745 746 + ip nexthop add id 10 group 3 type resilient buckets 7 747 + check_fail $? "managed to configure a too small resilient nexthop group when should not" 748 + 749 + ip nexthop add id 10 group 3 type resilient buckets 129 750 + check_fail $? "managed to configure a resilient nexthop group with invalid number of buckets when should not" 751 + 752 + ip nexthop add id 10 group 1/2 type resilient buckets 32 753 + check_fail $? "managed to configure a resilient nexthop group with device-only nexthops when should not" 754 + 755 + ip nexthop add id 10 group 3 type resilient buckets 32 756 + check_err $? "failed to configure a valid resilient nexthop group" 757 + ip nexthop replace id 3 dev $swp1 758 + check_fail $? "managed to populate a nexthop bucket with a device-only nexthop when should not" 759 + 746 760 log_test "nexthop objects - invalid configurations" 747 761 762 + ip nexthop del id 10 763 + ip nexthop del id 3 748 764 ip nexthop del id 2 749 765 ip nexthop del id 1 750 766 ··· 864 846 check_err $? "nexthop group not marked as offloaded after revalidating nexthop" 865 847 866 848 log_test "nexthop group objects offload indication" 849 + 850 + ip neigh del 2001:db8:1::2 dev $swp1 851 + ip neigh del 192.0.2.3 dev $swp1 852 + ip neigh del 192.0.2.2 dev $swp1 853 + ip nexthop del id 10 854 + ip nexthop del id 2 855 + ip nexthop del id 1 856 + 857 + simple_if_fini $swp2 858 + simple_if_fini $swp1 192.0.2.1/24 2001:db8:1::1/64 859 + } 860 + 861 + nexthop_obj_bucket_offload_test() 862 + { 863 + # Test offload indication of nexthop buckets 864 + RET=0 865 + 866 + simple_if_init $swp1 192.0.2.1/24 2001:db8:1::1/64 867 + simple_if_init $swp2 868 + setup_wait 869 + 870 + ip nexthop add id 1 via 192.0.2.2 dev $swp1 871 + ip nexthop add id 2 via 2001:db8:1::2 dev $swp1 872 + ip nexthop add id 10 group 1/2 type resilient buckets 32 idle_timer 0 873 + ip neigh replace 192.0.2.2 lladdr 00:11:22:33:44:55 nud reachable \ 874 + dev $swp1 875 + ip neigh replace 192.0.2.3 lladdr 00:11:22:33:44:55 nud reachable \ 876 + dev $swp1 877 + ip neigh replace 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud reachable \ 878 + dev $swp1 879 + 880 + busywait "$TIMEOUT" wait_for_offload \ 881 + ip nexthop bucket show nhid 1 882 + check_err $? "IPv4 nexthop buckets not marked as offloaded when should" 883 + busywait "$TIMEOUT" wait_for_offload \ 884 + ip nexthop bucket show nhid 2 885 + check_err $? "IPv6 nexthop buckets not marked as offloaded when should" 886 + 887 + # Invalidate nexthop id 1 888 + ip neigh replace 192.0.2.2 nud failed dev $swp1 889 + busywait "$TIMEOUT" wait_for_trap \ 890 + ip nexthop bucket show nhid 1 891 + check_err $? "IPv4 nexthop buckets not marked with trap when should" 892 + 893 + # Invalidate nexthop id 2 894 + ip neigh replace 2001:db8:1::2 nud failed dev $swp1 895 + busywait "$TIMEOUT" wait_for_trap \ 896 + ip nexthop bucket show nhid 2 897 + check_err $? "IPv6 nexthop buckets not marked with trap when should" 898 + 899 + # Revalidate nexthop id 1 by changing its configuration 900 + ip nexthop replace id 1 via 192.0.2.3 dev $swp1 901 + busywait "$TIMEOUT" wait_for_offload \ 902 + ip nexthop bucket show nhid 1 903 + check_err $? "nexthop bucket not marked as offloaded after revalidating nexthop" 904 + 905 + # Revalidate nexthop id 2 by changing its neighbour 906 + ip neigh replace 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud reachable \ 907 + dev $swp1 908 + busywait "$TIMEOUT" wait_for_offload \ 909 + ip nexthop bucket show nhid 2 910 + check_err $? "nexthop bucket not marked as offloaded after revalidating neighbour" 911 + 912 + log_test "nexthop bucket offload indication" 867 913 868 914 ip neigh del 2001:db8:1::2 dev $swp1 869 915 ip neigh del 192.0.2.3 dev $swp1
+5
tools/testing/selftests/net/forwarding/lib.sh
··· 353 353 "$@" | grep -q offload 354 354 } 355 355 356 + wait_for_trap() 357 + { 358 + "$@" | grep -q trap 359 + } 360 + 356 361 until_counter_is() 357 362 { 358 363 local expr=$1; shift