Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'net-sched-skip_sw'

Asbjørn Sloth Tønnesen says:

====================
make skip_sw actually skip software

During development of flower-route[1], which I
recently presented at FOSDEM[2], I noticed that
CPU usage, would increase the more rules I installed
into the hardware for IP forwarding offloading.

Since we use TC flower offload for the hottest
prefixes, and leave the long tail to the normal (non-TC)
Linux network stack for slow-path IP forwarding.
We therefore need both the hardware and software
datapath to perform well.

I found that skip_sw rules, are quite expensive
in the kernel datapath, since they must be evaluated
and matched upon, before the kernel checks the
skip_sw flag.

This patchset optimizes the case where all rules
are skip_sw, by implementing a TC bypass for these
cases, where TC is only used as a control plane
for the hardware path.

v4:
- Rebased onto net-next, now that net-next is open again

v3: https://lore.kernel.org/netdev/20240306165813.656931-1-ast@fiberby.net/
- Patch 3:
- Fix source_inline
- Fix build failure, when CONFIG_NET_CLS without CONFIG_NET_CLS_ACT.

v2: https://lore.kernel.org/netdev/20240305144404.569632-1-ast@fiberby.net/
- Patch 1:
- Add Reviewed-By from Jiri Pirko
- Patch 2:
- Move code, to avoid forward declaration (Jiri).
- Patch 3
- Refactor to use a static key.
- Add performance data for trapping, or sending
a packet to a non-existent chain (as suggested by Marcelo).

v1: https://lore.kernel.org/netdev/20240215160458.1727237-1-ast@fiberby.net/

[1] flower-route
https://github.com/fiberby-dk/flower-route

[2] FOSDEM talk
https://fosdem.org/2024/schedule/event/fosdem-2024-3337-flying-higher-hardware-offloading-with-bird/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+64
+9
include/net/pkt_cls.h
··· 74 74 return block && block->index; 75 75 } 76 76 77 + #ifdef CONFIG_NET_CLS_ACT 78 + DECLARE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key); 79 + 80 + static inline bool tcf_block_bypass_sw(struct tcf_block *block) 81 + { 82 + return block && block->bypass_wanted; 83 + } 84 + #endif 85 + 77 86 static inline struct Qdisc *tcf_block_q(struct tcf_block *block) 78 87 { 79 88 WARN_ON(tcf_block_shared(block));
+4
include/net/sch_generic.h
··· 422 422 */ 423 423 spinlock_t lock; 424 424 bool deleting; 425 + bool counted; 425 426 refcount_t refcnt; 426 427 struct rcu_head rcu; 427 428 struct hlist_node destroy_ht_node; ··· 472 471 struct flow_block flow_block; 473 472 struct list_head owner_list; 474 473 bool keep_dst; 474 + bool bypass_wanted; 475 + atomic_t filtercnt; /* Number of filters */ 476 + atomic_t skipswcnt; /* Number of skip_sw filters */ 475 477 atomic_t offloadcnt; /* Number of oddloaded filters */ 476 478 unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */ 477 479 unsigned int lockeddevcnt; /* Number of devs that require rtnl lock. */
+10
net/core/dev.c
··· 2083 2083 EXPORT_SYMBOL_GPL(net_dec_egress_queue); 2084 2084 #endif 2085 2085 2086 + #ifdef CONFIG_NET_CLS_ACT 2087 + DEFINE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key); 2088 + EXPORT_SYMBOL(tcf_bypass_check_needed_key); 2089 + #endif 2090 + 2086 2091 DEFINE_STATIC_KEY_FALSE(netstamp_needed_key); 2087 2092 EXPORT_SYMBOL(netstamp_needed_key); 2088 2093 #ifdef CONFIG_JUMP_LABEL ··· 3941 3936 3942 3937 if (!miniq) 3943 3938 return ret; 3939 + 3940 + if (static_branch_unlikely(&tcf_bypass_check_needed_key)) { 3941 + if (tcf_block_bypass_sw(miniq->block)) 3942 + return ret; 3943 + } 3944 3944 3945 3945 tc_skb_cb(skb)->mru = 0; 3946 3946 tc_skb_cb(skb)->post_ct = false;
+41
net/sched/cls_api.c
··· 410 410 refcount_inc(&tp->refcnt); 411 411 } 412 412 413 + static void tcf_maintain_bypass(struct tcf_block *block) 414 + { 415 + int filtercnt = atomic_read(&block->filtercnt); 416 + int skipswcnt = atomic_read(&block->skipswcnt); 417 + bool bypass_wanted = filtercnt > 0 && filtercnt == skipswcnt; 418 + 419 + if (bypass_wanted != block->bypass_wanted) { 420 + #ifdef CONFIG_NET_CLS_ACT 421 + if (bypass_wanted) 422 + static_branch_inc(&tcf_bypass_check_needed_key); 423 + else 424 + static_branch_dec(&tcf_bypass_check_needed_key); 425 + #endif 426 + block->bypass_wanted = bypass_wanted; 427 + } 428 + } 429 + 430 + static void tcf_block_filter_cnt_update(struct tcf_block *block, bool *counted, bool add) 431 + { 432 + lockdep_assert_not_held(&block->cb_lock); 433 + 434 + down_write(&block->cb_lock); 435 + if (*counted != add) { 436 + if (add) { 437 + atomic_inc(&block->filtercnt); 438 + *counted = true; 439 + } else { 440 + atomic_dec(&block->filtercnt); 441 + *counted = false; 442 + } 443 + } 444 + tcf_maintain_bypass(block); 445 + up_write(&block->cb_lock); 446 + } 447 + 413 448 static void tcf_chain_put(struct tcf_chain *chain); 414 449 415 450 static void tcf_proto_destroy(struct tcf_proto *tp, bool rtnl_held, 416 451 bool sig_destroy, struct netlink_ext_ack *extack) 417 452 { 418 453 tp->ops->destroy(tp, rtnl_held, extack); 454 + tcf_block_filter_cnt_update(tp->chain->block, &tp->counted, false); 419 455 if (sig_destroy) 420 456 tcf_proto_signal_destroyed(tp->chain, tp); 421 457 tcf_chain_put(tp->chain); ··· 2400 2364 err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh, 2401 2365 flags, extack); 2402 2366 if (err == 0) { 2367 + tcf_block_filter_cnt_update(block, &tp->counted, true); 2403 2368 tfilter_notify(net, skb, n, tp, block, q, parent, fh, 2404 2369 RTM_NEWTFILTER, false, rtnl_held, extack); 2405 2370 tfilter_put(tp, fh); ··· 3520 3483 if (*flags & TCA_CLS_FLAGS_IN_HW) 3521 3484 return; 3522 3485 *flags |= TCA_CLS_FLAGS_IN_HW; 3486 + if (tc_skip_sw(*flags)) 3487 + atomic_inc(&block->skipswcnt); 3523 3488 atomic_inc(&block->offloadcnt); 3524 3489 } 3525 3490 ··· 3530 3491 if (!(*flags & TCA_CLS_FLAGS_IN_HW)) 3531 3492 return; 3532 3493 *flags &= ~TCA_CLS_FLAGS_IN_HW; 3494 + if (tc_skip_sw(*flags)) 3495 + atomic_dec(&block->skipswcnt); 3533 3496 atomic_dec(&block->offloadcnt); 3534 3497 } 3535 3498