Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

xsk: use a smaller new lock for shared pool case

- Split cq_lock into two smaller locks: cq_prod_lock and
cq_cached_prod_lock
- Avoid disabling/enabling interrupts in the hot xmit path

In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function,
the race condition is only between multiple xsks sharing the same
pool. They are all in the process context rather than interrupt context,
so now the small lock named cq_cached_prod_lock can be used without
handling interrupts.

While cq_cached_prod_lock ensures the exclusive modification of
@cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares
about @producer and corresponding @desc. Both of them don't necessarily
be consistent with @cached_prod protected by cq_cached_prod_lock.
That's the reason why the previous big lock can be split into two
smaller ones. Please note that SPSC rule is all about the global state
of producer and consumer that can affect both layers instead of local
or cached ones.

Frequently disabling and enabling interrupt are very time consuming
in some cases, especially in a per-descriptor granularity, which now
can be avoided after this optimization, even when the pool is shared by
multiple xsks.

With this patch, the performance number[1] could go from 1,872,565 pps
to 1,961,009 pps. It's a minor rise of around 5%.

[1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20251030000646.18859-3-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Jason Xing and committed by
Paolo Abeni
30ed05ad 46228004

+17 -14
+9 -4
include/net/xsk_buff_pool.h
··· 85 85 bool unaligned; 86 86 bool tx_sw_csum; 87 87 void *addrs; 88 - /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: 89 - * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when 90 - * sockets share a single cq when the same netdev and queue id is shared. 88 + /* Mutual exclusion of the completion ring in the SKB mode. 89 + * Protect: NAPI TX thread and sendmsg error paths in the SKB 90 + * destructor callback. 91 91 */ 92 - spinlock_t cq_lock; 92 + spinlock_t cq_prod_lock; 93 + /* Mutual exclusion of the completion ring in the SKB mode. 94 + * Protect: when sockets share a single cq when the same netdev 95 + * and queue id is shared. 96 + */ 97 + spinlock_t cq_cached_prod_lock; 93 98 struct xdp_buff_xsk *free_heads[]; 94 99 }; 95 100
+6 -9
net/xdp/xsk.c
··· 548 548 549 549 static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) 550 550 { 551 - unsigned long flags; 552 551 int ret; 553 552 554 - spin_lock_irqsave(&pool->cq_lock, flags); 553 + spin_lock(&pool->cq_cached_prod_lock); 555 554 ret = xskq_prod_reserve(pool->cq); 556 - spin_unlock_irqrestore(&pool->cq_lock, flags); 555 + spin_unlock(&pool->cq_cached_prod_lock); 557 556 558 557 return ret; 559 558 } ··· 565 566 unsigned long flags; 566 567 u32 idx; 567 568 568 - spin_lock_irqsave(&pool->cq_lock, flags); 569 + spin_lock_irqsave(&pool->cq_prod_lock, flags); 569 570 idx = xskq_get_prod(pool->cq); 570 571 571 572 xskq_prod_write_addr(pool->cq, idx, ··· 582 583 } 583 584 } 584 585 xskq_prod_submit_n(pool->cq, descs_processed); 585 - spin_unlock_irqrestore(&pool->cq_lock, flags); 586 + spin_unlock_irqrestore(&pool->cq_prod_lock, flags); 586 587 } 587 588 588 589 static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n) 589 590 { 590 - unsigned long flags; 591 - 592 - spin_lock_irqsave(&pool->cq_lock, flags); 591 + spin_lock(&pool->cq_cached_prod_lock); 593 592 xskq_prod_cancel_n(pool->cq, n); 594 - spin_unlock_irqrestore(&pool->cq_lock, flags); 593 + spin_unlock(&pool->cq_cached_prod_lock); 595 594 } 596 595 597 596 static void xsk_inc_num_desc(struct sk_buff *skb)
+2 -1
net/xdp/xsk_buff_pool.c
··· 90 90 INIT_LIST_HEAD(&pool->xskb_list); 91 91 INIT_LIST_HEAD(&pool->xsk_tx_list); 92 92 spin_lock_init(&pool->xsk_tx_list_lock); 93 - spin_lock_init(&pool->cq_lock); 93 + spin_lock_init(&pool->cq_prod_lock); 94 + spin_lock_init(&pool->cq_cached_prod_lock); 94 95 refcount_set(&pool->users, 1); 95 96 96 97 pool->fq = xs->fq_tmp;