net: rps: softnet_data reorg to make enqueue_to_backlog() fast

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

enqueue_to_backlog() is showing up in kernel profiles on hosts
with many cores, when RFS/RPS is used.

The following softnet_data fields need to be updated:

- input_queue_tail
- input_pkt_queue (next, prev, qlen, lock)
- backlog.state (if input_pkt_queue was empty)

Unfortunately they are currenly using two cache lines:

/* --- cacheline 3 boundary (192 bytes) --- */
call_single_data_t csd __attribute__((__aligned__(64))); /* 0xc0 0x20 */
struct softnet_data * rps_ipi_next; /* 0xe0 0x8 */
unsigned int cpu; /* 0xe8 0x4 */
unsigned int input_queue_tail; /* 0xec 0x4 */
struct sk_buff_head input_pkt_queue; /* 0xf0 0x18 */

/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */

struct napi_struct backlog __attribute__((__aligned__(8))); /* 0x108 0x1f0 */

Add one ____cacheline_aligned_in_smp to make sure they now are using
a single cache line.

Also, because napi_struct has written fields, make @state its first field.

We want to make sure that cpus adding packets to sd->input_pkt_queue
are not slowing down cpus processing their backlog because of
false sharing.

After this patch new layout is:

/* --- cacheline 5 boundary (320 bytes) --- */
long int pad[3] __attribute__((__aligned__(64))); /* 0x140 0x18 */
unsigned int input_queue_tail; /* 0x158 0x4 */

/* XXX 4 bytes hole, try to pack */

struct sk_buff_head input_pkt_queue; /* 0x160 0x18 */
struct napi_struct backlog __attribute__((__aligned__(8))); /* 0x178 0x1f0 */

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251024091240.3292546-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Eric Dumazet and committed by

Jakub Kicinski 7 months ago c72568c2 a086e986

+10 -1

1 changed file

expand all

include

linux

netdevice.h

+10 -1

include/linux/netdevice.h

··· 377 377 * Structure for NAPI scheduling similar to tasklet but with weighting 378 378 */ 379 379 struct napi_struct { 380 + /* This field should be first or softnet_data.backlog needs tweaks. */ 381 + unsigned long state; 380 382 /* The poll_list must only be managed by the entity which 381 383 * changes the state of the NAPI_STATE_SCHED bit. This means 382 384 * whoever atomically sets that bit can add this napi_struct ··· 387 385 */ 388 386 struct list_head poll_list; 389 387 390 - unsigned long state; 391 388 int weight; 392 389 u32 defer_hard_irqs_count; 393 390 int (*poll)(struct napi_struct *, int); ··· 3530 3529 call_single_data_t csd ____cacheline_aligned_in_smp; 3531 3530 struct softnet_data *rps_ipi_next; 3532 3531 unsigned int cpu; 3532 + 3533 + /* We force a cacheline alignment from here, to hold together 3534 + * input_queue_tail, input_pkt_queue and backlog.state. 3535 + * We add holes so that backlog.state is the last field 3536 + * of this cache line. 3537 + */ 3538 + long pad[3] ____cacheline_aligned_in_smp; 3533 3539 unsigned int input_queue_tail; 3534 3540 #endif 3535 3541 struct sk_buff_head input_pkt_queue; 3542 + 3536 3543 struct napi_struct backlog; 3537 3544 3538 3545 struct numa_drop_counters drop_counters;

Configure Feed

Configure Feed