Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

net: Fix rcu_tasks stall in threaded busypoll

I was debugging a NIC driver when I noticed that when I enable
threaded busypoll, bpftrace hangs when starting up. dmesg showed:

rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 10658 jiffies old.
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 40793 jiffies old.
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 131273 jiffies old.
rcu_tasks_wait_gp: rcu_tasks grace period number 85 (since boot) is 402058 jiffies old.
INFO: rcu_tasks detected stalls on tasks:
00000000769f52cd: .N nvcsw: 2/2 holdout: 1 idle_cpu: -1/64
task:napi/eth2-8265 state:R running task stack:0 pid:48300 tgid:48300 ppid:2 task_flags:0x208040 flags:0x00004000
Call Trace:
<TASK>
? napi_threaded_poll_loop+0x27c/0x2c0
? __pfx_napi_threaded_poll+0x10/0x10
? napi_threaded_poll+0x26/0x80
? kthread+0xfa/0x240
? __pfx_kthread+0x10/0x10
? ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
? ret_from_fork_asm+0x1a/0x30
</TASK>

The cause is that in threaded busypoll, the main loop is in
napi_threaded_poll rather than napi_threaded_poll_loop, where the
latter rarely iterates more than once within its loop. For
rcu_softirq_qs_periodic inside napi_threaded_poll_loop to report its
qs state, the last_qs must be 100ms behind, and this can't happen
because napi_threaded_poll_loop rarely iterates in threaded busypoll,
and each time napi_threaded_poll_loop is called last_qs is reset to
latest jiffies.

This patch changes so that in threaded busypoll, last_qs is saved
in the outer napi_threaded_poll, and whether busy_poll_last_qs
is NULL indicates whether napi_threaded_poll_loop is called for
busypoll. This way last_qs would not reset to latest jiffies on
each invocation of napi_threaded_poll_loop.

Fixes: c18d4b190a46 ("net: Extend NAPI threaded polling to allow kthread based busy polling")
Cc: stable@vger.kernel.org
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20260227221937.1060857-1-zhuyifei@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

YiFei Zhu and committed by
Paolo Abeni
1a86a1f7 6a877ece

+11 -6
+11 -6
net/core/dev.c
··· 7794 7794 return -1; 7795 7795 } 7796 7796 7797 - static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll) 7797 + static void napi_threaded_poll_loop(struct napi_struct *napi, 7798 + unsigned long *busy_poll_last_qs) 7798 7799 { 7800 + unsigned long last_qs = busy_poll_last_qs ? *busy_poll_last_qs : jiffies; 7799 7801 struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; 7800 7802 struct softnet_data *sd; 7801 - unsigned long last_qs = jiffies; 7802 7803 7803 7804 for (;;) { 7804 7805 bool repoll = false; ··· 7828 7827 /* When busy poll is enabled, the old packets are not flushed in 7829 7828 * napi_complete_done. So flush them here. 7830 7829 */ 7831 - if (busy_poll) 7830 + if (busy_poll_last_qs) 7832 7831 gro_flush_normal(&napi->gro, HZ >= 1000); 7833 7832 local_bh_enable(); 7834 7833 7835 7834 /* Call cond_resched here to avoid watchdog warnings. */ 7836 - if (repoll || busy_poll) { 7835 + if (repoll || busy_poll_last_qs) { 7837 7836 rcu_softirq_qs_periodic(last_qs); 7838 7837 cond_resched(); 7839 7838 } ··· 7841 7840 if (!repoll) 7842 7841 break; 7843 7842 } 7843 + 7844 + if (busy_poll_last_qs) 7845 + *busy_poll_last_qs = last_qs; 7844 7846 } 7845 7847 7846 7848 static int napi_threaded_poll(void *data) 7847 7849 { 7848 7850 struct napi_struct *napi = data; 7851 + unsigned long last_qs = jiffies; 7849 7852 bool want_busy_poll; 7850 7853 bool in_busy_poll; 7851 7854 unsigned long val; ··· 7867 7862 assign_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state, 7868 7863 want_busy_poll); 7869 7864 7870 - napi_threaded_poll_loop(napi, want_busy_poll); 7865 + napi_threaded_poll_loop(napi, want_busy_poll ? &last_qs : NULL); 7871 7866 } 7872 7867 7873 7868 return 0; ··· 13180 13175 { 13181 13176 struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu); 13182 13177 13183 - napi_threaded_poll_loop(&sd->backlog, false); 13178 + napi_threaded_poll_loop(&sd->backlog, NULL); 13184 13179 } 13185 13180 13186 13181 static void backlog_napi_setup(unsigned int cpu)