Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

io_uring/sqpoll: be smarter on when to update the stime usage

The current approach is a bit naive, and hence calls the time querying
way too often. Only start the "doing work" timer when there's actual
work to do, and then use that information to terminate (and account) the
work time once done. This greatly reduces the frequency of these calls,
when they cannot have changed anyway.

Running a basic random reader that is setup to use SQPOLL, a profile
before this change shows these as the top cycle consumers:

+ 32.60% iou-sqp-1074 [kernel.kallsyms] [k] thread_group_cputime_adjusted
+ 19.97% iou-sqp-1074 [kernel.kallsyms] [k] thread_group_cputime
+ 12.20% io_uring io_uring [.] submitter_uring_fn
+ 4.13% iou-sqp-1074 [kernel.kallsyms] [k] getrusage
+ 2.45% iou-sqp-1074 [kernel.kallsyms] [k] io_submit_sqes
+ 2.18% iou-sqp-1074 [kernel.kallsyms] [k] __pi_memset_generic
+ 2.09% iou-sqp-1074 [kernel.kallsyms] [k] cputime_adjust

and after this change, top of profile looks as follows:

+ 36.23% io_uring io_uring [.] submitter_uring_fn
+ 23.26% iou-sqp-819 [kernel.kallsyms] [k] io_sq_thread
+ 10.14% iou-sqp-819 [kernel.kallsyms] [k] io_sq_tw
+ 6.52% iou-sqp-819 [kernel.kallsyms] [k] tctx_task_work_run
+ 4.82% iou-sqp-819 [kernel.kallsyms] [k] nvme_submit_cmds.part.0
+ 2.91% iou-sqp-819 [kernel.kallsyms] [k] io_submit_sqes
[...]
0.02% iou-sqp-819 [kernel.kallsyms] [k] cputime_adjust

where it's spending the cycles on things that actually matter.

Reported-by: Fengnan Chang <changfengnan@bytedance.com>
Cc: stable@vger.kernel.org
Fixes: 3fcb9d17206e ("io_uring/sqpoll: statistics of the true utilization of sq threads")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

+32 -11
+32 -11
io_uring/sqpoll.c
··· 170 170 return READ_ONCE(sqd->state); 171 171 } 172 172 173 + struct io_sq_time { 174 + bool started; 175 + u64 usec; 176 + }; 177 + 173 178 u64 io_sq_cpu_usec(struct task_struct *tsk) 174 179 { 175 180 u64 utime, stime; ··· 184 179 return stime; 185 180 } 186 181 187 - static void io_sq_update_worktime(struct io_sq_data *sqd, u64 usec) 182 + static void io_sq_update_worktime(struct io_sq_data *sqd, struct io_sq_time *ist) 188 183 { 189 - sqd->work_time += io_sq_cpu_usec(current) - usec; 184 + if (!ist->started) 185 + return; 186 + ist->started = false; 187 + sqd->work_time += io_sq_cpu_usec(current) - ist->usec; 190 188 } 191 189 192 - static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) 190 + static void io_sq_start_worktime(struct io_sq_time *ist) 191 + { 192 + if (ist->started) 193 + return; 194 + ist->started = true; 195 + ist->usec = io_sq_cpu_usec(current); 196 + } 197 + 198 + static int __io_sq_thread(struct io_ring_ctx *ctx, struct io_sq_data *sqd, 199 + bool cap_entries, struct io_sq_time *ist) 193 200 { 194 201 unsigned int to_submit; 195 202 int ret = 0; ··· 213 196 214 197 if (to_submit || !wq_list_empty(&ctx->iopoll_list)) { 215 198 const struct cred *creds = NULL; 199 + 200 + io_sq_start_worktime(ist); 216 201 217 202 if (ctx->sq_creds != current_cred()) 218 203 creds = override_creds(ctx->sq_creds); ··· 297 278 unsigned long timeout = 0; 298 279 char buf[TASK_COMM_LEN] = {}; 299 280 DEFINE_WAIT(wait); 300 - u64 start; 301 281 302 282 /* offload context creation failed, just exit */ 303 283 if (!current->io_uring) { ··· 331 313 mutex_lock(&sqd->lock); 332 314 while (1) { 333 315 bool cap_entries, sqt_spin = false; 316 + struct io_sq_time ist = { }; 334 317 335 318 if (io_sqd_events_pending(sqd) || signal_pending(current)) { 336 319 if (io_sqd_handle_event(sqd)) ··· 340 321 } 341 322 342 323 cap_entries = !list_is_singular(&sqd->ctx_list); 343 - start = io_sq_cpu_usec(current); 344 324 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { 345 - int ret = __io_sq_thread(ctx, cap_entries); 325 + int ret = __io_sq_thread(ctx, sqd, cap_entries, &ist); 346 326 347 327 if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) 348 328 sqt_spin = true; ··· 349 331 if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE)) 350 332 sqt_spin = true; 351 333 352 - list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) 353 - if (io_napi(ctx)) 334 + list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { 335 + if (io_napi(ctx)) { 336 + io_sq_start_worktime(&ist); 354 337 io_napi_sqpoll_busy_poll(ctx); 338 + } 339 + } 340 + 341 + io_sq_update_worktime(sqd, &ist); 355 342 356 343 if (sqt_spin || !time_after(jiffies, timeout)) { 357 - if (sqt_spin) { 358 - io_sq_update_worktime(sqd, start); 344 + if (sqt_spin) 359 345 timeout = jiffies + sqd->sq_thread_idle; 360 - } 361 346 if (unlikely(need_resched())) { 362 347 mutex_unlock(&sqd->lock); 363 348 cond_resched();