Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

workqueue: Process rescuer work items one-by-one using a cursor

Previously, the rescuer scanned for all matching work items at once and
processed them within a single rescuer thread, which could cause one
blocking work item to stall all others.

Make the rescuer process work items one-by-one instead of slurping all
matches in a single pass.

Break the rescuer loop after finding and processing the first matching
work item, then restart the search to pick up the next. This gives
normal worker threads a chance to process other items which gives them
the opportunity to be processed instead of waiting on the rescuer's
queue and prevents a blocking work item from stalling the rest once
memory pressure is relieved.

Introduce a dummy cursor work item to avoid potentially O(N^2)
rescans of the work list. The marker records the resume position for
the next scan, eliminating redundant traversals.

Also introduce RESCUER_BATCH to control the maximum number of work items
the rescuer processes in each turn, and move on to other PWQs when the
limit is reached.

Cc: ying chen <yc1082463@gmail.com>
Reported-by: ying chen <yc1082463@gmail.com>
Fixes: e22bee782b3b ("workqueue: implement concurrency managed dynamic worker pool")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Lai Jiangshan and committed by
Tejun Heo
e5a30c30 fc5ff53d

+59 -16
+59 -16
kernel/workqueue.c
··· 117 117 MAYDAY_INTERVAL = HZ / 10, /* and then every 100ms */ 118 118 CREATE_COOLDOWN = HZ, /* time to breath after fail */ 119 119 120 + RESCUER_BATCH = 16, /* process items per turn */ 121 + 120 122 /* 121 123 * Rescue workers are used only on emergencies and shared by 122 124 * all cpus. Give MIN_NICE. ··· 288 286 struct list_head pending_node; /* LN: node on wq_node_nr_active->pending_pwqs */ 289 287 struct list_head pwqs_node; /* WR: node on wq->pwqs */ 290 288 struct list_head mayday_node; /* MD: node on wq->maydays */ 289 + struct work_struct mayday_cursor; /* L: cursor on pool->worklist */ 291 290 292 291 u64 stats[PWQ_NR_STATS]; 293 292 ··· 1123 1120 return NULL; 1124 1121 } 1125 1122 1123 + static void mayday_cursor_func(struct work_struct *work) 1124 + { 1125 + /* should not be processed, only for marking position */ 1126 + BUG(); 1127 + } 1128 + 1126 1129 /** 1127 1130 * move_linked_works - move linked works to a list 1128 1131 * @work: start of series of works to be scheduled ··· 1190 1181 struct worker *collision; 1191 1182 1192 1183 lockdep_assert_held(&pool->lock); 1184 + 1185 + /* The cursor work should not be processed */ 1186 + if (unlikely(work->func == mayday_cursor_func)) { 1187 + /* only worker_thread() can possibly take this branch */ 1188 + WARN_ON_ONCE(worker->rescue_wq); 1189 + if (nextp) 1190 + *nextp = list_next_entry(work, entry); 1191 + list_del_init(&work->entry); 1192 + return false; 1193 + } 1193 1194 1194 1195 /* 1195 1196 * A single work shouldn't be executed concurrently by multiple workers. ··· 3458 3439 static bool assign_rescuer_work(struct pool_workqueue *pwq, struct worker *rescuer) 3459 3440 { 3460 3441 struct worker_pool *pool = pwq->pool; 3442 + struct work_struct *cursor = &pwq->mayday_cursor; 3461 3443 struct work_struct *work, *n; 3462 3444 3463 3445 /* need rescue? */ 3464 3446 if (!pwq->nr_active || !need_to_create_worker(pool)) 3465 3447 return false; 3466 3448 3467 - /* 3468 - * Slurp in all works issued via this workqueue and 3469 - * process'em. 3470 - */ 3471 - list_for_each_entry_safe(work, n, &pool->worklist, entry) { 3472 - if (get_work_pwq(work) == pwq && assign_work(work, rescuer, &n)) 3449 + /* search from the start or cursor if available */ 3450 + if (list_empty(&cursor->entry)) 3451 + work = list_first_entry(&pool->worklist, struct work_struct, entry); 3452 + else 3453 + work = list_next_entry(cursor, entry); 3454 + 3455 + /* find the next work item to rescue */ 3456 + list_for_each_entry_safe_from(work, n, &pool->worklist, entry) { 3457 + if (get_work_pwq(work) == pwq && assign_work(work, rescuer, &n)) { 3473 3458 pwq->stats[PWQ_STAT_RESCUED]++; 3459 + /* put the cursor for next search */ 3460 + list_move_tail(&cursor->entry, &n->entry); 3461 + return true; 3462 + } 3474 3463 } 3475 3464 3476 - return !list_empty(&rescuer->scheduled); 3465 + return false; 3477 3466 } 3478 3467 3479 3468 /** ··· 3538 3511 struct pool_workqueue *pwq = list_first_entry(&wq->maydays, 3539 3512 struct pool_workqueue, mayday_node); 3540 3513 struct worker_pool *pool = pwq->pool; 3514 + unsigned int count = 0; 3541 3515 3542 3516 __set_current_state(TASK_RUNNING); 3543 3517 list_del_init(&pwq->mayday_node); ··· 3551 3523 3552 3524 WARN_ON_ONCE(!list_empty(&rescuer->scheduled)); 3553 3525 3554 - if (assign_rescuer_work(pwq, rescuer)) { 3526 + while (assign_rescuer_work(pwq, rescuer)) { 3555 3527 process_scheduled_works(rescuer); 3556 3528 3557 3529 /* 3558 - * The above execution of rescued work items could 3559 - * have created more to rescue through 3560 - * pwq_activate_first_inactive() or chained 3561 - * queueing. Let's put @pwq back on mayday list so 3562 - * that such back-to-back work items, which may be 3563 - * being used to relieve memory pressure, don't 3564 - * incur MAYDAY_INTERVAL delay inbetween. 3530 + * If the per-turn work item limit is reached and other 3531 + * PWQs are in mayday, requeue mayday for this PWQ and 3532 + * let the rescuer handle the other PWQs first. 3565 3533 */ 3566 - if (pwq->nr_active && need_to_create_worker(pool)) { 3534 + if (++count > RESCUER_BATCH && !list_empty(&pwq->wq->maydays) && 3535 + pwq->nr_active && need_to_create_worker(pool)) { 3567 3536 raw_spin_lock(&wq_mayday_lock); 3568 3537 send_mayday(pwq); 3569 3538 raw_spin_unlock(&wq_mayday_lock); 3539 + break; 3570 3540 } 3571 3541 } 3542 + 3543 + /* The cursor can not be left behind without the rescuer watching it. */ 3544 + if (!list_empty(&pwq->mayday_cursor.entry) && list_empty(&pwq->mayday_node)) 3545 + list_del_init(&pwq->mayday_cursor.entry); 3572 3546 3573 3547 /* 3574 3548 * Leave this pool. Notify regular workers; otherwise, we end up ··· 5190 5160 INIT_LIST_HEAD(&pwq->pwqs_node); 5191 5161 INIT_LIST_HEAD(&pwq->mayday_node); 5192 5162 kthread_init_work(&pwq->release_work, pwq_release_workfn); 5163 + 5164 + /* 5165 + * Set the dummy cursor work with valid function and get_work_pwq(). 5166 + * 5167 + * The cursor work should only be in the pwq->pool->worklist, and 5168 + * should not be treated as a processable work item. 5169 + * 5170 + * WORK_STRUCT_PENDING and WORK_STRUCT_INACTIVE just make it less 5171 + * surprise for kernel debugging tools and reviewers. 5172 + */ 5173 + INIT_WORK(&pwq->mayday_cursor, mayday_cursor_func); 5174 + atomic_long_set(&pwq->mayday_cursor.data, (unsigned long)pwq | 5175 + WORK_STRUCT_PENDING | WORK_STRUCT_PWQ | WORK_STRUCT_INACTIVE); 5193 5176 } 5194 5177 5195 5178 /* sync @pwq with the current state of its associated wq and link it */