Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

delayacct: add delay max to record delay peak

Introduce the use cases of delay max, which can help quickly detect
potential abnormal delays in the system and record the types and specific
details of delay spikes.

Problem
========
Delay accounting can track the average delay of processes to show
system workload. However, when a process experiences a significant
delay, maybe a delay spike, which adversely affects performance,
getdelays can only display the average system delay over a period
of time. Yet, average delay is unhelpful for diagnosing delay peak.
It is not even possible to determine which type of delay has spiked,
as this information might be masked by the average delay.

Solution
=========
the 'delay max' can display delay peak since the system's startup,
which can record potential abnormal delays over time, including
the type of delay and the maximum delay. This is helpful for
quickly identifying crash caused by delay.

Use case
=========
bash# ./getdelays -d -p 244
print delayacct stats ON
PID 244

CPU count real total virtual total delay total delay average delay max
68 192000000 213676651 705643 0.010ms 0.306381ms
IO count delay total delay average delay max
0 0 0.000ms 0.000000ms
SWAP count delay total delay average delay max
0 0 0.000ms 0.000000ms
RECLAIM count delay total delay average delay max
0 0 0.000ms 0.000000ms
THRASHING count delay total delay average delay max
0 0 0.000ms 0.000000ms
COMPACT count delay total delay average delay max
0 0 0.000ms 0.000000ms
WPCOPY count delay total delay average delay max
235 15648284 0.067ms 0.263842ms
IRQ count delay total delay average delay max
0 0 0.000ms 0.000000ms

[wang.yaxin@zte.com.cn: update docs and fix some spelling errors]
Link: https://lkml.kernel.org/r/20241213192700771XKZ8H30OtHSeziGqRVMs0@zte.com.cn
Link: https://lkml.kernel.org/r/20241203164848805CS62CQPQWG9GLdQj2_BxS@zte.com.cn
Co-developed-by: Wang Yong <wang.yong12@zte.com.cn>
Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Co-developed-by: xu xin <xu.xin16@zte.com.cn>
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn>
Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Fan Yu <fan.yu9@zte.com.cn>
Cc: Peilin He <he.peilin@zte.com.cn>
Cc: tuqiang <tu.qiang35@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: ye xingchen <ye.xingchen@zte.com.cn>
Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Wang Yaxin and committed by
Andrew Morton
658eb5ab 1e185723

+103 -55
+19 -19
Documentation/accounting/delay-accounting.rst
··· 100 100 # ./getdelays -d -p 10 101 101 (output similar to next case) 102 102 103 - Get sum of delays, since system boot, for all pids with tgid 5:: 103 + Get sum and peak of delays, since system boot, for all pids with tgid 242:: 104 104 105 - # ./getdelays -d -t 5 105 + bash-4.4# ./getdelays -d -t 242 106 106 print delayacct stats ON 107 - TGID 5 107 + TGID 242 108 108 109 109 110 - CPU count real total virtual total delay total delay average 111 - 8 7000000 6872122 3382277 0.423ms 112 - IO count delay total delay average 113 - 0 0 0.000ms 114 - SWAP count delay total delay average 115 - 0 0 0.000ms 116 - RECLAIM count delay total delay average 117 - 0 0 0.000ms 118 - THRASHING count delay total delay average 119 - 0 0 0.000ms 120 - COMPACT count delay total delay average 121 - 0 0 0.000ms 122 - WPCOPY count delay total delay average 123 - 0 0 0.000ms 124 - IRQ count delay total delay average 125 - 0 0 0.000ms 110 + CPU count real total virtual total delay total delay average delay max 111 + 239 296000000 307724885 1127792 0.005ms 0.238382ms 112 + IO count delay total delay average delay max 113 + 0 0 0.000ms 0.000000ms 114 + SWAP count delay total delay average delay max 115 + 0 0 0.000ms 0.000000ms 116 + RECLAIM count delay total delay average delay max 117 + 0 0 0.000ms 0.000000ms 118 + THRASHING count delay total delay average delay max 119 + 0 0 0.000ms 0.000000ms 120 + COMPACT count delay total delay average delay max 121 + 0 0 0.000ms 0.000000ms 122 + WPCOPY count delay total delay average delay max 123 + 230 19100476 0.083ms 0.383822ms 124 + IRQ count delay total delay average delay max 125 + 0 0 0.000ms 0.000000ms 126 126 127 127 Get IO accounting for pid 1, it works only with -p:: 128 128
+7
include/linux/delayacct.h
··· 29 29 * XXX_delay contains the accumulated delay time in nanoseconds. 30 30 */ 31 31 u64 blkio_start; 32 + u64 blkio_delay_max; 32 33 u64 blkio_delay; /* wait for sync block io completion */ 33 34 u64 swapin_start; 35 + u64 swapin_delay_max; 34 36 u64 swapin_delay; /* wait for swapin */ 35 37 u32 blkio_count; /* total count of the number of sync block */ 36 38 /* io operations performed */ 37 39 u32 swapin_count; /* total count of swapin */ 38 40 39 41 u64 freepages_start; 42 + u64 freepages_delay_max; 40 43 u64 freepages_delay; /* wait for memory reclaim */ 41 44 42 45 u64 thrashing_start; 46 + u64 thrashing_delay_max; 43 47 u64 thrashing_delay; /* wait for thrashing page */ 44 48 45 49 u64 compact_start; 50 + u64 compact_delay_max; 46 51 u64 compact_delay; /* wait for memory compact */ 47 52 48 53 u64 wpcopy_start; 54 + u64 wpcopy_delay_max; 49 55 u64 wpcopy_delay; /* wait for write-protect copy */ 50 56 57 + u64 irq_delay_max; 51 58 u64 irq_delay; /* wait for IRQ/SOFTIRQ */ 52 59 53 60 u32 freepages_count; /* total count of memory reclaim */
+3
include/linux/sched.h
··· 398 398 /* Time spent waiting on a runqueue: */ 399 399 unsigned long long run_delay; 400 400 401 + /* Max time spent waiting on a runqueue: */ 402 + unsigned long long max_run_delay; 403 + 401 404 /* Timestamps: */ 402 405 403 406 /* When did we last run on a CPU? */
+9
include/uapi/linux/taskstats.h
··· 72 72 */ 73 73 __u64 cpu_count __attribute__((aligned(8))); 74 74 __u64 cpu_delay_total; 75 + __u64 cpu_delay_max; 75 76 76 77 /* Following four fields atomically updated using task->delays->lock */ 77 78 ··· 81 80 */ 82 81 __u64 blkio_count; 83 82 __u64 blkio_delay_total; 83 + __u64 blkio_delay_max; 84 84 85 85 /* Delay waiting for page fault I/O (swap in only) */ 86 86 __u64 swapin_count; 87 87 __u64 swapin_delay_total; 88 + __u64 swapin_delay_max; 88 89 89 90 /* cpu "wall-clock" running time 90 91 * On some architectures, value will adjust for cpu time stolen ··· 169 166 /* Delay waiting for memory reclaim */ 170 167 __u64 freepages_count; 171 168 __u64 freepages_delay_total; 169 + __u64 freepages_delay_max; 172 170 173 171 /* Delay waiting for thrashing page */ 174 172 __u64 thrashing_count; 175 173 __u64 thrashing_delay_total; 174 + __u64 thrashing_delay_max; 176 175 177 176 /* v10: 64-bit btime to avoid overflow */ 178 177 __u64 ac_btime64; /* 64-bit begin time */ ··· 182 177 /* v11: Delay waiting for memory compact */ 183 178 __u64 compact_count; 184 179 __u64 compact_delay_total; 180 + __u64 compact_delay_max; 185 181 186 182 /* v12 begin */ 187 183 __u32 ac_tgid; /* thread group ID */ ··· 204 198 /* v13: Delay waiting for write-protect copy */ 205 199 __u64 wpcopy_count; 206 200 __u64 wpcopy_delay_total; 201 + __u64 wpcopy_delay_max; 207 202 208 203 /* v14: Delay waiting for IRQ/SOFTIRQ */ 209 204 __u64 irq_count; 210 205 __u64 irq_delay_total; 206 + __u64 irq_delay_max; 207 + /* v15: add Delay max */ 211 208 }; 212 209 213 210
+27 -10
kernel/delayacct.c
··· 93 93 94 94 /* 95 95 * Finish delay accounting for a statistic using its timestamps (@start), 96 - * accumalator (@total) and @count 96 + * accumulator (@total) and @count 97 97 */ 98 - static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count) 98 + static void delayacct_end(raw_spinlock_t *lock, u64 *start, u64 *total, u32 *count, u64 *max) 99 99 { 100 100 s64 ns = local_clock() - *start; 101 101 unsigned long flags; ··· 104 104 raw_spin_lock_irqsave(lock, flags); 105 105 *total += ns; 106 106 (*count)++; 107 + if (ns > *max) 108 + *max = ns; 107 109 raw_spin_unlock_irqrestore(lock, flags); 108 110 } 109 111 } ··· 124 122 delayacct_end(&p->delays->lock, 125 123 &p->delays->blkio_start, 126 124 &p->delays->blkio_delay, 127 - &p->delays->blkio_count); 125 + &p->delays->blkio_count, 126 + &p->delays->blkio_delay_max); 128 127 } 129 128 130 129 int delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk) ··· 156 153 157 154 d->cpu_count += t1; 158 155 156 + d->cpu_delay_max = tsk->sched_info.max_run_delay; 159 157 tmp = (s64)d->cpu_delay_total + t2; 160 158 d->cpu_delay_total = (tmp < (s64)d->cpu_delay_total) ? 0 : tmp; 161 - 162 159 tmp = (s64)d->cpu_run_virtual_total + t3; 160 + 163 161 d->cpu_run_virtual_total = 164 162 (tmp < (s64)d->cpu_run_virtual_total) ? 0 : tmp; 165 163 ··· 168 164 return 0; 169 165 170 166 /* zero XXX_total, non-zero XXX_count implies XXX stat overflowed */ 171 - 172 167 raw_spin_lock_irqsave(&tsk->delays->lock, flags); 168 + d->blkio_delay_max = tsk->delays->blkio_delay_max; 173 169 tmp = d->blkio_delay_total + tsk->delays->blkio_delay; 174 170 d->blkio_delay_total = (tmp < d->blkio_delay_total) ? 0 : tmp; 171 + d->swapin_delay_max = tsk->delays->swapin_delay_max; 175 172 tmp = d->swapin_delay_total + tsk->delays->swapin_delay; 176 173 d->swapin_delay_total = (tmp < d->swapin_delay_total) ? 0 : tmp; 174 + d->freepages_delay_max = tsk->delays->freepages_delay_max; 177 175 tmp = d->freepages_delay_total + tsk->delays->freepages_delay; 178 176 d->freepages_delay_total = (tmp < d->freepages_delay_total) ? 0 : tmp; 177 + d->thrashing_delay_max = tsk->delays->thrashing_delay_max; 179 178 tmp = d->thrashing_delay_total + tsk->delays->thrashing_delay; 180 179 d->thrashing_delay_total = (tmp < d->thrashing_delay_total) ? 0 : tmp; 180 + d->compact_delay_max = tsk->delays->compact_delay_max; 181 181 tmp = d->compact_delay_total + tsk->delays->compact_delay; 182 182 d->compact_delay_total = (tmp < d->compact_delay_total) ? 0 : tmp; 183 + d->wpcopy_delay_max = tsk->delays->wpcopy_delay_max; 183 184 tmp = d->wpcopy_delay_total + tsk->delays->wpcopy_delay; 184 185 d->wpcopy_delay_total = (tmp < d->wpcopy_delay_total) ? 0 : tmp; 186 + d->irq_delay_max = tsk->delays->irq_delay_max; 185 187 tmp = d->irq_delay_total + tsk->delays->irq_delay; 186 188 d->irq_delay_total = (tmp < d->irq_delay_total) ? 0 : tmp; 187 189 d->blkio_count += tsk->delays->blkio_count; ··· 223 213 delayacct_end(&current->delays->lock, 224 214 &current->delays->freepages_start, 225 215 &current->delays->freepages_delay, 226 - &current->delays->freepages_count); 216 + &current->delays->freepages_count, 217 + &current->delays->freepages_delay_max); 227 218 } 228 219 229 220 void __delayacct_thrashing_start(bool *in_thrashing) ··· 246 235 delayacct_end(&current->delays->lock, 247 236 &current->delays->thrashing_start, 248 237 &current->delays->thrashing_delay, 249 - &current->delays->thrashing_count); 238 + &current->delays->thrashing_count, 239 + &current->delays->thrashing_delay_max); 250 240 } 251 241 252 242 void __delayacct_swapin_start(void) ··· 260 248 delayacct_end(&current->delays->lock, 261 249 &current->delays->swapin_start, 262 250 &current->delays->swapin_delay, 263 - &current->delays->swapin_count); 251 + &current->delays->swapin_count, 252 + &current->delays->swapin_delay_max); 264 253 } 265 254 266 255 void __delayacct_compact_start(void) ··· 274 261 delayacct_end(&current->delays->lock, 275 262 &current->delays->compact_start, 276 263 &current->delays->compact_delay, 277 - &current->delays->compact_count); 264 + &current->delays->compact_count, 265 + &current->delays->compact_delay_max); 278 266 } 279 267 280 268 void __delayacct_wpcopy_start(void) ··· 288 274 delayacct_end(&current->delays->lock, 289 275 &current->delays->wpcopy_start, 290 276 &current->delays->wpcopy_delay, 291 - &current->delays->wpcopy_count); 277 + &current->delays->wpcopy_count, 278 + &current->delays->wpcopy_delay_max); 292 279 } 293 280 294 281 void __delayacct_irq(struct task_struct *task, u32 delta) ··· 299 284 raw_spin_lock_irqsave(&task->delays->lock, flags); 300 285 task->delays->irq_delay += delta; 301 286 task->delays->irq_count++; 287 + if (delta > task->delays->irq_delay_max) 288 + task->delays->irq_delay_max = delta; 302 289 raw_spin_unlock_irqrestore(&task->delays->lock, flags); 303 290 } 304 291
+4 -1
kernel/sched/stats.h
··· 244 244 delta = rq_clock(rq) - t->sched_info.last_queued; 245 245 t->sched_info.last_queued = 0; 246 246 t->sched_info.run_delay += delta; 247 - 247 + if (delta > t->sched_info.max_run_delay) 248 + t->sched_info.max_run_delay = delta; 248 249 rq_sched_info_dequeue(rq, delta); 249 250 } 250 251 ··· 267 266 t->sched_info.run_delay += delta; 268 267 t->sched_info.last_arrival = now; 269 268 t->sched_info.pcount++; 269 + if (delta > t->sched_info.max_run_delay) 270 + t->sched_info.max_run_delay = delta; 270 271 271 272 rq_sched_info_arrive(rq, delta); 272 273 }
+34 -25
tools/accounting/getdelays.c
··· 192 192 } 193 193 194 194 #define average_ms(t, c) (t / 1000000ULL / (c ? c : 1)) 195 + #define delay_max_ms(t) (t / 1000000ULL) 195 196 196 197 static void print_delayacct(struct taskstats *t) 197 198 { 198 - printf("\n\nCPU %15s%15s%15s%15s%15s\n" 199 - " %15llu%15llu%15llu%15llu%15.3fms\n" 200 - "IO %15s%15s%15s\n" 201 - " %15llu%15llu%15.3fms\n" 202 - "SWAP %15s%15s%15s\n" 203 - " %15llu%15llu%15.3fms\n" 204 - "RECLAIM %12s%15s%15s\n" 205 - " %15llu%15llu%15.3fms\n" 206 - "THRASHING%12s%15s%15s\n" 207 - " %15llu%15llu%15.3fms\n" 208 - "COMPACT %12s%15s%15s\n" 209 - " %15llu%15llu%15.3fms\n" 210 - "WPCOPY %12s%15s%15s\n" 211 - " %15llu%15llu%15.3fms\n" 212 - "IRQ %15s%15s%15s\n" 213 - " %15llu%15llu%15.3fms\n", 199 + printf("\n\nCPU %15s%15s%15s%15s%15s%15s\n" 200 + " %15llu%15llu%15llu%15llu%15.3fms%13.6fms\n" 201 + "IO %15s%15s%15s%15s\n" 202 + " %15llu%15llu%15.3fms%13.6fms\n" 203 + "SWAP %15s%15s%15s%15s\n" 204 + " %15llu%15llu%15.3fms%13.6fms\n" 205 + "RECLAIM %12s%15s%15s%15s\n" 206 + " %15llu%15llu%15.3fms%13.6fms\n" 207 + "THRASHING%12s%15s%15s%15s\n" 208 + " %15llu%15llu%15.3fms%13.6fms\n" 209 + "COMPACT %12s%15s%15s%15s\n" 210 + " %15llu%15llu%15.3fms%13.6fms\n" 211 + "WPCOPY %12s%15s%15s%15s\n" 212 + " %15llu%15llu%15.3fms%13.6fms\n" 213 + "IRQ %15s%15s%15s%15s\n" 214 + " %15llu%15llu%15.3fms%13.6fms\n", 214 215 "count", "real total", "virtual total", 215 - "delay total", "delay average", 216 + "delay total", "delay average", "delay max", 216 217 (unsigned long long)t->cpu_count, 217 218 (unsigned long long)t->cpu_run_real_total, 218 219 (unsigned long long)t->cpu_run_virtual_total, 219 220 (unsigned long long)t->cpu_delay_total, 220 221 average_ms((double)t->cpu_delay_total, t->cpu_count), 221 - "count", "delay total", "delay average", 222 + delay_max_ms((double)t->cpu_delay_max), 223 + "count", "delay total", "delay average", "delay max", 222 224 (unsigned long long)t->blkio_count, 223 225 (unsigned long long)t->blkio_delay_total, 224 226 average_ms((double)t->blkio_delay_total, t->blkio_count), 225 - "count", "delay total", "delay average", 227 + delay_max_ms((double)t->blkio_delay_max), 228 + "count", "delay total", "delay average", "delay max", 226 229 (unsigned long long)t->swapin_count, 227 230 (unsigned long long)t->swapin_delay_total, 228 231 average_ms((double)t->swapin_delay_total, t->swapin_count), 229 - "count", "delay total", "delay average", 232 + delay_max_ms((double)t->swapin_delay_max), 233 + "count", "delay total", "delay average", "delay max", 230 234 (unsigned long long)t->freepages_count, 231 235 (unsigned long long)t->freepages_delay_total, 232 236 average_ms((double)t->freepages_delay_total, t->freepages_count), 233 - "count", "delay total", "delay average", 237 + delay_max_ms((double)t->freepages_delay_max), 238 + "count", "delay total", "delay average", "delay max", 234 239 (unsigned long long)t->thrashing_count, 235 240 (unsigned long long)t->thrashing_delay_total, 236 241 average_ms((double)t->thrashing_delay_total, t->thrashing_count), 237 - "count", "delay total", "delay average", 242 + delay_max_ms((double)t->thrashing_delay_max), 243 + "count", "delay total", "delay average", "delay max", 238 244 (unsigned long long)t->compact_count, 239 245 (unsigned long long)t->compact_delay_total, 240 246 average_ms((double)t->compact_delay_total, t->compact_count), 241 - "count", "delay total", "delay average", 247 + delay_max_ms((double)t->compact_delay_max), 248 + "count", "delay total", "delay average", "delay max", 242 249 (unsigned long long)t->wpcopy_count, 243 250 (unsigned long long)t->wpcopy_delay_total, 244 251 average_ms((double)t->wpcopy_delay_total, t->wpcopy_count), 245 - "count", "delay total", "delay average", 252 + delay_max_ms((double)t->wpcopy_delay_max), 253 + "count", "delay total", "delay average", "delay max", 246 254 (unsigned long long)t->irq_count, 247 255 (unsigned long long)t->irq_delay_total, 248 - average_ms((double)t->irq_delay_total, t->irq_count)); 256 + average_ms((double)t->irq_delay_total, t->irq_count), 257 + delay_max_ms((double)t->irq_delay_max)); 249 258 } 250 259 251 260 static void task_context_switch_counts(struct taskstats *t)