Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

tools/delaytop: add flexible sorting by delay field

Patch series "tools/delaytop: implement real-time keyboard interaction
support", v2.

Current Limitations
===================

The current delaytop implementation has two main limitations:

1) Static sorting only by CPU delay Forcing users to restart with
different parameters to analyze other resource bottlenecks.

2) Memory delay information is always expanded Causing information
overload when only high-level memory pressure monitoring is needed.

Improvements
============

1) Implemented dynamic sorting capability
- Interactive key 'o' triggers sort mode.
- Supports sorting by CPU/IO/Memory/IRQ delays.
- Memory subcategories available in verbose mode.
* c - CPU delay (default)
* i - IO delay
* m - Total memory delay
* q - IRQ delay
* s/r/t/p/w - Memory subcategories (in verbose mode)

2) Added memory display modes
- Compact view (default): shows aggregated memory delays.
- Verbose view ('M' key): breaks down into memory sub-delays.
* SWAP - swapin delays
* RCL - freepages reclaim delays
* THR - thrashing delays
* CMP - compaction delays
* WP - write-protect copy delays

Practical benefits
==================

1) Dynamic Sorting for Real-Time Bottleneck Detection System
administrators can now dynamically change sorting to identify different
types of resource bottlenecks without restarting.


2) Enhanced Usability with On-Screen Keybindings More intuitive
interactive usage with on-screen keybindings help. Reduced screen
clutter when only memory overview is needed.

Use Case
========
# ./delaytop
System Pressure Information: (avg10/avg60vg300/total)
CPU some: 0.0%/ 0.0%/ 0.0%/ 106817(ms)
CPU full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
Memory full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
Memory some: 0.0%/ 0.0%/ 0.0%/ 0(ms)
IO full: 0.0%/ 0.0%/ 0.0%/ 2245(ms)
IO some: 0.0%/ 0.0%/ 0.0%/ 2791(ms)
IRQ full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
[o]sort [M]memverbose [q]quit
Top 20 processes (sorted by cpu delay):
PID TGID COMMAND CPU(ms) IO(ms) IRQ(ms) MEM(ms)
------------------------------------------------------------------------
110 110 kworker/15:0H-s 27.91 0.00 0.00 0.00
57 57 cpuhp/7 3.18 0.00 0.00 0.00
99 99 cpuhp/14 2.97 0.00 0.00 0.00
51 51 cpuhp/6 0.90 0.00 0.00 0.00
44 44 kworker/4:0H-sy 0.80 0.00 0.00 0.00
76 76 idle_inject/10 0.31 0.00 0.00 0.00
100 100 idle_inject/14 0.30 0.00 0.00 0.00
1309 1309 systemsettings 0.29 0.00 0.00 0.00
60 60 ksoftirqd/7 0.28 0.00 0.00 0.00
45 45 cpuhp/5 0.22 0.00 0.00 0.00
63 63 cpuhp/8 0.20 0.00 0.00 0.00
87 87 cpuhp/12 0.18 0.00 0.00 0.00
93 93 cpuhp/13 0.17 0.00 0.00 0.00
1265 1265 acpid 0.17 0.00 0.00 0.00
1552 1552 sshd 0.17 0.00 0.00 0.00
2584 2584 sddm-helper 0.16 0.00 0.00 0.00
1284 1284 rtkit-daemon 0.15 0.00 0.00 0.00
1326 1326 nde-netfilter 0.14 0.00 0.00 0.00
27 27 cpuhp/2 0.13 0.00 0.00 0.00
631 631 kworker/11:2-rc 0.11 0.00 0.00 0.00

# ./delaytop -M
System Pressure Information: (avg10/avg60vg300/total)
CPU some: 0.0%/ 0.0%/ 0.0%/ 106827(ms)
CPU full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
Memory full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
Memory some: 0.0%/ 0.0%/ 0.0%/ 0(ms)
IO full: 0.0%/ 0.0%/ 0.0%/ 2245(ms)
IO some: 0.0%/ 0.0%/ 0.0%/ 2791(ms)
IRQ full: 0.0%/ 0.0%/ 0.0%/ 0(ms)
[o]sort [M]memverbose [q]quit
Top 20 processes (sorted by mem delay):
PID TGID COMMAND MEM(ms) SWAP(ms) RCL(ms) THR(ms) CMP(ms) WP(ms)
------------------------------------------------------------------------------------------
121732 121732 delaytop 0.01 0.00 0.00 0.00 0.00 0.01
95876 95876 top 0.00 0.00 0.00 0.00 0.00 0.00
121641 121641 systemd-userwor 0.00 0.00 0.00 0.00 0.00 0.00
121693 121693 systemd-userwor 0.00 0.00 0.00 0.00 0.00 0.00
121661 121661 systemd-userwor 0.00 0.00 0.00 0.00 0.00 0.00
1 1 systemd 0.00 0.00 0.00 0.00 0.00 0.00
2 2 kthreadd 0.00 0.00 0.00 0.00 0.00 0.00
3 3 pool_workqueue_ 0.00 0.00 0.00 0.00 0.00 0.00
4 4 kworker/R-rcu_g 0.00 0.00 0.00 0.00 0.00 0.00
5 5 kworker/R-rcu_p 0.00 0.00 0.00 0.00 0.00 0.00
6 6 kworker/R-slub_ 0.00 0.00 0.00 0.00 0.00 0.00
7 7 kworker/R-netns 0.00 0.00 0.00 0.00 0.00 0.00
9 9 kworker/0:0H-sy 0.00 0.00 0.00 0.00 0.00 0.00
11 11 kworker/u32:0-n 0.00 0.00 0.00 0.00 0.00 0.00
12 12 kworker/R-mm_pe 0.00 0.00 0.00 0.00 0.00 0.00
13 13 rcu_tasks_kthre 0.00 0.00 0.00 0.00 0.00 0.00
14 14 rcu_tasks_rude_ 0.00 0.00 0.00 0.00 0.00 0.00
15 15 rcu_tasks_trace 0.00 0.00 0.00 0.00 0.00 0.00
16 16 ksoftirqd/0 0.00 0.00 0.00 0.00 0.00 0.00
17 17 rcu_preempt 0.00 0.00 0.00 0.00 0.00 0.00

When psi is not enabled:
# ./delaytop
System Pressure Information: (avg10/avg60vg300/total)
PSI not found: check if psi=1 enabled in cmdline


This patch (of 5):

The delaytop tool only supported sorting by CPU delay, which limited its
usefulness when users needed to identify bottlenecks in other subsystems.
Users had no way to sort processes by IO, IRQ, or other delay types to
quickly pinpoint specific performance issues.

Add -s/--sort option to allow sorting by different delay types. Users can
now quickly identify bottlenecks in specific subsystems by sorting
processes by the relevant delay metric.

Link: https://lkml.kernel.org/r/20250907001101305vrTGnXaRNvtmsGkp-Ljk_@zte.com.cn
Link: https://lkml.kernel.org/r/20250907001205573L3XpsQMIQnLgDqiiKYd3H@zte.com.cn
Signed-off-by: Fan Yu <fan.yu9@zte.com.cn>
Reviewed-by: xu xin <xu.xin16@zte.com.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Wang Yaxin <wang.yaxin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Fan Yu and committed by
Andrew Morton
0471440c 7b1e502e

+121 -32
+121 -32
tools/accounting/delaytop.c
··· 42 42 #include <linux/genetlink.h> 43 43 #include <linux/taskstats.h> 44 44 #include <linux/cgroupstats.h> 45 + #include <stddef.h> 45 46 46 47 #define PSI_CPU_SOME "/proc/pressure/cpu" 47 48 #define PSI_CPU_FULL "/proc/pressure/cpu" ··· 62 61 #define TASK_COMM_LEN 16 63 62 #define MAX_MSG_SIZE 1024 64 63 #define MAX_TASKS 1000 64 + #define MAX_BUF_LEN 256 65 65 #define SET_TASK_STAT(task_count, field) tasks[task_count].field = stats.field 66 66 #define BOOL_FPRINT(stream, fmt, ...) \ 67 67 ({ \ ··· 70 68 ret >= 0; \ 71 69 }) 72 70 #define PSI_LINE_FORMAT "%-12s %6.1f%%/%6.1f%%/%6.1f%%/%8llu(ms)\n" 73 - 74 - /* Program settings structure */ 75 - struct config { 76 - int delay; /* Update interval in seconds */ 77 - int iterations; /* Number of iterations, 0 == infinite */ 78 - int max_processes; /* Maximum number of processes to show */ 79 - char sort_field; /* Field to sort by */ 80 - int output_one_time; /* Output once and exit */ 81 - int monitor_pid; /* Monitor specific PID */ 82 - char *container_path; /* Path to container cgroup */ 83 - }; 71 + #define SORT_FIELD(name) \ 72 + {#name, \ 73 + offsetof(struct task_info, name##_delay_total), \ 74 + offsetof(struct task_info, name##_count)} 75 + #define END_FIELD {NULL, 0, 0} 84 76 85 77 /* PSI statistics structure */ 86 78 struct psi_stats { ··· 126 130 int nr_io_wait; /* Number of processes in IO wait */ 127 131 }; 128 132 133 + /* Delay field structure */ 134 + struct field_desc { 135 + const char *name; /* Field name for cmdline argument */ 136 + unsigned long total_offset; /* Offset of total delay in task_info */ 137 + unsigned long count_offset; /* Offset of count in task_info */ 138 + }; 139 + 140 + /* Program settings structure */ 141 + struct config { 142 + int delay; /* Update interval in seconds */ 143 + int iterations; /* Number of iterations, 0 == infinite */ 144 + int max_processes; /* Maximum number of processes to show */ 145 + int output_one_time; /* Output once and exit */ 146 + int monitor_pid; /* Monitor specific PID */ 147 + char *container_path; /* Path to container cgroup */ 148 + const struct field_desc *sort_field; /* Current sort field */ 149 + }; 150 + 129 151 /* Global variables */ 130 152 static struct config cfg; 131 153 static struct psi_stats psi; ··· 151 137 static int task_count; 152 138 static int running = 1; 153 139 static struct container_stats container_stats; 140 + static const struct field_desc sort_fields[] = { 141 + SORT_FIELD(cpu), 142 + SORT_FIELD(blkio), 143 + SORT_FIELD(irq), 144 + SORT_FIELD(swapin), 145 + SORT_FIELD(freepages), 146 + SORT_FIELD(thrashing), 147 + SORT_FIELD(compact), 148 + SORT_FIELD(wpcopy), 149 + END_FIELD 150 + }; 154 151 155 152 /* Netlink socket variables */ 156 153 static int nl_sd = -1; ··· 183 158 tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig_termios); 184 159 } 185 160 161 + /* Find field descriptor by name with string comparison */ 162 + static const struct field_desc *get_field_by_name(const char *name) 163 + { 164 + const struct field_desc *field; 165 + size_t field_len; 166 + 167 + for (field = sort_fields; field->name != NULL; field++) { 168 + field_len = strlen(field->name); 169 + if (field_len != strlen(name)) 170 + continue; 171 + if (strncmp(field->name, name, field_len) == 0) 172 + return field; 173 + } 174 + 175 + return NULL; 176 + } 177 + 178 + /* Find display name for a field descriptor */ 179 + static const char *get_name_by_field(const struct field_desc *field) 180 + { 181 + return field ? field->name : "UNKNOWN"; 182 + } 183 + 184 + /* Generate string of available field names */ 185 + static void display_available_fields(void) 186 + { 187 + const struct field_desc *field; 188 + char buf[MAX_BUF_LEN]; 189 + 190 + buf[0] = '\0'; 191 + 192 + for (field = sort_fields; field->name != NULL; field++) { 193 + strncat(buf, "|", MAX_BUF_LEN - strlen(buf) - 1); 194 + strncat(buf, field->name, MAX_BUF_LEN - strlen(buf) - 1); 195 + buf[MAX_BUF_LEN - 1] = '\0'; 196 + } 197 + 198 + fprintf(stderr, "Available fields: %s\n", buf); 199 + } 200 + 186 201 /* Display usage information and command line options */ 187 202 static void usage(void) 188 203 { 189 204 printf("Usage: delaytop [Options]\n" 190 205 "Options:\n" 191 - " -h, --help Show this help message and exit\n" 192 - " -d, --delay=SECONDS Set refresh interval (default: 2 seconds, min: 1)\n" 193 - " -n, --iterations=COUNT Set number of updates (default: 0 = infinite)\n" 194 - " -P, --processes=NUMBER Set maximum number of processes to show (default: 20, max: 1000)\n" 195 - " -o, --once Display once and exit\n" 196 - " -p, --pid=PID Monitor only the specified PID\n" 197 - " -C, --container=PATH Monitor the container at specified cgroup path\n"); 206 + " -h, --help Show this help message and exit\n" 207 + " -d, --delay=SECONDS Set refresh interval (default: 2 seconds, min: 1)\n" 208 + " -n, --iterations=COUNT Set number of updates (default: 0 = infinite)\n" 209 + " -P, --processes=NUMBER Set maximum number of processes to show (default: 20, max: 1000)\n" 210 + " -o, --once Display once and exit\n" 211 + " -p, --pid=PID Monitor only the specified PID\n" 212 + " -C, --container=PATH Monitor the container at specified cgroup path\n" 213 + " -s, --sort=FIELD Sort by delay field (default: cpu)\n"); 198 214 exit(0); 199 215 } 200 216 ··· 243 177 static void parse_args(int argc, char **argv) 244 178 { 245 179 int c; 180 + const struct field_desc *field; 246 181 struct option long_options[] = { 247 182 {"help", no_argument, 0, 'h'}, 248 183 {"delay", required_argument, 0, 'd'}, ··· 251 184 {"pid", required_argument, 0, 'p'}, 252 185 {"once", no_argument, 0, 'o'}, 253 186 {"processes", required_argument, 0, 'P'}, 187 + {"sort", required_argument, 0, 's'}, 254 188 {"container", required_argument, 0, 'C'}, 255 189 {0, 0, 0, 0} 256 190 }; ··· 260 192 cfg.delay = 2; 261 193 cfg.iterations = 0; 262 194 cfg.max_processes = 20; 263 - cfg.sort_field = 'c'; /* Default sort by CPU delay */ 195 + cfg.sort_field = &sort_fields[0]; /* Default sorted by CPU delay */ 264 196 cfg.output_one_time = 0; 265 197 cfg.monitor_pid = 0; /* 0 means monitor all PIDs */ 266 198 cfg.container_path = NULL; ··· 268 200 while (1) { 269 201 int option_index = 0; 270 202 271 - c = getopt_long(argc, argv, "hd:n:p:oP:C:", long_options, &option_index); 203 + c = getopt_long(argc, argv, "hd:n:p:oP:C:s:", long_options, &option_index); 272 204 if (c == -1) 273 205 break; 274 206 ··· 314 246 break; 315 247 case 'C': 316 248 cfg.container_path = strdup(optarg); 249 + break; 250 + case 's': 251 + if (strlen(optarg) == 0) { 252 + fprintf(stderr, "Error: empty sort field\n"); 253 + exit(1); 254 + } 255 + 256 + field = get_field_by_name(optarg); 257 + /* Show available fields if invalid option provided */ 258 + if (!field) { 259 + fprintf(stderr, "Error: invalid sort field '%s'\n", optarg); 260 + display_available_fields(); 261 + exit(1); 262 + } 263 + 264 + cfg.sort_field = field; 317 265 break; 318 266 default: 319 267 fprintf(stderr, "Try 'delaytop --help' for more information.\n"); ··· 671 587 { 672 588 const struct task_info *t1 = (const struct task_info *)a; 673 589 const struct task_info *t2 = (const struct task_info *)b; 590 + unsigned long long total1; 591 + unsigned long long total2; 592 + unsigned long count1; 593 + unsigned long count2; 674 594 double avg1, avg2; 675 595 676 - switch (cfg.sort_field) { 677 - case 'c': /* CPU */ 678 - avg1 = average_ms(t1->cpu_delay_total, t1->cpu_count); 679 - avg2 = average_ms(t2->cpu_delay_total, t2->cpu_count); 680 - if (avg1 != avg2) 681 - return avg2 > avg1 ? 1 : -1; 682 - return t2->cpu_delay_total > t1->cpu_delay_total ? 1 : -1; 596 + total1 = *(unsigned long long *)((char *)t1 + cfg.sort_field->total_offset); 597 + total2 = *(unsigned long long *)((char *)t2 + cfg.sort_field->total_offset); 598 + count1 = *(unsigned long *)((char *)t1 + cfg.sort_field->count_offset); 599 + count2 = *(unsigned long *)((char *)t2 + cfg.sort_field->count_offset); 683 600 684 - default: 685 - return t2->cpu_delay_total > t1->cpu_delay_total ? 1 : -1; 686 - } 601 + avg1 = average_ms(total1, count1); 602 + avg2 = average_ms(total2, count2); 603 + if (avg1 != avg2) 604 + return avg2 > avg1 ? 1 : -1; 605 + 606 + return 0; 687 607 } 688 608 689 609 /* Sort tasks by selected field */ ··· 826 738 container_stats.nr_stopped, container_stats.nr_uninterruptible, 827 739 container_stats.nr_io_wait); 828 740 } 829 - suc &= BOOL_FPRINT(out, "Top %d processes (sorted by CPU delay):\n", 830 - cfg.max_processes); 741 + /* Task delay output */ 742 + suc &= BOOL_FPRINT(out, "Top %d processes (sorted by %s delay):\n", 743 + cfg.max_processes, get_name_by_field(cfg.sort_field)); 831 744 suc &= BOOL_FPRINT(out, "%5s %5s %-17s", "PID", "TGID", "COMMAND"); 832 745 suc &= BOOL_FPRINT(out, "%7s %7s %7s %7s %7s %7s %7s %7s\n", 833 746 "CPU(ms)", "IO(ms)", "SWAP(ms)", "RCL(ms)",