Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

perf mem: Document new output fields (op, cache, mem, dtlb, snoop)

Update the documentation of the new fields with examples and caveats.

Also update the related documentation for AMD IBS.

Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250610005742.2173050-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

authored by

Namhyung Kim and committed by
Arnaldo Carvalho de Melo
bbeb1088 11cfaf37

+91 -16
+41 -16
tools/perf/Documentation/perf-amd-ibs.txt
··· 171 171 # perf mem report 172 172 173 173 A normal perf mem report output will provide detailed memory access profile. 174 - However, it can also be aggregated based on output fields. For example: 174 + New output fields will show related access info together. For example: 175 175 176 - # perf mem report -F mem,sample,snoop 177 - Samples: 3M of event 'ibs_op//', Event count (approx.): 23524876 178 - Memory access Samples Snoop 179 - N/A 1903343 N/A 180 - L1 hit 1056754 N/A 181 - L2 hit 75231 N/A 182 - L3 hit 9496 HitM 183 - L3 hit 2270 N/A 184 - RAM hit 8710 N/A 185 - Remote node, same socket RAM hit 3241 N/A 186 - Remote core, same node Any cache hit 1572 HitM 187 - Remote core, same node Any cache hit 514 N/A 188 - Remote node, same socket Any cache hit 1216 HitM 189 - Remote node, same socket Any cache hit 350 N/A 190 - Uncached hit 18 N/A 176 + # perf mem report -F overhead,cache,snoop,comm 177 + ... 178 + # Samples: 92K of event 'ibs_op//' 179 + # Total weight : 531104 180 + # 181 + # ---------- Cache ----------- --- Snoop ---- 182 + # Overhead L1 L2 L1-buf Other HitM Other Command 183 + # ........ ............................ .............. .......... 184 + # 185 + 76.07% 5.8% 35.7% 0.0% 34.6% 23.3% 52.8% cc1 186 + 5.79% 0.2% 0.0% 0.0% 5.6% 0.1% 5.7% make 187 + 5.78% 0.1% 4.4% 0.0% 1.2% 0.5% 5.3% gcc 188 + 5.33% 0.3% 3.9% 0.0% 1.1% 0.2% 5.2% as 189 + 5.00% 0.1% 3.8% 0.0% 1.0% 0.3% 4.7% sh 190 + 1.56% 0.1% 0.1% 0.0% 1.4% 0.6% 0.9% ld 191 + 0.28% 0.1% 0.0% 0.0% 0.2% 0.1% 0.2% pkg-config 192 + 0.09% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% git 193 + 0.03% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% rm 194 + ... 195 + 196 + Also, it can be aggregated based on various memory access info using the 197 + sort keys. For example: 198 + 199 + # perf mem report -s mem,snoop 200 + ... 201 + # Samples: 92K of event 'ibs_op//' 202 + # Total weight : 531104 203 + # Sort order : mem,snoop 204 + # 205 + # Overhead Samples Memory access Snoop 206 + # ........ ............ ....................................... ............ 207 + # 208 + 47.99% 1509 L2 hit N/A 209 + 25.08% 338 core, same node Any cache hit HitM 210 + 10.24% 54374 N/A N/A 211 + 6.77% 35938 L1 hit N/A 212 + 6.39% 101 core, same node Any cache hit N/A 213 + 3.50% 69 RAM hit N/A 214 + 0.03% 158 LFB/MAB hit N/A 215 + 0.00% 2 Uncached hit N/A 191 216 192 217 Please refer to their man page for more detail. 193 218
+50
tools/perf/Documentation/perf-mem.txt
··· 119 119 And the default sort keys are changed to local_weight, mem, sym, dso, 120 120 symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat. 121 121 122 + -F:: 123 + --fields=:: 124 + Specify output field - multiple keys can be specified in CSV format. 125 + Please see linkperf:perf-report[1] for details. 126 + 127 + In addition to the default fields, 'perf mem report' will provide the 128 + following fields to break down sample periods. 129 + 130 + - op: operation in the sample instruction (load, store, prefetch, ...) 131 + - cache: location in CPU cache (L1, L2, ...) where the sample hit 132 + - mem: location in memory or other places the sample hit 133 + - dtlb: location in Data TLB (L1, L2) where the sample hit 134 + - snoop: snoop result for the sampled data access 135 + 136 + Please take a look at the OUTPUT FIELD SELECTION section for caveats. 137 + 122 138 -T:: 123 139 --type-profile:: 124 140 Show data-type profile result instead of code symbols. This requires ··· 171 155 $ perf mem report -F overhead,symbol 172 156 90% [k] memcpy 173 157 10% [.] strcmp 158 + 159 + OUTPUT FIELD SELECTION 160 + ---------------------- 161 + "perf mem report" adds a number of new output fields specific to data source 162 + information in the sample. Some of them have the same name with the existing 163 + sort keys ("mem" and "snoop"). So unlike other fields and sort keys, they'll 164 + behave differently when it's used by -F/--fields or -s/--sort. 165 + 166 + Using those two as output fields will aggregate samples altogether and show 167 + breakdown. 168 + 169 + $ perf mem report -F mem,snoop 170 + ... 171 + # ------ Memory ------- --- Snoop ---- 172 + # RAM Uncach Other HitM Other 173 + # ..................... .............. 174 + # 175 + 3.5% 0.0% 96.5% 25.1% 74.9% 176 + 177 + But using the same name for sort keys will aggregate samples for each type 178 + separately. 179 + 180 + $ perf mem report -s mem,snoop 181 + # Overhead Samples Memory access Snoop 182 + # ........ ............ ....................................... ............ 183 + # 184 + 47.99% 1509 L2 hit N/A 185 + 25.08% 338 core, same node Any cache hit HitM 186 + 10.24% 54374 N/A N/A 187 + 6.77% 35938 L1 hit N/A 188 + 6.39% 101 core, same node Any cache hit N/A 189 + 3.50% 69 RAM hit N/A 190 + 0.03% 158 LFB/MAB hit N/A 191 + 0.00% 2 Uncached hit N/A 174 192 175 193 SEE ALSO 176 194 --------