crypto: xor - fix template benchmarking

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Commit c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
switched from using jiffies to ktime-based performance benchmarking.

This works nicely on machines which have a fine-grained ktime()
clocksource as e.g. x86 machines with TSC.
But other machines, e.g. my 4-way HP PARISC server, don't have such
fine-grained clocksources, which is why it seems that 800 xor loops
take zero seconds, which then shows up in the logs as:

xor: measuring software checksum speed
8regs : -1018167296 MB/sec
8regs_prefetch : -1018167296 MB/sec
32regs : -1018167296 MB/sec
32regs_prefetch : -1018167296 MB/sec

Fix this with some small modifications to the existing code to improve
the algorithm to always produce correct results without introducing
major delays for architectures with a fine-grained ktime()
clocksource:
a) Delay start of the timing until ktime() just advanced. On machines
with a fast ktime() this should be just one additional ktime() call.
b) Count the number of loops. Run at minimum 800 loops and finish
earliest when the ktime() counter has progressed.

With that the throughput can now be calculated more accurately under all
conditions.

Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: John David Anglin <dave.anglin@bell.net>

v2:
- clean up coding style (noticed & suggested by Herbert Xu)
- rephrased & fixed typo in commit message

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

authored by

Helge Deller and committed by

Herbert Xu 2 years ago ab9a244c 8400291e

+14 -17

1 changed file

expand all

crypto

xor.c

+14 -17

crypto/xor.c

··· 83 83 do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2) 84 84 { 85 85 int speed; 86 - int i, j; 87 - ktime_t min, start, diff; 86 + unsigned long reps; 87 + ktime_t min, start, t0; 88 88 89 89 tmpl->next = template_list; 90 90 template_list = tmpl; 91 91 92 92 preempt_disable(); 93 93 94 - min = (ktime_t)S64_MAX; 95 - for (i = 0; i < 3; i++) { 96 - start = ktime_get(); 97 - for (j = 0; j < REPS; j++) { 98 - mb(); /* prevent loop optimization */ 99 - tmpl->do_2(BENCH_SIZE, b1, b2); 100 - mb(); 101 - } 102 - diff = ktime_sub(ktime_get(), start); 103 - if (diff < min) 104 - min = diff; 105 - } 94 + reps = 0; 95 + t0 = ktime_get(); 96 + /* delay start until time has advanced */ 97 + while ((start = ktime_get()) == t0) 98 + cpu_relax(); 99 + do { 100 + mb(); /* prevent loop optimization */ 101 + tmpl->do_2(BENCH_SIZE, b1, b2); 102 + mb(); 103 + } while (reps++ < REPS || (t0 = ktime_get()) == start); 104 + min = ktime_sub(t0, start); 106 105 107 106 preempt_enable(); 108 107 109 108 // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s] 110 - if (!min) 111 - min = 1; 112 - speed = (1000 * REPS * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); 109 + speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); 113 110 tmpl->speed = speed; 114 111 115 112 pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed);

Configure Feed

Configure Feed