Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

arm64/mm: Modify range-based tlbi to decrement scale

In preparation for adding support for LPA2 to the tlb invalidation
routines, modify the algorithm used by range-based tlbi to start at the
highest 'scale' and decrement instead of starting at the lowest 'scale'
and incrementing. This new approach makes it possible to maintain 64K
alignment as we work through the range, until the last op (at scale=0).
This is required when LPA2 is enabled. (This part will be added in a
subsequent commit).

This change is separated into its own patch because it will also impact
non-LPA2 systems, and I want to make it easy to bisect in case it leads
to performance regression (see below for benchmarks that suggest this
should not be a problem).

The original commit (d1d3aa98 "arm64: tlb: Use the TLBI RANGE feature in
arm64") stated this as the reason for _incrementing_ scale:

However, in most scenarios, the pages = 1 when flush_tlb_range() is
called. Start from scale = 3 or other proper value (such as scale
=ilog2(pages)), will incur extra overhead. So increase 'scale' from 0
to maximum.

But pages=1 is already special cased by the non-range invalidation path,
which will take care of it the first time through the loop (both in the
original commit and in my change), so I don't think switching to
decrement scale should have any extra performance impact after all.

Indeed benchmarking kernel compilation, a TLBI-heavy workload, suggests
that this new approach actually _improves_ performance slightly (using a
virtual machine on Apple M2):

Table shows time to execute kernel compilation workload with 8 jobs,
relative to baseline without this patch (more negative number is
bigger speedup). Repeated 9 times across 3 system reboots:

| counter | mean | stdev |
|:----------|-----------:|----------:|
| real-time | -0.6% | 0.0% |
| kern-time | -1.6% | 0.5% |
| user-time | -0.4% | 0.1% |

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231127111737.1897081-2-ryan.roberts@arm.com

authored by

Ryan Roberts and committed by
Marc Zyngier
e2768b79 2cc14f52

+10 -10
+10 -10
arch/arm64/include/asm/tlbflush.h
··· 350 350 * entries one by one at the granularity of 'stride'. If the TLB 351 351 * range ops are supported, then: 352 352 * 353 - * 1. If 'pages' is odd, flush the first page through non-range 354 - * operations; 353 + * 1. The minimum range granularity is decided by 'scale', so multiple range 354 + * TLBI operations may be required. Start from scale = 3, flush the largest 355 + * possible number of pages ((num+1)*2^(5*scale+1)) that fit into the 356 + * requested range, then decrement scale and continue until one or zero pages 357 + * are left. 355 358 * 356 - * 2. For remaining pages: the minimum range granularity is decided 357 - * by 'scale', so multiple range TLBI operations may be required. 358 - * Start from scale = 0, flush the corresponding number of pages 359 - * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it 360 - * until no pages left. 359 + * 2. If there is 1 page remaining, flush it through non-range operations. Range 360 + * operations can only span an even number of pages. 361 361 * 362 362 * Note that certain ranges can be represented by either num = 31 and 363 363 * scale or num = 0 and scale + 1. The loop below favours the latter ··· 367 367 asid, tlb_level, tlbi_user) \ 368 368 do { \ 369 369 int num = 0; \ 370 - int scale = 0; \ 370 + int scale = 3; \ 371 371 unsigned long addr; \ 372 372 \ 373 373 while (pages > 0) { \ 374 374 if (!system_supports_tlb_range() || \ 375 - pages % 2 == 1) { \ 375 + pages == 1) { \ 376 376 addr = __TLBI_VADDR(start, asid); \ 377 377 __tlbi_level(op, addr, tlb_level); \ 378 378 if (tlbi_user) \ ··· 392 392 start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \ 393 393 pages -= __TLBI_RANGE_PAGES(num, scale); \ 394 394 } \ 395 - scale++; \ 395 + scale--; \ 396 396 } \ 397 397 } while (0) 398 398