Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

swap: fix shmem swapping when more than 8 areas

Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.

Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there. Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.

Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().

This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE. It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.

Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check. Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.

Reported-and-Reviewed-and-Tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Hugh Dickins and committed by
Linus Torvalds
9b15b817 a2c2df86

+9 -11
+5 -3
include/linux/swapops.h
··· 9 9 * get good packing density in that tree, so the index should be dense in 10 10 * the low-order bits. 11 11 * 12 - * We arrange the `type' and `offset' fields so that `type' is at the five 12 + * We arrange the `type' and `offset' fields so that `type' is at the seven 13 13 * high-order bits of the swp_entry_t and `offset' is right-aligned in the 14 - * remaining bits. 14 + * remaining bits. Although `type' itself needs only five bits, we allow for 15 + * shmem/tmpfs to shift it all up a further two bits: see swp_to_radix_entry(). 15 16 * 16 17 * swp_entry_t's are *never* stored anywhere in their arch-dependent format. 17 18 */ 18 - #define SWP_TYPE_SHIFT(e) (sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT) 19 + #define SWP_TYPE_SHIFT(e) ((sizeof(e.val) * 8) - \ 20 + (MAX_SWAPFILES_SHIFT + RADIX_TREE_EXCEPTIONAL_SHIFT)) 19 21 #define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1) 20 22 21 23 /*
+4 -8
mm/swapfile.c
··· 1916 1916 1917 1917 /* 1918 1918 * Find out how many pages are allowed for a single swap 1919 - * device. There are three limiting factors: 1) the number 1919 + * device. There are two limiting factors: 1) the number 1920 1920 * of bits for the swap offset in the swp_entry_t type, and 1921 1921 * 2) the number of bits in the swap pte as defined by the 1922 - * the different architectures, and 3) the number of free bits 1923 - * in an exceptional radix_tree entry. In order to find the 1922 + * different architectures. In order to find the 1924 1923 * largest possible bit mask, a swap entry with swap type 0 1925 1924 * and swap offset ~0UL is created, encoded to a swap pte, 1926 1925 * decoded to a swp_entry_t again, and finally the swap 1927 1926 * offset is extracted. This will mask all the bits from 1928 1927 * the initial ~0UL mask that can't be encoded in either 1929 1928 * the swp_entry_t or the architecture definition of a 1930 - * swap pte. Then the same is done for a radix_tree entry. 1929 + * swap pte. 1931 1930 */ 1932 1931 maxpages = swp_offset(pte_to_swp_entry( 1933 - swp_entry_to_pte(swp_entry(0, ~0UL)))); 1934 - maxpages = swp_offset(radix_to_swp_entry( 1935 - swp_to_radix_entry(swp_entry(0, maxpages)))) + 1; 1936 - 1932 + swp_entry_to_pte(swp_entry(0, ~0UL)))) + 1; 1937 1933 if (maxpages > swap_header->info.last_page) { 1938 1934 maxpages = swap_header->info.last_page + 1; 1939 1935 /* p->max is an unsigned int: don't overflow it */