Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers

Patch series "implement "memmap on memory" feature on s390".

This series provides "memmap on memory" support on s390 platform. "memmap
on memory" allows struct pages array to be allocated from the hotplugged
memory range instead of allocating it from main system memory.

s390 currently preallocates struct pages array for all potentially
possible memory, which ensures memory onlining always succeeds, but with
the cost of significant memory consumption from the available system
memory during boottime. In certain extreme configuration, this could lead
to ipl failure.

"memmap on memory" ensures struct pages array are populated from self
contained hotplugged memory range instead of depleting the available
system memory and this could eliminate ipl failure on s390 platform.

On other platforms, system might go OOM when the physically hotplugged
memory depletes the available memory before it is onlined. Hence, "memmap
on memory" feature was introduced as described in commit a08a2ae34613
("mm,memory_hotplug: allocate memmap from the added memory range").

Unlike other architectures, s390 memory blocks are not physically
accessible until it is online. To make it physically accessible two new
memory notifiers MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE are added and
this notifier lets the hypervisor inform that the memory should be made
physically accessible. This allows for "memmap on memory" initialization
during memory hotplug onlining phase, which is performed before calling
MEM_GOING_ONLINE notifier.

Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers
to prepare the transition of memory to and from a physically accessible
state. New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure
altmap cannot be written when adding memory - before it is set online.
This enhancement is crucial for implementing the "memmap on memory"
feature for s390 in a subsequent patch.

Patches 2 allocates vmemmap pages from self-contained memory range for
s390. It allocates memory map (struct pages array) from the hotplugged
memory range, rather than using system memory by passing altmap to vmemmap
functions.

Patch 3 removes unhandled memory notifier types on s390.

Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers
on s390. MEM_PREPARE_ONLINE memory notifier makes memory block physical
accessible via sclp assign command. The notifier ensures self-contained
memory maps are accessible and hence enabling the "memmap on memory" on
s390. MEM_FINISH_OFFLINE memory notifier shifts the memory block to an
inaccessible state via sclp unassign command.

Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390.


This patch (of 5):

Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to
prepare the transition of memory to and from a physically accessible
state. This enhancement is crucial for implementing the "memmap on
memory" feature for s390 in a subsequent patch.

Platforms such as x86 can support physical memory hotplug via ACPI. When
there is physical memory hotplug, ACPI event leads to the memory addition
with the following callchain:

acpi_memory_device_add()
-> acpi_memory_enable_device()
-> __add_memory()

After this, the hotplugged memory is physically accessible, and altmap
support prepared, before the "memmap on memory" initialization in
memory_block_online() is called.

On s390, memory hotplug works in a different way. The available hotplug
memory has to be defined upfront in the hypervisor, but it is made
physically accessible only when the user sets it online via sysfs,
currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap
on memory" initialization is performed before calling MEM_GOING_ONLINE
notifier.

During the memory hotplug addition phase, altmap support is prepared and
during the memory onlining phase s390 requires memory to be physically
accessible and then subsequently initiate the "memmap on memory"
initialization process.

The memory provider will handle new MEM_PREPARE_ONLINE /
MEM_FINISH_OFFLINE notifications and make the memory accessible.

The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when
used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written
(e.g., poisoned) when adding memory -- before it is set online. This
allows for adding memory with an altmap that is not currently made
available by a hypervisor. When onlining that memory, the hypervisor can
be instructed to make that memory accessible via the new notifiers and the
onlining phase will not require any memory allocations, which is helpful
in low-memory situations.

All architectures ignore unknown memory notifiers. Therefore, the
introduction of these new notifiers does not result in any functional
modifications across architectures.

Link: https://lkml.kernel.org/r/20240108132747.3238763-1-sumanthk@linux.ibm.com
Link: https://lkml.kernel.org/r/20240108132747.3238763-2-sumanthk@linux.ibm.com
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Sumanth Korikkar and committed by
Andrew Morton
c5f1e2d1 e755c43e

+65 -6
+22 -1
drivers/base/memory.c
··· 188 188 unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); 189 189 unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; 190 190 unsigned long nr_vmemmap_pages = 0; 191 + struct memory_notify arg; 191 192 struct zone *zone; 192 193 int ret; 193 194 ··· 208 207 if (mem->altmap) 209 208 nr_vmemmap_pages = mem->altmap->free; 210 209 210 + arg.altmap_start_pfn = start_pfn; 211 + arg.altmap_nr_pages = nr_vmemmap_pages; 212 + arg.start_pfn = start_pfn + nr_vmemmap_pages; 213 + arg.nr_pages = nr_pages - nr_vmemmap_pages; 211 214 mem_hotplug_begin(); 215 + ret = memory_notify(MEM_PREPARE_ONLINE, &arg); 216 + ret = notifier_to_errno(ret); 217 + if (ret) 218 + goto out_notifier; 219 + 212 220 if (nr_vmemmap_pages) { 213 - ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); 221 + ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, 222 + zone, mem->altmap->inaccessible); 214 223 if (ret) 215 224 goto out; 216 225 } ··· 242 231 nr_vmemmap_pages); 243 232 244 233 mem->zone = zone; 234 + mem_hotplug_done(); 235 + return ret; 245 236 out: 237 + memory_notify(MEM_FINISH_OFFLINE, &arg); 238 + out_notifier: 246 239 mem_hotplug_done(); 247 240 return ret; 248 241 } ··· 259 244 unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); 260 245 unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; 261 246 unsigned long nr_vmemmap_pages = 0; 247 + struct memory_notify arg; 262 248 int ret; 263 249 264 250 if (!mem->zone) ··· 291 275 mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); 292 276 293 277 mem->zone = NULL; 278 + arg.altmap_start_pfn = start_pfn; 279 + arg.altmap_nr_pages = nr_vmemmap_pages; 280 + arg.start_pfn = start_pfn + nr_vmemmap_pages; 281 + arg.nr_pages = nr_pages - nr_vmemmap_pages; 282 + memory_notify(MEM_FINISH_OFFLINE, &arg); 294 283 out: 295 284 mem_hotplug_done(); 296 285 return ret;
+9
include/linux/memory.h
··· 96 96 #define MEM_GOING_ONLINE (1<<3) 97 97 #define MEM_CANCEL_ONLINE (1<<4) 98 98 #define MEM_CANCEL_OFFLINE (1<<5) 99 + #define MEM_PREPARE_ONLINE (1<<6) 100 + #define MEM_FINISH_OFFLINE (1<<7) 99 101 100 102 struct memory_notify { 103 + /* 104 + * The altmap_start_pfn and altmap_nr_pages fields are designated for 105 + * specifying the altmap range and are exclusively intended for use in 106 + * MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers. 107 + */ 108 + unsigned long altmap_start_pfn; 109 + unsigned long altmap_nr_pages; 101 110 unsigned long start_pfn; 102 111 unsigned long nr_pages; 103 112 int status_change_nid_normal;
+17 -1
include/linux/memory_hotplug.h
··· 106 106 * implies the node id (nid). 107 107 */ 108 108 #define MHP_NID_IS_MGID ((__force mhp_t)BIT(2)) 109 + /* 110 + * The hotplugged memory is completely inaccessible while the memory is 111 + * offline. The memory provider will handle MEM_PREPARE_ONLINE / 112 + * MEM_FINISH_OFFLINE notifications and make the memory accessible. 113 + * 114 + * This flag is only relevant when used along with MHP_MEMMAP_ON_MEMORY, 115 + * because the altmap cannot be written (e.g., poisoned) when adding 116 + * memory -- before it is set online. 117 + * 118 + * This allows for adding memory with an altmap that is not currently 119 + * made available by a hypervisor. When onlining that memory, the 120 + * hypervisor can be instructed to make that memory available, and 121 + * the onlining phase will not require any memory allocations, which is 122 + * helpful in low-memory situations. 123 + */ 124 + #define MHP_OFFLINE_INACCESSIBLE ((__force mhp_t)BIT(3)) 109 125 110 126 /* 111 127 * Extended parameters for memory hotplug: ··· 170 154 long nr_pages); 171 155 /* VM interface that may be used by firmware interface */ 172 156 extern int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, 173 - struct zone *zone); 157 + struct zone *zone, bool mhp_off_inaccessible); 174 158 extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages); 175 159 extern int online_pages(unsigned long pfn, unsigned long nr_pages, 176 160 struct zone *zone, struct memory_group *group);
+1
include/linux/memremap.h
··· 25 25 unsigned long free; 26 26 unsigned long align; 27 27 unsigned long alloc; 28 + bool inaccessible; 28 29 }; 29 30 30 31 /*
+14 -3
mm/memory_hotplug.c
··· 1087 1087 } 1088 1088 1089 1089 int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, 1090 - struct zone *zone) 1090 + struct zone *zone, bool mhp_off_inaccessible) 1091 1091 { 1092 1092 unsigned long end_pfn = pfn + nr_pages; 1093 1093 int ret, i; ··· 1095 1095 ret = kasan_add_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); 1096 1096 if (ret) 1097 1097 return ret; 1098 + 1099 + /* 1100 + * Memory block is accessible at this stage and hence poison the struct 1101 + * pages now. If the memory block is accessible during memory hotplug 1102 + * addition phase, then page poisining is already performed in 1103 + * sparse_add_section(). 1104 + */ 1105 + if (mhp_off_inaccessible) 1106 + page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages); 1098 1107 1099 1108 move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); 1100 1109 ··· 1424 1415 } 1425 1416 1426 1417 static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group, 1427 - u64 start, u64 size) 1418 + u64 start, u64 size, mhp_t mhp_flags) 1428 1419 { 1429 1420 unsigned long memblock_size = memory_block_size_bytes(); 1430 1421 u64 cur_start; ··· 1440 1431 }; 1441 1432 1442 1433 mhp_altmap.free = memory_block_memmap_on_memory_pages(); 1434 + if (mhp_flags & MHP_OFFLINE_INACCESSIBLE) 1435 + mhp_altmap.inaccessible = true; 1443 1436 params.altmap = kmemdup(&mhp_altmap, sizeof(struct vmem_altmap), 1444 1437 GFP_KERNEL); 1445 1438 if (!params.altmap) { ··· 1527 1516 */ 1528 1517 if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) && 1529 1518 mhp_supports_memmap_on_memory(memory_block_size_bytes())) { 1530 - ret = create_altmaps_and_memory_blocks(nid, group, start, size); 1519 + ret = create_altmaps_and_memory_blocks(nid, group, start, size, mhp_flags); 1531 1520 if (ret) 1532 1521 goto error; 1533 1522 } else {
+2 -1
mm/sparse.c
··· 908 908 * Poison uninitialized struct pages in order to catch invalid flags 909 909 * combinations. 910 910 */ 911 - page_init_poison(memmap, sizeof(struct page) * nr_pages); 911 + if (!altmap || !altmap->inaccessible) 912 + page_init_poison(memmap, sizeof(struct page) * nr_pages); 912 913 913 914 ms = __nr_to_section(section_nr); 914 915 set_section_nid(section_nr, nid);