Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:

- three fixes tagged for -stable including a crash fix, simple
performance tweak, and an invalid i/o error.

- build regression fix for the nvdimm unit tests

- nvdimm documentation update

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix __dax_pmd_fault crash
libnvdimm: documentation clarifications
libnvdimm, pmem: fix size trim in pmem_direct_access()
libnvdimm, e820: fix numa node for e820-type-12 pmem ranges
tools/testing/nvdimm, acpica: fix flag rename build breakage

+51 -35
+28 -21
Documentation/nvdimm/nvdimm.txt
··· 62 62 mmap persistent memory, from a PMEM block device, directly into a 63 63 process address space. 64 64 65 + DSM: Device Specific Method: ACPI method to to control specific 66 + device - in this case the firmware. 67 + 68 + DCR: NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5. 69 + It defines a vendor-id, device-id, and interface format for a given DIMM. 70 + 65 71 BTT: Block Translation Table: Persistent memory is byte addressable. 66 72 Existing software may have an expectation that the power-fail-atomicity 67 73 of writes is at least one sector, 512 bytes. The BTT is an indirection ··· 139 133 registered, can be immediately attached to nd_pmem. 140 134 141 135 2. BLK (nd_blk.ko): This driver performs I/O using a set of platform 142 - defined apertures. A set of apertures will all access just one DIMM. 143 - Multiple windows allow multiple concurrent accesses, much like 136 + defined apertures. A set of apertures will access just one DIMM. 137 + Multiple windows (apertures) allow multiple concurrent accesses, much like 144 138 tagged-command-queuing, and would likely be used by different threads or 145 139 different CPUs. 146 140 147 141 The NFIT specification defines a standard format for a BLK-aperture, but 148 142 the spec also allows for vendor specific layouts, and non-NFIT BLK 149 - implementations may other designs for BLK I/O. For this reason "nd_blk" 150 - calls back into platform-specific code to perform the I/O. One such 151 - implementation is defined in the "Driver Writer's Guide" and "DSM 143 + implementations may have other designs for BLK I/O. For this reason 144 + "nd_blk" calls back into platform-specific code to perform the I/O. 145 + One such implementation is defined in the "Driver Writer's Guide" and "DSM 152 146 Interface Example". 153 147 154 148 ··· 158 152 While PMEM provides direct byte-addressable CPU-load/store access to 159 153 NVDIMM storage, it does not provide the best system RAS (recovery, 160 154 availability, and serviceability) model. An access to a corrupted 161 - system-physical-address address causes a cpu exception while an access 155 + system-physical-address address causes a CPU exception while an access 162 156 to a corrupted address through an BLK-aperture causes that block window 163 157 to raise an error status in a register. The latter is more aligned with 164 158 the standard error model that host-bus-adapter attached disks present. ··· 168 162 several DIMMs. 169 163 170 164 PMEM vs BLK 171 - BLK-apertures solve this RAS problem, but their presence is also the 165 + BLK-apertures solve these RAS problems, but their presence is also the 172 166 major contributing factor to the complexity of the ND subsystem. They 173 167 complicate the implementation because PMEM and BLK alias in DPA space. 174 168 Any given DIMM's DPA-range may contribute to one or more ··· 226 220 by a region device with a dynamically assigned id (REGION0 - REGION5). 227 221 228 222 1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A 229 - single PMEM namespace is created in the REGION0-SPA-range that spans 230 - DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that 223 + single PMEM namespace is created in the REGION0-SPA-range that spans most 224 + of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that 231 225 interleaved system-physical-address range is reclaimed as BLK-aperture 232 226 accessed space starting at DPA-offset (a) into each DIMM. In that 233 227 reclaimed space we create two BLK-aperture "namespaces" from REGION2 and ··· 236 230 237 231 2. In the last portion of DIMM0 and DIMM1 we have an interleaved 238 232 system-physical-address range, REGION1, that spans those two DIMMs as 239 - well as DIMM2 and DIMM3. Some of REGION1 allocated to a PMEM namespace 240 - named "pm1.0" the rest is reclaimed in 4 BLK-aperture namespaces (for 233 + well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace 234 + named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for 241 235 each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and 242 236 "blk5.0". 243 237 244 238 3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1 245 - interleaved system-physical-address range (i.e. the DPA address below 239 + interleaved system-physical-address range (i.e. the DPA address past 246 240 offset (b) are also included in the "blk4.0" and "blk5.0" namespaces. 247 241 Note, that this example shows that BLK-aperture namespaces don't need to 248 242 be contiguous in DPA-space. ··· 258 252 259 253 What follows is a description of the LIBNVDIMM sysfs layout and a 260 254 corresponding object hierarchy diagram as viewed through the LIBNDCTL 261 - api. The example sysfs paths and diagrams are relative to the Example 255 + API. The example sysfs paths and diagrams are relative to the Example 262 256 NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit 263 257 test. 264 258 265 259 LIBNDCTL: Context 266 - Every api call in the LIBNDCTL library requires a context that holds the 260 + Every API call in the LIBNDCTL library requires a context that holds the 267 261 logging parameters and other library instance state. The library is 268 262 based on the libabc template: 269 - https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git/ 263 + https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git 270 264 271 265 LIBNDCTL: instantiate a new library context example 272 266 ··· 415 409 LIBNVDIMM/LIBNDCTL: Region 416 410 ---------------------- 417 411 418 - A generic REGION device is registered for each PMEM range orBLK-aperture 412 + A generic REGION device is registered for each PMEM range or BLK-aperture 419 413 set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture 420 414 sets on the "nfit_test.0" bus. The primary role of regions are to be a 421 415 container of "mappings". A mapping is a tuple of <DIMM, ··· 515 509 types that we should simply name REGION devices with something derived 516 510 from those type names. However, the ND subsystem explicitly keeps the 517 511 REGION name generic and expects userspace to always consider the 518 - region-attributes for 4 reasons: 512 + region-attributes for four reasons: 519 513 520 514 1. There are already more than two REGION and "namespace" types. For 521 515 PMEM there are two subtypes. As mentioned previously we have PMEM where ··· 704 698 705 699 Why the Term "namespace"? 706 700 707 - 1. Why not "volume" for instance? "volume" ran the risk of confusing ND 708 - as a volume manager like device-mapper. 701 + 1. Why not "volume" for instance? "volume" ran the risk of confusing 702 + ND (libnvdimm subsystem) to a volume manager like device-mapper. 709 703 710 704 2. The term originated to describe the sub-devices that can be created 711 705 within a NVME controller (see the nvme specification: ··· 780 774 needs to be written in raw mode. By default, the kernel will autodetect 781 775 the presence of a BTT and disable raw mode. This autodetect behavior 782 776 can be suppressed by enabling raw mode for the namespace via the 783 - ndctl_namespace_set_raw_mode() api. 777 + ndctl_namespace_set_raw_mode() API. 784 778 785 779 786 780 Summary LIBNDCTL Diagram 787 781 ------------------------ 788 782 789 - For the given example above, here is the view of the objects as seen by the LIBNDCTL api: 783 + For the given example above, here is the view of the objects as seen by the 784 + LIBNDCTL API: 790 785 +---+ 791 786 |CTX| +---------+ +--------------+ +---------------+ 792 787 +-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |
+14 -1
drivers/nvdimm/e820.c
··· 3 3 * Copyright (c) 2015, Intel Corporation. 4 4 */ 5 5 #include <linux/platform_device.h> 6 + #include <linux/memory_hotplug.h> 6 7 #include <linux/libnvdimm.h> 7 8 #include <linux/module.h> 8 9 ··· 25 24 nvdimm_bus_unregister(nvdimm_bus); 26 25 return 0; 27 26 } 27 + 28 + #ifdef CONFIG_MEMORY_HOTPLUG 29 + static int e820_range_to_nid(resource_size_t addr) 30 + { 31 + return memory_add_physaddr_to_nid(addr); 32 + } 33 + #else 34 + static int e820_range_to_nid(resource_size_t addr) 35 + { 36 + return NUMA_NO_NODE; 37 + } 38 + #endif 28 39 29 40 static int e820_pmem_probe(struct platform_device *pdev) 30 41 { ··· 61 48 memset(&ndr_desc, 0, sizeof(ndr_desc)); 62 49 ndr_desc.res = p; 63 50 ndr_desc.attr_groups = e820_pmem_region_attribute_groups; 64 - ndr_desc.numa_node = NUMA_NO_NODE; 51 + ndr_desc.numa_node = e820_range_to_nid(p->start); 65 52 set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); 66 53 if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc)) 67 54 goto err;
+1 -12
drivers/nvdimm/pmem.c
··· 105 105 { 106 106 struct pmem_device *pmem = bdev->bd_disk->private_data; 107 107 resource_size_t offset = sector * 512 + pmem->data_offset; 108 - resource_size_t size; 109 108 110 - if (pmem->data_offset) { 111 - /* 112 - * Limit the direct_access() size to what is covered by 113 - * the memmap 114 - */ 115 - size = (pmem->size - offset) & ~ND_PFN_MASK; 116 - } else 117 - size = pmem->size - offset; 118 - 119 - /* FIXME convert DAX to comprehend that this mapping has a lifetime */ 120 109 *kaddr = pmem->virt_addr + offset; 121 110 *pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT; 122 111 123 - return size; 112 + return pmem->size - offset; 124 113 } 125 114 126 115 static const struct block_device_operations pmem_fops = {
+7
fs/dax.c
··· 629 629 if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR)) 630 630 goto fallback; 631 631 632 + /* 633 + * TODO: teach vmf_insert_pfn_pmd() to support 634 + * 'pte_special' for pmds 635 + */ 636 + if (pfn_valid(pfn)) 637 + goto fallback; 638 + 632 639 if (buffer_unwritten(&bh) || buffer_new(&bh)) { 633 640 int i; 634 641 for (i = 0; i < PTRS_PER_PMD; i++)
+1 -1
tools/testing/nvdimm/test/nfit.c
··· 1135 1135 memdev->interleave_ways = 1; 1136 1136 memdev->flags = ACPI_NFIT_MEM_SAVE_FAILED | ACPI_NFIT_MEM_RESTORE_FAILED 1137 1137 | ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_HEALTH_OBSERVED 1138 - | ACPI_NFIT_MEM_ARMED; 1138 + | ACPI_NFIT_MEM_NOT_ARMED; 1139 1139 1140 1140 offset += sizeof(*memdev); 1141 1141 /* dcr-descriptor0 */