nvme: set discard_granularity from NPDG/NPDA

Currently, nvme_config_discard() always sets the discard_granularity
queue limit to the logical block size. However, NVMe namespaces can
advertise a larger preferred discard granularity in the NPDG or NPDA
field of the Identify Namespace structure or the NPDGL or NPDAL fields
of the I/O Command Set Specific Identify Namespace structure.

Use these fields to compute the discard_granularity limit. The logic is
somewhat involved. First, the fields are optional. NPDG is only reported
if the low bit of OPTPERF is set in NSFEAT. NPDA is reported if any bit
of OPTPERF is set. And NPDGL and NPDAL are reported if the high bit of
OPTPERF is set. NPDGL and NPDAL can also each be set to 0 to opt out of
reporting a limit. I/O Command Set Specific Identify Namespace may also
not be supported by older NVMe controllers. Another complication is that
multiple values may be reported among NPDG, NPDGL, NPDA, and NPDAL. The
spec says to prefer the values reported in the L variants. The spec says
NPDG should be a multiple of NPDA and NPDGL should be a multiple of
NPDAL, but it doesn't specify a relationship between NPDG and NPDAL or
NPDGL and NPDA. So use the maximum of the reported NPDG(L) and NPDA(L)
values as the discard_granularity.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>

authored by

Caleb Sander Mateos and committed by

Keith Busch 1 month ago 1029298d b465046c

+32 -3

1 changed file

expand all

drivers

nvme

host

core.c

+32 -3

drivers/nvme/host/core.c

··· 2059 2059 } 2060 2060 2061 2061 static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, 2062 - struct queue_limits *lim) 2062 + struct nvme_id_ns_nvm *nvm, struct queue_limits *lim) 2063 2063 { 2064 2064 struct nvme_ns_head *head = ns->head; 2065 2065 struct nvme_ctrl *ctrl = ns->ctrl; 2066 2066 u32 bs = 1U << head->lba_shift; 2067 2067 u32 atomic_bs, phys_bs, io_opt = 0; 2068 + u32 npdg = 1, npda = 1; 2068 2069 bool valid = true; 2069 2070 u8 optperf; 2070 2071 ··· 2118 2117 else 2119 2118 lim->max_hw_discard_sectors = 0; 2120 2119 2121 - lim->discard_granularity = lim->logical_block_size; 2120 + /* 2121 + * NVMe namespaces advertise both a preferred deallocate granularity 2122 + * (for a discard length) and alignment (for a discard starting offset). 2123 + * However, Linux block devices advertise a single discard_granularity. 2124 + * From NVM Command Set specification 1.1 section 5.2.2, the NPDGL/NPDAL 2125 + * fields in the NVM Command Set Specific Identify Namespace structure 2126 + * are preferred to NPDG/NPDA in the Identify Namespace structure since 2127 + * they can represent larger values. However, NPDGL or NPDAL may be 0 if 2128 + * unsupported. NPDG and NPDA are 0's based. 2129 + * From Figure 115 of NVM Command Set specification 1.1, NPDGL and NPDAL 2130 + * are supported if the high bit of OPTPERF is set. NPDG is supported if 2131 + * the low bit of OPTPERF is set. NPDA is supported if either is set. 2132 + * NPDG should be a multiple of NPDA, and likewise NPDGL should be a 2133 + * multiple of NPDAL, but the spec doesn't say anything about NPDG vs. 2134 + * NPDAL or NPDGL vs. NPDA. So compute the maximum instead of assuming 2135 + * NPDG(L) is the larger. If neither NPDG, NPDGL, NPDA, nor NPDAL are 2136 + * supported, default the discard_granularity to the logical block size. 2137 + */ 2138 + if (optperf & 0x2 && nvm && nvm->npdgl) 2139 + npdg = le32_to_cpu(nvm->npdgl); 2140 + else if (optperf & 0x1) 2141 + npdg = from0based(id->npdg); 2142 + if (optperf & 0x2 && nvm && nvm->npdal) 2143 + npda = le32_to_cpu(nvm->npdal); 2144 + else if (optperf) 2145 + npda = from0based(id->npda); 2146 + if (check_mul_overflow(max(npdg, npda), lim->logical_block_size, 2147 + &lim->discard_granularity)) 2148 + lim->discard_granularity = lim->logical_block_size; 2122 2149 2123 2150 if (ctrl->dmrl) 2124 2151 lim->max_discard_segments = ctrl->dmrl; ··· 2413 2384 nvme_set_ctrl_limits(ns->ctrl, &lim, false); 2414 2385 nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info); 2415 2386 nvme_set_chunk_sectors(ns, id, &lim); 2416 - if (!nvme_update_disk_info(ns, id, &lim)) 2387 + if (!nvme_update_disk_info(ns, id, nvm, &lim)) 2417 2388 capacity = 0; 2418 2389 2419 2390 if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&

Configure Feed

Configure Feed