Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme into for-6.14/block

Pull NVMe updates from Keith:

"nvme updates for Linux 6.14

- Target support for PCI-Endpoint transport (Damien)
- TCP IO queue spreading fixes (Sagi, Chaitanya)
- Target handling for "limited retry" flags (Guixen)
- Poll type fix (Yongsoo)
- Xarray storage error handling (Keisuke)
- Host memory buffer free size fix on error (Francis)"

* tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme: (25 commits)
nvme-pci: use correct size to free the hmb buffer
nvme: Add error path for xa_store in nvme_init_effects
nvme-pci: fix comment typo
Documentation: Document the NVMe PCI endpoint target driver
nvmet: New NVMe PCI endpoint function target driver
nvmet: Implement arbitration feature support
nvmet: Implement interrupt config feature support
nvmet: Implement interrupt coalescing feature support
nvmet: Implement host identifier set feature support
nvmet: Introduce get/set_feature controller operations
nvmet: Do not require SGL for PCI target controller commands
nvmet: Add support for I/O queue management admin commands
nvmet: Introduce nvmet_sq_create() and nvmet_cq_create()
nvmet: Introduce nvmet_req_transfer_len()
nvmet: Improve nvmet_alloc_ctrl() interface and implementation
nvme: Add PCI transport type
nvmet: Add drvdata field to struct nvmet_ctrl
nvmet: Introduce nvmet_get_cmd_effects_admin()
nvmet: Export nvmet_update_cc() and nvmet_cc_xxx() helpers
nvmet: Add vendor_id and subsys_vendor_id subsystem attributes
...

+3962 -188
+1
Documentation/PCI/endpoint/index.rst
··· 15 15 pci-ntb-howto 16 16 pci-vntb-function 17 17 pci-vntb-howto 18 + pci-nvme-function 18 19 19 20 function/binding/pci-test 20 21 function/binding/pci-ntb
+13
Documentation/PCI/endpoint/pci-nvme-function.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================= 4 + PCI NVMe Function 5 + ================= 6 + 7 + :Author: Damien Le Moal <dlemoal@kernel.org> 8 + 9 + The PCI NVMe endpoint function implements a PCI NVMe controller using the NVMe 10 + subsystem target core code. The driver for this function resides with the NVMe 11 + subsystem as drivers/nvme/target/nvmet-pciep.c. 12 + 13 + See Documentation/nvme/nvme-pci-endpoint-target.rst for more details.
+12
Documentation/nvme/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============== 4 + NVMe Subsystem 5 + ============== 6 + 7 + .. toctree:: 8 + :maxdepth: 2 9 + :numbered: 10 + 11 + feature-and-quirk-policy 12 + nvme-pci-endpoint-target
+368
Documentation/nvme/nvme-pci-endpoint-target.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================= 4 + NVMe PCI Endpoint Function Target 5 + ================================= 6 + 7 + :Author: Damien Le Moal <dlemoal@kernel.org> 8 + 9 + The NVMe PCI endpoint function target driver implements a NVMe PCIe controller 10 + using a NVMe fabrics target controller configured with the PCI transport type. 11 + 12 + Overview 13 + ======== 14 + 15 + The NVMe PCI endpoint function target driver allows exposing a NVMe target 16 + controller over a PCIe link, thus implementing an NVMe PCIe device similar to a 17 + regular M.2 SSD. The target controller is created in the same manner as when 18 + using NVMe over fabrics: the controller represents the interface to an NVMe 19 + subsystem using a port. The port transfer type must be configured to be 20 + "pci". The subsystem can be configured to have namespaces backed by regular 21 + files or block devices, or can use NVMe passthrough to expose to the PCI host an 22 + existing physical NVMe device or a NVMe fabrics host controller (e.g. a NVMe TCP 23 + host controller). 24 + 25 + The NVMe PCI endpoint function target driver relies as much as possible on the 26 + NVMe target core code to parse and execute NVMe commands submitted by the PCIe 27 + host. However, using the PCI endpoint framework API and DMA API, the driver is 28 + also responsible for managing all data transfers over the PCIe link. This 29 + implies that the NVMe PCI endpoint function target driver implements several 30 + NVMe data structure management and some NVMe command parsing. 31 + 32 + 1) The driver manages retrieval of NVMe commands in submission queues using DMA 33 + if supported, or MMIO otherwise. Each command retrieved is then executed 34 + using a work item to maximize performance with the parallel execution of 35 + multiple commands on different CPUs. The driver uses a work item to 36 + constantly poll the doorbell of all submission queues to detect command 37 + submissions from the PCIe host. 38 + 39 + 2) The driver transfers completion queues entries of completed commands to the 40 + PCIe host using MMIO copy of the entries in the host completion queue. 41 + After posting completion entries in a completion queue, the driver uses the 42 + PCI endpoint framework API to raise an interrupt to the host to signal the 43 + commands completion. 44 + 45 + 3) For any command that has a data buffer, the NVMe PCI endpoint target driver 46 + parses the command PRPs or SGLs lists to create a list of PCI address 47 + segments representing the mapping of the command data buffer on the host. 48 + The command data buffer is transferred over the PCIe link using this list of 49 + PCI address segments using DMA, if supported. If DMA is not supported, MMIO 50 + is used, which results in poor performance. For write commands, the command 51 + data buffer is transferred from the host into a local memory buffer before 52 + executing the command using the target core code. For read commands, a local 53 + memory buffer is allocated to execute the command and the content of that 54 + buffer is transferred to the host once the command completes. 55 + 56 + Controller Capabilities 57 + ----------------------- 58 + 59 + The NVMe capabilities exposed to the PCIe host through the BAR 0 registers 60 + are almost identical to the capabilities of the NVMe target controller 61 + implemented by the target core code. There are some exceptions. 62 + 63 + 1) The NVMe PCI endpoint target driver always sets the controller capability 64 + CQR bit to request "Contiguous Queues Required". This is to facilitate the 65 + mapping of a queue PCI address range to the local CPU address space. 66 + 67 + 2) The doorbell stride (DSTRB) is always set to be 4B 68 + 69 + 3) Since the PCI endpoint framework does not provide a way to handle PCI level 70 + resets, the controller capability NSSR bit (NVM Subsystem Reset Supported) 71 + is always cleared. 72 + 73 + 4) The boot partition support (BPS), Persistent Memory Region Supported (PMRS) 74 + and Controller Memory Buffer Supported (CMBS) capabilities are never 75 + reported. 76 + 77 + Supported Features 78 + ------------------ 79 + 80 + The NVMe PCI endpoint target driver implements support for both PRPs and SGLs. 81 + The driver also implements IRQ vector coalescing and submission queue 82 + arbitration burst. 83 + 84 + The maximum number of queues and the maximum data transfer size (MDTS) are 85 + configurable through configfs before starting the controller. To avoid issues 86 + with excessive local memory usage for executing commands, MDTS defaults to 512 87 + KB and is limited to a maximum of 2 MB (arbitrary limit). 88 + 89 + Mimimum number of PCI Address Mapping Windows Required 90 + ------------------------------------------------------ 91 + 92 + Most PCI endpoint controllers provide a limited number of mapping windows for 93 + mapping a PCI address range to local CPU memory addresses. The NVMe PCI 94 + endpoint target controllers uses mapping windows for the following. 95 + 96 + 1) One memory window for raising MSI or MSI-X interrupts 97 + 2) One memory window for MMIO transfers 98 + 3) One memory window for each completion queue 99 + 100 + Given the highly asynchronous nature of the NVMe PCI endpoint target driver 101 + operation, the memory windows as described above will generally not be used 102 + simultaneously, but that may happen. So a safe maximum number of completion 103 + queues that can be supported is equal to the total number of memory mapping 104 + windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI 105 + controller with 32 outbound memory windows available, up to 30 completion 106 + queues can be safely operated without any risk of getting PCI address mapping 107 + errors due to the lack of memory windows. 108 + 109 + Maximum Number of Queue Pairs 110 + ----------------------------- 111 + 112 + Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint 113 + controller, BAR 0 is allocated with enough space to accommodate the admin queue 114 + and multiple I/O queues. The maximum of number of I/O queues pairs that can be 115 + supported is limited by several factors. 116 + 117 + 1) The NVMe target core code limits the maximum number of I/O queues to the 118 + number of online CPUs. 119 + 2) The total number of queue pairs, including the admin queue, cannot exceed 120 + the number of MSI-X or MSI vectors available. 121 + 3) The total number of completion queues must not exceed the total number of 122 + PCI mapping windows minus 2 (see above). 123 + 124 + The NVMe endpoint function driver allows configuring the maximum number of 125 + queue pairs through configfs. 126 + 127 + Limitations and NVMe Specification Non-Compliance 128 + ------------------------------------------------- 129 + 130 + Similar to the NVMe target core code, the NVMe PCI endpoint target driver does 131 + not support multiple submission queues using the same completion queue. All 132 + submission queues must specify a unique completion queue. 133 + 134 + 135 + User Guide 136 + ========== 137 + 138 + This section describes the hardware requirements and how to setup an NVMe PCI 139 + endpoint target device. 140 + 141 + Kernel Requirements 142 + ------------------- 143 + 144 + The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT, 145 + CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EPF enabled. 146 + CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled 147 + (obviously). 148 + 149 + In addition to this, at least one PCI endpoint controller driver should be 150 + available for the endpoint hardware used. 151 + 152 + To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK) 153 + is also recommended. With this, a simple setup using a null_blk block device 154 + as a subsystem namespace can be used. 155 + 156 + Hardware Requirements 157 + --------------------- 158 + 159 + To use the NVMe PCI endpoint target driver, at least one endpoint controller 160 + device is required. 161 + 162 + To find the list of endpoint controller devices in the system:: 163 + 164 + # ls /sys/class/pci_epc/ 165 + a40000000.pcie-ep 166 + 167 + If PCI_ENDPOINT_CONFIGFS is enabled:: 168 + 169 + # ls /sys/kernel/config/pci_ep/controllers 170 + a40000000.pcie-ep 171 + 172 + The endpoint board must of course also be connected to a host with a PCI cable 173 + with RX-TX signal swapped. If the host PCI slot used does not have 174 + plug-and-play capabilities, the host should be powered off when the NVMe PCI 175 + endpoint device is configured. 176 + 177 + NVMe Endpoint Device 178 + -------------------- 179 + 180 + Creating an NVMe endpoint device is a two step process. First, an NVMe target 181 + subsystem and port must be defined. Second, the NVMe PCI endpoint device must 182 + be setup and bound to the subsystem and port created. 183 + 184 + Creating a NVMe Subsystem and Port 185 + ---------------------------------- 186 + 187 + Details about how to configure a NVMe target subsystem and port are outside the 188 + scope of this document. The following only provides a simple example of a port 189 + and subsystem with a single namespace backed by a null_blk device. 190 + 191 + First, make sure that configfs is enabled:: 192 + 193 + # mount -t configfs none /sys/kernel/config 194 + 195 + Next, create a null_blk device (default settings give a 250 GB device without 196 + memory backing). The block device created will be /dev/nullb0 by default:: 197 + 198 + # modprobe null_blk 199 + # ls /dev/nullb0 200 + /dev/nullb0 201 + 202 + The NVMe PCI endpoint function target driver must be loaded:: 203 + 204 + # modprobe nvmet_pci_epf 205 + # lsmod | grep nvmet 206 + nvmet_pci_epf 32768 0 207 + nvmet 118784 1 nvmet_pci_epf 208 + nvme_core 131072 2 nvmet_pci_epf,nvmet 209 + 210 + Now, create a subsystem and a port that we will use to create a PCI target 211 + controller when setting up the NVMe PCI endpoint target device. In this 212 + example, the port is created with a maximum of 4 I/O queue pairs:: 213 + 214 + # cd /sys/kernel/config/nvmet/subsystems 215 + # mkdir nvmepf.0.nqn 216 + # echo -n "Linux-pci-epf" > nvmepf.0.nqn/attr_model 217 + # echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id 218 + # echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id 219 + # echo 1 > nvmepf.0.nqn/attr_allow_any_host 220 + # echo 4 > nvmepf.0.nqn/attr_qid_max 221 + 222 + Next, create and enable the subsystem namespace using the null_blk block 223 + device:: 224 + 225 + # mkdir nvmepf.0.nqn/namespaces/1 226 + # echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path 227 + # echo 1 > "nvmepf.0.nqn/namespaces/1/enable" 228 + 229 + Finally, create the target port and link it to the subsystem:: 230 + 231 + # cd /sys/kernel/config/nvmet/ports 232 + # mkdir 1 233 + # echo -n "pci" > 1/addr_trtype 234 + # ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \ 235 + /sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn 236 + 237 + Creating a NVMe PCI Endpoint Device 238 + ----------------------------------- 239 + 240 + With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint 241 + device can now be created and enabled. The NVMe PCI endpoint target driver 242 + should already be loaded (that is done automatically when the port is created):: 243 + 244 + # ls /sys/kernel/config/pci_ep/functions 245 + nvmet_pci_epf 246 + 247 + Next, create function 0:: 248 + 249 + # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf 250 + # mkdir nvmepf.0 251 + # ls nvmepf.0/ 252 + baseclass_code msix_interrupts secondary 253 + cache_line_size nvme subclass_code 254 + deviceid primary subsys_id 255 + interrupt_pin progif_code subsys_vendor_id 256 + msi_interrupts revid vendorid 257 + 258 + Configure the function using any device ID (the vendor ID for the device will 259 + be automatically set to the same value as the NVMe target subsystem vendor 260 + ID):: 261 + 262 + # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf 263 + # echo 0xBEEF > nvmepf.0/deviceid 264 + # echo 32 > nvmepf.0/msix_interrupts 265 + 266 + If the PCI endpoint controller used does not support MSI-X, MSI can be 267 + configured instead:: 268 + 269 + # echo 32 > nvmepf.0/msi_interrupts 270 + 271 + Next, let's bind our endpoint device with the target subsystem and port that we 272 + created:: 273 + 274 + # echo 1 > nvmepf.0/nvme/portid 275 + # echo "nvmepf.0.nqn" > nvmepf.0/nvme/subsysnqn 276 + 277 + The endpoint function can then be bound to the endpoint controller and the 278 + controller started:: 279 + 280 + # cd /sys/kernel/config/pci_ep 281 + # ln -s functions/nvmet_pci_epf/nvmepf.0 controllers/a40000000.pcie-ep/ 282 + # echo 1 > controllers/a40000000.pcie-ep/start 283 + 284 + On the endpoint machine, kernel messages will show information as the NVMe 285 + target device and endpoint device are created and connected. 286 + 287 + .. code-block:: text 288 + 289 + null_blk: disk nullb0 created 290 + null_blk: module loaded 291 + nvmet: adding nsid 1 to subsystem nvmepf.0.nqn 292 + nvmet_pci_epf nvmet_pci_epf.0: PCI endpoint controller supports MSI-X, 32 vectors 293 + nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ab90791-2246-4fbb-961d-4c3d5a5a0176. 294 + nvmet_pci_epf nvmet_pci_epf.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B 295 + 296 + PCI Root-Complex Host 297 + --------------------- 298 + 299 + Booting the PCI host will result in the initialization of the PCIe link (this 300 + may be signaled by the PCI endpoint driver with a kernel message). A kernel 301 + message on the endpoint will also signal when the host NVMe driver enables the 302 + device controller:: 303 + 304 + nvmet_pci_epf nvmet_pci_epf.0: Enabling controller 305 + 306 + On the host side, the NVMe PCI endpoint function target device will is 307 + discoverable as a PCI device, with the vendor ID and device ID as configured:: 308 + 309 + # lspci -n 310 + 0000:01:00.0 0108: 1b96:beef 311 + 312 + An this device will be recognized as an NVMe device with a single namespace:: 313 + 314 + # lsblk 315 + NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS 316 + nvme0n1 259:0 0 250G 0 disk 317 + 318 + The NVMe endpoint block device can then be used as any other regular NVMe 319 + namespace block device. The *nvme* command line utility can be used to get more 320 + detailed information about the endpoint device:: 321 + 322 + # nvme id-ctrl /dev/nvme0 323 + NVME Identify Controller: 324 + vid : 0x1b96 325 + ssvid : 0x1b96 326 + sn : 94993c85650ef7bcd625 327 + mn : Linux-pci-epf 328 + fr : 6.13.0-r 329 + rab : 6 330 + ieee : 000000 331 + cmic : 0xb 332 + mdts : 7 333 + cntlid : 0x1 334 + ver : 0x20100 335 + ... 336 + 337 + 338 + Endpoint Bindings 339 + ================= 340 + 341 + The NVMe PCI endpoint target driver uses the PCI endpoint configfs device 342 + attributes as follows. 343 + 344 + ================ =========================================================== 345 + vendorid Ignored (the vendor id of the NVMe target subsystem is used) 346 + deviceid Anything is OK (e.g. PCI_ANY_ID) 347 + revid Do not care 348 + progif_code Must be 0x02 (NVM Express) 349 + baseclass_code Must be 0x01 (PCI_BASE_CLASS_STORAGE) 350 + subclass_code Must be 0x08 (Non-Volatile Memory controller) 351 + cache_line_size Do not care 352 + subsys_vendor_id Ignored (the subsystem vendor id of the NVMe target subsystem 353 + is used) 354 + subsys_id Anything is OK (e.g. PCI_ANY_ID) 355 + msi_interrupts At least equal to the number of queue pairs desired 356 + msix_interrupts At least equal to the number of queue pairs desired 357 + interrupt_pin Interrupt PIN to use if MSI and MSI-X are not supported 358 + ================ =========================================================== 359 + 360 + The NVMe PCI endpoint target function also has some specific configurable 361 + fields defined in the *nvme* subdirectory of the function directory. These 362 + fields are as follows. 363 + 364 + ================ =========================================================== 365 + mdts_kb Maximum data transfer size in KiB (default: 512) 366 + portid The ID of the target port to use 367 + subsysnqn The NQN of the target subsystem to use 368 + ================ ===========================================================
+1
Documentation/subsystem-apis.rst
··· 60 60 cdrom/index 61 61 scsi/index 62 62 target/index 63 + nvme/index 63 64 64 65 Other subsystems 65 66 ----------------
+28 -6
drivers/nvme/host/core.c
··· 3093 3093 static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi, 3094 3094 struct nvme_effects_log **log) 3095 3095 { 3096 - struct nvme_effects_log *cel = xa_load(&ctrl->cels, csi); 3096 + struct nvme_effects_log *old, *cel = xa_load(&ctrl->cels, csi); 3097 3097 int ret; 3098 3098 3099 3099 if (cel) ··· 3110 3110 return ret; 3111 3111 } 3112 3112 3113 - xa_store(&ctrl->cels, csi, cel, GFP_KERNEL); 3113 + old = xa_store(&ctrl->cels, csi, cel, GFP_KERNEL); 3114 + if (xa_is_err(old)) { 3115 + kfree(cel); 3116 + return xa_err(old); 3117 + } 3114 3118 out: 3115 3119 *log = cel; 3116 3120 return 0; ··· 3176 3172 return ret; 3177 3173 } 3178 3174 3175 + static int nvme_init_effects_log(struct nvme_ctrl *ctrl, 3176 + u8 csi, struct nvme_effects_log **log) 3177 + { 3178 + struct nvme_effects_log *effects, *old; 3179 + 3180 + effects = kzalloc(sizeof(*effects), GFP_KERNEL); 3181 + if (effects) 3182 + return -ENOMEM; 3183 + 3184 + old = xa_store(&ctrl->cels, csi, effects, GFP_KERNEL); 3185 + if (xa_is_err(old)) { 3186 + kfree(effects); 3187 + return xa_err(old); 3188 + } 3189 + 3190 + *log = effects; 3191 + return 0; 3192 + } 3193 + 3179 3194 static void nvme_init_known_nvm_effects(struct nvme_ctrl *ctrl) 3180 3195 { 3181 3196 struct nvme_effects_log *log = ctrl->effects; ··· 3241 3218 } 3242 3219 3243 3220 if (!ctrl->effects) { 3244 - ctrl->effects = kzalloc(sizeof(*ctrl->effects), GFP_KERNEL); 3245 - if (!ctrl->effects) 3246 - return -ENOMEM; 3247 - xa_store(&ctrl->cels, NVME_CSI_NVM, ctrl->effects, GFP_KERNEL); 3221 + ret = nvme_init_effects_log(ctrl, NVME_CSI_NVM, &ctrl->effects); 3222 + if (ret < 0) 3223 + return ret; 3248 3224 } 3249 3225 3250 3226 nvme_init_known_nvm_effects(ctrl);
-39
drivers/nvme/host/nvme.h
··· 1182 1182 return (ctrl->ctrl_config & NVME_CC_CSS_MASK) == NVME_CC_CSS_CSI; 1183 1183 } 1184 1184 1185 - #ifdef CONFIG_NVME_VERBOSE_ERRORS 1186 - const char *nvme_get_error_status_str(u16 status); 1187 - const char *nvme_get_opcode_str(u8 opcode); 1188 - const char *nvme_get_admin_opcode_str(u8 opcode); 1189 - const char *nvme_get_fabrics_opcode_str(u8 opcode); 1190 - #else /* CONFIG_NVME_VERBOSE_ERRORS */ 1191 - static inline const char *nvme_get_error_status_str(u16 status) 1192 - { 1193 - return "I/O Error"; 1194 - } 1195 - static inline const char *nvme_get_opcode_str(u8 opcode) 1196 - { 1197 - return "I/O Cmd"; 1198 - } 1199 - static inline const char *nvme_get_admin_opcode_str(u8 opcode) 1200 - { 1201 - return "Admin Cmd"; 1202 - } 1203 - 1204 - static inline const char *nvme_get_fabrics_opcode_str(u8 opcode) 1205 - { 1206 - return "Fabrics Cmd"; 1207 - } 1208 - #endif /* CONFIG_NVME_VERBOSE_ERRORS */ 1209 - 1210 - static inline const char *nvme_opcode_str(int qid, u8 opcode) 1211 - { 1212 - return qid ? nvme_get_opcode_str(opcode) : 1213 - nvme_get_admin_opcode_str(opcode); 1214 - } 1215 - 1216 - static inline const char *nvme_fabrics_opcode_str( 1217 - int qid, const struct nvme_command *cmd) 1218 - { 1219 - if (nvme_is_fabrics(cmd)) 1220 - return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype); 1221 - 1222 - return nvme_opcode_str(qid, cmd->common.opcode); 1223 - } 1224 1185 #endif /* _NVME_H */
+7 -7
drivers/nvme/host/pci.c
··· 372 372 /* 373 373 * Ensure that the doorbell is updated before reading the event 374 374 * index from memory. The controller needs to provide similar 375 - * ordering to ensure the envent index is updated before reading 375 + * ordering to ensure the event index is updated before reading 376 376 * the doorbell. 377 377 */ 378 378 mb(); ··· 1147 1147 } 1148 1148 } 1149 1149 1150 - static inline int nvme_poll_cq(struct nvme_queue *nvmeq, 1151 - struct io_comp_batch *iob) 1150 + static inline bool nvme_poll_cq(struct nvme_queue *nvmeq, 1151 + struct io_comp_batch *iob) 1152 1152 { 1153 - int found = 0; 1153 + bool found = false; 1154 1154 1155 1155 while (nvme_cqe_pending(nvmeq)) { 1156 - found++; 1156 + found = true; 1157 1157 /* 1158 1158 * load-load control dependency between phase and the rest of 1159 1159 * the cqe requires a full read memory barrier ··· 2085 2085 sizeof(*dev->host_mem_descs), &dev->host_mem_descs_dma, 2086 2086 GFP_KERNEL); 2087 2087 if (!dev->host_mem_descs) { 2088 - dma_free_noncontiguous(dev->dev, dev->host_mem_size, 2089 - dev->hmb_sgt, DMA_BIDIRECTIONAL); 2088 + dma_free_noncontiguous(dev->dev, size, dev->hmb_sgt, 2089 + DMA_BIDIRECTIONAL); 2090 2090 dev->hmb_sgt = NULL; 2091 2091 return -ENOMEM; 2092 2092 }
+57 -13
drivers/nvme/host/tcp.c
··· 54 54 "nvme TLS handshake timeout in seconds (default 10)"); 55 55 #endif 56 56 57 + static atomic_t nvme_tcp_cpu_queues[NR_CPUS]; 58 + 57 59 #ifdef CONFIG_DEBUG_LOCK_ALLOC 58 60 /* lockdep can detect a circular dependency of the form 59 61 * sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock ··· 129 127 NVME_TCP_Q_ALLOCATED = 0, 130 128 NVME_TCP_Q_LIVE = 1, 131 129 NVME_TCP_Q_POLLING = 2, 130 + NVME_TCP_Q_IO_CPU_SET = 3, 132 131 }; 133 132 134 133 enum nvme_tcp_recv_state { ··· 1565 1562 ctrl->io_queues[HCTX_TYPE_POLL]; 1566 1563 } 1567 1564 1565 + /** 1566 + * Track the number of queues assigned to each cpu using a global per-cpu 1567 + * counter and select the least used cpu from the mq_map. Our goal is to spread 1568 + * different controllers I/O threads across different cpu cores. 1569 + * 1570 + * Note that the accounting is not 100% perfect, but we don't need to be, we're 1571 + * simply putting our best effort to select the best candidate cpu core that we 1572 + * find at any given point. 1573 + */ 1568 1574 static void nvme_tcp_set_queue_io_cpu(struct nvme_tcp_queue *queue) 1569 1575 { 1570 1576 struct nvme_tcp_ctrl *ctrl = queue->ctrl; 1571 - int qid = nvme_tcp_queue_id(queue); 1572 - int n = 0; 1577 + struct blk_mq_tag_set *set = &ctrl->tag_set; 1578 + int qid = nvme_tcp_queue_id(queue) - 1; 1579 + unsigned int *mq_map = NULL; 1580 + int cpu, min_queues = INT_MAX, io_cpu; 1581 + 1582 + if (wq_unbound) 1583 + goto out; 1573 1584 1574 1585 if (nvme_tcp_default_queue(queue)) 1575 - n = qid - 1; 1586 + mq_map = set->map[HCTX_TYPE_DEFAULT].mq_map; 1576 1587 else if (nvme_tcp_read_queue(queue)) 1577 - n = qid - ctrl->io_queues[HCTX_TYPE_DEFAULT] - 1; 1588 + mq_map = set->map[HCTX_TYPE_READ].mq_map; 1578 1589 else if (nvme_tcp_poll_queue(queue)) 1579 - n = qid - ctrl->io_queues[HCTX_TYPE_DEFAULT] - 1580 - ctrl->io_queues[HCTX_TYPE_READ] - 1; 1581 - if (wq_unbound) 1582 - queue->io_cpu = WORK_CPU_UNBOUND; 1583 - else 1584 - queue->io_cpu = cpumask_next_wrap(n - 1, cpu_online_mask, -1, false); 1590 + mq_map = set->map[HCTX_TYPE_POLL].mq_map; 1591 + 1592 + if (WARN_ON(!mq_map)) 1593 + goto out; 1594 + 1595 + /* Search for the least used cpu from the mq_map */ 1596 + io_cpu = WORK_CPU_UNBOUND; 1597 + for_each_online_cpu(cpu) { 1598 + int num_queues = atomic_read(&nvme_tcp_cpu_queues[cpu]); 1599 + 1600 + if (mq_map[cpu] != qid) 1601 + continue; 1602 + if (num_queues < min_queues) { 1603 + io_cpu = cpu; 1604 + min_queues = num_queues; 1605 + } 1606 + } 1607 + if (io_cpu != WORK_CPU_UNBOUND) { 1608 + queue->io_cpu = io_cpu; 1609 + atomic_inc(&nvme_tcp_cpu_queues[io_cpu]); 1610 + set_bit(NVME_TCP_Q_IO_CPU_SET, &queue->flags); 1611 + } 1612 + out: 1613 + dev_dbg(ctrl->ctrl.device, "queue %d: using cpu %d\n", 1614 + qid, queue->io_cpu); 1585 1615 } 1586 1616 1587 1617 static void nvme_tcp_tls_done(void *data, int status, key_serial_t pskid) ··· 1758 1722 1759 1723 queue->sock->sk->sk_allocation = GFP_ATOMIC; 1760 1724 queue->sock->sk->sk_use_task_frag = false; 1761 - nvme_tcp_set_queue_io_cpu(queue); 1725 + queue->io_cpu = WORK_CPU_UNBOUND; 1762 1726 queue->request = NULL; 1763 1727 queue->data_remaining = 0; 1764 1728 queue->ddgst_remaining = 0; ··· 1880 1844 if (!test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags)) 1881 1845 return; 1882 1846 1847 + if (test_and_clear_bit(NVME_TCP_Q_IO_CPU_SET, &queue->flags)) 1848 + atomic_dec(&nvme_tcp_cpu_queues[queue->io_cpu]); 1849 + 1883 1850 mutex_lock(&queue->queue_lock); 1884 1851 if (test_and_clear_bit(NVME_TCP_Q_LIVE, &queue->flags)) 1885 1852 __nvme_tcp_stop_queue(queue); ··· 1917 1878 nvme_tcp_init_recv_ctx(queue); 1918 1879 nvme_tcp_setup_sock_ops(queue); 1919 1880 1920 - if (idx) 1881 + if (idx) { 1882 + nvme_tcp_set_queue_io_cpu(queue); 1921 1883 ret = nvmf_connect_io_queue(nctrl, idx); 1922 - else 1884 + } else 1923 1885 ret = nvmf_connect_admin_queue(nctrl); 1924 1886 1925 1887 if (!ret) { ··· 2889 2849 static int __init nvme_tcp_init_module(void) 2890 2850 { 2891 2851 unsigned int wq_flags = WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS; 2852 + int cpu; 2892 2853 2893 2854 BUILD_BUG_ON(sizeof(struct nvme_tcp_hdr) != 8); 2894 2855 BUILD_BUG_ON(sizeof(struct nvme_tcp_cmd_pdu) != 72); ··· 2906 2865 nvme_tcp_wq = alloc_workqueue("nvme_tcp_wq", wq_flags, 0); 2907 2866 if (!nvme_tcp_wq) 2908 2867 return -ENOMEM; 2868 + 2869 + for_each_possible_cpu(cpu) 2870 + atomic_set(&nvme_tcp_cpu_queues[cpu], 0); 2909 2871 2910 2872 nvmf_register_transport(&nvme_tcp_transport); 2911 2873 return 0;
+11
drivers/nvme/target/Kconfig
··· 115 115 target side. 116 116 117 117 If unsure, say N. 118 + 119 + config NVME_TARGET_PCI_EPF 120 + tristate "NVMe PCI Endpoint Function target support" 121 + depends on NVME_TARGET && PCI_ENDPOINT 122 + depends on NVME_CORE=y || NVME_CORE=NVME_TARGET 123 + help 124 + This enables the NVMe PCI Endpoint Function target driver support, 125 + which allows creating a NVMe PCI controller using an endpoint mode 126 + capable PCI controller. 127 + 128 + If unsure, say N.
+2
drivers/nvme/target/Makefile
··· 8 8 obj-$(CONFIG_NVME_TARGET_FC) += nvmet-fc.o 9 9 obj-$(CONFIG_NVME_TARGET_FCLOOP) += nvme-fcloop.o 10 10 obj-$(CONFIG_NVME_TARGET_TCP) += nvmet-tcp.o 11 + obj-$(CONFIG_NVME_TARGET_PCI_EPF) += nvmet-pci-epf.o 11 12 12 13 nvmet-y += core.o configfs.o admin-cmd.o fabrics-cmd.o \ 13 14 discovery.o io-cmd-file.o io-cmd-bdev.o pr.o ··· 21 20 nvmet-fc-y += fc.o 22 21 nvme-fcloop-y += fcloop.o 23 22 nvmet-tcp-y += tcp.o 23 + nvmet-pci-epf-y += pci-epf.o 24 24 nvmet-$(CONFIG_TRACING) += trace.o
+376 -14
drivers/nvme/target/admin-cmd.c
··· 12 12 #include <linux/unaligned.h> 13 13 #include "nvmet.h" 14 14 15 + static void nvmet_execute_delete_sq(struct nvmet_req *req) 16 + { 17 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 18 + u16 sqid = le16_to_cpu(req->cmd->delete_queue.qid); 19 + u16 status; 20 + 21 + if (!nvmet_is_pci_ctrl(ctrl)) { 22 + status = nvmet_report_invalid_opcode(req); 23 + goto complete; 24 + } 25 + 26 + if (!sqid) { 27 + status = NVME_SC_QID_INVALID | NVME_STATUS_DNR; 28 + goto complete; 29 + } 30 + 31 + status = nvmet_check_sqid(ctrl, sqid, false); 32 + if (status != NVME_SC_SUCCESS) 33 + goto complete; 34 + 35 + status = ctrl->ops->delete_sq(ctrl, sqid); 36 + 37 + complete: 38 + nvmet_req_complete(req, status); 39 + } 40 + 41 + static void nvmet_execute_create_sq(struct nvmet_req *req) 42 + { 43 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 44 + struct nvme_command *cmd = req->cmd; 45 + u16 sqid = le16_to_cpu(cmd->create_sq.sqid); 46 + u16 cqid = le16_to_cpu(cmd->create_sq.cqid); 47 + u16 sq_flags = le16_to_cpu(cmd->create_sq.sq_flags); 48 + u16 qsize = le16_to_cpu(cmd->create_sq.qsize); 49 + u64 prp1 = le64_to_cpu(cmd->create_sq.prp1); 50 + u16 status; 51 + 52 + if (!nvmet_is_pci_ctrl(ctrl)) { 53 + status = nvmet_report_invalid_opcode(req); 54 + goto complete; 55 + } 56 + 57 + if (!sqid) { 58 + status = NVME_SC_QID_INVALID | NVME_STATUS_DNR; 59 + goto complete; 60 + } 61 + 62 + status = nvmet_check_sqid(ctrl, sqid, true); 63 + if (status != NVME_SC_SUCCESS) 64 + goto complete; 65 + 66 + /* 67 + * Note: The NVMe specification allows multiple SQs to use the same CQ. 68 + * However, the target code does not really support that. So for now, 69 + * prevent this and fail the command if sqid and cqid are different. 70 + */ 71 + if (!cqid || cqid != sqid) { 72 + pr_err("SQ %u: Unsupported CQID %u\n", sqid, cqid); 73 + status = NVME_SC_CQ_INVALID | NVME_STATUS_DNR; 74 + goto complete; 75 + } 76 + 77 + if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) { 78 + status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR; 79 + goto complete; 80 + } 81 + 82 + status = ctrl->ops->create_sq(ctrl, sqid, sq_flags, qsize, prp1); 83 + 84 + complete: 85 + nvmet_req_complete(req, status); 86 + } 87 + 88 + static void nvmet_execute_delete_cq(struct nvmet_req *req) 89 + { 90 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 91 + u16 cqid = le16_to_cpu(req->cmd->delete_queue.qid); 92 + u16 status; 93 + 94 + if (!nvmet_is_pci_ctrl(ctrl)) { 95 + status = nvmet_report_invalid_opcode(req); 96 + goto complete; 97 + } 98 + 99 + if (!cqid) { 100 + status = NVME_SC_QID_INVALID | NVME_STATUS_DNR; 101 + goto complete; 102 + } 103 + 104 + status = nvmet_check_cqid(ctrl, cqid); 105 + if (status != NVME_SC_SUCCESS) 106 + goto complete; 107 + 108 + status = ctrl->ops->delete_cq(ctrl, cqid); 109 + 110 + complete: 111 + nvmet_req_complete(req, status); 112 + } 113 + 114 + static void nvmet_execute_create_cq(struct nvmet_req *req) 115 + { 116 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 117 + struct nvme_command *cmd = req->cmd; 118 + u16 cqid = le16_to_cpu(cmd->create_cq.cqid); 119 + u16 cq_flags = le16_to_cpu(cmd->create_cq.cq_flags); 120 + u16 qsize = le16_to_cpu(cmd->create_cq.qsize); 121 + u16 irq_vector = le16_to_cpu(cmd->create_cq.irq_vector); 122 + u64 prp1 = le64_to_cpu(cmd->create_cq.prp1); 123 + u16 status; 124 + 125 + if (!nvmet_is_pci_ctrl(ctrl)) { 126 + status = nvmet_report_invalid_opcode(req); 127 + goto complete; 128 + } 129 + 130 + if (!cqid) { 131 + status = NVME_SC_QID_INVALID | NVME_STATUS_DNR; 132 + goto complete; 133 + } 134 + 135 + status = nvmet_check_cqid(ctrl, cqid); 136 + if (status != NVME_SC_SUCCESS) 137 + goto complete; 138 + 139 + if (!qsize || qsize > NVME_CAP_MQES(ctrl->cap)) { 140 + status = NVME_SC_QUEUE_SIZE | NVME_STATUS_DNR; 141 + goto complete; 142 + } 143 + 144 + status = ctrl->ops->create_cq(ctrl, cqid, cq_flags, qsize, 145 + prp1, irq_vector); 146 + 147 + complete: 148 + nvmet_req_complete(req, status); 149 + } 150 + 15 151 u32 nvmet_get_log_page_len(struct nvme_command *cmd) 16 152 { 17 153 u32 len = le16_to_cpu(cmd->get_log_page.numdu); ··· 366 230 nvmet_req_complete(req, status); 367 231 } 368 232 369 - static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log) 233 + static void nvmet_get_cmd_effects_admin(struct nvmet_ctrl *ctrl, 234 + struct nvme_effects_log *log) 370 235 { 236 + /* For a PCI target controller, advertize support for the . */ 237 + if (nvmet_is_pci_ctrl(ctrl)) { 238 + log->acs[nvme_admin_delete_sq] = 239 + log->acs[nvme_admin_create_sq] = 240 + log->acs[nvme_admin_delete_cq] = 241 + log->acs[nvme_admin_create_cq] = 242 + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); 243 + } 244 + 371 245 log->acs[nvme_admin_get_log_page] = 372 246 log->acs[nvme_admin_identify] = 373 247 log->acs[nvme_admin_abort_cmd] = ··· 386 240 log->acs[nvme_admin_async_event] = 387 241 log->acs[nvme_admin_keep_alive] = 388 242 cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); 243 + } 389 244 245 + static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log) 246 + { 390 247 log->iocs[nvme_cmd_read] = 391 248 log->iocs[nvme_cmd_flush] = 392 249 log->iocs[nvme_cmd_dsm] = ··· 414 265 415 266 static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) 416 267 { 268 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 417 269 struct nvme_effects_log *log; 418 270 u16 status = NVME_SC_SUCCESS; 419 271 ··· 426 276 427 277 switch (req->cmd->get_log_page.csi) { 428 278 case NVME_CSI_NVM: 279 + nvmet_get_cmd_effects_admin(ctrl, log); 429 280 nvmet_get_cmd_effects_nvm(log); 430 281 break; 431 282 case NVME_CSI_ZNS: ··· 434 283 status = NVME_SC_INVALID_IO_CMD_SET; 435 284 goto free; 436 285 } 286 + nvmet_get_cmd_effects_admin(ctrl, log); 437 287 nvmet_get_cmd_effects_nvm(log); 438 288 nvmet_get_cmd_effects_zns(log); 439 289 break; ··· 659 507 struct nvmet_ctrl *ctrl = req->sq->ctrl; 660 508 struct nvmet_subsys *subsys = ctrl->subsys; 661 509 struct nvme_id_ctrl *id; 662 - u32 cmd_capsule_size; 510 + u32 cmd_capsule_size, ctratt; 663 511 u16 status = 0; 664 512 665 513 if (!subsys->subsys_discovered) { ··· 674 522 goto out; 675 523 } 676 524 677 - /* XXX: figure out how to assign real vendors IDs. */ 678 - id->vid = 0; 679 - id->ssvid = 0; 525 + id->vid = cpu_to_le16(subsys->vendor_id); 526 + id->ssvid = cpu_to_le16(subsys->subsys_vendor_id); 680 527 681 528 memcpy(id->sn, ctrl->subsys->serial, NVMET_SN_MAX_SIZE); 682 529 memcpy_and_pad(id->mn, sizeof(id->mn), subsys->model_number, ··· 707 556 708 557 /* XXX: figure out what to do about RTD3R/RTD3 */ 709 558 id->oaes = cpu_to_le32(NVMET_AEN_CFG_OPTIONAL); 710 - id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT | 711 - NVME_CTRL_ATTR_TBKAS); 559 + ctratt = NVME_CTRL_ATTR_HID_128_BIT | NVME_CTRL_ATTR_TBKAS; 560 + if (nvmet_is_pci_ctrl(ctrl)) 561 + ctratt |= NVME_CTRL_ATTR_RHII; 562 + id->ctratt = cpu_to_le32(ctratt); 712 563 713 564 id->oacs = 0; 714 565 ··· 1257 1104 return 0; 1258 1105 } 1259 1106 1107 + static u16 nvmet_set_feat_host_id(struct nvmet_req *req) 1108 + { 1109 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1110 + 1111 + if (!nvmet_is_pci_ctrl(ctrl)) 1112 + return NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR; 1113 + 1114 + /* 1115 + * The NVMe base specifications v2.1 recommends supporting 128-bits host 1116 + * IDs (section 5.1.25.1.28.1). However, that same section also says 1117 + * that "The controller may support a 64-bit Host Identifier and/or an 1118 + * extended 128-bit Host Identifier". So simplify this support and do 1119 + * not support 64-bits host IDs to avoid needing to check that all 1120 + * controllers associated with the same subsystem all use the same host 1121 + * ID size. 1122 + */ 1123 + if (!(req->cmd->common.cdw11 & cpu_to_le32(1 << 0))) { 1124 + req->error_loc = offsetof(struct nvme_common_command, cdw11); 1125 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1126 + } 1127 + 1128 + return nvmet_copy_from_sgl(req, 0, &req->sq->ctrl->hostid, 1129 + sizeof(req->sq->ctrl->hostid)); 1130 + } 1131 + 1132 + static u16 nvmet_set_feat_irq_coalesce(struct nvmet_req *req) 1133 + { 1134 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1135 + u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11); 1136 + struct nvmet_feat_irq_coalesce irqc = { 1137 + .time = (cdw11 >> 8) & 0xff, 1138 + .thr = cdw11 & 0xff, 1139 + }; 1140 + 1141 + /* 1142 + * This feature is not supported for fabrics controllers and mandatory 1143 + * for PCI controllers. 1144 + */ 1145 + if (!nvmet_is_pci_ctrl(ctrl)) { 1146 + req->error_loc = offsetof(struct nvme_common_command, cdw10); 1147 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1148 + } 1149 + 1150 + return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc); 1151 + } 1152 + 1153 + static u16 nvmet_set_feat_irq_config(struct nvmet_req *req) 1154 + { 1155 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1156 + u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11); 1157 + struct nvmet_feat_irq_config irqcfg = { 1158 + .iv = cdw11 & 0xffff, 1159 + .cd = (cdw11 >> 16) & 0x1, 1160 + }; 1161 + 1162 + /* 1163 + * This feature is not supported for fabrics controllers and mandatory 1164 + * for PCI controllers. 1165 + */ 1166 + if (!nvmet_is_pci_ctrl(ctrl)) { 1167 + req->error_loc = offsetof(struct nvme_common_command, cdw10); 1168 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1169 + } 1170 + 1171 + return ctrl->ops->set_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg); 1172 + } 1173 + 1174 + static u16 nvmet_set_feat_arbitration(struct nvmet_req *req) 1175 + { 1176 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1177 + u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11); 1178 + struct nvmet_feat_arbitration arb = { 1179 + .hpw = (cdw11 >> 24) & 0xff, 1180 + .mpw = (cdw11 >> 16) & 0xff, 1181 + .lpw = (cdw11 >> 8) & 0xff, 1182 + .ab = cdw11 & 0x3, 1183 + }; 1184 + 1185 + if (!ctrl->ops->set_feature) { 1186 + req->error_loc = offsetof(struct nvme_common_command, cdw10); 1187 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1188 + } 1189 + 1190 + return ctrl->ops->set_feature(ctrl, NVME_FEAT_ARBITRATION, &arb); 1191 + } 1192 + 1260 1193 void nvmet_execute_set_features(struct nvmet_req *req) 1261 1194 { 1262 1195 struct nvmet_subsys *subsys = nvmet_req_subsys(req); ··· 1356 1117 return; 1357 1118 1358 1119 switch (cdw10 & 0xff) { 1120 + case NVME_FEAT_ARBITRATION: 1121 + status = nvmet_set_feat_arbitration(req); 1122 + break; 1359 1123 case NVME_FEAT_NUM_QUEUES: 1360 1124 ncqr = (cdw11 >> 16) & 0xffff; 1361 1125 nsqr = cdw11 & 0xffff; ··· 1369 1127 nvmet_set_result(req, 1370 1128 (subsys->max_qid - 1) | ((subsys->max_qid - 1) << 16)); 1371 1129 break; 1130 + case NVME_FEAT_IRQ_COALESCE: 1131 + status = nvmet_set_feat_irq_coalesce(req); 1132 + break; 1133 + case NVME_FEAT_IRQ_CONFIG: 1134 + status = nvmet_set_feat_irq_config(req); 1135 + break; 1372 1136 case NVME_FEAT_KATO: 1373 1137 status = nvmet_set_feat_kato(req); 1374 1138 break; ··· 1382 1134 status = nvmet_set_feat_async_event(req, NVMET_AEN_CFG_ALL); 1383 1135 break; 1384 1136 case NVME_FEAT_HOST_ID: 1385 - status = NVME_SC_CMD_SEQ_ERROR | NVME_STATUS_DNR; 1137 + status = nvmet_set_feat_host_id(req); 1386 1138 break; 1387 1139 case NVME_FEAT_WRITE_PROTECT: 1388 1140 status = nvmet_set_feat_write_protect(req); ··· 1419 1171 return 0; 1420 1172 } 1421 1173 1174 + static u16 nvmet_get_feat_irq_coalesce(struct nvmet_req *req) 1175 + { 1176 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1177 + struct nvmet_feat_irq_coalesce irqc = { }; 1178 + u16 status; 1179 + 1180 + /* 1181 + * This feature is not supported for fabrics controllers and mandatory 1182 + * for PCI controllers. 1183 + */ 1184 + if (!nvmet_is_pci_ctrl(ctrl)) { 1185 + req->error_loc = offsetof(struct nvme_common_command, cdw10); 1186 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1187 + } 1188 + 1189 + status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_COALESCE, &irqc); 1190 + if (status != NVME_SC_SUCCESS) 1191 + return status; 1192 + 1193 + nvmet_set_result(req, ((u32)irqc.time << 8) | (u32)irqc.thr); 1194 + 1195 + return NVME_SC_SUCCESS; 1196 + } 1197 + 1198 + static u16 nvmet_get_feat_irq_config(struct nvmet_req *req) 1199 + { 1200 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1201 + u32 iv = le32_to_cpu(req->cmd->common.cdw11) & 0xffff; 1202 + struct nvmet_feat_irq_config irqcfg = { .iv = iv }; 1203 + u16 status; 1204 + 1205 + /* 1206 + * This feature is not supported for fabrics controllers and mandatory 1207 + * for PCI controllers. 1208 + */ 1209 + if (!nvmet_is_pci_ctrl(ctrl)) { 1210 + req->error_loc = offsetof(struct nvme_common_command, cdw10); 1211 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1212 + } 1213 + 1214 + status = ctrl->ops->get_feature(ctrl, NVME_FEAT_IRQ_CONFIG, &irqcfg); 1215 + if (status != NVME_SC_SUCCESS) 1216 + return status; 1217 + 1218 + nvmet_set_result(req, ((u32)irqcfg.cd << 16) | iv); 1219 + 1220 + return NVME_SC_SUCCESS; 1221 + } 1222 + 1223 + static u16 nvmet_get_feat_arbitration(struct nvmet_req *req) 1224 + { 1225 + struct nvmet_ctrl *ctrl = req->sq->ctrl; 1226 + struct nvmet_feat_arbitration arb = { }; 1227 + u16 status; 1228 + 1229 + if (!ctrl->ops->get_feature) { 1230 + req->error_loc = offsetof(struct nvme_common_command, cdw10); 1231 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1232 + } 1233 + 1234 + status = ctrl->ops->get_feature(ctrl, NVME_FEAT_ARBITRATION, &arb); 1235 + if (status != NVME_SC_SUCCESS) 1236 + return status; 1237 + 1238 + nvmet_set_result(req, 1239 + ((u32)arb.hpw << 24) | 1240 + ((u32)arb.mpw << 16) | 1241 + ((u32)arb.lpw << 8) | 1242 + (arb.ab & 0x3)); 1243 + 1244 + return NVME_SC_SUCCESS; 1245 + } 1246 + 1422 1247 void nvmet_get_feat_kato(struct nvmet_req *req) 1423 1248 { 1424 1249 nvmet_set_result(req, req->sq->ctrl->kato * 1000); ··· 1518 1197 * need to come up with some fake values for these. 1519 1198 */ 1520 1199 #if 0 1521 - case NVME_FEAT_ARBITRATION: 1522 - break; 1523 1200 case NVME_FEAT_POWER_MGMT: 1524 1201 break; 1525 1202 case NVME_FEAT_TEMP_THRESH: 1526 1203 break; 1527 1204 case NVME_FEAT_ERR_RECOVERY: 1528 1205 break; 1529 - case NVME_FEAT_IRQ_COALESCE: 1530 - break; 1531 - case NVME_FEAT_IRQ_CONFIG: 1532 - break; 1533 1206 case NVME_FEAT_WRITE_ATOMIC: 1534 1207 break; 1535 1208 #endif 1209 + case NVME_FEAT_ARBITRATION: 1210 + status = nvmet_get_feat_arbitration(req); 1211 + break; 1212 + case NVME_FEAT_IRQ_COALESCE: 1213 + status = nvmet_get_feat_irq_coalesce(req); 1214 + break; 1215 + case NVME_FEAT_IRQ_CONFIG: 1216 + status = nvmet_get_feat_irq_config(req); 1217 + break; 1536 1218 case NVME_FEAT_ASYNC_EVENT: 1537 1219 nvmet_get_feat_async_event(req); 1538 1220 break; ··· 1616 1292 nvmet_req_complete(req, status); 1617 1293 } 1618 1294 1295 + u32 nvmet_admin_cmd_data_len(struct nvmet_req *req) 1296 + { 1297 + struct nvme_command *cmd = req->cmd; 1298 + 1299 + if (nvme_is_fabrics(cmd)) 1300 + return nvmet_fabrics_admin_cmd_data_len(req); 1301 + if (nvmet_is_disc_subsys(nvmet_req_subsys(req))) 1302 + return nvmet_discovery_cmd_data_len(req); 1303 + 1304 + switch (cmd->common.opcode) { 1305 + case nvme_admin_get_log_page: 1306 + return nvmet_get_log_page_len(cmd); 1307 + case nvme_admin_identify: 1308 + return NVME_IDENTIFY_DATA_SIZE; 1309 + case nvme_admin_get_features: 1310 + return nvmet_feat_data_len(req, le32_to_cpu(cmd->common.cdw10)); 1311 + default: 1312 + return 0; 1313 + } 1314 + } 1315 + 1619 1316 u16 nvmet_parse_admin_cmd(struct nvmet_req *req) 1620 1317 { 1621 1318 struct nvme_command *cmd = req->cmd; ··· 1651 1306 if (unlikely(ret)) 1652 1307 return ret; 1653 1308 1309 + /* For PCI controllers, admin commands shall not use SGL. */ 1310 + if (nvmet_is_pci_ctrl(req->sq->ctrl) && !req->sq->qid && 1311 + cmd->common.flags & NVME_CMD_SGL_ALL) 1312 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1313 + 1654 1314 if (nvmet_is_passthru_req(req)) 1655 1315 return nvmet_parse_passthru_admin_cmd(req); 1656 1316 1657 1317 switch (cmd->common.opcode) { 1318 + case nvme_admin_delete_sq: 1319 + req->execute = nvmet_execute_delete_sq; 1320 + return 0; 1321 + case nvme_admin_create_sq: 1322 + req->execute = nvmet_execute_create_sq; 1323 + return 0; 1658 1324 case nvme_admin_get_log_page: 1659 1325 req->execute = nvmet_execute_get_log_page; 1326 + return 0; 1327 + case nvme_admin_delete_cq: 1328 + req->execute = nvmet_execute_delete_cq; 1329 + return 0; 1330 + case nvme_admin_create_cq: 1331 + req->execute = nvmet_execute_create_cq; 1660 1332 return 0; 1661 1333 case nvme_admin_identify: 1662 1334 req->execute = nvmet_execute_identify;
+49
drivers/nvme/target/configfs.c
··· 37 37 { NVMF_TRTYPE_RDMA, "rdma" }, 38 38 { NVMF_TRTYPE_FC, "fc" }, 39 39 { NVMF_TRTYPE_TCP, "tcp" }, 40 + { NVMF_TRTYPE_PCI, "pci" }, 40 41 { NVMF_TRTYPE_LOOP, "loop" }, 41 42 }; 42 43 ··· 47 46 { NVMF_ADDR_FAMILY_IP6, "ipv6" }, 48 47 { NVMF_ADDR_FAMILY_IB, "ib" }, 49 48 { NVMF_ADDR_FAMILY_FC, "fc" }, 49 + { NVMF_ADDR_FAMILY_PCI, "pci" }, 50 50 { NVMF_ADDR_FAMILY_LOOP, "loop" }, 51 51 }; 52 52 ··· 1414 1412 } 1415 1413 CONFIGFS_ATTR(nvmet_subsys_, attr_cntlid_max); 1416 1414 1415 + static ssize_t nvmet_subsys_attr_vendor_id_show(struct config_item *item, 1416 + char *page) 1417 + { 1418 + return snprintf(page, PAGE_SIZE, "0x%x\n", to_subsys(item)->vendor_id); 1419 + } 1420 + 1421 + static ssize_t nvmet_subsys_attr_vendor_id_store(struct config_item *item, 1422 + const char *page, size_t count) 1423 + { 1424 + u16 vid; 1425 + 1426 + if (kstrtou16(page, 0, &vid)) 1427 + return -EINVAL; 1428 + 1429 + down_write(&nvmet_config_sem); 1430 + to_subsys(item)->vendor_id = vid; 1431 + up_write(&nvmet_config_sem); 1432 + return count; 1433 + } 1434 + CONFIGFS_ATTR(nvmet_subsys_, attr_vendor_id); 1435 + 1436 + static ssize_t nvmet_subsys_attr_subsys_vendor_id_show(struct config_item *item, 1437 + char *page) 1438 + { 1439 + return snprintf(page, PAGE_SIZE, "0x%x\n", 1440 + to_subsys(item)->subsys_vendor_id); 1441 + } 1442 + 1443 + static ssize_t nvmet_subsys_attr_subsys_vendor_id_store(struct config_item *item, 1444 + const char *page, size_t count) 1445 + { 1446 + u16 ssvid; 1447 + 1448 + if (kstrtou16(page, 0, &ssvid)) 1449 + return -EINVAL; 1450 + 1451 + down_write(&nvmet_config_sem); 1452 + to_subsys(item)->subsys_vendor_id = ssvid; 1453 + up_write(&nvmet_config_sem); 1454 + return count; 1455 + } 1456 + CONFIGFS_ATTR(nvmet_subsys_, attr_subsys_vendor_id); 1457 + 1417 1458 static ssize_t nvmet_subsys_attr_model_show(struct config_item *item, 1418 1459 char *page) 1419 1460 { ··· 1685 1640 &nvmet_subsys_attr_attr_serial, 1686 1641 &nvmet_subsys_attr_attr_cntlid_min, 1687 1642 &nvmet_subsys_attr_attr_cntlid_max, 1643 + &nvmet_subsys_attr_attr_vendor_id, 1644 + &nvmet_subsys_attr_attr_subsys_vendor_id, 1688 1645 &nvmet_subsys_attr_attr_model, 1689 1646 &nvmet_subsys_attr_attr_qid_max, 1690 1647 &nvmet_subsys_attr_attr_ieee_oui, ··· 1841 1794 return ERR_PTR(-ENOMEM); 1842 1795 1843 1796 INIT_LIST_HEAD(&port->entry); 1797 + port->disc_addr.trtype = NVMF_TRTYPE_MAX; 1844 1798 config_group_init_type_name(&port->group, name, &nvmet_referral_type); 1845 1799 1846 1800 return &port->group; ··· 2067 2019 port->inline_data_size = -1; /* < 0 == let the transport choose */ 2068 2020 port->max_queue_size = -1; /* < 0 == let the transport choose */ 2069 2021 2022 + port->disc_addr.trtype = NVMF_TRTYPE_MAX; 2070 2023 port->disc_addr.portid = cpu_to_le16(portid); 2071 2024 port->disc_addr.adrfam = NVMF_ADDR_FAMILY_MAX; 2072 2025 port->disc_addr.treq = NVMF_TREQ_DISABLE_SQFLOW;
+195 -71
drivers/nvme/target/core.c
··· 818 818 complete(&sq->confirm_done); 819 819 } 820 820 821 + u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid) 822 + { 823 + if (!ctrl->sqs) 824 + return NVME_SC_INTERNAL | NVME_STATUS_DNR; 825 + 826 + if (cqid > ctrl->subsys->max_qid) 827 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 828 + 829 + /* 830 + * Note: For PCI controllers, the NVMe specifications allows multiple 831 + * SQs to share a single CQ. However, we do not support this yet, so 832 + * check that there is no SQ defined for a CQ. If one exist, then the 833 + * CQ ID is invalid for creation as well as when the CQ is being 834 + * deleted (as that would mean that the SQ was not deleted before the 835 + * CQ). 836 + */ 837 + if (ctrl->sqs[cqid]) 838 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 839 + 840 + return NVME_SC_SUCCESS; 841 + } 842 + 843 + u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, 844 + u16 qid, u16 size) 845 + { 846 + u16 status; 847 + 848 + status = nvmet_check_cqid(ctrl, qid); 849 + if (status != NVME_SC_SUCCESS) 850 + return status; 851 + 852 + nvmet_cq_setup(ctrl, cq, qid, size); 853 + 854 + return NVME_SC_SUCCESS; 855 + } 856 + EXPORT_SYMBOL_GPL(nvmet_cq_create); 857 + 858 + u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, 859 + bool create) 860 + { 861 + if (!ctrl->sqs) 862 + return NVME_SC_INTERNAL | NVME_STATUS_DNR; 863 + 864 + if (sqid > ctrl->subsys->max_qid) 865 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 866 + 867 + if ((create && ctrl->sqs[sqid]) || 868 + (!create && !ctrl->sqs[sqid])) 869 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 870 + 871 + return NVME_SC_SUCCESS; 872 + } 873 + 874 + u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, 875 + u16 sqid, u16 size) 876 + { 877 + u16 status; 878 + int ret; 879 + 880 + if (!kref_get_unless_zero(&ctrl->ref)) 881 + return NVME_SC_INTERNAL | NVME_STATUS_DNR; 882 + 883 + status = nvmet_check_sqid(ctrl, sqid, true); 884 + if (status != NVME_SC_SUCCESS) 885 + return status; 886 + 887 + ret = nvmet_sq_init(sq); 888 + if (ret) { 889 + status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 890 + goto ctrl_put; 891 + } 892 + 893 + nvmet_sq_setup(ctrl, sq, sqid, size); 894 + sq->ctrl = ctrl; 895 + 896 + return NVME_SC_SUCCESS; 897 + 898 + ctrl_put: 899 + nvmet_ctrl_put(ctrl); 900 + return status; 901 + } 902 + EXPORT_SYMBOL_GPL(nvmet_sq_create); 903 + 821 904 void nvmet_sq_destroy(struct nvmet_sq *sq) 822 905 { 823 906 struct nvmet_ctrl *ctrl = sq->ctrl; ··· 992 909 } 993 910 994 911 return 0; 912 + } 913 + 914 + static u32 nvmet_io_cmd_transfer_len(struct nvmet_req *req) 915 + { 916 + struct nvme_command *cmd = req->cmd; 917 + u32 metadata_len = 0; 918 + 919 + if (nvme_is_fabrics(cmd)) 920 + return nvmet_fabrics_io_cmd_data_len(req); 921 + 922 + if (!req->ns) 923 + return 0; 924 + 925 + switch (req->cmd->common.opcode) { 926 + case nvme_cmd_read: 927 + case nvme_cmd_write: 928 + case nvme_cmd_zone_append: 929 + if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns)) 930 + metadata_len = nvmet_rw_metadata_len(req); 931 + return nvmet_rw_data_len(req) + metadata_len; 932 + case nvme_cmd_dsm: 933 + return nvmet_dsm_len(req); 934 + case nvme_cmd_zone_mgmt_recv: 935 + return (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2; 936 + default: 937 + return 0; 938 + } 995 939 } 996 940 997 941 static u16 nvmet_parse_io_cmd(struct nvmet_req *req) ··· 1122 1012 /* 1123 1013 * For fabrics, PSDT field shall describe metadata pointer (MPTR) that 1124 1014 * contains an address of a single contiguous physical buffer that is 1125 - * byte aligned. 1015 + * byte aligned. For PCI controllers, this is optional so not enforced. 1126 1016 */ 1127 1017 if (unlikely((flags & NVME_CMD_SGL_ALL) != NVME_CMD_SGL_METABUF)) { 1128 - req->error_loc = offsetof(struct nvme_common_command, flags); 1129 - status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1130 - goto fail; 1018 + if (!req->sq->ctrl || !nvmet_is_pci_ctrl(req->sq->ctrl)) { 1019 + req->error_loc = 1020 + offsetof(struct nvme_common_command, flags); 1021 + status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1022 + goto fail; 1023 + } 1131 1024 } 1132 1025 1133 1026 if (unlikely(!req->sq->ctrl)) ··· 1172 1059 } 1173 1060 EXPORT_SYMBOL_GPL(nvmet_req_uninit); 1174 1061 1062 + size_t nvmet_req_transfer_len(struct nvmet_req *req) 1063 + { 1064 + if (likely(req->sq->qid != 0)) 1065 + return nvmet_io_cmd_transfer_len(req); 1066 + if (unlikely(!req->sq->ctrl)) 1067 + return nvmet_connect_cmd_data_len(req); 1068 + return nvmet_admin_cmd_data_len(req); 1069 + } 1070 + EXPORT_SYMBOL_GPL(nvmet_req_transfer_len); 1071 + 1175 1072 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len) 1176 1073 { 1177 1074 if (unlikely(len != req->transfer_len)) { 1075 + u16 status; 1076 + 1178 1077 req->error_loc = offsetof(struct nvme_common_command, dptr); 1179 - nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR); 1078 + if (req->cmd->common.flags & NVME_CMD_SGL_ALL) 1079 + status = NVME_SC_SGL_INVALID_DATA; 1080 + else 1081 + status = NVME_SC_INVALID_FIELD; 1082 + nvmet_req_complete(req, status | NVME_STATUS_DNR); 1180 1083 return false; 1181 1084 } 1182 1085 ··· 1203 1074 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len) 1204 1075 { 1205 1076 if (unlikely(data_len > req->transfer_len)) { 1077 + u16 status; 1078 + 1206 1079 req->error_loc = offsetof(struct nvme_common_command, dptr); 1207 - nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_STATUS_DNR); 1080 + if (req->cmd->common.flags & NVME_CMD_SGL_ALL) 1081 + status = NVME_SC_SGL_INVALID_DATA; 1082 + else 1083 + status = NVME_SC_INVALID_FIELD; 1084 + nvmet_req_complete(req, status | NVME_STATUS_DNR); 1208 1085 return false; 1209 1086 } 1210 1087 ··· 1301 1166 } 1302 1167 EXPORT_SYMBOL_GPL(nvmet_req_free_sgls); 1303 1168 1304 - static inline bool nvmet_cc_en(u32 cc) 1305 - { 1306 - return (cc >> NVME_CC_EN_SHIFT) & 0x1; 1307 - } 1308 - 1309 - static inline u8 nvmet_cc_css(u32 cc) 1310 - { 1311 - return (cc >> NVME_CC_CSS_SHIFT) & 0x7; 1312 - } 1313 - 1314 - static inline u8 nvmet_cc_mps(u32 cc) 1315 - { 1316 - return (cc >> NVME_CC_MPS_SHIFT) & 0xf; 1317 - } 1318 - 1319 - static inline u8 nvmet_cc_ams(u32 cc) 1320 - { 1321 - return (cc >> NVME_CC_AMS_SHIFT) & 0x7; 1322 - } 1323 - 1324 - static inline u8 nvmet_cc_shn(u32 cc) 1325 - { 1326 - return (cc >> NVME_CC_SHN_SHIFT) & 0x3; 1327 - } 1328 - 1329 - static inline u8 nvmet_cc_iosqes(u32 cc) 1330 - { 1331 - return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf; 1332 - } 1333 - 1334 - static inline u8 nvmet_cc_iocqes(u32 cc) 1335 - { 1336 - return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf; 1337 - } 1338 - 1339 1169 static inline bool nvmet_css_supported(u8 cc_css) 1340 1170 { 1341 1171 switch (cc_css << NVME_CC_CSS_SHIFT) { ··· 1377 1277 ctrl->csts &= ~NVME_CSTS_SHST_CMPLT; 1378 1278 mutex_unlock(&ctrl->lock); 1379 1279 } 1280 + EXPORT_SYMBOL_GPL(nvmet_update_cc); 1380 1281 1381 1282 static void nvmet_init_cap(struct nvmet_ctrl *ctrl) 1382 1283 { ··· 1485 1384 * Note: ctrl->subsys->lock should be held when calling this function 1486 1385 */ 1487 1386 static void nvmet_setup_p2p_ns_map(struct nvmet_ctrl *ctrl, 1488 - struct nvmet_req *req) 1387 + struct device *p2p_client) 1489 1388 { 1490 1389 struct nvmet_ns *ns; 1491 1390 unsigned long idx; 1492 1391 1493 - if (!req->p2p_client) 1392 + if (!p2p_client) 1494 1393 return; 1495 1394 1496 - ctrl->p2p_client = get_device(req->p2p_client); 1395 + ctrl->p2p_client = get_device(p2p_client); 1497 1396 1498 1397 xa_for_each(&ctrl->subsys->namespaces, idx, ns) 1499 1398 nvmet_p2pmem_ns_add_p2p(ctrl, ns); ··· 1522 1421 ctrl->ops->delete_ctrl(ctrl); 1523 1422 } 1524 1423 1525 - u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn, 1526 - struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp, 1527 - uuid_t *hostid) 1424 + struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args) 1528 1425 { 1529 1426 struct nvmet_subsys *subsys; 1530 1427 struct nvmet_ctrl *ctrl; 1428 + u32 kato = args->kato; 1429 + u8 dhchap_status; 1531 1430 int ret; 1532 - u16 status; 1533 1431 1534 - status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR; 1535 - subsys = nvmet_find_get_subsys(req->port, subsysnqn); 1432 + args->status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR; 1433 + subsys = nvmet_find_get_subsys(args->port, args->subsysnqn); 1536 1434 if (!subsys) { 1537 1435 pr_warn("connect request for invalid subsystem %s!\n", 1538 - subsysnqn); 1539 - req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(subsysnqn); 1540 - req->error_loc = offsetof(struct nvme_common_command, dptr); 1541 - goto out; 1436 + args->subsysnqn); 1437 + args->result = IPO_IATTR_CONNECT_DATA(subsysnqn); 1438 + args->error_loc = offsetof(struct nvme_common_command, dptr); 1439 + return NULL; 1542 1440 } 1543 1441 1544 1442 down_read(&nvmet_config_sem); 1545 - if (!nvmet_host_allowed(subsys, hostnqn)) { 1443 + if (!nvmet_host_allowed(subsys, args->hostnqn)) { 1546 1444 pr_info("connect by host %s for subsystem %s not allowed\n", 1547 - hostnqn, subsysnqn); 1548 - req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(hostnqn); 1445 + args->hostnqn, args->subsysnqn); 1446 + args->result = IPO_IATTR_CONNECT_DATA(hostnqn); 1549 1447 up_read(&nvmet_config_sem); 1550 - status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR; 1551 - req->error_loc = offsetof(struct nvme_common_command, dptr); 1448 + args->status = NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR; 1449 + args->error_loc = offsetof(struct nvme_common_command, dptr); 1552 1450 goto out_put_subsystem; 1553 1451 } 1554 1452 up_read(&nvmet_config_sem); 1555 1453 1556 - status = NVME_SC_INTERNAL; 1454 + args->status = NVME_SC_INTERNAL; 1557 1455 ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL); 1558 1456 if (!ctrl) 1559 1457 goto out_put_subsystem; 1560 1458 mutex_init(&ctrl->lock); 1561 1459 1562 - ctrl->port = req->port; 1563 - ctrl->ops = req->ops; 1460 + ctrl->port = args->port; 1461 + ctrl->ops = args->ops; 1564 1462 1565 1463 #ifdef CONFIG_NVME_TARGET_PASSTHRU 1566 1464 /* By default, set loop targets to clear IDS by default */ ··· 1573 1473 INIT_WORK(&ctrl->fatal_err_work, nvmet_fatal_error_handler); 1574 1474 INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer); 1575 1475 1576 - memcpy(ctrl->subsysnqn, subsysnqn, NVMF_NQN_SIZE); 1577 - memcpy(ctrl->hostnqn, hostnqn, NVMF_NQN_SIZE); 1476 + memcpy(ctrl->subsysnqn, args->subsysnqn, NVMF_NQN_SIZE); 1477 + memcpy(ctrl->hostnqn, args->hostnqn, NVMF_NQN_SIZE); 1578 1478 1579 1479 kref_init(&ctrl->ref); 1580 1480 ctrl->subsys = subsys; ··· 1597 1497 subsys->cntlid_min, subsys->cntlid_max, 1598 1498 GFP_KERNEL); 1599 1499 if (ret < 0) { 1600 - status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR; 1500 + args->status = NVME_SC_CONNECT_CTRL_BUSY | NVME_STATUS_DNR; 1601 1501 goto out_free_sqs; 1602 1502 } 1603 1503 ctrl->cntlid = ret; 1604 1504 1605 - uuid_copy(&ctrl->hostid, hostid); 1505 + uuid_copy(&ctrl->hostid, args->hostid); 1606 1506 1607 1507 /* 1608 1508 * Discovery controllers may use some arbitrary high value ··· 1624 1524 if (ret) 1625 1525 goto init_pr_fail; 1626 1526 list_add_tail(&ctrl->subsys_entry, &subsys->ctrls); 1627 - nvmet_setup_p2p_ns_map(ctrl, req); 1527 + nvmet_setup_p2p_ns_map(ctrl, args->p2p_client); 1628 1528 nvmet_debugfs_ctrl_setup(ctrl); 1629 1529 mutex_unlock(&subsys->lock); 1630 1530 1631 - *ctrlp = ctrl; 1632 - return 0; 1531 + if (args->hostid) 1532 + uuid_copy(&ctrl->hostid, args->hostid); 1533 + 1534 + dhchap_status = nvmet_setup_auth(ctrl); 1535 + if (dhchap_status) { 1536 + pr_err("Failed to setup authentication, dhchap status %u\n", 1537 + dhchap_status); 1538 + nvmet_ctrl_put(ctrl); 1539 + if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED) 1540 + args->status = 1541 + NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR; 1542 + else 1543 + args->status = NVME_SC_INTERNAL; 1544 + return NULL; 1545 + } 1546 + 1547 + args->status = NVME_SC_SUCCESS; 1548 + 1549 + pr_info("Created %s controller %d for subsystem %s for NQN %s%s%s.\n", 1550 + nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm", 1551 + ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn, 1552 + ctrl->pi_support ? " T10-PI is enabled" : "", 1553 + nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : ""); 1554 + 1555 + return ctrl; 1633 1556 1634 1557 init_pr_fail: 1635 1558 mutex_unlock(&subsys->lock); ··· 1666 1543 kfree(ctrl); 1667 1544 out_put_subsystem: 1668 1545 nvmet_subsys_put(subsys); 1669 - out: 1670 - return status; 1546 + return NULL; 1671 1547 } 1548 + EXPORT_SYMBOL_GPL(nvmet_alloc_ctrl); 1672 1549 1673 1550 static void nvmet_ctrl_free(struct kref *ref) 1674 1551 { ··· 1704 1581 { 1705 1582 kref_put(&ctrl->ref, nvmet_ctrl_free); 1706 1583 } 1584 + EXPORT_SYMBOL_GPL(nvmet_ctrl_put); 1707 1585 1708 1586 void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl) 1709 1587 {
+17
drivers/nvme/target/discovery.c
··· 224 224 } 225 225 226 226 list_for_each_entry(r, &req->port->referrals, entry) { 227 + if (r->disc_addr.trtype == NVMF_TRTYPE_PCI) 228 + continue; 229 + 227 230 nvmet_format_discovery_entry(hdr, r, 228 231 NVME_DISC_SUBSYS_NAME, 229 232 r->disc_addr.traddr, ··· 353 350 } 354 351 355 352 nvmet_req_complete(req, stat); 353 + } 354 + 355 + u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req) 356 + { 357 + struct nvme_command *cmd = req->cmd; 358 + 359 + switch (cmd->common.opcode) { 360 + case nvme_admin_get_log_page: 361 + return nvmet_get_log_page_len(req->cmd); 362 + case nvme_admin_identify: 363 + return NVME_IDENTIFY_DATA_SIZE; 364 + default: 365 + return 0; 366 + } 356 367 } 357 368 358 369 u16 nvmet_parse_discovery_cmd(struct nvmet_req *req)
+12 -2
drivers/nvme/target/fabrics-cmd-auth.c
··· 179 179 return data->rescode_exp; 180 180 } 181 181 182 + u32 nvmet_auth_send_data_len(struct nvmet_req *req) 183 + { 184 + return le32_to_cpu(req->cmd->auth_send.tl); 185 + } 186 + 182 187 void nvmet_execute_auth_send(struct nvmet_req *req) 183 188 { 184 189 struct nvmet_ctrl *ctrl = req->sq->ctrl; ··· 211 206 offsetof(struct nvmf_auth_send_command, spsp1); 212 207 goto done; 213 208 } 214 - tl = le32_to_cpu(req->cmd->auth_send.tl); 209 + tl = nvmet_auth_send_data_len(req); 215 210 if (!tl) { 216 211 status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 217 212 req->error_loc = ··· 434 429 data->rescode_exp = req->sq->dhchap_status; 435 430 } 436 431 432 + u32 nvmet_auth_receive_data_len(struct nvmet_req *req) 433 + { 434 + return le32_to_cpu(req->cmd->auth_receive.al); 435 + } 436 + 437 437 void nvmet_execute_auth_receive(struct nvmet_req *req) 438 438 { 439 439 struct nvmet_ctrl *ctrl = req->sq->ctrl; ··· 464 454 offsetof(struct nvmf_auth_receive_command, spsp1); 465 455 goto done; 466 456 } 467 - al = le32_to_cpu(req->cmd->auth_receive.al); 457 + al = nvmet_auth_receive_data_len(req); 468 458 if (!al) { 469 459 status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 470 460 req->error_loc =
+70 -33
drivers/nvme/target/fabrics-cmd.c
··· 85 85 nvmet_req_complete(req, status); 86 86 } 87 87 88 + u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req) 89 + { 90 + struct nvme_command *cmd = req->cmd; 91 + 92 + switch (cmd->fabrics.fctype) { 93 + #ifdef CONFIG_NVME_TARGET_AUTH 94 + case nvme_fabrics_type_auth_send: 95 + return nvmet_auth_send_data_len(req); 96 + case nvme_fabrics_type_auth_receive: 97 + return nvmet_auth_receive_data_len(req); 98 + #endif 99 + default: 100 + return 0; 101 + } 102 + } 103 + 88 104 u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req) 89 105 { 90 106 struct nvme_command *cmd = req->cmd; ··· 128 112 } 129 113 130 114 return 0; 115 + } 116 + 117 + u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req) 118 + { 119 + struct nvme_command *cmd = req->cmd; 120 + 121 + switch (cmd->fabrics.fctype) { 122 + #ifdef CONFIG_NVME_TARGET_AUTH 123 + case nvme_fabrics_type_auth_send: 124 + return nvmet_auth_send_data_len(req); 125 + case nvme_fabrics_type_auth_receive: 126 + return nvmet_auth_receive_data_len(req); 127 + #endif 128 + default: 129 + return 0; 130 + } 131 131 } 132 132 133 133 u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req) ··· 245 213 struct nvmf_connect_command *c = &req->cmd->connect; 246 214 struct nvmf_connect_data *d; 247 215 struct nvmet_ctrl *ctrl = NULL; 248 - u16 status; 249 - u8 dhchap_status; 216 + struct nvmet_alloc_ctrl_args args = { 217 + .port = req->port, 218 + .ops = req->ops, 219 + .p2p_client = req->p2p_client, 220 + .kato = le32_to_cpu(c->kato), 221 + }; 250 222 251 223 if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data))) 252 224 return; 253 225 254 226 d = kmalloc(sizeof(*d), GFP_KERNEL); 255 227 if (!d) { 256 - status = NVME_SC_INTERNAL; 228 + args.status = NVME_SC_INTERNAL; 257 229 goto complete; 258 230 } 259 231 260 - status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d)); 261 - if (status) 232 + args.status = nvmet_copy_from_sgl(req, 0, d, sizeof(*d)); 233 + if (args.status) 262 234 goto out; 263 235 264 236 if (c->recfmt != 0) { 265 237 pr_warn("invalid connect version (%d).\n", 266 238 le16_to_cpu(c->recfmt)); 267 - req->error_loc = offsetof(struct nvmf_connect_command, recfmt); 268 - status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR; 239 + args.error_loc = offsetof(struct nvmf_connect_command, recfmt); 240 + args.status = NVME_SC_CONNECT_FORMAT | NVME_STATUS_DNR; 269 241 goto out; 270 242 } 271 243 272 244 if (unlikely(d->cntlid != cpu_to_le16(0xffff))) { 273 245 pr_warn("connect attempt for invalid controller ID %#x\n", 274 246 d->cntlid); 275 - status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR; 276 - req->cqe->result.u32 = IPO_IATTR_CONNECT_DATA(cntlid); 247 + args.status = NVME_SC_CONNECT_INVALID_PARAM | NVME_STATUS_DNR; 248 + args.result = IPO_IATTR_CONNECT_DATA(cntlid); 277 249 goto out; 278 250 } 279 251 280 252 d->subsysnqn[NVMF_NQN_FIELD_LEN - 1] = '\0'; 281 253 d->hostnqn[NVMF_NQN_FIELD_LEN - 1] = '\0'; 282 - status = nvmet_alloc_ctrl(d->subsysnqn, d->hostnqn, req, 283 - le32_to_cpu(c->kato), &ctrl, &d->hostid); 284 - if (status) 254 + 255 + args.subsysnqn = d->subsysnqn; 256 + args.hostnqn = d->hostnqn; 257 + args.hostid = &d->hostid; 258 + args.kato = c->kato; 259 + 260 + ctrl = nvmet_alloc_ctrl(&args); 261 + if (!ctrl) 285 262 goto out; 286 263 287 - dhchap_status = nvmet_setup_auth(ctrl); 288 - if (dhchap_status) { 289 - pr_err("Failed to setup authentication, dhchap status %u\n", 290 - dhchap_status); 291 - nvmet_ctrl_put(ctrl); 292 - if (dhchap_status == NVME_AUTH_DHCHAP_FAILURE_FAILED) 293 - status = (NVME_SC_CONNECT_INVALID_HOST | NVME_STATUS_DNR); 294 - else 295 - status = NVME_SC_INTERNAL; 296 - goto out; 297 - } 298 - 299 - status = nvmet_install_queue(ctrl, req); 300 - if (status) { 264 + args.status = nvmet_install_queue(ctrl, req); 265 + if (args.status) { 301 266 nvmet_ctrl_put(ctrl); 302 267 goto out; 303 268 } 304 269 305 - pr_info("creating %s controller %d for subsystem %s for NQN %s%s%s.\n", 306 - nvmet_is_disc_subsys(ctrl->subsys) ? "discovery" : "nvm", 307 - ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn, 308 - ctrl->pi_support ? " T10-PI is enabled" : "", 309 - nvmet_has_auth(ctrl) ? " with DH-HMAC-CHAP" : ""); 310 - req->cqe->result.u32 = cpu_to_le32(nvmet_connect_result(ctrl)); 270 + args.result = cpu_to_le32(nvmet_connect_result(ctrl)); 311 271 out: 312 272 kfree(d); 313 273 complete: 314 - nvmet_req_complete(req, status); 274 + req->error_loc = args.error_loc; 275 + req->cqe->result.u32 = args.result; 276 + nvmet_req_complete(req, args.status); 315 277 } 316 278 317 279 static void nvmet_execute_io_connect(struct nvmet_req *req) ··· 367 341 out_ctrl_put: 368 342 nvmet_ctrl_put(ctrl); 369 343 goto out; 344 + } 345 + 346 + u32 nvmet_connect_cmd_data_len(struct nvmet_req *req) 347 + { 348 + struct nvme_command *cmd = req->cmd; 349 + 350 + if (!nvme_is_fabrics(cmd) || 351 + cmd->fabrics.fctype != nvme_fabrics_type_connect) 352 + return 0; 353 + 354 + return sizeof(struct nvmf_connect_data); 370 355 } 371 356 372 357 u16 nvmet_parse_connect_cmd(struct nvmet_req *req)
+3
drivers/nvme/target/io-cmd-bdev.c
··· 272 272 iter_flags = SG_MITER_FROM_SG; 273 273 } 274 274 275 + if (req->cmd->rw.control & NVME_RW_LR) 276 + opf |= REQ_FAILFAST_DEV; 277 + 275 278 if (is_pci_p2pdma_page(sg_page(req->sg))) 276 279 opf |= REQ_NOMERGE; 277 280
+107 -3
drivers/nvme/target/nvmet.h
··· 238 238 struct nvmet_subsys *subsys; 239 239 struct nvmet_sq **sqs; 240 240 241 + void *drvdata; 242 + 241 243 bool reset_tbkas; 242 244 243 245 struct mutex lock; ··· 326 324 struct config_group namespaces_group; 327 325 struct config_group allowed_hosts_group; 328 326 327 + u16 vendor_id; 328 + u16 subsys_vendor_id; 329 329 char *model_number; 330 330 u32 ieee_oui; 331 331 char *firmware_rev; ··· 408 404 void (*discovery_chg)(struct nvmet_port *port); 409 405 u8 (*get_mdts)(const struct nvmet_ctrl *ctrl); 410 406 u16 (*get_max_queue_size)(const struct nvmet_ctrl *ctrl); 407 + 408 + /* Operations mandatory for PCI target controllers */ 409 + u16 (*create_sq)(struct nvmet_ctrl *ctrl, u16 sqid, u16 flags, 410 + u16 qsize, u64 prp1); 411 + u16 (*delete_sq)(struct nvmet_ctrl *ctrl, u16 sqid); 412 + u16 (*create_cq)(struct nvmet_ctrl *ctrl, u16 cqid, u16 flags, 413 + u16 qsize, u64 prp1, u16 irq_vector); 414 + u16 (*delete_cq)(struct nvmet_ctrl *ctrl, u16 cqid); 415 + u16 (*set_feature)(const struct nvmet_ctrl *ctrl, u8 feat, 416 + void *feat_data); 417 + u16 (*get_feature)(const struct nvmet_ctrl *ctrl, u8 feat, 418 + void *feat_data); 411 419 }; 412 420 413 421 #define NVMET_MAX_INLINE_BIOVEC 8 ··· 529 513 void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl); 530 514 531 515 u16 nvmet_parse_connect_cmd(struct nvmet_req *req); 516 + u32 nvmet_connect_cmd_data_len(struct nvmet_req *req); 532 517 void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id); 533 518 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req); 534 519 u16 nvmet_file_parse_io_cmd(struct nvmet_req *req); 535 520 u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req); 521 + u32 nvmet_admin_cmd_data_len(struct nvmet_req *req); 536 522 u16 nvmet_parse_admin_cmd(struct nvmet_req *req); 523 + u32 nvmet_discovery_cmd_data_len(struct nvmet_req *req); 537 524 u16 nvmet_parse_discovery_cmd(struct nvmet_req *req); 538 525 u16 nvmet_parse_fabrics_admin_cmd(struct nvmet_req *req); 526 + u32 nvmet_fabrics_admin_cmd_data_len(struct nvmet_req *req); 539 527 u16 nvmet_parse_fabrics_io_cmd(struct nvmet_req *req); 528 + u32 nvmet_fabrics_io_cmd_data_len(struct nvmet_req *req); 540 529 541 530 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq, 542 531 struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops); 543 532 void nvmet_req_uninit(struct nvmet_req *req); 533 + size_t nvmet_req_transfer_len(struct nvmet_req *req); 544 534 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len); 545 535 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len); 546 536 void nvmet_req_complete(struct nvmet_req *req, u16 status); ··· 557 535 void nvmet_execute_get_features(struct nvmet_req *req); 558 536 void nvmet_execute_keep_alive(struct nvmet_req *req); 559 537 538 + u16 nvmet_check_cqid(struct nvmet_ctrl *ctrl, u16 cqid); 560 539 void nvmet_cq_setup(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid, 561 540 u16 size); 541 + u16 nvmet_cq_create(struct nvmet_ctrl *ctrl, struct nvmet_cq *cq, u16 qid, 542 + u16 size); 543 + u16 nvmet_check_sqid(struct nvmet_ctrl *ctrl, u16 sqid, bool create); 562 544 void nvmet_sq_setup(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid, 545 + u16 size); 546 + u16 nvmet_sq_create(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, u16 qid, 563 547 u16 size); 564 548 void nvmet_sq_destroy(struct nvmet_sq *sq); 565 549 int nvmet_sq_init(struct nvmet_sq *sq); ··· 573 545 void nvmet_ctrl_fatal_error(struct nvmet_ctrl *ctrl); 574 546 575 547 void nvmet_update_cc(struct nvmet_ctrl *ctrl, u32 new); 576 - u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn, 577 - struct nvmet_req *req, u32 kato, struct nvmet_ctrl **ctrlp, 578 - uuid_t *hostid); 548 + 549 + struct nvmet_alloc_ctrl_args { 550 + struct nvmet_port *port; 551 + char *subsysnqn; 552 + char *hostnqn; 553 + uuid_t *hostid; 554 + const struct nvmet_fabrics_ops *ops; 555 + struct device *p2p_client; 556 + u32 kato; 557 + u32 result; 558 + u16 error_loc; 559 + u16 status; 560 + }; 561 + 562 + struct nvmet_ctrl *nvmet_alloc_ctrl(struct nvmet_alloc_ctrl_args *args); 579 563 struct nvmet_ctrl *nvmet_ctrl_find_get(const char *subsysnqn, 580 564 const char *hostnqn, u16 cntlid, 581 565 struct nvmet_req *req); ··· 729 689 return subsys->type != NVME_NQN_NVME; 730 690 } 731 691 692 + static inline bool nvmet_is_pci_ctrl(struct nvmet_ctrl *ctrl) 693 + { 694 + return ctrl->port->disc_addr.trtype == NVMF_TRTYPE_PCI; 695 + } 696 + 732 697 #ifdef CONFIG_NVME_TARGET_PASSTHRU 733 698 void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys); 734 699 int nvmet_passthru_ctrl_enable(struct nvmet_subsys *subsys); ··· 775 730 u16 errno_to_nvme_status(struct nvmet_req *req, int errno); 776 731 u16 nvmet_report_invalid_opcode(struct nvmet_req *req); 777 732 733 + static inline bool nvmet_cc_en(u32 cc) 734 + { 735 + return (cc >> NVME_CC_EN_SHIFT) & 0x1; 736 + } 737 + 738 + static inline u8 nvmet_cc_css(u32 cc) 739 + { 740 + return (cc >> NVME_CC_CSS_SHIFT) & 0x7; 741 + } 742 + 743 + static inline u8 nvmet_cc_mps(u32 cc) 744 + { 745 + return (cc >> NVME_CC_MPS_SHIFT) & 0xf; 746 + } 747 + 748 + static inline u8 nvmet_cc_ams(u32 cc) 749 + { 750 + return (cc >> NVME_CC_AMS_SHIFT) & 0x7; 751 + } 752 + 753 + static inline u8 nvmet_cc_shn(u32 cc) 754 + { 755 + return (cc >> NVME_CC_SHN_SHIFT) & 0x3; 756 + } 757 + 758 + static inline u8 nvmet_cc_iosqes(u32 cc) 759 + { 760 + return (cc >> NVME_CC_IOSQES_SHIFT) & 0xf; 761 + } 762 + 763 + static inline u8 nvmet_cc_iocqes(u32 cc) 764 + { 765 + return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf; 766 + } 767 + 778 768 /* Convert a 32-bit number to a 16-bit 0's based number */ 779 769 static inline __le16 to0based(u32 a) 780 770 { ··· 846 766 } 847 767 848 768 #ifdef CONFIG_NVME_TARGET_AUTH 769 + u32 nvmet_auth_send_data_len(struct nvmet_req *req); 849 770 void nvmet_execute_auth_send(struct nvmet_req *req); 771 + u32 nvmet_auth_receive_data_len(struct nvmet_req *req); 850 772 void nvmet_execute_auth_receive(struct nvmet_req *req); 851 773 int nvmet_auth_set_key(struct nvmet_host *host, const char *secret, 852 774 bool set_ctrl); ··· 906 824 { 907 825 percpu_ref_put(&pc_ref->ref); 908 826 } 827 + 828 + /* 829 + * Data for the get_feature() and set_feature() operations of PCI target 830 + * controllers. 831 + */ 832 + struct nvmet_feat_irq_coalesce { 833 + u8 thr; 834 + u8 time; 835 + }; 836 + 837 + struct nvmet_feat_irq_config { 838 + u16 iv; 839 + bool cd; 840 + }; 841 + 842 + struct nvmet_feat_arbitration { 843 + u8 hpw; 844 + u8 mpw; 845 + u8 lpw; 846 + u8 ab; 847 + }; 848 + 909 849 #endif /* _NVMET_H */
+2591
drivers/nvme/target/pci-epf.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * NVMe PCI Endpoint Function target driver. 4 + * 5 + * Copyright (c) 2024, Western Digital Corporation or its affiliates. 6 + * Copyright (c) 2024, Rick Wertenbroek <rick.wertenbroek@gmail.com> 7 + * REDS Institute, HEIG-VD, HES-SO, Switzerland 8 + */ 9 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 10 + 11 + #include <linux/delay.h> 12 + #include <linux/dmaengine.h> 13 + #include <linux/io.h> 14 + #include <linux/mempool.h> 15 + #include <linux/module.h> 16 + #include <linux/mutex.h> 17 + #include <linux/nvme.h> 18 + #include <linux/pci_ids.h> 19 + #include <linux/pci-epc.h> 20 + #include <linux/pci-epf.h> 21 + #include <linux/pci_regs.h> 22 + #include <linux/slab.h> 23 + 24 + #include "nvmet.h" 25 + 26 + static LIST_HEAD(nvmet_pci_epf_ports); 27 + static DEFINE_MUTEX(nvmet_pci_epf_ports_mutex); 28 + 29 + /* 30 + * Default and maximum allowed data transfer size. For the default, 31 + * allow up to 128 page-sized segments. For the maximum allowed, 32 + * use 4 times the default (which is completely arbitrary). 33 + */ 34 + #define NVMET_PCI_EPF_MAX_SEGS 128 35 + #define NVMET_PCI_EPF_MDTS_KB \ 36 + (NVMET_PCI_EPF_MAX_SEGS << (PAGE_SHIFT - 10)) 37 + #define NVMET_PCI_EPF_MAX_MDTS_KB (NVMET_PCI_EPF_MDTS_KB * 4) 38 + 39 + /* 40 + * IRQ vector coalescing threshold: by default, post 8 CQEs before raising an 41 + * interrupt vector to the host. This default 8 is completely arbitrary and can 42 + * be changed by the host with a nvme_set_features command. 43 + */ 44 + #define NVMET_PCI_EPF_IV_THRESHOLD 8 45 + 46 + /* 47 + * BAR CC register and SQ polling intervals. 48 + */ 49 + #define NVMET_PCI_EPF_CC_POLL_INTERVAL msecs_to_jiffies(5) 50 + #define NVMET_PCI_EPF_SQ_POLL_INTERVAL msecs_to_jiffies(5) 51 + #define NVMET_PCI_EPF_SQ_POLL_IDLE msecs_to_jiffies(5000) 52 + 53 + /* 54 + * SQ arbitration burst default: fetch at most 8 commands at a time from an SQ. 55 + */ 56 + #define NVMET_PCI_EPF_SQ_AB 8 57 + 58 + /* 59 + * Handling of CQs is normally immediate, unless we fail to map a CQ or the CQ 60 + * is full, in which case we retry the CQ processing after this interval. 61 + */ 62 + #define NVMET_PCI_EPF_CQ_RETRY_INTERVAL msecs_to_jiffies(1) 63 + 64 + enum nvmet_pci_epf_queue_flags { 65 + NVMET_PCI_EPF_Q_IS_SQ = 0, /* The queue is a submission queue */ 66 + NVMET_PCI_EPF_Q_LIVE, /* The queue is live */ 67 + NVMET_PCI_EPF_Q_IRQ_ENABLED, /* IRQ is enabled for this queue */ 68 + }; 69 + 70 + /* 71 + * IRQ vector descriptor. 72 + */ 73 + struct nvmet_pci_epf_irq_vector { 74 + unsigned int vector; 75 + unsigned int ref; 76 + bool cd; 77 + int nr_irqs; 78 + }; 79 + 80 + struct nvmet_pci_epf_queue { 81 + union { 82 + struct nvmet_sq nvme_sq; 83 + struct nvmet_cq nvme_cq; 84 + }; 85 + struct nvmet_pci_epf_ctrl *ctrl; 86 + unsigned long flags; 87 + 88 + u64 pci_addr; 89 + size_t pci_size; 90 + struct pci_epc_map pci_map; 91 + 92 + u16 qid; 93 + u16 depth; 94 + u16 vector; 95 + u16 head; 96 + u16 tail; 97 + u16 phase; 98 + u32 db; 99 + 100 + size_t qes; 101 + 102 + struct nvmet_pci_epf_irq_vector *iv; 103 + struct workqueue_struct *iod_wq; 104 + struct delayed_work work; 105 + spinlock_t lock; 106 + struct list_head list; 107 + }; 108 + 109 + /* 110 + * PCI Root Complex (RC) address data segment for mapping an admin or 111 + * I/O command buffer @buf of @length bytes to the PCI address @pci_addr. 112 + */ 113 + struct nvmet_pci_epf_segment { 114 + void *buf; 115 + u64 pci_addr; 116 + u32 length; 117 + }; 118 + 119 + /* 120 + * Command descriptors. 121 + */ 122 + struct nvmet_pci_epf_iod { 123 + struct list_head link; 124 + 125 + struct nvmet_req req; 126 + struct nvme_command cmd; 127 + struct nvme_completion cqe; 128 + unsigned int status; 129 + 130 + struct nvmet_pci_epf_ctrl *ctrl; 131 + 132 + struct nvmet_pci_epf_queue *sq; 133 + struct nvmet_pci_epf_queue *cq; 134 + 135 + /* Data transfer size and direction for the command. */ 136 + size_t data_len; 137 + enum dma_data_direction dma_dir; 138 + 139 + /* 140 + * PCI Root Complex (RC) address data segments: if nr_data_segs is 1, we 141 + * use only @data_seg. Otherwise, the array of segments @data_segs is 142 + * allocated to manage multiple PCI address data segments. @data_sgl and 143 + * @data_sgt are used to setup the command request for execution by the 144 + * target core. 145 + */ 146 + unsigned int nr_data_segs; 147 + struct nvmet_pci_epf_segment data_seg; 148 + struct nvmet_pci_epf_segment *data_segs; 149 + struct scatterlist data_sgl; 150 + struct sg_table data_sgt; 151 + 152 + struct work_struct work; 153 + struct completion done; 154 + }; 155 + 156 + /* 157 + * PCI target controller private data. 158 + */ 159 + struct nvmet_pci_epf_ctrl { 160 + struct nvmet_pci_epf *nvme_epf; 161 + struct nvmet_port *port; 162 + struct nvmet_ctrl *tctrl; 163 + struct device *dev; 164 + 165 + unsigned int nr_queues; 166 + struct nvmet_pci_epf_queue *sq; 167 + struct nvmet_pci_epf_queue *cq; 168 + unsigned int sq_ab; 169 + 170 + mempool_t iod_pool; 171 + void *bar; 172 + u64 cap; 173 + u32 cc; 174 + u32 csts; 175 + 176 + size_t io_sqes; 177 + size_t io_cqes; 178 + 179 + size_t mps_shift; 180 + size_t mps; 181 + size_t mps_mask; 182 + 183 + unsigned int mdts; 184 + 185 + struct delayed_work poll_cc; 186 + struct delayed_work poll_sqs; 187 + 188 + struct mutex irq_lock; 189 + struct nvmet_pci_epf_irq_vector *irq_vectors; 190 + unsigned int irq_vector_threshold; 191 + 192 + bool link_up; 193 + bool enabled; 194 + }; 195 + 196 + /* 197 + * PCI EPF driver private data. 198 + */ 199 + struct nvmet_pci_epf { 200 + struct pci_epf *epf; 201 + 202 + const struct pci_epc_features *epc_features; 203 + 204 + void *reg_bar; 205 + size_t msix_table_offset; 206 + 207 + unsigned int irq_type; 208 + unsigned int nr_vectors; 209 + 210 + struct nvmet_pci_epf_ctrl ctrl; 211 + 212 + bool dma_enabled; 213 + struct dma_chan *dma_tx_chan; 214 + struct mutex dma_tx_lock; 215 + struct dma_chan *dma_rx_chan; 216 + struct mutex dma_rx_lock; 217 + 218 + struct mutex mmio_lock; 219 + 220 + /* PCI endpoint function configfs attributes. */ 221 + struct config_group group; 222 + __le16 portid; 223 + char subsysnqn[NVMF_NQN_SIZE]; 224 + unsigned int mdts_kb; 225 + }; 226 + 227 + static inline u32 nvmet_pci_epf_bar_read32(struct nvmet_pci_epf_ctrl *ctrl, 228 + u32 off) 229 + { 230 + __le32 *bar_reg = ctrl->bar + off; 231 + 232 + return le32_to_cpu(READ_ONCE(*bar_reg)); 233 + } 234 + 235 + static inline void nvmet_pci_epf_bar_write32(struct nvmet_pci_epf_ctrl *ctrl, 236 + u32 off, u32 val) 237 + { 238 + __le32 *bar_reg = ctrl->bar + off; 239 + 240 + WRITE_ONCE(*bar_reg, cpu_to_le32(val)); 241 + } 242 + 243 + static inline u64 nvmet_pci_epf_bar_read64(struct nvmet_pci_epf_ctrl *ctrl, 244 + u32 off) 245 + { 246 + return (u64)nvmet_pci_epf_bar_read32(ctrl, off) | 247 + ((u64)nvmet_pci_epf_bar_read32(ctrl, off + 4) << 32); 248 + } 249 + 250 + static inline void nvmet_pci_epf_bar_write64(struct nvmet_pci_epf_ctrl *ctrl, 251 + u32 off, u64 val) 252 + { 253 + nvmet_pci_epf_bar_write32(ctrl, off, val & 0xFFFFFFFF); 254 + nvmet_pci_epf_bar_write32(ctrl, off + 4, (val >> 32) & 0xFFFFFFFF); 255 + } 256 + 257 + static inline int nvmet_pci_epf_mem_map(struct nvmet_pci_epf *nvme_epf, 258 + u64 pci_addr, size_t size, struct pci_epc_map *map) 259 + { 260 + struct pci_epf *epf = nvme_epf->epf; 261 + 262 + return pci_epc_mem_map(epf->epc, epf->func_no, epf->vfunc_no, 263 + pci_addr, size, map); 264 + } 265 + 266 + static inline void nvmet_pci_epf_mem_unmap(struct nvmet_pci_epf *nvme_epf, 267 + struct pci_epc_map *map) 268 + { 269 + struct pci_epf *epf = nvme_epf->epf; 270 + 271 + pci_epc_mem_unmap(epf->epc, epf->func_no, epf->vfunc_no, map); 272 + } 273 + 274 + struct nvmet_pci_epf_dma_filter { 275 + struct device *dev; 276 + u32 dma_mask; 277 + }; 278 + 279 + static bool nvmet_pci_epf_dma_filter(struct dma_chan *chan, void *arg) 280 + { 281 + struct nvmet_pci_epf_dma_filter *filter = arg; 282 + struct dma_slave_caps caps; 283 + 284 + memset(&caps, 0, sizeof(caps)); 285 + dma_get_slave_caps(chan, &caps); 286 + 287 + return chan->device->dev == filter->dev && 288 + (filter->dma_mask & caps.directions); 289 + } 290 + 291 + static void nvmet_pci_epf_init_dma(struct nvmet_pci_epf *nvme_epf) 292 + { 293 + struct pci_epf *epf = nvme_epf->epf; 294 + struct device *dev = &epf->dev; 295 + struct nvmet_pci_epf_dma_filter filter; 296 + struct dma_chan *chan; 297 + dma_cap_mask_t mask; 298 + 299 + mutex_init(&nvme_epf->dma_rx_lock); 300 + mutex_init(&nvme_epf->dma_tx_lock); 301 + 302 + dma_cap_zero(mask); 303 + dma_cap_set(DMA_SLAVE, mask); 304 + 305 + filter.dev = epf->epc->dev.parent; 306 + filter.dma_mask = BIT(DMA_DEV_TO_MEM); 307 + 308 + chan = dma_request_channel(mask, nvmet_pci_epf_dma_filter, &filter); 309 + if (!chan) 310 + goto out_dma_no_rx; 311 + 312 + nvme_epf->dma_rx_chan = chan; 313 + 314 + filter.dma_mask = BIT(DMA_MEM_TO_DEV); 315 + chan = dma_request_channel(mask, nvmet_pci_epf_dma_filter, &filter); 316 + if (!chan) 317 + goto out_dma_no_tx; 318 + 319 + nvme_epf->dma_tx_chan = chan; 320 + 321 + nvme_epf->dma_enabled = true; 322 + 323 + dev_dbg(dev, "Using DMA RX channel %s, maximum segment size %u B\n", 324 + dma_chan_name(chan), 325 + dma_get_max_seg_size(dmaengine_get_dma_device(chan))); 326 + 327 + dev_dbg(dev, "Using DMA TX channel %s, maximum segment size %u B\n", 328 + dma_chan_name(chan), 329 + dma_get_max_seg_size(dmaengine_get_dma_device(chan))); 330 + 331 + return; 332 + 333 + out_dma_no_tx: 334 + dma_release_channel(nvme_epf->dma_rx_chan); 335 + nvme_epf->dma_rx_chan = NULL; 336 + 337 + out_dma_no_rx: 338 + mutex_destroy(&nvme_epf->dma_rx_lock); 339 + mutex_destroy(&nvme_epf->dma_tx_lock); 340 + nvme_epf->dma_enabled = false; 341 + 342 + dev_info(&epf->dev, "DMA not supported, falling back to MMIO\n"); 343 + } 344 + 345 + static void nvmet_pci_epf_deinit_dma(struct nvmet_pci_epf *nvme_epf) 346 + { 347 + if (!nvme_epf->dma_enabled) 348 + return; 349 + 350 + dma_release_channel(nvme_epf->dma_tx_chan); 351 + nvme_epf->dma_tx_chan = NULL; 352 + dma_release_channel(nvme_epf->dma_rx_chan); 353 + nvme_epf->dma_rx_chan = NULL; 354 + mutex_destroy(&nvme_epf->dma_rx_lock); 355 + mutex_destroy(&nvme_epf->dma_tx_lock); 356 + nvme_epf->dma_enabled = false; 357 + } 358 + 359 + static int nvmet_pci_epf_dma_transfer(struct nvmet_pci_epf *nvme_epf, 360 + struct nvmet_pci_epf_segment *seg, enum dma_data_direction dir) 361 + { 362 + struct pci_epf *epf = nvme_epf->epf; 363 + struct dma_async_tx_descriptor *desc; 364 + struct dma_slave_config sconf = {}; 365 + struct device *dev = &epf->dev; 366 + struct device *dma_dev; 367 + struct dma_chan *chan; 368 + dma_cookie_t cookie; 369 + dma_addr_t dma_addr; 370 + struct mutex *lock; 371 + int ret; 372 + 373 + switch (dir) { 374 + case DMA_FROM_DEVICE: 375 + lock = &nvme_epf->dma_rx_lock; 376 + chan = nvme_epf->dma_rx_chan; 377 + sconf.direction = DMA_DEV_TO_MEM; 378 + sconf.src_addr = seg->pci_addr; 379 + break; 380 + case DMA_TO_DEVICE: 381 + lock = &nvme_epf->dma_tx_lock; 382 + chan = nvme_epf->dma_tx_chan; 383 + sconf.direction = DMA_MEM_TO_DEV; 384 + sconf.dst_addr = seg->pci_addr; 385 + break; 386 + default: 387 + return -EINVAL; 388 + } 389 + 390 + mutex_lock(lock); 391 + 392 + dma_dev = dmaengine_get_dma_device(chan); 393 + dma_addr = dma_map_single(dma_dev, seg->buf, seg->length, dir); 394 + ret = dma_mapping_error(dma_dev, dma_addr); 395 + if (ret) 396 + goto unlock; 397 + 398 + ret = dmaengine_slave_config(chan, &sconf); 399 + if (ret) { 400 + dev_err(dev, "Failed to configure DMA channel\n"); 401 + goto unmap; 402 + } 403 + 404 + desc = dmaengine_prep_slave_single(chan, dma_addr, seg->length, 405 + sconf.direction, DMA_CTRL_ACK); 406 + if (!desc) { 407 + dev_err(dev, "Failed to prepare DMA\n"); 408 + ret = -EIO; 409 + goto unmap; 410 + } 411 + 412 + cookie = dmaengine_submit(desc); 413 + ret = dma_submit_error(cookie); 414 + if (ret) { 415 + dev_err(dev, "Failed to do DMA submit (err=%d)\n", ret); 416 + goto unmap; 417 + } 418 + 419 + if (dma_sync_wait(chan, cookie) != DMA_COMPLETE) { 420 + dev_err(dev, "DMA transfer failed\n"); 421 + ret = -EIO; 422 + } 423 + 424 + dmaengine_terminate_sync(chan); 425 + 426 + unmap: 427 + dma_unmap_single(dma_dev, dma_addr, seg->length, dir); 428 + 429 + unlock: 430 + mutex_unlock(lock); 431 + 432 + return ret; 433 + } 434 + 435 + static int nvmet_pci_epf_mmio_transfer(struct nvmet_pci_epf *nvme_epf, 436 + struct nvmet_pci_epf_segment *seg, enum dma_data_direction dir) 437 + { 438 + u64 pci_addr = seg->pci_addr; 439 + u32 length = seg->length; 440 + void *buf = seg->buf; 441 + struct pci_epc_map map; 442 + int ret = -EINVAL; 443 + 444 + /* 445 + * Note: MMIO transfers do not need serialization but this is a 446 + * simple way to avoid using too many mapping windows. 447 + */ 448 + mutex_lock(&nvme_epf->mmio_lock); 449 + 450 + while (length) { 451 + ret = nvmet_pci_epf_mem_map(nvme_epf, pci_addr, length, &map); 452 + if (ret) 453 + break; 454 + 455 + switch (dir) { 456 + case DMA_FROM_DEVICE: 457 + memcpy_fromio(buf, map.virt_addr, map.pci_size); 458 + break; 459 + case DMA_TO_DEVICE: 460 + memcpy_toio(map.virt_addr, buf, map.pci_size); 461 + break; 462 + default: 463 + ret = -EINVAL; 464 + goto unlock; 465 + } 466 + 467 + pci_addr += map.pci_size; 468 + buf += map.pci_size; 469 + length -= map.pci_size; 470 + 471 + nvmet_pci_epf_mem_unmap(nvme_epf, &map); 472 + } 473 + 474 + unlock: 475 + mutex_unlock(&nvme_epf->mmio_lock); 476 + 477 + return ret; 478 + } 479 + 480 + static inline int nvmet_pci_epf_transfer_seg(struct nvmet_pci_epf *nvme_epf, 481 + struct nvmet_pci_epf_segment *seg, enum dma_data_direction dir) 482 + { 483 + if (nvme_epf->dma_enabled) 484 + return nvmet_pci_epf_dma_transfer(nvme_epf, seg, dir); 485 + 486 + return nvmet_pci_epf_mmio_transfer(nvme_epf, seg, dir); 487 + } 488 + 489 + static inline int nvmet_pci_epf_transfer(struct nvmet_pci_epf_ctrl *ctrl, 490 + void *buf, u64 pci_addr, u32 length, 491 + enum dma_data_direction dir) 492 + { 493 + struct nvmet_pci_epf_segment seg = { 494 + .buf = buf, 495 + .pci_addr = pci_addr, 496 + .length = length, 497 + }; 498 + 499 + return nvmet_pci_epf_transfer_seg(ctrl->nvme_epf, &seg, dir); 500 + } 501 + 502 + static int nvmet_pci_epf_alloc_irq_vectors(struct nvmet_pci_epf_ctrl *ctrl) 503 + { 504 + ctrl->irq_vectors = kcalloc(ctrl->nr_queues, 505 + sizeof(struct nvmet_pci_epf_irq_vector), 506 + GFP_KERNEL); 507 + if (!ctrl->irq_vectors) 508 + return -ENOMEM; 509 + 510 + mutex_init(&ctrl->irq_lock); 511 + 512 + return 0; 513 + } 514 + 515 + static void nvmet_pci_epf_free_irq_vectors(struct nvmet_pci_epf_ctrl *ctrl) 516 + { 517 + if (ctrl->irq_vectors) { 518 + mutex_destroy(&ctrl->irq_lock); 519 + kfree(ctrl->irq_vectors); 520 + ctrl->irq_vectors = NULL; 521 + } 522 + } 523 + 524 + static struct nvmet_pci_epf_irq_vector * 525 + nvmet_pci_epf_find_irq_vector(struct nvmet_pci_epf_ctrl *ctrl, u16 vector) 526 + { 527 + struct nvmet_pci_epf_irq_vector *iv; 528 + int i; 529 + 530 + lockdep_assert_held(&ctrl->irq_lock); 531 + 532 + for (i = 0; i < ctrl->nr_queues; i++) { 533 + iv = &ctrl->irq_vectors[i]; 534 + if (iv->ref && iv->vector == vector) 535 + return iv; 536 + } 537 + 538 + return NULL; 539 + } 540 + 541 + static struct nvmet_pci_epf_irq_vector * 542 + nvmet_pci_epf_add_irq_vector(struct nvmet_pci_epf_ctrl *ctrl, u16 vector) 543 + { 544 + struct nvmet_pci_epf_irq_vector *iv; 545 + int i; 546 + 547 + mutex_lock(&ctrl->irq_lock); 548 + 549 + iv = nvmet_pci_epf_find_irq_vector(ctrl, vector); 550 + if (iv) { 551 + iv->ref++; 552 + goto unlock; 553 + } 554 + 555 + for (i = 0; i < ctrl->nr_queues; i++) { 556 + iv = &ctrl->irq_vectors[i]; 557 + if (!iv->ref) 558 + break; 559 + } 560 + 561 + if (WARN_ON_ONCE(!iv)) 562 + goto unlock; 563 + 564 + iv->ref = 1; 565 + iv->vector = vector; 566 + iv->nr_irqs = 0; 567 + 568 + unlock: 569 + mutex_unlock(&ctrl->irq_lock); 570 + 571 + return iv; 572 + } 573 + 574 + static void nvmet_pci_epf_remove_irq_vector(struct nvmet_pci_epf_ctrl *ctrl, 575 + u16 vector) 576 + { 577 + struct nvmet_pci_epf_irq_vector *iv; 578 + 579 + mutex_lock(&ctrl->irq_lock); 580 + 581 + iv = nvmet_pci_epf_find_irq_vector(ctrl, vector); 582 + if (iv) { 583 + iv->ref--; 584 + if (!iv->ref) { 585 + iv->vector = 0; 586 + iv->nr_irqs = 0; 587 + } 588 + } 589 + 590 + mutex_unlock(&ctrl->irq_lock); 591 + } 592 + 593 + static bool nvmet_pci_epf_should_raise_irq(struct nvmet_pci_epf_ctrl *ctrl, 594 + struct nvmet_pci_epf_queue *cq, bool force) 595 + { 596 + struct nvmet_pci_epf_irq_vector *iv = cq->iv; 597 + bool ret; 598 + 599 + if (!test_bit(NVMET_PCI_EPF_Q_IRQ_ENABLED, &cq->flags)) 600 + return false; 601 + 602 + /* IRQ coalescing for the admin queue is not allowed. */ 603 + if (!cq->qid) 604 + return true; 605 + 606 + if (iv->cd) 607 + return true; 608 + 609 + if (force) { 610 + ret = iv->nr_irqs > 0; 611 + } else { 612 + iv->nr_irqs++; 613 + ret = iv->nr_irqs >= ctrl->irq_vector_threshold; 614 + } 615 + if (ret) 616 + iv->nr_irqs = 0; 617 + 618 + return ret; 619 + } 620 + 621 + static void nvmet_pci_epf_raise_irq(struct nvmet_pci_epf_ctrl *ctrl, 622 + struct nvmet_pci_epf_queue *cq, bool force) 623 + { 624 + struct nvmet_pci_epf *nvme_epf = ctrl->nvme_epf; 625 + struct pci_epf *epf = nvme_epf->epf; 626 + int ret = 0; 627 + 628 + if (!test_bit(NVMET_PCI_EPF_Q_LIVE, &cq->flags)) 629 + return; 630 + 631 + mutex_lock(&ctrl->irq_lock); 632 + 633 + if (!nvmet_pci_epf_should_raise_irq(ctrl, cq, force)) 634 + goto unlock; 635 + 636 + switch (nvme_epf->irq_type) { 637 + case PCI_IRQ_MSIX: 638 + case PCI_IRQ_MSI: 639 + ret = pci_epc_raise_irq(epf->epc, epf->func_no, epf->vfunc_no, 640 + nvme_epf->irq_type, cq->vector + 1); 641 + if (!ret) 642 + break; 643 + /* 644 + * If we got an error, it is likely because the host is using 645 + * legacy IRQs (e.g. BIOS, grub). 646 + */ 647 + fallthrough; 648 + case PCI_IRQ_INTX: 649 + ret = pci_epc_raise_irq(epf->epc, epf->func_no, epf->vfunc_no, 650 + PCI_IRQ_INTX, 0); 651 + break; 652 + default: 653 + WARN_ON_ONCE(1); 654 + ret = -EINVAL; 655 + break; 656 + } 657 + 658 + if (ret) 659 + dev_err(ctrl->dev, "Failed to raise IRQ (err=%d)\n", ret); 660 + 661 + unlock: 662 + mutex_unlock(&ctrl->irq_lock); 663 + } 664 + 665 + static inline const char *nvmet_pci_epf_iod_name(struct nvmet_pci_epf_iod *iod) 666 + { 667 + return nvme_opcode_str(iod->sq->qid, iod->cmd.common.opcode); 668 + } 669 + 670 + static void nvmet_pci_epf_exec_iod_work(struct work_struct *work); 671 + 672 + static struct nvmet_pci_epf_iod * 673 + nvmet_pci_epf_alloc_iod(struct nvmet_pci_epf_queue *sq) 674 + { 675 + struct nvmet_pci_epf_ctrl *ctrl = sq->ctrl; 676 + struct nvmet_pci_epf_iod *iod; 677 + 678 + iod = mempool_alloc(&ctrl->iod_pool, GFP_KERNEL); 679 + if (unlikely(!iod)) 680 + return NULL; 681 + 682 + memset(iod, 0, sizeof(*iod)); 683 + iod->req.cmd = &iod->cmd; 684 + iod->req.cqe = &iod->cqe; 685 + iod->req.port = ctrl->port; 686 + iod->ctrl = ctrl; 687 + iod->sq = sq; 688 + iod->cq = &ctrl->cq[sq->qid]; 689 + INIT_LIST_HEAD(&iod->link); 690 + iod->dma_dir = DMA_NONE; 691 + INIT_WORK(&iod->work, nvmet_pci_epf_exec_iod_work); 692 + init_completion(&iod->done); 693 + 694 + return iod; 695 + } 696 + 697 + /* 698 + * Allocate or grow a command table of PCI segments. 699 + */ 700 + static int nvmet_pci_epf_alloc_iod_data_segs(struct nvmet_pci_epf_iod *iod, 701 + int nsegs) 702 + { 703 + struct nvmet_pci_epf_segment *segs; 704 + int nr_segs = iod->nr_data_segs + nsegs; 705 + 706 + segs = krealloc(iod->data_segs, 707 + nr_segs * sizeof(struct nvmet_pci_epf_segment), 708 + GFP_KERNEL | __GFP_ZERO); 709 + if (!segs) 710 + return -ENOMEM; 711 + 712 + iod->nr_data_segs = nr_segs; 713 + iod->data_segs = segs; 714 + 715 + return 0; 716 + } 717 + 718 + static void nvmet_pci_epf_free_iod(struct nvmet_pci_epf_iod *iod) 719 + { 720 + int i; 721 + 722 + if (iod->data_segs) { 723 + for (i = 0; i < iod->nr_data_segs; i++) 724 + kfree(iod->data_segs[i].buf); 725 + if (iod->data_segs != &iod->data_seg) 726 + kfree(iod->data_segs); 727 + } 728 + if (iod->data_sgt.nents > 1) 729 + sg_free_table(&iod->data_sgt); 730 + mempool_free(iod, &iod->ctrl->iod_pool); 731 + } 732 + 733 + static int nvmet_pci_epf_transfer_iod_data(struct nvmet_pci_epf_iod *iod) 734 + { 735 + struct nvmet_pci_epf *nvme_epf = iod->ctrl->nvme_epf; 736 + struct nvmet_pci_epf_segment *seg = &iod->data_segs[0]; 737 + int i, ret; 738 + 739 + /* Split the data transfer according to the PCI segments. */ 740 + for (i = 0; i < iod->nr_data_segs; i++, seg++) { 741 + ret = nvmet_pci_epf_transfer_seg(nvme_epf, seg, iod->dma_dir); 742 + if (ret) { 743 + iod->status = NVME_SC_DATA_XFER_ERROR | NVME_STATUS_DNR; 744 + return ret; 745 + } 746 + } 747 + 748 + return 0; 749 + } 750 + 751 + static inline u32 nvmet_pci_epf_prp_ofst(struct nvmet_pci_epf_ctrl *ctrl, 752 + u64 prp) 753 + { 754 + return prp & ctrl->mps_mask; 755 + } 756 + 757 + static inline size_t nvmet_pci_epf_prp_size(struct nvmet_pci_epf_ctrl *ctrl, 758 + u64 prp) 759 + { 760 + return ctrl->mps - nvmet_pci_epf_prp_ofst(ctrl, prp); 761 + } 762 + 763 + /* 764 + * Transfer a PRP list from the host and return the number of prps. 765 + */ 766 + static int nvmet_pci_epf_get_prp_list(struct nvmet_pci_epf_ctrl *ctrl, u64 prp, 767 + size_t xfer_len, __le64 *prps) 768 + { 769 + size_t nr_prps = (xfer_len + ctrl->mps_mask) >> ctrl->mps_shift; 770 + u32 length; 771 + int ret; 772 + 773 + /* 774 + * Compute the number of PRPs required for the number of bytes to 775 + * transfer (xfer_len). If this number overflows the memory page size 776 + * with the PRP list pointer specified, only return the space available 777 + * in the memory page, the last PRP in there will be a PRP list pointer 778 + * to the remaining PRPs. 779 + */ 780 + length = min(nvmet_pci_epf_prp_size(ctrl, prp), nr_prps << 3); 781 + ret = nvmet_pci_epf_transfer(ctrl, prps, prp, length, DMA_FROM_DEVICE); 782 + if (ret) 783 + return ret; 784 + 785 + return length >> 3; 786 + } 787 + 788 + static int nvmet_pci_epf_iod_parse_prp_list(struct nvmet_pci_epf_ctrl *ctrl, 789 + struct nvmet_pci_epf_iod *iod) 790 + { 791 + struct nvme_command *cmd = &iod->cmd; 792 + struct nvmet_pci_epf_segment *seg; 793 + size_t size = 0, ofst, prp_size, xfer_len; 794 + size_t transfer_len = iod->data_len; 795 + int nr_segs, nr_prps = 0; 796 + u64 pci_addr, prp; 797 + int i = 0, ret; 798 + __le64 *prps; 799 + 800 + prps = kzalloc(ctrl->mps, GFP_KERNEL); 801 + if (!prps) 802 + goto err_internal; 803 + 804 + /* 805 + * Allocate PCI segments for the command: this considers the worst case 806 + * scenario where all prps are discontiguous, so get as many segments 807 + * as we can have prps. In practice, most of the time, we will have 808 + * far less PCI segments than prps. 809 + */ 810 + prp = le64_to_cpu(cmd->common.dptr.prp1); 811 + if (!prp) 812 + goto err_invalid_field; 813 + 814 + ofst = nvmet_pci_epf_prp_ofst(ctrl, prp); 815 + nr_segs = (transfer_len + ofst + ctrl->mps - 1) >> ctrl->mps_shift; 816 + 817 + ret = nvmet_pci_epf_alloc_iod_data_segs(iod, nr_segs); 818 + if (ret) 819 + goto err_internal; 820 + 821 + /* Set the first segment using prp1. */ 822 + seg = &iod->data_segs[0]; 823 + seg->pci_addr = prp; 824 + seg->length = nvmet_pci_epf_prp_size(ctrl, prp); 825 + 826 + size = seg->length; 827 + pci_addr = prp + size; 828 + nr_segs = 1; 829 + 830 + /* 831 + * Now build the PCI address segments using the PRP lists, starting 832 + * from prp2. 833 + */ 834 + prp = le64_to_cpu(cmd->common.dptr.prp2); 835 + if (!prp) 836 + goto err_invalid_field; 837 + 838 + while (size < transfer_len) { 839 + xfer_len = transfer_len - size; 840 + 841 + if (!nr_prps) { 842 + nr_prps = nvmet_pci_epf_get_prp_list(ctrl, prp, 843 + xfer_len, prps); 844 + if (nr_prps < 0) 845 + goto err_internal; 846 + 847 + i = 0; 848 + ofst = 0; 849 + } 850 + 851 + /* Current entry */ 852 + prp = le64_to_cpu(prps[i]); 853 + if (!prp) 854 + goto err_invalid_field; 855 + 856 + /* Did we reach the last PRP entry of the list? */ 857 + if (xfer_len > ctrl->mps && i == nr_prps - 1) { 858 + /* We need more PRPs: PRP is a list pointer. */ 859 + nr_prps = 0; 860 + continue; 861 + } 862 + 863 + /* Only the first PRP is allowed to have an offset. */ 864 + if (nvmet_pci_epf_prp_ofst(ctrl, prp)) 865 + goto err_invalid_offset; 866 + 867 + if (prp != pci_addr) { 868 + /* Discontiguous prp: new segment. */ 869 + nr_segs++; 870 + if (WARN_ON_ONCE(nr_segs > iod->nr_data_segs)) 871 + goto err_internal; 872 + 873 + seg++; 874 + seg->pci_addr = prp; 875 + seg->length = 0; 876 + pci_addr = prp; 877 + } 878 + 879 + prp_size = min_t(size_t, ctrl->mps, xfer_len); 880 + seg->length += prp_size; 881 + pci_addr += prp_size; 882 + size += prp_size; 883 + 884 + i++; 885 + } 886 + 887 + iod->nr_data_segs = nr_segs; 888 + ret = 0; 889 + 890 + if (size != transfer_len) { 891 + dev_err(ctrl->dev, 892 + "PRPs transfer length mismatch: got %zu B, need %zu B\n", 893 + size, transfer_len); 894 + goto err_internal; 895 + } 896 + 897 + kfree(prps); 898 + 899 + return 0; 900 + 901 + err_invalid_offset: 902 + dev_err(ctrl->dev, "PRPs list invalid offset\n"); 903 + iod->status = NVME_SC_PRP_INVALID_OFFSET | NVME_STATUS_DNR; 904 + goto err; 905 + 906 + err_invalid_field: 907 + dev_err(ctrl->dev, "PRPs list invalid field\n"); 908 + iod->status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 909 + goto err; 910 + 911 + err_internal: 912 + dev_err(ctrl->dev, "PRPs list internal error\n"); 913 + iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 914 + 915 + err: 916 + kfree(prps); 917 + return -EINVAL; 918 + } 919 + 920 + static int nvmet_pci_epf_iod_parse_prp_simple(struct nvmet_pci_epf_ctrl *ctrl, 921 + struct nvmet_pci_epf_iod *iod) 922 + { 923 + struct nvme_command *cmd = &iod->cmd; 924 + size_t transfer_len = iod->data_len; 925 + int ret, nr_segs = 1; 926 + u64 prp1, prp2 = 0; 927 + size_t prp1_size; 928 + 929 + prp1 = le64_to_cpu(cmd->common.dptr.prp1); 930 + prp1_size = nvmet_pci_epf_prp_size(ctrl, prp1); 931 + 932 + /* For commands crossing a page boundary, we should have prp2. */ 933 + if (transfer_len > prp1_size) { 934 + prp2 = le64_to_cpu(cmd->common.dptr.prp2); 935 + if (!prp2) { 936 + iod->status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 937 + return -EINVAL; 938 + } 939 + if (nvmet_pci_epf_prp_ofst(ctrl, prp2)) { 940 + iod->status = 941 + NVME_SC_PRP_INVALID_OFFSET | NVME_STATUS_DNR; 942 + return -EINVAL; 943 + } 944 + if (prp2 != prp1 + prp1_size) 945 + nr_segs = 2; 946 + } 947 + 948 + if (nr_segs == 1) { 949 + iod->nr_data_segs = 1; 950 + iod->data_segs = &iod->data_seg; 951 + iod->data_segs[0].pci_addr = prp1; 952 + iod->data_segs[0].length = transfer_len; 953 + return 0; 954 + } 955 + 956 + ret = nvmet_pci_epf_alloc_iod_data_segs(iod, nr_segs); 957 + if (ret) { 958 + iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 959 + return ret; 960 + } 961 + 962 + iod->data_segs[0].pci_addr = prp1; 963 + iod->data_segs[0].length = prp1_size; 964 + iod->data_segs[1].pci_addr = prp2; 965 + iod->data_segs[1].length = transfer_len - prp1_size; 966 + 967 + return 0; 968 + } 969 + 970 + static int nvmet_pci_epf_iod_parse_prps(struct nvmet_pci_epf_iod *iod) 971 + { 972 + struct nvmet_pci_epf_ctrl *ctrl = iod->ctrl; 973 + u64 prp1 = le64_to_cpu(iod->cmd.common.dptr.prp1); 974 + size_t ofst; 975 + 976 + /* Get the PCI address segments for the command using its PRPs. */ 977 + ofst = nvmet_pci_epf_prp_ofst(ctrl, prp1); 978 + if (ofst & 0x3) { 979 + iod->status = NVME_SC_PRP_INVALID_OFFSET | NVME_STATUS_DNR; 980 + return -EINVAL; 981 + } 982 + 983 + if (iod->data_len + ofst <= ctrl->mps * 2) 984 + return nvmet_pci_epf_iod_parse_prp_simple(ctrl, iod); 985 + 986 + return nvmet_pci_epf_iod_parse_prp_list(ctrl, iod); 987 + } 988 + 989 + /* 990 + * Transfer an SGL segment from the host and return the number of data 991 + * descriptors and the next segment descriptor, if any. 992 + */ 993 + static struct nvme_sgl_desc * 994 + nvmet_pci_epf_get_sgl_segment(struct nvmet_pci_epf_ctrl *ctrl, 995 + struct nvme_sgl_desc *desc, unsigned int *nr_sgls) 996 + { 997 + struct nvme_sgl_desc *sgls; 998 + u32 length = le32_to_cpu(desc->length); 999 + int nr_descs, ret; 1000 + void *buf; 1001 + 1002 + buf = kmalloc(length, GFP_KERNEL); 1003 + if (!buf) 1004 + return NULL; 1005 + 1006 + ret = nvmet_pci_epf_transfer(ctrl, buf, le64_to_cpu(desc->addr), length, 1007 + DMA_FROM_DEVICE); 1008 + if (ret) { 1009 + kfree(buf); 1010 + return NULL; 1011 + } 1012 + 1013 + sgls = buf; 1014 + nr_descs = length / sizeof(struct nvme_sgl_desc); 1015 + if (sgls[nr_descs - 1].type == (NVME_SGL_FMT_SEG_DESC << 4) || 1016 + sgls[nr_descs - 1].type == (NVME_SGL_FMT_LAST_SEG_DESC << 4)) { 1017 + /* 1018 + * We have another SGL segment following this one: do not count 1019 + * it as a regular data SGL descriptor and return it to the 1020 + * caller. 1021 + */ 1022 + *desc = sgls[nr_descs - 1]; 1023 + nr_descs--; 1024 + } else { 1025 + /* We do not have another SGL segment after this one. */ 1026 + desc->length = 0; 1027 + } 1028 + 1029 + *nr_sgls = nr_descs; 1030 + 1031 + return sgls; 1032 + } 1033 + 1034 + static int nvmet_pci_epf_iod_parse_sgl_segments(struct nvmet_pci_epf_ctrl *ctrl, 1035 + struct nvmet_pci_epf_iod *iod) 1036 + { 1037 + struct nvme_command *cmd = &iod->cmd; 1038 + struct nvme_sgl_desc seg = cmd->common.dptr.sgl; 1039 + struct nvme_sgl_desc *sgls = NULL; 1040 + int n = 0, i, nr_sgls; 1041 + int ret; 1042 + 1043 + /* 1044 + * We do not support inline data nor keyed SGLs, so we should be seeing 1045 + * only segment descriptors. 1046 + */ 1047 + if (seg.type != (NVME_SGL_FMT_SEG_DESC << 4) && 1048 + seg.type != (NVME_SGL_FMT_LAST_SEG_DESC << 4)) { 1049 + iod->status = NVME_SC_SGL_INVALID_TYPE | NVME_STATUS_DNR; 1050 + return -EIO; 1051 + } 1052 + 1053 + while (seg.length) { 1054 + sgls = nvmet_pci_epf_get_sgl_segment(ctrl, &seg, &nr_sgls); 1055 + if (!sgls) { 1056 + iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 1057 + return -EIO; 1058 + } 1059 + 1060 + /* Grow the PCI segment table as needed. */ 1061 + ret = nvmet_pci_epf_alloc_iod_data_segs(iod, nr_sgls); 1062 + if (ret) { 1063 + iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 1064 + goto out; 1065 + } 1066 + 1067 + /* 1068 + * Parse the SGL descriptors to build the PCI segment table, 1069 + * checking the descriptor type as we go. 1070 + */ 1071 + for (i = 0; i < nr_sgls; i++) { 1072 + if (sgls[i].type != (NVME_SGL_FMT_DATA_DESC << 4)) { 1073 + iod->status = NVME_SC_SGL_INVALID_TYPE | 1074 + NVME_STATUS_DNR; 1075 + goto out; 1076 + } 1077 + iod->data_segs[n].pci_addr = le64_to_cpu(sgls[i].addr); 1078 + iod->data_segs[n].length = le32_to_cpu(sgls[i].length); 1079 + n++; 1080 + } 1081 + 1082 + kfree(sgls); 1083 + } 1084 + 1085 + out: 1086 + if (iod->status != NVME_SC_SUCCESS) { 1087 + kfree(sgls); 1088 + return -EIO; 1089 + } 1090 + 1091 + return 0; 1092 + } 1093 + 1094 + static int nvmet_pci_epf_iod_parse_sgls(struct nvmet_pci_epf_iod *iod) 1095 + { 1096 + struct nvmet_pci_epf_ctrl *ctrl = iod->ctrl; 1097 + struct nvme_sgl_desc *sgl = &iod->cmd.common.dptr.sgl; 1098 + 1099 + if (sgl->type == (NVME_SGL_FMT_DATA_DESC << 4)) { 1100 + /* Single data descriptor case. */ 1101 + iod->nr_data_segs = 1; 1102 + iod->data_segs = &iod->data_seg; 1103 + iod->data_seg.pci_addr = le64_to_cpu(sgl->addr); 1104 + iod->data_seg.length = le32_to_cpu(sgl->length); 1105 + return 0; 1106 + } 1107 + 1108 + return nvmet_pci_epf_iod_parse_sgl_segments(ctrl, iod); 1109 + } 1110 + 1111 + static int nvmet_pci_epf_alloc_iod_data_buf(struct nvmet_pci_epf_iod *iod) 1112 + { 1113 + struct nvmet_pci_epf_ctrl *ctrl = iod->ctrl; 1114 + struct nvmet_req *req = &iod->req; 1115 + struct nvmet_pci_epf_segment *seg; 1116 + struct scatterlist *sg; 1117 + int ret, i; 1118 + 1119 + if (iod->data_len > ctrl->mdts) { 1120 + iod->status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1121 + return -EINVAL; 1122 + } 1123 + 1124 + /* 1125 + * Get the PCI address segments for the command data buffer using either 1126 + * its SGLs or PRPs. 1127 + */ 1128 + if (iod->cmd.common.flags & NVME_CMD_SGL_ALL) 1129 + ret = nvmet_pci_epf_iod_parse_sgls(iod); 1130 + else 1131 + ret = nvmet_pci_epf_iod_parse_prps(iod); 1132 + if (ret) 1133 + return ret; 1134 + 1135 + /* Get a command buffer using SGLs matching the PCI segments. */ 1136 + if (iod->nr_data_segs == 1) { 1137 + sg_init_table(&iod->data_sgl, 1); 1138 + iod->data_sgt.sgl = &iod->data_sgl; 1139 + iod->data_sgt.nents = 1; 1140 + iod->data_sgt.orig_nents = 1; 1141 + } else { 1142 + ret = sg_alloc_table(&iod->data_sgt, iod->nr_data_segs, 1143 + GFP_KERNEL); 1144 + if (ret) 1145 + goto err_nomem; 1146 + } 1147 + 1148 + for_each_sgtable_sg(&iod->data_sgt, sg, i) { 1149 + seg = &iod->data_segs[i]; 1150 + seg->buf = kmalloc(seg->length, GFP_KERNEL); 1151 + if (!seg->buf) 1152 + goto err_nomem; 1153 + sg_set_buf(sg, seg->buf, seg->length); 1154 + } 1155 + 1156 + req->transfer_len = iod->data_len; 1157 + req->sg = iod->data_sgt.sgl; 1158 + req->sg_cnt = iod->data_sgt.nents; 1159 + 1160 + return 0; 1161 + 1162 + err_nomem: 1163 + iod->status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 1164 + return -ENOMEM; 1165 + } 1166 + 1167 + static void nvmet_pci_epf_complete_iod(struct nvmet_pci_epf_iod *iod) 1168 + { 1169 + struct nvmet_pci_epf_queue *cq = iod->cq; 1170 + unsigned long flags; 1171 + 1172 + /* Print an error message for failed commands, except AENs. */ 1173 + iod->status = le16_to_cpu(iod->cqe.status) >> 1; 1174 + if (iod->status && iod->cmd.common.opcode != nvme_admin_async_event) 1175 + dev_err(iod->ctrl->dev, 1176 + "CQ[%d]: Command %s (0x%x) status 0x%0x\n", 1177 + iod->sq->qid, nvmet_pci_epf_iod_name(iod), 1178 + iod->cmd.common.opcode, iod->status); 1179 + 1180 + /* 1181 + * Add the command to the list of completed commands and schedule the 1182 + * CQ work. 1183 + */ 1184 + spin_lock_irqsave(&cq->lock, flags); 1185 + list_add_tail(&iod->link, &cq->list); 1186 + queue_delayed_work(system_highpri_wq, &cq->work, 0); 1187 + spin_unlock_irqrestore(&cq->lock, flags); 1188 + } 1189 + 1190 + static void nvmet_pci_epf_drain_queue(struct nvmet_pci_epf_queue *queue) 1191 + { 1192 + struct nvmet_pci_epf_iod *iod; 1193 + unsigned long flags; 1194 + 1195 + spin_lock_irqsave(&queue->lock, flags); 1196 + while (!list_empty(&queue->list)) { 1197 + iod = list_first_entry(&queue->list, struct nvmet_pci_epf_iod, 1198 + link); 1199 + list_del_init(&iod->link); 1200 + nvmet_pci_epf_free_iod(iod); 1201 + } 1202 + spin_unlock_irqrestore(&queue->lock, flags); 1203 + } 1204 + 1205 + static int nvmet_pci_epf_add_port(struct nvmet_port *port) 1206 + { 1207 + mutex_lock(&nvmet_pci_epf_ports_mutex); 1208 + list_add_tail(&port->entry, &nvmet_pci_epf_ports); 1209 + mutex_unlock(&nvmet_pci_epf_ports_mutex); 1210 + return 0; 1211 + } 1212 + 1213 + static void nvmet_pci_epf_remove_port(struct nvmet_port *port) 1214 + { 1215 + mutex_lock(&nvmet_pci_epf_ports_mutex); 1216 + list_del_init(&port->entry); 1217 + mutex_unlock(&nvmet_pci_epf_ports_mutex); 1218 + } 1219 + 1220 + static struct nvmet_port * 1221 + nvmet_pci_epf_find_port(struct nvmet_pci_epf_ctrl *ctrl, __le16 portid) 1222 + { 1223 + struct nvmet_port *p, *port = NULL; 1224 + 1225 + mutex_lock(&nvmet_pci_epf_ports_mutex); 1226 + list_for_each_entry(p, &nvmet_pci_epf_ports, entry) { 1227 + if (p->disc_addr.portid == portid) { 1228 + port = p; 1229 + break; 1230 + } 1231 + } 1232 + mutex_unlock(&nvmet_pci_epf_ports_mutex); 1233 + 1234 + return port; 1235 + } 1236 + 1237 + static void nvmet_pci_epf_queue_response(struct nvmet_req *req) 1238 + { 1239 + struct nvmet_pci_epf_iod *iod = 1240 + container_of(req, struct nvmet_pci_epf_iod, req); 1241 + 1242 + iod->status = le16_to_cpu(req->cqe->status) >> 1; 1243 + 1244 + /* If we have no data to transfer, directly complete the command. */ 1245 + if (!iod->data_len || iod->dma_dir != DMA_TO_DEVICE) { 1246 + nvmet_pci_epf_complete_iod(iod); 1247 + return; 1248 + } 1249 + 1250 + complete(&iod->done); 1251 + } 1252 + 1253 + static u8 nvmet_pci_epf_get_mdts(const struct nvmet_ctrl *tctrl) 1254 + { 1255 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1256 + int page_shift = NVME_CAP_MPSMIN(tctrl->cap) + 12; 1257 + 1258 + return ilog2(ctrl->mdts) - page_shift; 1259 + } 1260 + 1261 + static u16 nvmet_pci_epf_create_cq(struct nvmet_ctrl *tctrl, 1262 + u16 cqid, u16 flags, u16 qsize, u64 pci_addr, u16 vector) 1263 + { 1264 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1265 + struct nvmet_pci_epf_queue *cq = &ctrl->cq[cqid]; 1266 + u16 status; 1267 + 1268 + if (test_and_set_bit(NVMET_PCI_EPF_Q_LIVE, &cq->flags)) 1269 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 1270 + 1271 + if (!(flags & NVME_QUEUE_PHYS_CONTIG)) 1272 + return NVME_SC_INVALID_QUEUE | NVME_STATUS_DNR; 1273 + 1274 + if (flags & NVME_CQ_IRQ_ENABLED) 1275 + set_bit(NVMET_PCI_EPF_Q_IRQ_ENABLED, &cq->flags); 1276 + 1277 + cq->pci_addr = pci_addr; 1278 + cq->qid = cqid; 1279 + cq->depth = qsize + 1; 1280 + cq->vector = vector; 1281 + cq->head = 0; 1282 + cq->tail = 0; 1283 + cq->phase = 1; 1284 + cq->db = NVME_REG_DBS + (((cqid * 2) + 1) * sizeof(u32)); 1285 + nvmet_pci_epf_bar_write32(ctrl, cq->db, 0); 1286 + 1287 + if (!cqid) 1288 + cq->qes = sizeof(struct nvme_completion); 1289 + else 1290 + cq->qes = ctrl->io_cqes; 1291 + cq->pci_size = cq->qes * cq->depth; 1292 + 1293 + cq->iv = nvmet_pci_epf_add_irq_vector(ctrl, vector); 1294 + if (!cq->iv) { 1295 + status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 1296 + goto err; 1297 + } 1298 + 1299 + status = nvmet_cq_create(tctrl, &cq->nvme_cq, cqid, cq->depth); 1300 + if (status != NVME_SC_SUCCESS) 1301 + goto err; 1302 + 1303 + dev_dbg(ctrl->dev, "CQ[%u]: %u entries of %zu B, IRQ vector %u\n", 1304 + cqid, qsize, cq->qes, cq->vector); 1305 + 1306 + return NVME_SC_SUCCESS; 1307 + 1308 + err: 1309 + clear_bit(NVMET_PCI_EPF_Q_IRQ_ENABLED, &cq->flags); 1310 + clear_bit(NVMET_PCI_EPF_Q_LIVE, &cq->flags); 1311 + return status; 1312 + } 1313 + 1314 + static u16 nvmet_pci_epf_delete_cq(struct nvmet_ctrl *tctrl, u16 cqid) 1315 + { 1316 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1317 + struct nvmet_pci_epf_queue *cq = &ctrl->cq[cqid]; 1318 + 1319 + if (!test_and_clear_bit(NVMET_PCI_EPF_Q_LIVE, &cq->flags)) 1320 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 1321 + 1322 + cancel_delayed_work_sync(&cq->work); 1323 + nvmet_pci_epf_drain_queue(cq); 1324 + nvmet_pci_epf_remove_irq_vector(ctrl, cq->vector); 1325 + 1326 + return NVME_SC_SUCCESS; 1327 + } 1328 + 1329 + static u16 nvmet_pci_epf_create_sq(struct nvmet_ctrl *tctrl, 1330 + u16 sqid, u16 flags, u16 qsize, u64 pci_addr) 1331 + { 1332 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1333 + struct nvmet_pci_epf_queue *sq = &ctrl->sq[sqid]; 1334 + u16 status; 1335 + 1336 + if (test_and_set_bit(NVMET_PCI_EPF_Q_LIVE, &sq->flags)) 1337 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 1338 + 1339 + if (!(flags & NVME_QUEUE_PHYS_CONTIG)) 1340 + return NVME_SC_INVALID_QUEUE | NVME_STATUS_DNR; 1341 + 1342 + sq->pci_addr = pci_addr; 1343 + sq->qid = sqid; 1344 + sq->depth = qsize + 1; 1345 + sq->head = 0; 1346 + sq->tail = 0; 1347 + sq->phase = 0; 1348 + sq->db = NVME_REG_DBS + (sqid * 2 * sizeof(u32)); 1349 + nvmet_pci_epf_bar_write32(ctrl, sq->db, 0); 1350 + if (!sqid) 1351 + sq->qes = 1UL << NVME_ADM_SQES; 1352 + else 1353 + sq->qes = ctrl->io_sqes; 1354 + sq->pci_size = sq->qes * sq->depth; 1355 + 1356 + status = nvmet_sq_create(tctrl, &sq->nvme_sq, sqid, sq->depth); 1357 + if (status != NVME_SC_SUCCESS) 1358 + goto out_clear_bit; 1359 + 1360 + sq->iod_wq = alloc_workqueue("sq%d_wq", WQ_UNBOUND, 1361 + min_t(int, sq->depth, WQ_MAX_ACTIVE), sqid); 1362 + if (!sq->iod_wq) { 1363 + dev_err(ctrl->dev, "Failed to create SQ %d work queue\n", sqid); 1364 + status = NVME_SC_INTERNAL | NVME_STATUS_DNR; 1365 + goto out_destroy_sq; 1366 + } 1367 + 1368 + dev_dbg(ctrl->dev, "SQ[%u]: %u entries of %zu B\n", 1369 + sqid, qsize, sq->qes); 1370 + 1371 + return NVME_SC_SUCCESS; 1372 + 1373 + out_destroy_sq: 1374 + nvmet_sq_destroy(&sq->nvme_sq); 1375 + out_clear_bit: 1376 + clear_bit(NVMET_PCI_EPF_Q_LIVE, &sq->flags); 1377 + return status; 1378 + } 1379 + 1380 + static u16 nvmet_pci_epf_delete_sq(struct nvmet_ctrl *tctrl, u16 sqid) 1381 + { 1382 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1383 + struct nvmet_pci_epf_queue *sq = &ctrl->sq[sqid]; 1384 + 1385 + if (!test_and_clear_bit(NVMET_PCI_EPF_Q_LIVE, &sq->flags)) 1386 + return NVME_SC_QID_INVALID | NVME_STATUS_DNR; 1387 + 1388 + flush_workqueue(sq->iod_wq); 1389 + destroy_workqueue(sq->iod_wq); 1390 + sq->iod_wq = NULL; 1391 + 1392 + nvmet_pci_epf_drain_queue(sq); 1393 + 1394 + if (sq->nvme_sq.ctrl) 1395 + nvmet_sq_destroy(&sq->nvme_sq); 1396 + 1397 + return NVME_SC_SUCCESS; 1398 + } 1399 + 1400 + static u16 nvmet_pci_epf_get_feat(const struct nvmet_ctrl *tctrl, 1401 + u8 feat, void *data) 1402 + { 1403 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1404 + struct nvmet_feat_arbitration *arb; 1405 + struct nvmet_feat_irq_coalesce *irqc; 1406 + struct nvmet_feat_irq_config *irqcfg; 1407 + struct nvmet_pci_epf_irq_vector *iv; 1408 + u16 status; 1409 + 1410 + switch (feat) { 1411 + case NVME_FEAT_ARBITRATION: 1412 + arb = data; 1413 + if (!ctrl->sq_ab) 1414 + arb->ab = 0x7; 1415 + else 1416 + arb->ab = ilog2(ctrl->sq_ab); 1417 + return NVME_SC_SUCCESS; 1418 + 1419 + case NVME_FEAT_IRQ_COALESCE: 1420 + irqc = data; 1421 + irqc->thr = ctrl->irq_vector_threshold; 1422 + irqc->time = 0; 1423 + return NVME_SC_SUCCESS; 1424 + 1425 + case NVME_FEAT_IRQ_CONFIG: 1426 + irqcfg = data; 1427 + mutex_lock(&ctrl->irq_lock); 1428 + iv = nvmet_pci_epf_find_irq_vector(ctrl, irqcfg->iv); 1429 + if (iv) { 1430 + irqcfg->cd = iv->cd; 1431 + status = NVME_SC_SUCCESS; 1432 + } else { 1433 + status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1434 + } 1435 + mutex_unlock(&ctrl->irq_lock); 1436 + return status; 1437 + 1438 + default: 1439 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1440 + } 1441 + } 1442 + 1443 + static u16 nvmet_pci_epf_set_feat(const struct nvmet_ctrl *tctrl, 1444 + u8 feat, void *data) 1445 + { 1446 + struct nvmet_pci_epf_ctrl *ctrl = tctrl->drvdata; 1447 + struct nvmet_feat_arbitration *arb; 1448 + struct nvmet_feat_irq_coalesce *irqc; 1449 + struct nvmet_feat_irq_config *irqcfg; 1450 + struct nvmet_pci_epf_irq_vector *iv; 1451 + u16 status; 1452 + 1453 + switch (feat) { 1454 + case NVME_FEAT_ARBITRATION: 1455 + arb = data; 1456 + if (arb->ab == 0x7) 1457 + ctrl->sq_ab = 0; 1458 + else 1459 + ctrl->sq_ab = 1 << arb->ab; 1460 + return NVME_SC_SUCCESS; 1461 + 1462 + case NVME_FEAT_IRQ_COALESCE: 1463 + /* 1464 + * Since we do not implement precise IRQ coalescing timing, 1465 + * ignore the time field. 1466 + */ 1467 + irqc = data; 1468 + ctrl->irq_vector_threshold = irqc->thr + 1; 1469 + return NVME_SC_SUCCESS; 1470 + 1471 + case NVME_FEAT_IRQ_CONFIG: 1472 + irqcfg = data; 1473 + mutex_lock(&ctrl->irq_lock); 1474 + iv = nvmet_pci_epf_find_irq_vector(ctrl, irqcfg->iv); 1475 + if (iv) { 1476 + iv->cd = irqcfg->cd; 1477 + status = NVME_SC_SUCCESS; 1478 + } else { 1479 + status = NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1480 + } 1481 + mutex_unlock(&ctrl->irq_lock); 1482 + return status; 1483 + 1484 + default: 1485 + return NVME_SC_INVALID_FIELD | NVME_STATUS_DNR; 1486 + } 1487 + } 1488 + 1489 + static const struct nvmet_fabrics_ops nvmet_pci_epf_fabrics_ops = { 1490 + .owner = THIS_MODULE, 1491 + .type = NVMF_TRTYPE_PCI, 1492 + .add_port = nvmet_pci_epf_add_port, 1493 + .remove_port = nvmet_pci_epf_remove_port, 1494 + .queue_response = nvmet_pci_epf_queue_response, 1495 + .get_mdts = nvmet_pci_epf_get_mdts, 1496 + .create_cq = nvmet_pci_epf_create_cq, 1497 + .delete_cq = nvmet_pci_epf_delete_cq, 1498 + .create_sq = nvmet_pci_epf_create_sq, 1499 + .delete_sq = nvmet_pci_epf_delete_sq, 1500 + .get_feature = nvmet_pci_epf_get_feat, 1501 + .set_feature = nvmet_pci_epf_set_feat, 1502 + }; 1503 + 1504 + static void nvmet_pci_epf_cq_work(struct work_struct *work); 1505 + 1506 + static void nvmet_pci_epf_init_queue(struct nvmet_pci_epf_ctrl *ctrl, 1507 + unsigned int qid, bool sq) 1508 + { 1509 + struct nvmet_pci_epf_queue *queue; 1510 + 1511 + if (sq) { 1512 + queue = &ctrl->sq[qid]; 1513 + set_bit(NVMET_PCI_EPF_Q_IS_SQ, &queue->flags); 1514 + } else { 1515 + queue = &ctrl->cq[qid]; 1516 + INIT_DELAYED_WORK(&queue->work, nvmet_pci_epf_cq_work); 1517 + } 1518 + queue->ctrl = ctrl; 1519 + queue->qid = qid; 1520 + spin_lock_init(&queue->lock); 1521 + INIT_LIST_HEAD(&queue->list); 1522 + } 1523 + 1524 + static int nvmet_pci_epf_alloc_queues(struct nvmet_pci_epf_ctrl *ctrl) 1525 + { 1526 + unsigned int qid; 1527 + 1528 + ctrl->sq = kcalloc(ctrl->nr_queues, 1529 + sizeof(struct nvmet_pci_epf_queue), GFP_KERNEL); 1530 + if (!ctrl->sq) 1531 + return -ENOMEM; 1532 + 1533 + ctrl->cq = kcalloc(ctrl->nr_queues, 1534 + sizeof(struct nvmet_pci_epf_queue), GFP_KERNEL); 1535 + if (!ctrl->cq) { 1536 + kfree(ctrl->sq); 1537 + ctrl->sq = NULL; 1538 + return -ENOMEM; 1539 + } 1540 + 1541 + for (qid = 0; qid < ctrl->nr_queues; qid++) { 1542 + nvmet_pci_epf_init_queue(ctrl, qid, true); 1543 + nvmet_pci_epf_init_queue(ctrl, qid, false); 1544 + } 1545 + 1546 + return 0; 1547 + } 1548 + 1549 + static void nvmet_pci_epf_free_queues(struct nvmet_pci_epf_ctrl *ctrl) 1550 + { 1551 + kfree(ctrl->sq); 1552 + ctrl->sq = NULL; 1553 + kfree(ctrl->cq); 1554 + ctrl->cq = NULL; 1555 + } 1556 + 1557 + static int nvmet_pci_epf_map_queue(struct nvmet_pci_epf_ctrl *ctrl, 1558 + struct nvmet_pci_epf_queue *queue) 1559 + { 1560 + struct nvmet_pci_epf *nvme_epf = ctrl->nvme_epf; 1561 + int ret; 1562 + 1563 + ret = nvmet_pci_epf_mem_map(nvme_epf, queue->pci_addr, 1564 + queue->pci_size, &queue->pci_map); 1565 + if (ret) { 1566 + dev_err(ctrl->dev, "Failed to map queue %u (err=%d)\n", 1567 + queue->qid, ret); 1568 + return ret; 1569 + } 1570 + 1571 + if (queue->pci_map.pci_size < queue->pci_size) { 1572 + dev_err(ctrl->dev, "Invalid partial mapping of queue %u\n", 1573 + queue->qid); 1574 + nvmet_pci_epf_mem_unmap(nvme_epf, &queue->pci_map); 1575 + return -ENOMEM; 1576 + } 1577 + 1578 + return 0; 1579 + } 1580 + 1581 + static inline void nvmet_pci_epf_unmap_queue(struct nvmet_pci_epf_ctrl *ctrl, 1582 + struct nvmet_pci_epf_queue *queue) 1583 + { 1584 + nvmet_pci_epf_mem_unmap(ctrl->nvme_epf, &queue->pci_map); 1585 + } 1586 + 1587 + static void nvmet_pci_epf_exec_iod_work(struct work_struct *work) 1588 + { 1589 + struct nvmet_pci_epf_iod *iod = 1590 + container_of(work, struct nvmet_pci_epf_iod, work); 1591 + struct nvmet_req *req = &iod->req; 1592 + int ret; 1593 + 1594 + if (!iod->ctrl->link_up) { 1595 + nvmet_pci_epf_free_iod(iod); 1596 + return; 1597 + } 1598 + 1599 + if (!test_bit(NVMET_PCI_EPF_Q_LIVE, &iod->sq->flags)) { 1600 + iod->status = NVME_SC_QID_INVALID | NVME_STATUS_DNR; 1601 + goto complete; 1602 + } 1603 + 1604 + if (!nvmet_req_init(req, &iod->cq->nvme_cq, &iod->sq->nvme_sq, 1605 + &nvmet_pci_epf_fabrics_ops)) 1606 + goto complete; 1607 + 1608 + iod->data_len = nvmet_req_transfer_len(req); 1609 + if (iod->data_len) { 1610 + /* 1611 + * Get the data DMA transfer direction. Here "device" means the 1612 + * PCI root-complex host. 1613 + */ 1614 + if (nvme_is_write(&iod->cmd)) 1615 + iod->dma_dir = DMA_FROM_DEVICE; 1616 + else 1617 + iod->dma_dir = DMA_TO_DEVICE; 1618 + 1619 + /* 1620 + * Setup the command data buffer and get the command data from 1621 + * the host if needed. 1622 + */ 1623 + ret = nvmet_pci_epf_alloc_iod_data_buf(iod); 1624 + if (!ret && iod->dma_dir == DMA_FROM_DEVICE) 1625 + ret = nvmet_pci_epf_transfer_iod_data(iod); 1626 + if (ret) { 1627 + nvmet_req_uninit(req); 1628 + goto complete; 1629 + } 1630 + } 1631 + 1632 + req->execute(req); 1633 + 1634 + /* 1635 + * If we do not have data to transfer after the command execution 1636 + * finishes, nvmet_pci_epf_queue_response() will complete the command 1637 + * directly. No need to wait for the completion in this case. 1638 + */ 1639 + if (!iod->data_len || iod->dma_dir != DMA_TO_DEVICE) 1640 + return; 1641 + 1642 + wait_for_completion(&iod->done); 1643 + 1644 + if (iod->status == NVME_SC_SUCCESS) { 1645 + WARN_ON_ONCE(!iod->data_len || iod->dma_dir != DMA_TO_DEVICE); 1646 + nvmet_pci_epf_transfer_iod_data(iod); 1647 + } 1648 + 1649 + complete: 1650 + nvmet_pci_epf_complete_iod(iod); 1651 + } 1652 + 1653 + static int nvmet_pci_epf_process_sq(struct nvmet_pci_epf_ctrl *ctrl, 1654 + struct nvmet_pci_epf_queue *sq) 1655 + { 1656 + struct nvmet_pci_epf_iod *iod; 1657 + int ret, n = 0; 1658 + 1659 + sq->tail = nvmet_pci_epf_bar_read32(ctrl, sq->db); 1660 + while (sq->head != sq->tail && (!ctrl->sq_ab || n < ctrl->sq_ab)) { 1661 + iod = nvmet_pci_epf_alloc_iod(sq); 1662 + if (!iod) 1663 + break; 1664 + 1665 + /* Get the NVMe command submitted by the host. */ 1666 + ret = nvmet_pci_epf_transfer(ctrl, &iod->cmd, 1667 + sq->pci_addr + sq->head * sq->qes, 1668 + sq->qes, DMA_FROM_DEVICE); 1669 + if (ret) { 1670 + /* Not much we can do... */ 1671 + nvmet_pci_epf_free_iod(iod); 1672 + break; 1673 + } 1674 + 1675 + dev_dbg(ctrl->dev, "SQ[%u]: head %u, tail %u, command %s\n", 1676 + sq->qid, sq->head, sq->tail, 1677 + nvmet_pci_epf_iod_name(iod)); 1678 + 1679 + sq->head++; 1680 + if (sq->head == sq->depth) 1681 + sq->head = 0; 1682 + n++; 1683 + 1684 + queue_work_on(WORK_CPU_UNBOUND, sq->iod_wq, &iod->work); 1685 + 1686 + sq->tail = nvmet_pci_epf_bar_read32(ctrl, sq->db); 1687 + } 1688 + 1689 + return n; 1690 + } 1691 + 1692 + static void nvmet_pci_epf_poll_sqs_work(struct work_struct *work) 1693 + { 1694 + struct nvmet_pci_epf_ctrl *ctrl = 1695 + container_of(work, struct nvmet_pci_epf_ctrl, poll_sqs.work); 1696 + struct nvmet_pci_epf_queue *sq; 1697 + unsigned long last = 0; 1698 + int i, nr_sqs; 1699 + 1700 + while (ctrl->link_up && ctrl->enabled) { 1701 + nr_sqs = 0; 1702 + /* Do round-robin arbitration. */ 1703 + for (i = 0; i < ctrl->nr_queues; i++) { 1704 + sq = &ctrl->sq[i]; 1705 + if (!test_bit(NVMET_PCI_EPF_Q_LIVE, &sq->flags)) 1706 + continue; 1707 + if (nvmet_pci_epf_process_sq(ctrl, sq)) 1708 + nr_sqs++; 1709 + } 1710 + 1711 + if (nr_sqs) { 1712 + last = jiffies; 1713 + continue; 1714 + } 1715 + 1716 + /* 1717 + * If we have not received any command on any queue for more 1718 + * than NVMET_PCI_EPF_SQ_POLL_IDLE, assume we are idle and 1719 + * reschedule. This avoids "burning" a CPU when the controller 1720 + * is idle for a long time. 1721 + */ 1722 + if (time_is_before_jiffies(last + NVMET_PCI_EPF_SQ_POLL_IDLE)) 1723 + break; 1724 + 1725 + cpu_relax(); 1726 + } 1727 + 1728 + schedule_delayed_work(&ctrl->poll_sqs, NVMET_PCI_EPF_SQ_POLL_INTERVAL); 1729 + } 1730 + 1731 + static void nvmet_pci_epf_cq_work(struct work_struct *work) 1732 + { 1733 + struct nvmet_pci_epf_queue *cq = 1734 + container_of(work, struct nvmet_pci_epf_queue, work.work); 1735 + struct nvmet_pci_epf_ctrl *ctrl = cq->ctrl; 1736 + struct nvme_completion *cqe; 1737 + struct nvmet_pci_epf_iod *iod; 1738 + unsigned long flags; 1739 + int ret, n = 0; 1740 + 1741 + ret = nvmet_pci_epf_map_queue(ctrl, cq); 1742 + if (ret) 1743 + goto again; 1744 + 1745 + while (test_bit(NVMET_PCI_EPF_Q_LIVE, &cq->flags) && ctrl->link_up) { 1746 + 1747 + /* Check that the CQ is not full. */ 1748 + cq->head = nvmet_pci_epf_bar_read32(ctrl, cq->db); 1749 + if (cq->head == cq->tail + 1) { 1750 + ret = -EAGAIN; 1751 + break; 1752 + } 1753 + 1754 + spin_lock_irqsave(&cq->lock, flags); 1755 + iod = list_first_entry_or_null(&cq->list, 1756 + struct nvmet_pci_epf_iod, link); 1757 + if (iod) 1758 + list_del_init(&iod->link); 1759 + spin_unlock_irqrestore(&cq->lock, flags); 1760 + 1761 + if (!iod) 1762 + break; 1763 + 1764 + /* Post the IOD completion entry. */ 1765 + cqe = &iod->cqe; 1766 + cqe->status = cpu_to_le16((iod->status << 1) | cq->phase); 1767 + 1768 + dev_dbg(ctrl->dev, 1769 + "CQ[%u]: %s status 0x%x, result 0x%llx, head %u, tail %u, phase %u\n", 1770 + cq->qid, nvmet_pci_epf_iod_name(iod), iod->status, 1771 + le64_to_cpu(cqe->result.u64), cq->head, cq->tail, 1772 + cq->phase); 1773 + 1774 + memcpy_toio(cq->pci_map.virt_addr + cq->tail * cq->qes, 1775 + cqe, cq->qes); 1776 + 1777 + cq->tail++; 1778 + if (cq->tail >= cq->depth) { 1779 + cq->tail = 0; 1780 + cq->phase ^= 1; 1781 + } 1782 + 1783 + nvmet_pci_epf_free_iod(iod); 1784 + 1785 + /* Signal the host. */ 1786 + nvmet_pci_epf_raise_irq(ctrl, cq, false); 1787 + n++; 1788 + } 1789 + 1790 + nvmet_pci_epf_unmap_queue(ctrl, cq); 1791 + 1792 + /* 1793 + * We do not support precise IRQ coalescing time (100ns units as per 1794 + * NVMe specifications). So if we have posted completion entries without 1795 + * reaching the interrupt coalescing threshold, raise an interrupt. 1796 + */ 1797 + if (n) 1798 + nvmet_pci_epf_raise_irq(ctrl, cq, true); 1799 + 1800 + again: 1801 + if (ret < 0) 1802 + queue_delayed_work(system_highpri_wq, &cq->work, 1803 + NVMET_PCI_EPF_CQ_RETRY_INTERVAL); 1804 + } 1805 + 1806 + static int nvmet_pci_epf_enable_ctrl(struct nvmet_pci_epf_ctrl *ctrl) 1807 + { 1808 + u64 pci_addr, asq, acq; 1809 + u32 aqa; 1810 + u16 status, qsize; 1811 + 1812 + if (ctrl->enabled) 1813 + return 0; 1814 + 1815 + dev_info(ctrl->dev, "Enabling controller\n"); 1816 + 1817 + ctrl->mps_shift = nvmet_cc_mps(ctrl->cc) + 12; 1818 + ctrl->mps = 1UL << ctrl->mps_shift; 1819 + ctrl->mps_mask = ctrl->mps - 1; 1820 + 1821 + ctrl->io_sqes = 1UL << nvmet_cc_iosqes(ctrl->cc); 1822 + if (ctrl->io_sqes < sizeof(struct nvme_command)) { 1823 + dev_err(ctrl->dev, "Unsupported I/O SQES %zu (need %zu)\n", 1824 + ctrl->io_sqes, sizeof(struct nvme_command)); 1825 + return -EINVAL; 1826 + } 1827 + 1828 + ctrl->io_cqes = 1UL << nvmet_cc_iocqes(ctrl->cc); 1829 + if (ctrl->io_cqes < sizeof(struct nvme_completion)) { 1830 + dev_err(ctrl->dev, "Unsupported I/O CQES %zu (need %zu)\n", 1831 + ctrl->io_sqes, sizeof(struct nvme_completion)); 1832 + return -EINVAL; 1833 + } 1834 + 1835 + /* Create the admin queue. */ 1836 + aqa = nvmet_pci_epf_bar_read32(ctrl, NVME_REG_AQA); 1837 + asq = nvmet_pci_epf_bar_read64(ctrl, NVME_REG_ASQ); 1838 + acq = nvmet_pci_epf_bar_read64(ctrl, NVME_REG_ACQ); 1839 + 1840 + qsize = (aqa & 0x0fff0000) >> 16; 1841 + pci_addr = acq & GENMASK_ULL(63, 12); 1842 + status = nvmet_pci_epf_create_cq(ctrl->tctrl, 0, 1843 + NVME_CQ_IRQ_ENABLED | NVME_QUEUE_PHYS_CONTIG, 1844 + qsize, pci_addr, 0); 1845 + if (status != NVME_SC_SUCCESS) { 1846 + dev_err(ctrl->dev, "Failed to create admin completion queue\n"); 1847 + return -EINVAL; 1848 + } 1849 + 1850 + qsize = aqa & 0x00000fff; 1851 + pci_addr = asq & GENMASK_ULL(63, 12); 1852 + status = nvmet_pci_epf_create_sq(ctrl->tctrl, 0, NVME_QUEUE_PHYS_CONTIG, 1853 + qsize, pci_addr); 1854 + if (status != NVME_SC_SUCCESS) { 1855 + dev_err(ctrl->dev, "Failed to create admin submission queue\n"); 1856 + nvmet_pci_epf_delete_cq(ctrl->tctrl, 0); 1857 + return -EINVAL; 1858 + } 1859 + 1860 + ctrl->sq_ab = NVMET_PCI_EPF_SQ_AB; 1861 + ctrl->irq_vector_threshold = NVMET_PCI_EPF_IV_THRESHOLD; 1862 + ctrl->enabled = true; 1863 + 1864 + /* Start polling the controller SQs. */ 1865 + schedule_delayed_work(&ctrl->poll_sqs, 0); 1866 + 1867 + return 0; 1868 + } 1869 + 1870 + static void nvmet_pci_epf_disable_ctrl(struct nvmet_pci_epf_ctrl *ctrl) 1871 + { 1872 + int qid; 1873 + 1874 + if (!ctrl->enabled) 1875 + return; 1876 + 1877 + dev_info(ctrl->dev, "Disabling controller\n"); 1878 + 1879 + ctrl->enabled = false; 1880 + cancel_delayed_work_sync(&ctrl->poll_sqs); 1881 + 1882 + /* Delete all I/O queues first. */ 1883 + for (qid = 1; qid < ctrl->nr_queues; qid++) 1884 + nvmet_pci_epf_delete_sq(ctrl->tctrl, qid); 1885 + 1886 + for (qid = 1; qid < ctrl->nr_queues; qid++) 1887 + nvmet_pci_epf_delete_cq(ctrl->tctrl, qid); 1888 + 1889 + /* Delete the admin queue last. */ 1890 + nvmet_pci_epf_delete_sq(ctrl->tctrl, 0); 1891 + nvmet_pci_epf_delete_cq(ctrl->tctrl, 0); 1892 + } 1893 + 1894 + static void nvmet_pci_epf_poll_cc_work(struct work_struct *work) 1895 + { 1896 + struct nvmet_pci_epf_ctrl *ctrl = 1897 + container_of(work, struct nvmet_pci_epf_ctrl, poll_cc.work); 1898 + u32 old_cc, new_cc; 1899 + int ret; 1900 + 1901 + if (!ctrl->tctrl) 1902 + return; 1903 + 1904 + old_cc = ctrl->cc; 1905 + new_cc = nvmet_pci_epf_bar_read32(ctrl, NVME_REG_CC); 1906 + ctrl->cc = new_cc; 1907 + 1908 + if (nvmet_cc_en(new_cc) && !nvmet_cc_en(old_cc)) { 1909 + ret = nvmet_pci_epf_enable_ctrl(ctrl); 1910 + if (ret) 1911 + return; 1912 + ctrl->csts |= NVME_CSTS_RDY; 1913 + } 1914 + 1915 + if (!nvmet_cc_en(new_cc) && nvmet_cc_en(old_cc)) { 1916 + nvmet_pci_epf_disable_ctrl(ctrl); 1917 + ctrl->csts &= ~NVME_CSTS_RDY; 1918 + } 1919 + 1920 + if (nvmet_cc_shn(new_cc) && !nvmet_cc_shn(old_cc)) { 1921 + nvmet_pci_epf_disable_ctrl(ctrl); 1922 + ctrl->csts |= NVME_CSTS_SHST_CMPLT; 1923 + } 1924 + 1925 + if (!nvmet_cc_shn(new_cc) && nvmet_cc_shn(old_cc)) 1926 + ctrl->csts &= ~NVME_CSTS_SHST_CMPLT; 1927 + 1928 + nvmet_update_cc(ctrl->tctrl, ctrl->cc); 1929 + nvmet_pci_epf_bar_write32(ctrl, NVME_REG_CSTS, ctrl->csts); 1930 + 1931 + schedule_delayed_work(&ctrl->poll_cc, NVMET_PCI_EPF_CC_POLL_INTERVAL); 1932 + } 1933 + 1934 + static void nvmet_pci_epf_init_bar(struct nvmet_pci_epf_ctrl *ctrl) 1935 + { 1936 + struct nvmet_ctrl *tctrl = ctrl->tctrl; 1937 + 1938 + ctrl->bar = ctrl->nvme_epf->reg_bar; 1939 + 1940 + /* Copy the target controller capabilities as a base. */ 1941 + ctrl->cap = tctrl->cap; 1942 + 1943 + /* Contiguous Queues Required (CQR). */ 1944 + ctrl->cap |= 0x1ULL << 16; 1945 + 1946 + /* Set Doorbell stride to 4B (DSTRB). */ 1947 + ctrl->cap &= ~GENMASK_ULL(35, 32); 1948 + 1949 + /* Clear NVM Subsystem Reset Supported (NSSRS). */ 1950 + ctrl->cap &= ~(0x1ULL << 36); 1951 + 1952 + /* Clear Boot Partition Support (BPS). */ 1953 + ctrl->cap &= ~(0x1ULL << 45); 1954 + 1955 + /* Clear Persistent Memory Region Supported (PMRS). */ 1956 + ctrl->cap &= ~(0x1ULL << 56); 1957 + 1958 + /* Clear Controller Memory Buffer Supported (CMBS). */ 1959 + ctrl->cap &= ~(0x1ULL << 57); 1960 + 1961 + /* Controller configuration. */ 1962 + ctrl->cc = tctrl->cc & (~NVME_CC_ENABLE); 1963 + 1964 + /* Controller status. */ 1965 + ctrl->csts = ctrl->tctrl->csts; 1966 + 1967 + nvmet_pci_epf_bar_write64(ctrl, NVME_REG_CAP, ctrl->cap); 1968 + nvmet_pci_epf_bar_write32(ctrl, NVME_REG_VS, tctrl->subsys->ver); 1969 + nvmet_pci_epf_bar_write32(ctrl, NVME_REG_CSTS, ctrl->csts); 1970 + nvmet_pci_epf_bar_write32(ctrl, NVME_REG_CC, ctrl->cc); 1971 + } 1972 + 1973 + static int nvmet_pci_epf_create_ctrl(struct nvmet_pci_epf *nvme_epf, 1974 + unsigned int max_nr_queues) 1975 + { 1976 + struct nvmet_pci_epf_ctrl *ctrl = &nvme_epf->ctrl; 1977 + struct nvmet_alloc_ctrl_args args = {}; 1978 + char hostnqn[NVMF_NQN_SIZE]; 1979 + uuid_t id; 1980 + int ret; 1981 + 1982 + memset(ctrl, 0, sizeof(*ctrl)); 1983 + ctrl->dev = &nvme_epf->epf->dev; 1984 + mutex_init(&ctrl->irq_lock); 1985 + ctrl->nvme_epf = nvme_epf; 1986 + ctrl->mdts = nvme_epf->mdts_kb * SZ_1K; 1987 + INIT_DELAYED_WORK(&ctrl->poll_cc, nvmet_pci_epf_poll_cc_work); 1988 + INIT_DELAYED_WORK(&ctrl->poll_sqs, nvmet_pci_epf_poll_sqs_work); 1989 + 1990 + ret = mempool_init_kmalloc_pool(&ctrl->iod_pool, 1991 + max_nr_queues * NVMET_MAX_QUEUE_SIZE, 1992 + sizeof(struct nvmet_pci_epf_iod)); 1993 + if (ret) { 1994 + dev_err(ctrl->dev, "Failed to initialize IOD mempool\n"); 1995 + return ret; 1996 + } 1997 + 1998 + ctrl->port = nvmet_pci_epf_find_port(ctrl, nvme_epf->portid); 1999 + if (!ctrl->port) { 2000 + dev_err(ctrl->dev, "Port not found\n"); 2001 + ret = -EINVAL; 2002 + goto out_mempool_exit; 2003 + } 2004 + 2005 + /* Create the target controller. */ 2006 + uuid_gen(&id); 2007 + snprintf(hostnqn, NVMF_NQN_SIZE, 2008 + "nqn.2014-08.org.nvmexpress:uuid:%pUb", &id); 2009 + args.port = ctrl->port; 2010 + args.subsysnqn = nvme_epf->subsysnqn; 2011 + memset(&id, 0, sizeof(uuid_t)); 2012 + args.hostid = &id; 2013 + args.hostnqn = hostnqn; 2014 + args.ops = &nvmet_pci_epf_fabrics_ops; 2015 + 2016 + ctrl->tctrl = nvmet_alloc_ctrl(&args); 2017 + if (!ctrl->tctrl) { 2018 + dev_err(ctrl->dev, "Failed to create target controller\n"); 2019 + ret = -ENOMEM; 2020 + goto out_mempool_exit; 2021 + } 2022 + ctrl->tctrl->drvdata = ctrl; 2023 + 2024 + /* We do not support protection information for now. */ 2025 + if (ctrl->tctrl->pi_support) { 2026 + dev_err(ctrl->dev, 2027 + "Protection information (PI) is not supported\n"); 2028 + ret = -ENOTSUPP; 2029 + goto out_put_ctrl; 2030 + } 2031 + 2032 + /* Allocate our queues, up to the maximum number. */ 2033 + ctrl->nr_queues = min(ctrl->tctrl->subsys->max_qid + 1, max_nr_queues); 2034 + ret = nvmet_pci_epf_alloc_queues(ctrl); 2035 + if (ret) 2036 + goto out_put_ctrl; 2037 + 2038 + /* 2039 + * Allocate the IRQ vectors descriptors. We cannot have more than the 2040 + * maximum number of queues. 2041 + */ 2042 + ret = nvmet_pci_epf_alloc_irq_vectors(ctrl); 2043 + if (ret) 2044 + goto out_free_queues; 2045 + 2046 + dev_info(ctrl->dev, 2047 + "New PCI ctrl \"%s\", %u I/O queues, mdts %u B\n", 2048 + ctrl->tctrl->subsys->subsysnqn, ctrl->nr_queues - 1, 2049 + ctrl->mdts); 2050 + 2051 + /* Initialize BAR 0 using the target controller CAP. */ 2052 + nvmet_pci_epf_init_bar(ctrl); 2053 + 2054 + return 0; 2055 + 2056 + out_free_queues: 2057 + nvmet_pci_epf_free_queues(ctrl); 2058 + out_put_ctrl: 2059 + nvmet_ctrl_put(ctrl->tctrl); 2060 + ctrl->tctrl = NULL; 2061 + out_mempool_exit: 2062 + mempool_exit(&ctrl->iod_pool); 2063 + return ret; 2064 + } 2065 + 2066 + static void nvmet_pci_epf_start_ctrl(struct nvmet_pci_epf_ctrl *ctrl) 2067 + { 2068 + schedule_delayed_work(&ctrl->poll_cc, NVMET_PCI_EPF_CC_POLL_INTERVAL); 2069 + } 2070 + 2071 + static void nvmet_pci_epf_stop_ctrl(struct nvmet_pci_epf_ctrl *ctrl) 2072 + { 2073 + cancel_delayed_work_sync(&ctrl->poll_cc); 2074 + 2075 + nvmet_pci_epf_disable_ctrl(ctrl); 2076 + } 2077 + 2078 + static void nvmet_pci_epf_destroy_ctrl(struct nvmet_pci_epf_ctrl *ctrl) 2079 + { 2080 + if (!ctrl->tctrl) 2081 + return; 2082 + 2083 + dev_info(ctrl->dev, "Destroying PCI ctrl \"%s\"\n", 2084 + ctrl->tctrl->subsys->subsysnqn); 2085 + 2086 + nvmet_pci_epf_stop_ctrl(ctrl); 2087 + 2088 + nvmet_pci_epf_free_queues(ctrl); 2089 + nvmet_pci_epf_free_irq_vectors(ctrl); 2090 + 2091 + nvmet_ctrl_put(ctrl->tctrl); 2092 + ctrl->tctrl = NULL; 2093 + 2094 + mempool_exit(&ctrl->iod_pool); 2095 + } 2096 + 2097 + static int nvmet_pci_epf_configure_bar(struct nvmet_pci_epf *nvme_epf) 2098 + { 2099 + struct pci_epf *epf = nvme_epf->epf; 2100 + const struct pci_epc_features *epc_features = nvme_epf->epc_features; 2101 + size_t reg_size, reg_bar_size; 2102 + size_t msix_table_size = 0; 2103 + 2104 + /* 2105 + * The first free BAR will be our register BAR and per NVMe 2106 + * specifications, it must be BAR 0. 2107 + */ 2108 + if (pci_epc_get_first_free_bar(epc_features) != BAR_0) { 2109 + dev_err(&epf->dev, "BAR 0 is not free\n"); 2110 + return -ENODEV; 2111 + } 2112 + 2113 + if (epc_features->bar[BAR_0].only_64bit) 2114 + epf->bar[BAR_0].flags |= PCI_BASE_ADDRESS_MEM_TYPE_64; 2115 + 2116 + /* 2117 + * Calculate the size of the register bar: NVMe registers first with 2118 + * enough space for the doorbells, followed by the MSI-X table 2119 + * if supported. 2120 + */ 2121 + reg_size = NVME_REG_DBS + (NVMET_NR_QUEUES * 2 * sizeof(u32)); 2122 + reg_size = ALIGN(reg_size, 8); 2123 + 2124 + if (epc_features->msix_capable) { 2125 + size_t pba_size; 2126 + 2127 + msix_table_size = PCI_MSIX_ENTRY_SIZE * epf->msix_interrupts; 2128 + nvme_epf->msix_table_offset = reg_size; 2129 + pba_size = ALIGN(DIV_ROUND_UP(epf->msix_interrupts, 8), 8); 2130 + 2131 + reg_size += msix_table_size + pba_size; 2132 + } 2133 + 2134 + if (epc_features->bar[BAR_0].type == BAR_FIXED) { 2135 + if (reg_size > epc_features->bar[BAR_0].fixed_size) { 2136 + dev_err(&epf->dev, 2137 + "BAR 0 size %llu B too small, need %zu B\n", 2138 + epc_features->bar[BAR_0].fixed_size, 2139 + reg_size); 2140 + return -ENOMEM; 2141 + } 2142 + reg_bar_size = epc_features->bar[BAR_0].fixed_size; 2143 + } else { 2144 + reg_bar_size = ALIGN(reg_size, max(epc_features->align, 4096)); 2145 + } 2146 + 2147 + nvme_epf->reg_bar = pci_epf_alloc_space(epf, reg_bar_size, BAR_0, 2148 + epc_features, PRIMARY_INTERFACE); 2149 + if (!nvme_epf->reg_bar) { 2150 + dev_err(&epf->dev, "Failed to allocate BAR 0\n"); 2151 + return -ENOMEM; 2152 + } 2153 + memset(nvme_epf->reg_bar, 0, reg_bar_size); 2154 + 2155 + return 0; 2156 + } 2157 + 2158 + static void nvmet_pci_epf_free_bar(struct nvmet_pci_epf *nvme_epf) 2159 + { 2160 + struct pci_epf *epf = nvme_epf->epf; 2161 + 2162 + if (!nvme_epf->reg_bar) 2163 + return; 2164 + 2165 + pci_epf_free_space(epf, nvme_epf->reg_bar, BAR_0, PRIMARY_INTERFACE); 2166 + nvme_epf->reg_bar = NULL; 2167 + } 2168 + 2169 + static void nvmet_pci_epf_clear_bar(struct nvmet_pci_epf *nvme_epf) 2170 + { 2171 + struct pci_epf *epf = nvme_epf->epf; 2172 + 2173 + pci_epc_clear_bar(epf->epc, epf->func_no, epf->vfunc_no, 2174 + &epf->bar[BAR_0]); 2175 + } 2176 + 2177 + static int nvmet_pci_epf_init_irq(struct nvmet_pci_epf *nvme_epf) 2178 + { 2179 + const struct pci_epc_features *epc_features = nvme_epf->epc_features; 2180 + struct pci_epf *epf = nvme_epf->epf; 2181 + int ret; 2182 + 2183 + /* Enable MSI-X if supported, otherwise, use MSI. */ 2184 + if (epc_features->msix_capable && epf->msix_interrupts) { 2185 + ret = pci_epc_set_msix(epf->epc, epf->func_no, epf->vfunc_no, 2186 + epf->msix_interrupts, BAR_0, 2187 + nvme_epf->msix_table_offset); 2188 + if (ret) { 2189 + dev_err(&epf->dev, "Failed to configure MSI-X\n"); 2190 + return ret; 2191 + } 2192 + 2193 + nvme_epf->nr_vectors = epf->msix_interrupts; 2194 + nvme_epf->irq_type = PCI_IRQ_MSIX; 2195 + 2196 + return 0; 2197 + } 2198 + 2199 + if (epc_features->msi_capable && epf->msi_interrupts) { 2200 + ret = pci_epc_set_msi(epf->epc, epf->func_no, epf->vfunc_no, 2201 + epf->msi_interrupts); 2202 + if (ret) { 2203 + dev_err(&epf->dev, "Failed to configure MSI\n"); 2204 + return ret; 2205 + } 2206 + 2207 + nvme_epf->nr_vectors = epf->msi_interrupts; 2208 + nvme_epf->irq_type = PCI_IRQ_MSI; 2209 + 2210 + return 0; 2211 + } 2212 + 2213 + /* MSI and MSI-X are not supported: fall back to INTx. */ 2214 + nvme_epf->nr_vectors = 1; 2215 + nvme_epf->irq_type = PCI_IRQ_INTX; 2216 + 2217 + return 0; 2218 + } 2219 + 2220 + static int nvmet_pci_epf_epc_init(struct pci_epf *epf) 2221 + { 2222 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2223 + const struct pci_epc_features *epc_features = nvme_epf->epc_features; 2224 + struct nvmet_pci_epf_ctrl *ctrl = &nvme_epf->ctrl; 2225 + unsigned int max_nr_queues = NVMET_NR_QUEUES; 2226 + int ret; 2227 + 2228 + /* For now, do not support virtual functions. */ 2229 + if (epf->vfunc_no > 0) { 2230 + dev_err(&epf->dev, "Virtual functions are not supported\n"); 2231 + return -EINVAL; 2232 + } 2233 + 2234 + /* 2235 + * Cap the maximum number of queues we can support on the controller 2236 + * with the number of IRQs we can use. 2237 + */ 2238 + if (epc_features->msix_capable && epf->msix_interrupts) { 2239 + dev_info(&epf->dev, 2240 + "PCI endpoint controller supports MSI-X, %u vectors\n", 2241 + epf->msix_interrupts); 2242 + max_nr_queues = min(max_nr_queues, epf->msix_interrupts); 2243 + } else if (epc_features->msi_capable && epf->msi_interrupts) { 2244 + dev_info(&epf->dev, 2245 + "PCI endpoint controller supports MSI, %u vectors\n", 2246 + epf->msi_interrupts); 2247 + max_nr_queues = min(max_nr_queues, epf->msi_interrupts); 2248 + } 2249 + 2250 + if (max_nr_queues < 2) { 2251 + dev_err(&epf->dev, "Invalid maximum number of queues %u\n", 2252 + max_nr_queues); 2253 + return -EINVAL; 2254 + } 2255 + 2256 + /* Create the target controller. */ 2257 + ret = nvmet_pci_epf_create_ctrl(nvme_epf, max_nr_queues); 2258 + if (ret) { 2259 + dev_err(&epf->dev, 2260 + "Failed to create NVMe PCI target controller (err=%d)\n", 2261 + ret); 2262 + return ret; 2263 + } 2264 + 2265 + /* Set device ID, class, etc. */ 2266 + epf->header->vendorid = ctrl->tctrl->subsys->vendor_id; 2267 + epf->header->subsys_vendor_id = ctrl->tctrl->subsys->subsys_vendor_id; 2268 + ret = pci_epc_write_header(epf->epc, epf->func_no, epf->vfunc_no, 2269 + epf->header); 2270 + if (ret) { 2271 + dev_err(&epf->dev, 2272 + "Failed to write configuration header (err=%d)\n", ret); 2273 + goto out_destroy_ctrl; 2274 + } 2275 + 2276 + ret = pci_epc_set_bar(epf->epc, epf->func_no, epf->vfunc_no, 2277 + &epf->bar[BAR_0]); 2278 + if (ret) { 2279 + dev_err(&epf->dev, "Failed to set BAR 0 (err=%d)\n", ret); 2280 + goto out_destroy_ctrl; 2281 + } 2282 + 2283 + /* 2284 + * Enable interrupts and start polling the controller BAR if we do not 2285 + * have a link up notifier. 2286 + */ 2287 + ret = nvmet_pci_epf_init_irq(nvme_epf); 2288 + if (ret) 2289 + goto out_clear_bar; 2290 + 2291 + if (!epc_features->linkup_notifier) { 2292 + ctrl->link_up = true; 2293 + nvmet_pci_epf_start_ctrl(&nvme_epf->ctrl); 2294 + } 2295 + 2296 + return 0; 2297 + 2298 + out_clear_bar: 2299 + nvmet_pci_epf_clear_bar(nvme_epf); 2300 + out_destroy_ctrl: 2301 + nvmet_pci_epf_destroy_ctrl(&nvme_epf->ctrl); 2302 + return ret; 2303 + } 2304 + 2305 + static void nvmet_pci_epf_epc_deinit(struct pci_epf *epf) 2306 + { 2307 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2308 + struct nvmet_pci_epf_ctrl *ctrl = &nvme_epf->ctrl; 2309 + 2310 + ctrl->link_up = false; 2311 + nvmet_pci_epf_destroy_ctrl(ctrl); 2312 + 2313 + nvmet_pci_epf_deinit_dma(nvme_epf); 2314 + nvmet_pci_epf_clear_bar(nvme_epf); 2315 + } 2316 + 2317 + static int nvmet_pci_epf_link_up(struct pci_epf *epf) 2318 + { 2319 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2320 + struct nvmet_pci_epf_ctrl *ctrl = &nvme_epf->ctrl; 2321 + 2322 + ctrl->link_up = true; 2323 + nvmet_pci_epf_start_ctrl(ctrl); 2324 + 2325 + return 0; 2326 + } 2327 + 2328 + static int nvmet_pci_epf_link_down(struct pci_epf *epf) 2329 + { 2330 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2331 + struct nvmet_pci_epf_ctrl *ctrl = &nvme_epf->ctrl; 2332 + 2333 + ctrl->link_up = false; 2334 + nvmet_pci_epf_stop_ctrl(ctrl); 2335 + 2336 + return 0; 2337 + } 2338 + 2339 + static const struct pci_epc_event_ops nvmet_pci_epf_event_ops = { 2340 + .epc_init = nvmet_pci_epf_epc_init, 2341 + .epc_deinit = nvmet_pci_epf_epc_deinit, 2342 + .link_up = nvmet_pci_epf_link_up, 2343 + .link_down = nvmet_pci_epf_link_down, 2344 + }; 2345 + 2346 + static int nvmet_pci_epf_bind(struct pci_epf *epf) 2347 + { 2348 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2349 + const struct pci_epc_features *epc_features; 2350 + struct pci_epc *epc = epf->epc; 2351 + int ret; 2352 + 2353 + if (WARN_ON_ONCE(!epc)) 2354 + return -EINVAL; 2355 + 2356 + epc_features = pci_epc_get_features(epc, epf->func_no, epf->vfunc_no); 2357 + if (!epc_features) { 2358 + dev_err(&epf->dev, "epc_features not implemented\n"); 2359 + return -EOPNOTSUPP; 2360 + } 2361 + nvme_epf->epc_features = epc_features; 2362 + 2363 + ret = nvmet_pci_epf_configure_bar(nvme_epf); 2364 + if (ret) 2365 + return ret; 2366 + 2367 + nvmet_pci_epf_init_dma(nvme_epf); 2368 + 2369 + return 0; 2370 + } 2371 + 2372 + static void nvmet_pci_epf_unbind(struct pci_epf *epf) 2373 + { 2374 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2375 + struct pci_epc *epc = epf->epc; 2376 + 2377 + nvmet_pci_epf_destroy_ctrl(&nvme_epf->ctrl); 2378 + 2379 + if (epc->init_complete) { 2380 + nvmet_pci_epf_deinit_dma(nvme_epf); 2381 + nvmet_pci_epf_clear_bar(nvme_epf); 2382 + } 2383 + 2384 + nvmet_pci_epf_free_bar(nvme_epf); 2385 + } 2386 + 2387 + static struct pci_epf_header nvme_epf_pci_header = { 2388 + .vendorid = PCI_ANY_ID, 2389 + .deviceid = PCI_ANY_ID, 2390 + .progif_code = 0x02, /* NVM Express */ 2391 + .baseclass_code = PCI_BASE_CLASS_STORAGE, 2392 + .subclass_code = 0x08, /* Non-Volatile Memory controller */ 2393 + .interrupt_pin = PCI_INTERRUPT_INTA, 2394 + }; 2395 + 2396 + static int nvmet_pci_epf_probe(struct pci_epf *epf, 2397 + const struct pci_epf_device_id *id) 2398 + { 2399 + struct nvmet_pci_epf *nvme_epf; 2400 + int ret; 2401 + 2402 + nvme_epf = devm_kzalloc(&epf->dev, sizeof(*nvme_epf), GFP_KERNEL); 2403 + if (!nvme_epf) 2404 + return -ENOMEM; 2405 + 2406 + ret = devm_mutex_init(&epf->dev, &nvme_epf->mmio_lock); 2407 + if (ret) 2408 + return ret; 2409 + 2410 + nvme_epf->epf = epf; 2411 + nvme_epf->mdts_kb = NVMET_PCI_EPF_MDTS_KB; 2412 + 2413 + epf->event_ops = &nvmet_pci_epf_event_ops; 2414 + epf->header = &nvme_epf_pci_header; 2415 + epf_set_drvdata(epf, nvme_epf); 2416 + 2417 + return 0; 2418 + } 2419 + 2420 + #define to_nvme_epf(epf_group) \ 2421 + container_of(epf_group, struct nvmet_pci_epf, group) 2422 + 2423 + static ssize_t nvmet_pci_epf_portid_show(struct config_item *item, char *page) 2424 + { 2425 + struct config_group *group = to_config_group(item); 2426 + struct nvmet_pci_epf *nvme_epf = to_nvme_epf(group); 2427 + 2428 + return sysfs_emit(page, "%u\n", le16_to_cpu(nvme_epf->portid)); 2429 + } 2430 + 2431 + static ssize_t nvmet_pci_epf_portid_store(struct config_item *item, 2432 + const char *page, size_t len) 2433 + { 2434 + struct config_group *group = to_config_group(item); 2435 + struct nvmet_pci_epf *nvme_epf = to_nvme_epf(group); 2436 + u16 portid; 2437 + 2438 + /* Do not allow setting this when the function is already started. */ 2439 + if (nvme_epf->ctrl.tctrl) 2440 + return -EBUSY; 2441 + 2442 + if (!len) 2443 + return -EINVAL; 2444 + 2445 + if (kstrtou16(page, 0, &portid)) 2446 + return -EINVAL; 2447 + 2448 + nvme_epf->portid = cpu_to_le16(portid); 2449 + 2450 + return len; 2451 + } 2452 + 2453 + CONFIGFS_ATTR(nvmet_pci_epf_, portid); 2454 + 2455 + static ssize_t nvmet_pci_epf_subsysnqn_show(struct config_item *item, 2456 + char *page) 2457 + { 2458 + struct config_group *group = to_config_group(item); 2459 + struct nvmet_pci_epf *nvme_epf = to_nvme_epf(group); 2460 + 2461 + return sysfs_emit(page, "%s\n", nvme_epf->subsysnqn); 2462 + } 2463 + 2464 + static ssize_t nvmet_pci_epf_subsysnqn_store(struct config_item *item, 2465 + const char *page, size_t len) 2466 + { 2467 + struct config_group *group = to_config_group(item); 2468 + struct nvmet_pci_epf *nvme_epf = to_nvme_epf(group); 2469 + 2470 + /* Do not allow setting this when the function is already started. */ 2471 + if (nvme_epf->ctrl.tctrl) 2472 + return -EBUSY; 2473 + 2474 + if (!len) 2475 + return -EINVAL; 2476 + 2477 + strscpy(nvme_epf->subsysnqn, page, len); 2478 + 2479 + return len; 2480 + } 2481 + 2482 + CONFIGFS_ATTR(nvmet_pci_epf_, subsysnqn); 2483 + 2484 + static ssize_t nvmet_pci_epf_mdts_kb_show(struct config_item *item, char *page) 2485 + { 2486 + struct config_group *group = to_config_group(item); 2487 + struct nvmet_pci_epf *nvme_epf = to_nvme_epf(group); 2488 + 2489 + return sysfs_emit(page, "%u\n", nvme_epf->mdts_kb); 2490 + } 2491 + 2492 + static ssize_t nvmet_pci_epf_mdts_kb_store(struct config_item *item, 2493 + const char *page, size_t len) 2494 + { 2495 + struct config_group *group = to_config_group(item); 2496 + struct nvmet_pci_epf *nvme_epf = to_nvme_epf(group); 2497 + unsigned long mdts_kb; 2498 + int ret; 2499 + 2500 + if (nvme_epf->ctrl.tctrl) 2501 + return -EBUSY; 2502 + 2503 + ret = kstrtoul(page, 0, &mdts_kb); 2504 + if (ret) 2505 + return ret; 2506 + if (!mdts_kb) 2507 + mdts_kb = NVMET_PCI_EPF_MDTS_KB; 2508 + else if (mdts_kb > NVMET_PCI_EPF_MAX_MDTS_KB) 2509 + mdts_kb = NVMET_PCI_EPF_MAX_MDTS_KB; 2510 + 2511 + if (!is_power_of_2(mdts_kb)) 2512 + return -EINVAL; 2513 + 2514 + nvme_epf->mdts_kb = mdts_kb; 2515 + 2516 + return len; 2517 + } 2518 + 2519 + CONFIGFS_ATTR(nvmet_pci_epf_, mdts_kb); 2520 + 2521 + static struct configfs_attribute *nvmet_pci_epf_attrs[] = { 2522 + &nvmet_pci_epf_attr_portid, 2523 + &nvmet_pci_epf_attr_subsysnqn, 2524 + &nvmet_pci_epf_attr_mdts_kb, 2525 + NULL, 2526 + }; 2527 + 2528 + static const struct config_item_type nvmet_pci_epf_group_type = { 2529 + .ct_attrs = nvmet_pci_epf_attrs, 2530 + .ct_owner = THIS_MODULE, 2531 + }; 2532 + 2533 + static struct config_group *nvmet_pci_epf_add_cfs(struct pci_epf *epf, 2534 + struct config_group *group) 2535 + { 2536 + struct nvmet_pci_epf *nvme_epf = epf_get_drvdata(epf); 2537 + 2538 + config_group_init_type_name(&nvme_epf->group, "nvme", 2539 + &nvmet_pci_epf_group_type); 2540 + 2541 + return &nvme_epf->group; 2542 + } 2543 + 2544 + static const struct pci_epf_device_id nvmet_pci_epf_ids[] = { 2545 + { .name = "nvmet_pci_epf" }, 2546 + {}, 2547 + }; 2548 + 2549 + static struct pci_epf_ops nvmet_pci_epf_ops = { 2550 + .bind = nvmet_pci_epf_bind, 2551 + .unbind = nvmet_pci_epf_unbind, 2552 + .add_cfs = nvmet_pci_epf_add_cfs, 2553 + }; 2554 + 2555 + static struct pci_epf_driver nvmet_pci_epf_driver = { 2556 + .driver.name = "nvmet_pci_epf", 2557 + .probe = nvmet_pci_epf_probe, 2558 + .id_table = nvmet_pci_epf_ids, 2559 + .ops = &nvmet_pci_epf_ops, 2560 + .owner = THIS_MODULE, 2561 + }; 2562 + 2563 + static int __init nvmet_pci_epf_init_module(void) 2564 + { 2565 + int ret; 2566 + 2567 + ret = pci_epf_register_driver(&nvmet_pci_epf_driver); 2568 + if (ret) 2569 + return ret; 2570 + 2571 + ret = nvmet_register_transport(&nvmet_pci_epf_fabrics_ops); 2572 + if (ret) { 2573 + pci_epf_unregister_driver(&nvmet_pci_epf_driver); 2574 + return ret; 2575 + } 2576 + 2577 + return 0; 2578 + } 2579 + 2580 + static void __exit nvmet_pci_epf_cleanup_module(void) 2581 + { 2582 + nvmet_unregister_transport(&nvmet_pci_epf_fabrics_ops); 2583 + pci_epf_unregister_driver(&nvmet_pci_epf_driver); 2584 + } 2585 + 2586 + module_init(nvmet_pci_epf_init_module); 2587 + module_exit(nvmet_pci_epf_cleanup_module); 2588 + 2589 + MODULE_DESCRIPTION("NVMe PCI Endpoint Function target driver"); 2590 + MODULE_AUTHOR("Damien Le Moal <dlemoal@kernel.org>"); 2591 + MODULE_LICENSE("GPL");
+42
include/linux/nvme.h
··· 64 64 65 65 /* Transport Type codes for Discovery Log Page entry TRTYPE field */ 66 66 enum { 67 + NVMF_TRTYPE_PCI = 0, /* PCI */ 67 68 NVMF_TRTYPE_RDMA = 1, /* RDMA */ 68 69 NVMF_TRTYPE_FC = 2, /* Fibre Channel */ 69 70 NVMF_TRTYPE_TCP = 3, /* TCP/IP */ ··· 276 275 NVME_CTRL_ATTR_HID_128_BIT = (1 << 0), 277 276 NVME_CTRL_ATTR_TBKAS = (1 << 6), 278 277 NVME_CTRL_ATTR_ELBAS = (1 << 15), 278 + NVME_CTRL_ATTR_RHII = (1 << 18), 279 279 }; 280 280 281 281 struct nvme_id_ctrl { ··· 1896 1894 static inline bool nvme_is_fabrics(const struct nvme_command *cmd) 1897 1895 { 1898 1896 return cmd->common.opcode == nvme_fabrics_command; 1897 + } 1898 + 1899 + #ifdef CONFIG_NVME_VERBOSE_ERRORS 1900 + const char *nvme_get_error_status_str(u16 status); 1901 + const char *nvme_get_opcode_str(u8 opcode); 1902 + const char *nvme_get_admin_opcode_str(u8 opcode); 1903 + const char *nvme_get_fabrics_opcode_str(u8 opcode); 1904 + #else /* CONFIG_NVME_VERBOSE_ERRORS */ 1905 + static inline const char *nvme_get_error_status_str(u16 status) 1906 + { 1907 + return "I/O Error"; 1908 + } 1909 + static inline const char *nvme_get_opcode_str(u8 opcode) 1910 + { 1911 + return "I/O Cmd"; 1912 + } 1913 + static inline const char *nvme_get_admin_opcode_str(u8 opcode) 1914 + { 1915 + return "Admin Cmd"; 1916 + } 1917 + 1918 + static inline const char *nvme_get_fabrics_opcode_str(u8 opcode) 1919 + { 1920 + return "Fabrics Cmd"; 1921 + } 1922 + #endif /* CONFIG_NVME_VERBOSE_ERRORS */ 1923 + 1924 + static inline const char *nvme_opcode_str(int qid, u8 opcode) 1925 + { 1926 + return qid ? nvme_get_opcode_str(opcode) : 1927 + nvme_get_admin_opcode_str(opcode); 1928 + } 1929 + 1930 + static inline const char *nvme_fabrics_opcode_str( 1931 + int qid, const struct nvme_command *cmd) 1932 + { 1933 + if (nvme_is_fabrics(cmd)) 1934 + return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype); 1935 + 1936 + return nvme_opcode_str(qid, cmd->common.opcode); 1899 1937 } 1900 1938 1901 1939 struct nvme_error_slot {