Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

PCI: Wait for device readiness with Configuration RRS

After a device reset, delays are required before the device can
successfully complete config accesses. PCIe r6.0, sec 6.6, specifies some
delays required before software can perform config accesses. Devices that
require more time after those delays may respond to config accesses with
Configuration Request Retry Status (RRS) completions.

Callers of pci_dev_wait() are responsible for delays until the device can
respond to config accesses. pci_dev_wait() waits any additional time until
the device can successfully complete config accesses.

Reading config space of devices that are not present or not ready typically
returns ~0 (PCI_ERROR_RESPONSE). Previously we polled the Command register
until we got a value other than ~0. This is sometimes a problem because
Root Complex handling of RRS completions may include several retries and
implementation-specific behavior that is invisible to software (see sec
2.3.2), so the exponential backoff in pci_dev_wait() may not work as
intended.

Linux enables Configuration RRS Software Visibility on all Root Ports that
support it. If it is enabled, read the Vendor ID instead of the Command
register. RRS completions cause immediate return of the 0x0001 reserved
Vendor ID value, so the pci_dev_wait() backoff works correctly.

When a read of Vendor ID eventually completes successfully by returning a
non-0x0001 value (the Vendor ID or 0xffff for VFs), the device should be
initialized and ready to respond to config requests.

For conventional PCI devices or devices below Root Ports that don't support
Configuration RRS Software Visibility, poll the Command register as before.

This was developed independently, but is very similar to Stanislav
Spassov's previous work at
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com

Link: https://lore.kernel.org/r/20240827234848.4429-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Duc Dang <ducdang@google.com>

+37 -19
+28 -13
drivers/pci/pci.c
··· 1283 1283 { 1284 1284 int delay = 1; 1285 1285 bool retrain = false; 1286 - struct pci_dev *bridge; 1286 + struct pci_dev *root, *bridge; 1287 + 1288 + root = pcie_find_root_port(dev); 1287 1289 1288 1290 if (pci_is_pcie(dev)) { 1289 1291 bridge = pci_upstream_bridge(dev); ··· 1294 1292 } 1295 1293 1296 1294 /* 1297 - * After reset, the device should not silently discard config 1298 - * requests, but it may still indicate that it needs more time by 1299 - * responding to them with CRS completions. The Root Port will 1300 - * generally synthesize ~0 (PCI_ERROR_RESPONSE) data to complete 1301 - * the read (except when CRS SV is enabled and the read was for the 1302 - * Vendor ID; in that case it synthesizes 0x0001 data). 1295 + * The caller has already waited long enough after a reset that the 1296 + * device should respond to config requests, but it may respond 1297 + * with Request Retry Status (RRS) if it needs more time to 1298 + * initialize. 1303 1299 * 1304 - * Wait for the device to return a non-CRS completion. Read the 1305 - * Command register instead of Vendor ID so we don't have to 1306 - * contend with the CRS SV value. 1300 + * If the device is below a Root Port with Configuration RRS 1301 + * Software Visibility enabled, reading the Vendor ID returns a 1302 + * special data value if the device responded with RRS. Read the 1303 + * Vendor ID until we get non-RRS status. 1304 + * 1305 + * If there's no Root Port or Configuration RRS Software Visibility 1306 + * is not enabled, the device may still respond with RRS, but 1307 + * hardware may retry the config request. If no retries receive 1308 + * Successful Completion, hardware generally synthesizes ~0 1309 + * (PCI_ERROR_RESPONSE) data to complete the read. Reading Vendor 1310 + * ID for VFs and non-existent devices also returns ~0, so read the 1311 + * Command register until it returns something other than ~0. 1307 1312 */ 1308 1313 for (;;) { 1309 1314 u32 id; ··· 1320 1311 return -ENOTTY; 1321 1312 } 1322 1313 1323 - pci_read_config_dword(dev, PCI_COMMAND, &id); 1324 - if (!PCI_POSSIBLE_ERROR(id)) 1325 - break; 1314 + if (root && root->config_crs_sv) { 1315 + pci_read_config_dword(dev, PCI_VENDOR_ID, &id); 1316 + if (!pci_bus_crs_vendor_id(id)) 1317 + break; 1318 + } else { 1319 + pci_read_config_dword(dev, PCI_COMMAND, &id); 1320 + if (!PCI_POSSIBLE_ERROR(id)) 1321 + break; 1322 + } 1326 1323 1327 1324 if (delay > timeout) { 1328 1325 pci_warn(dev, "not ready %dms after %s; giving up\n",
+5
drivers/pci/pci.h
··· 139 139 void pci_bridge_d3_update(struct pci_dev *dev); 140 140 int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type); 141 141 142 + static inline bool pci_bus_crs_vendor_id(u32 l) 143 + { 144 + return (l & 0xffff) == PCI_VENDOR_ID_PCI_SIG; 145 + } 146 + 142 147 static inline void pci_wakeup_event(struct pci_dev *dev) 143 148 { 144 149 /* Wait 100 ms before the system can be put into a sleep state. */
+3 -6
drivers/pci/probe.c
··· 1209 1209 1210 1210 /* Enable CRS Software Visibility if supported */ 1211 1211 pcie_capability_read_word(pdev, PCI_EXP_RTCAP, &root_cap); 1212 - if (root_cap & PCI_EXP_RTCAP_CRSVIS) 1212 + if (root_cap & PCI_EXP_RTCAP_CRSVIS) { 1213 1213 pcie_capability_set_word(pdev, PCI_EXP_RTCTL, 1214 1214 PCI_EXP_RTCTL_CRSSVE); 1215 + pdev->config_crs_sv = 1; 1216 + } 1215 1217 } 1216 1218 1217 1219 static unsigned int pci_scan_child_bus_extend(struct pci_bus *bus, ··· 2344 2342 return dev; 2345 2343 } 2346 2344 EXPORT_SYMBOL(pci_alloc_dev); 2347 - 2348 - static bool pci_bus_crs_vendor_id(u32 l) 2349 - { 2350 - return (l & 0xffff) == PCI_VENDOR_ID_PCI_SIG; 2351 - } 2352 2345 2353 2346 static bool pci_bus_wait_crs(struct pci_bus *bus, int devfn, u32 *l, 2354 2347 int timeout)
+1
include/linux/pci.h
··· 371 371 can be generated */ 372 372 unsigned int pme_poll:1; /* Poll device's PME status bit */ 373 373 unsigned int pinned:1; /* Whether this dev is pinned */ 374 + unsigned int config_crs_sv:1; /* Config CRS software visibility */ 374 375 unsigned int imm_ready:1; /* Supports Immediate Readiness */ 375 376 unsigned int d1_support:1; /* Low power state D1 is supported */ 376 377 unsigned int d2_support:1; /* Low power state D2 is supported */