Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/xe: Track pre-production workaround support

When we're initially enabling driver support for a new platform/IP, we
usually implement all workarounds documented in the WA database in the
driver. Many of those workarounds are restricted to early steppings
that only showed up in pre-production hardware (i.e., internal test
chips that are not available to the general public). Since the
workarounds for early, pre-production steppings tend to be some of the
ugliest and most complicated workarounds, we generally want to eliminate
them and simplify the code once the platform has launched and our
internal usage of those pre-production parts have been phased out.

Let's add a flag to the device info that tracks which platforms still
have support for pre-production workarounds for so that we can print a
warning and taint if someone tries to load the driver on a
pre-production part for a platform without pre-production workarounds.
This will help our internal users understand the likely problems they'll
encounter if they try to load the driver on an old pre-production
device.

The Xe behavior here is similar to what we've done for many years on
i915 (see intel_detect_preproduction_hw()), except that instead of
manually coding up ranges of device steppings that we believe to be
pre-production hardware, Xe will use the hardware's own production vs
pre-production fusing status, which we can read from the FUSE2 register.
This fuse didn't exist on older Intel hardware, but should be present on
all platforms supported by the Xe driver.

Going forward, let's set the expectation that we'll start looking into
removing pre-production workarounds for a platform around the time that
platforms of the next major IP stepping are having their force_probe
requirement lifted. This timing is just a rough guideline; there may be
cases where some instances of pre-production parts are still being
actively used in CI farms, internal device pools, etc. and we'll need to
wait a bit longer for those to be swapped out.

v2:
- Fix inverted forcewake check

v3:
- Invert flag and add it to the platforms on which we still have
pre-prod workarounds. (Jani, Lucas)

v4:
- Avoid checking pre-production on VF since they don't have access to
the FUSE2 register.

Bspec: 78271, 52544
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>

+69
+3
drivers/gpu/drm/xe/regs/xe_gt_regs.h
··· 227 227 228 228 #define MIRROR_FUSE1 XE_REG(0x911c) 229 229 230 + #define FUSE2 XE_REG(0x9120) 231 + #define PRODUCTION_HW REG_BIT(2) 232 + 230 233 #define MIRROR_L3BANK_ENABLE XE_REG(0x9130) 231 234 #define XE3_L3BANK_ENABLE REG_GENMASK(31, 0) 232 235
+57
drivers/gpu/drm/xe/xe_device.c
··· 804 804 return 0; 805 805 } 806 806 807 + /* 808 + * Detect if the driver is being run on pre-production hardware. We don't 809 + * keep workarounds for pre-production hardware long term, so print an 810 + * error and add taint if we're being loaded on a pre-production platform 811 + * for which the pre-prod workarounds have already been removed. 812 + * 813 + * The general policy is that we'll remove any workarounds that only apply to 814 + * pre-production hardware around the time force_probe restrictions are lifted 815 + * for a platform of the next major IP generation (for example, Xe2 pre-prod 816 + * workarounds should be removed around the time the first Xe3 platforms have 817 + * force_probe lifted). 818 + */ 819 + static void detect_preproduction_hw(struct xe_device *xe) 820 + { 821 + struct xe_gt *gt; 822 + int id; 823 + 824 + /* 825 + * SR-IOV VFs don't have access to the FUSE2 register, so we can't 826 + * check pre-production status there. But the host OS will notice 827 + * and report the pre-production status, which should be enough to 828 + * help us catch mistaken use of pre-production hardware. 829 + */ 830 + if (IS_SRIOV_VF(xe)) 831 + return; 832 + 833 + /* 834 + * The "SW_CAP" fuse contains a bit indicating whether the device is a 835 + * production or pre-production device. This fuse is reflected through 836 + * the GT "FUSE2" register, even though the contents of the fuse are 837 + * not GT-specific. Every GT's reflection of this fuse should show the 838 + * same value, so we'll just use the first available GT for lookup. 839 + */ 840 + for_each_gt(gt, xe, id) 841 + break; 842 + 843 + if (!gt) 844 + return; 845 + 846 + CLASS(xe_force_wake, fw_ref)(gt_to_fw(gt), XE_FW_GT); 847 + if (!xe_force_wake_ref_has_domain(fw_ref.domains, XE_FW_GT)) { 848 + xe_gt_err(gt, "Forcewake failure; cannot determine production/pre-production hw status.\n"); 849 + return; 850 + } 851 + 852 + if (xe_mmio_read32(&gt->mmio, FUSE2) & PRODUCTION_HW) 853 + return; 854 + 855 + xe_info(xe, "Pre-production hardware detected.\n"); 856 + if (!xe->info.has_pre_prod_wa) { 857 + xe_err(xe, "Pre-production workarounds for this platform have already been removed.\n"); 858 + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK); 859 + } 860 + } 861 + 807 862 int xe_device_probe(struct xe_device *xe) 808 863 { 809 864 struct xe_tile *tile; ··· 1028 973 err = xe_sriov_init_late(xe); 1029 974 if (err) 1030 975 goto err_unregister_display; 976 + 977 + detect_preproduction_hw(xe); 1031 978 1032 979 return devm_add_action_or_reset(xe->drm.dev, xe_device_sanitize, xe); 1033 980
+2
drivers/gpu/drm/xe/xe_device_types.h
··· 327 327 u8 has_mert:1; 328 328 /** @info.has_page_reclaim_hw_assist: Device supports page reclamation feature */ 329 329 u8 has_page_reclaim_hw_assist:1; 330 + /** @info.has_pre_prod_wa: Pre-production workarounds still present in driver */ 331 + u8 has_pre_prod_wa:1; 330 332 /** @info.has_pxp: Device has PXP support */ 331 333 u8 has_pxp:1; 332 334 /** @info.has_range_tlb_inval: Has range based TLB invalidations */
+6
drivers/gpu/drm/xe/xe_pci.c
··· 347 347 .dma_mask_size = 46, 348 348 .has_display = true, 349 349 .has_flat_ccs = 1, 350 + .has_pre_prod_wa = 1, 350 351 .has_pxp = true, 351 352 .has_mem_copy_instr = true, 352 353 .max_gt_per_tile = 2, ··· 370 369 .has_heci_cscfi = 1, 371 370 .has_i2c = true, 372 371 .has_late_bind = true, 372 + .has_pre_prod_wa = 1, 373 373 .has_sriov = true, 374 374 .has_mem_copy_instr = true, 375 375 .max_gt_per_tile = 2, ··· 390 388 .has_flat_ccs = 1, 391 389 .has_sriov = true, 392 390 .has_mem_copy_instr = true, 391 + .has_pre_prod_wa = 1, 393 392 .max_gt_per_tile = 2, 394 393 .needs_scratch = true, 395 394 .needs_shared_vf_gt_wq = true, ··· 404 401 .has_display = true, 405 402 .has_flat_ccs = 1, 406 403 .has_mem_copy_instr = true, 404 + .has_pre_prod_wa = 1, 407 405 .max_gt_per_tile = 2, 408 406 .require_force_probe = true, 409 407 .va_bits = 48, ··· 420 416 .has_i2c = true, 421 417 .has_mbx_power_limits = true, 422 418 .has_mert = true, 419 + .has_pre_prod_wa = 1, 423 420 .has_sriov = true, 424 421 .max_gt_per_tile = 2, 425 422 .require_force_probe = true, ··· 690 685 xe->info.has_llc = desc->has_llc; 691 686 xe->info.has_mert = desc->has_mert; 692 687 xe->info.has_page_reclaim_hw_assist = desc->has_page_reclaim_hw_assist; 688 + xe->info.has_pre_prod_wa = desc->has_pre_prod_wa; 693 689 xe->info.has_pxp = desc->has_pxp; 694 690 xe->info.has_sriov = xe_configfs_primary_gt_allowed(to_pci_dev(xe->drm.dev)) && 695 691 desc->has_sriov;
+1
drivers/gpu/drm/xe/xe_pci_types.h
··· 50 50 u8 has_mbx_power_limits:1; 51 51 u8 has_mem_copy_instr:1; 52 52 u8 has_mert:1; 53 + u8 has_pre_prod_wa:1; 53 54 u8 has_page_reclaim_hw_assist:1; 54 55 u8 has_pxp:1; 55 56 u8 has_sriov:1;