Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu updates from Joerg Roedel:
"Core Updates:
- Convert call-sites using iommu_domain_alloc() to more specific
versions and remove function
- Introduce iommu_paging_domain_alloc_flags()
- Extend support for allocating PASID-capable domains to more drivers
- Remove iommu_present()
- Some smaller improvements

New IOMMU driver for RISC-V

Intel VT-d Updates:
- Add domain_alloc_paging support
- Enable user space IOPFs in non-PASID and non-svm cases
- Small code refactoring and cleanups
- Add domain replacement support for pasid

AMD-Vi Updates:
- Adapt to iommu_paging_domain_alloc_flags() interface and alloc V2
page-tables by default
- Replace custom domain ID allocator with IDA allocator
- Add ops->release_domain() support
- Other improvements to device attach and domain allocation code
paths

ARM-SMMU Updates:
- SMMUv2:
- Return -EPROBE_DEFER for client devices probing before their
SMMU
- Devicetree binding updates for Qualcomm MMU-500 implementations
- SMMUv3:
- Minor fixes and cleanup for NVIDIA's virtual command queue
driver
- IO-PGTable:
- Fix indexing of concatenated PGDs and extend selftest coverage
- Remove unused block-splitting support

S390 IOMMU:
- Implement support for blocking domain

Mediatek IOMMU:
- Enable 35-bit physical address support for mt8186

OMAP IOMMU driver:
- Adapt to recent IOMMU core changes and unbreak driver"

* tag 'iommu-updates-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (92 commits)
iommu/tegra241-cmdqv: Fix alignment failure at max_n_shift
iommu: Make set_dev_pasid op support domain replacement
iommu/arm-smmu-v3: Make set_dev_pasid() op support replace
iommu/vt-d: Add set_dev_pasid callback for nested domain
iommu/vt-d: Make identity_domain_set_dev_pasid() to handle domain replacement
iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement
iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain
iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement
iommu/vt-d: Add iommu_domain_did() to get did
iommu/vt-d: Consolidate the struct dev_pasid_info add/remove
iommu/vt-d: Add pasid replace helpers
iommu/vt-d: Refactor the pasid setup helpers
iommu/vt-d: Add a helper to flush cache for updating present pasid entry
iommu: Pass old domain to set_dev_pasid op
iommu/iova: Fix typo 'adderss'
iommu: Add a kdoc to iommu_unmap()
iommu/io-pgtable-arm-v7s: Remove split on unmap behavior
iommu/io-pgtable-arm: Remove split on unmap behavior
iommu/vt-d: Drain PRQs when domain removed from RID
iommu/vt-d: Drop pasid requirement for prq initialization
...

+4636 -1650
+5
Documentation/devicetree/bindings/iommu/arm,smmu.yaml
··· 36 36 items: 37 37 - enum: 38 38 - qcom,qcm2290-smmu-500 39 + - qcom,qcs615-smmu-500 39 40 - qcom,qcs8300-smmu-500 40 41 - qcom,qdu1000-smmu-500 41 42 - qcom,sa8255p-smmu-500 42 43 - qcom,sa8775p-smmu-500 44 + - qcom,sar2130p-smmu-500 43 45 - qcom,sc7180-smmu-500 44 46 - qcom,sc7280-smmu-500 45 47 - qcom,sc8180x-smmu-500 ··· 90 88 - qcom,qcm2290-smmu-500 91 89 - qcom,sa8255p-smmu-500 92 90 - qcom,sa8775p-smmu-500 91 + - qcom,sar2130p-smmu-500 93 92 - qcom,sc7280-smmu-500 94 93 - qcom,sc8180x-smmu-500 95 94 - qcom,sc8280xp-smmu-500 ··· 527 524 compatible: 528 525 items: 529 526 - enum: 527 + - qcom,sar2130p-smmu-500 530 528 - qcom,sm8550-smmu-500 531 529 - qcom,sm8650-smmu-500 532 530 - qcom,x1e80100-smmu-500 ··· 559 555 - cavium,smmu-v2 560 556 - marvell,ap806-smmu-500 561 557 - nvidia,smmu-500 558 + - qcom,qcs615-smmu-500 562 559 - qcom,qcs8300-smmu-500 563 560 - qcom,qdu1000-smmu-500 564 561 - qcom,sa8255p-smmu-500
+147
Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: RISC-V IOMMU Architecture Implementation 8 + 9 + maintainers: 10 + - Tomasz Jeznach <tjeznach@rivosinc.com> 11 + 12 + description: | 13 + The RISC-V IOMMU provides memory address translation and isolation for 14 + input and output devices, supporting per-device translation context, 15 + shared process address spaces including the ATS and PRI components of 16 + the PCIe specification, two stage address translation and MSI remapping. 17 + It supports identical translation table format to the RISC-V address 18 + translation tables with page level access and protection attributes. 19 + Hardware uses in-memory command and fault reporting queues with wired 20 + interrupt or MSI notifications. 21 + 22 + Visit https://github.com/riscv-non-isa/riscv-iommu for more details. 23 + 24 + For information on assigning RISC-V IOMMU to its peripheral devices, 25 + see generic IOMMU bindings. 26 + 27 + properties: 28 + # For PCIe IOMMU hardware compatible property should contain the vendor 29 + # and device ID according to the PCI Bus Binding specification. 30 + # Since PCI provides built-in identification methods, compatible is not 31 + # actually required. For non-PCIe hardware implementations 'riscv,iommu' 32 + # should be specified along with 'reg' property providing MMIO location. 33 + compatible: 34 + oneOf: 35 + - items: 36 + - enum: 37 + - qemu,riscv-iommu 38 + - const: riscv,iommu 39 + - items: 40 + - enum: 41 + - pci1efd,edf1 42 + - const: riscv,pci-iommu 43 + 44 + reg: 45 + maxItems: 1 46 + description: 47 + For non-PCI devices this represents base address and size of for the 48 + IOMMU memory mapped registers interface. 49 + For PCI IOMMU hardware implementation this should represent an address 50 + of the IOMMU, as defined in the PCI Bus Binding reference. 51 + 52 + '#iommu-cells': 53 + const: 1 54 + description: 55 + The single cell describes the requester id emitted by a master to the 56 + IOMMU. 57 + 58 + interrupts: 59 + minItems: 1 60 + maxItems: 4 61 + description: 62 + Wired interrupt vectors available for RISC-V IOMMU to notify the 63 + RISC-V HARTS. The cause to interrupt vector is software defined 64 + using IVEC IOMMU register. 65 + 66 + msi-parent: true 67 + 68 + power-domains: 69 + maxItems: 1 70 + 71 + required: 72 + - compatible 73 + - reg 74 + - '#iommu-cells' 75 + 76 + additionalProperties: false 77 + 78 + examples: 79 + - |+ 80 + /* Example 1 (IOMMU device with wired interrupts) */ 81 + #include <dt-bindings/interrupt-controller/irq.h> 82 + 83 + iommu1: iommu@1bccd000 { 84 + compatible = "qemu,riscv-iommu", "riscv,iommu"; 85 + reg = <0x1bccd000 0x1000>; 86 + interrupt-parent = <&aplic_smode>; 87 + interrupts = <32 IRQ_TYPE_LEVEL_HIGH>, 88 + <33 IRQ_TYPE_LEVEL_HIGH>, 89 + <34 IRQ_TYPE_LEVEL_HIGH>, 90 + <35 IRQ_TYPE_LEVEL_HIGH>; 91 + #iommu-cells = <1>; 92 + }; 93 + 94 + /* Device with two IOMMU device IDs, 0 and 7 */ 95 + master1 { 96 + iommus = <&iommu1 0>, <&iommu1 7>; 97 + }; 98 + 99 + - |+ 100 + /* Example 2 (IOMMU device with shared wired interrupt) */ 101 + #include <dt-bindings/interrupt-controller/irq.h> 102 + 103 + iommu2: iommu@1bccd000 { 104 + compatible = "qemu,riscv-iommu", "riscv,iommu"; 105 + reg = <0x1bccd000 0x1000>; 106 + interrupt-parent = <&aplic_smode>; 107 + interrupts = <32 IRQ_TYPE_LEVEL_HIGH>; 108 + #iommu-cells = <1>; 109 + }; 110 + 111 + - |+ 112 + /* Example 3 (IOMMU device with MSIs) */ 113 + iommu3: iommu@1bcdd000 { 114 + compatible = "qemu,riscv-iommu", "riscv,iommu"; 115 + reg = <0x1bccd000 0x1000>; 116 + msi-parent = <&imsics_smode>; 117 + #iommu-cells = <1>; 118 + }; 119 + 120 + - |+ 121 + /* Example 4 (IOMMU PCIe device with MSIs) */ 122 + bus { 123 + #address-cells = <2>; 124 + #size-cells = <2>; 125 + 126 + pcie@30000000 { 127 + device_type = "pci"; 128 + #address-cells = <3>; 129 + #size-cells = <2>; 130 + reg = <0x0 0x30000000 0x0 0x1000000>; 131 + ranges = <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x0f000000>; 132 + 133 + /* 134 + * The IOMMU manages all functions in this PCI domain except 135 + * itself. Omit BDF 00:01.0. 136 + */ 137 + iommu-map = <0x0 &iommu0 0x0 0x8>, 138 + <0x9 &iommu0 0x9 0xfff7>; 139 + 140 + /* The IOMMU programming interface uses slot 00:01.0 */ 141 + iommu0: iommu@1,0 { 142 + compatible = "pci1efd,edf1", "riscv,pci-iommu"; 143 + reg = <0x800 0 0 0 0>; 144 + #iommu-cells = <1>; 145 + }; 146 + }; 147 + };
+9
MAINTAINERS
··· 19992 19992 N: riscv 19993 19993 K: riscv 19994 19994 19995 + RISC-V IOMMU 19996 + M: Tomasz Jeznach <tjeznach@rivosinc.com> 19997 + L: iommu@lists.linux.dev 19998 + L: linux-riscv@lists.infradead.org 19999 + S: Maintained 20000 + T: git git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git 20001 + F: Documentation/devicetree/bindings/iommu/riscv,iommu.yaml 20002 + F: drivers/iommu/riscv/ 20003 + 19995 20004 RISC-V MICROCHIP FPGA SUPPORT 19996 20005 M: Conor Dooley <conor.dooley@microchip.com> 19997 20006 M: Daire McNamara <daire.mcnamara@microchip.com>
+2 -2
arch/s390/include/asm/pci.h
··· 96 96 u8 size; /* order 2 exponent */ 97 97 }; 98 98 99 - struct s390_domain; 100 99 struct kvm_zdev; 101 100 102 101 #define ZPCI_FUNCTIONS_PER_BUS 256 ··· 185 186 struct dentry *debugfs_dev; 186 187 187 188 /* IOMMU and passthrough */ 188 - struct s390_domain *s390_domain; /* s390 IOMMU domain data */ 189 + struct iommu_domain *s390_domain; /* attached IOMMU domain */ 189 190 struct kvm_zdev *kzdev; 190 191 struct mutex kzdev_lock; 192 + spinlock_t dom_lock; /* protect s390_domain change */ 191 193 }; 192 194 193 195 static inline bool zdev_enabled(struct zpci_dev *zdev)
+3
arch/s390/pci/pci.c
··· 161 161 u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_SET_MEASURE); 162 162 struct zpci_iommu_ctrs *ctrs; 163 163 struct zpci_fib fib = {0}; 164 + unsigned long flags; 164 165 u8 cc, status; 165 166 166 167 if (zdev->fmb || sizeof(*zdev->fmb) < zdev->fmb_length) ··· 173 172 WARN_ON((u64) zdev->fmb & 0xf); 174 173 175 174 /* reset software counters */ 175 + spin_lock_irqsave(&zdev->dom_lock, flags); 176 176 ctrs = zpci_get_iommu_ctrs(zdev); 177 177 if (ctrs) { 178 178 atomic64_set(&ctrs->mapped_pages, 0); ··· 182 180 atomic64_set(&ctrs->sync_map_rpcits, 0); 183 181 atomic64_set(&ctrs->sync_rpcits, 0); 184 182 } 183 + spin_unlock_irqrestore(&zdev->dom_lock, flags); 185 184 186 185 187 186 fib.fmb_addr = virt_to_phys(zdev->fmb);
+8 -2
arch/s390/pci/pci_debug.c
··· 71 71 72 72 static void pci_sw_counter_show(struct seq_file *m) 73 73 { 74 - struct zpci_iommu_ctrs *ctrs = zpci_get_iommu_ctrs(m->private); 74 + struct zpci_dev *zdev = m->private; 75 + struct zpci_iommu_ctrs *ctrs; 75 76 atomic64_t *counter; 77 + unsigned long flags; 76 78 int i; 77 79 80 + spin_lock_irqsave(&zdev->dom_lock, flags); 81 + ctrs = zpci_get_iommu_ctrs(m->private); 78 82 if (!ctrs) 79 - return; 83 + goto unlock; 80 84 81 85 counter = &ctrs->mapped_pages; 82 86 for (i = 0; i < ARRAY_SIZE(pci_sw_names); i++, counter++) 83 87 seq_printf(m, "%26s:\t%llu\n", pci_sw_names[i], 84 88 atomic64_read(counter)); 89 + unlock: 90 + spin_unlock_irqrestore(&zdev->dom_lock, flags); 85 91 } 86 92 87 93 static int pci_perf_show(struct seq_file *m, void *v)
+1
drivers/iommu/Kconfig
··· 195 195 source "drivers/iommu/amd/Kconfig" 196 196 source "drivers/iommu/intel/Kconfig" 197 197 source "drivers/iommu/iommufd/Kconfig" 198 + source "drivers/iommu/riscv/Kconfig" 198 199 199 200 config IRQ_REMAP 200 201 bool "Support for Interrupt Remapping"
+1 -1
drivers/iommu/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - obj-y += amd/ intel/ arm/ iommufd/ 2 + obj-y += amd/ intel/ arm/ iommufd/ riscv/ 3 3 obj-$(CONFIG_IOMMU_API) += iommu.o 4 4 obj-$(CONFIG_IOMMU_API) += iommu-traces.o 5 5 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+9 -2
drivers/iommu/amd/amd_iommu.h
··· 46 46 extern unsigned long amd_iommu_pgsize_bitmap; 47 47 48 48 /* Protection domain ops */ 49 + void amd_iommu_init_identity_domain(void); 49 50 struct protection_domain *protection_domain_alloc(unsigned int type, int nid); 50 51 void protection_domain_free(struct protection_domain *domain); 51 52 struct iommu_domain *amd_iommu_domain_alloc_sva(struct device *dev, 52 53 struct mm_struct *mm); 53 54 void amd_iommu_domain_free(struct iommu_domain *dom); 54 55 int iommu_sva_set_dev_pasid(struct iommu_domain *domain, 55 - struct device *dev, ioasid_t pasid); 56 + struct device *dev, ioasid_t pasid, 57 + struct iommu_domain *old); 56 58 void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 57 59 struct iommu_domain *domain); 58 60 ··· 120 118 return (amd_iommu_efr2 & mask); 121 119 } 122 120 121 + static inline bool amd_iommu_v2_pgtbl_supported(void) 122 + { 123 + return (check_feature(FEATURE_GIOSUP) && check_feature(FEATURE_GT)); 124 + } 125 + 123 126 static inline bool amd_iommu_gt_ppr_supported(void) 124 127 { 125 - return (check_feature(FEATURE_GT) && 128 + return (amd_iommu_v2_pgtbl_supported() && 126 129 check_feature(FEATURE_PPR) && 127 130 check_feature(FEATURE_EPHSUP)); 128 131 }
+11 -12
drivers/iommu/amd/amd_iommu_types.h
··· 565 565 struct list_head list; 566 566 }; 567 567 568 + /* Keeps track of the IOMMUs attached to protection domain */ 569 + struct pdom_iommu_info { 570 + struct amd_iommu *iommu; /* IOMMUs attach to protection domain */ 571 + u32 refcnt; /* Count of attached dev/pasid per domain/IOMMU */ 572 + }; 573 + 568 574 /* 569 575 * This structure contains generic data for IOMMU protection domains 570 576 * independent of their use. ··· 584 578 u16 id; /* the domain id written to the device table */ 585 579 enum protection_domain_mode pd_mode; /* Track page table type */ 586 580 bool dirty_tracking; /* dirty tracking is enabled in the domain */ 587 - unsigned dev_cnt; /* devices assigned to this domain */ 588 - unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */ 581 + struct xarray iommu_array; /* per-IOMMU reference count */ 589 582 590 583 struct mmu_notifier mn; /* mmu notifier for the SVA domain */ 591 584 struct list_head dev_data_list; /* List of pdom_dev_data */ ··· 836 831 */ 837 832 struct iommu_dev_data { 838 833 /*Protect against attach/detach races */ 839 - spinlock_t lock; 834 + struct mutex mutex; 840 835 841 836 struct list_head list; /* For domain->dev_list */ 842 837 struct llist_node dev_data_list; /* For global dev_data_list */ ··· 878 873 extern struct list_head amd_iommu_list; 879 874 880 875 /* 881 - * Array with pointers to each IOMMU struct 882 - * The indices are referenced in the protection domains 883 - */ 884 - extern struct amd_iommu *amd_iommus[MAX_IOMMUS]; 885 - 886 - /* 887 876 * Structure defining one entry in the device table 888 877 */ 889 878 struct dev_table_entry { ··· 911 912 /* size of the dma_ops aperture as power of 2 */ 912 913 extern unsigned amd_iommu_aperture_order; 913 914 914 - /* allocation bitmap for domain ids */ 915 - extern unsigned long *amd_iommu_pd_alloc_bitmap; 916 - 917 915 extern bool amd_iommu_force_isolation; 918 916 919 917 /* Max levels of glxval supported */ 920 918 extern int amd_iommu_max_glx_val; 919 + 920 + /* IDA to track protection domain IDs */ 921 + extern struct ida pdom_ids; 921 922 922 923 /* Global EFR and EFR2 registers */ 923 924 extern u64 amd_iommu_efr;
+19 -44
drivers/iommu/amd/init.c
··· 177 177 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the 178 178 system */ 179 179 180 - /* Array to assign indices to IOMMUs*/ 181 - struct amd_iommu *amd_iommus[MAX_IOMMUS]; 182 - 183 180 /* Number of IOMMUs present in the system */ 184 181 static int amd_iommus_present; 185 182 ··· 190 193 bool amd_iommu_force_isolation __read_mostly; 191 194 192 195 unsigned long amd_iommu_pgsize_bitmap __ro_after_init = AMD_IOMMU_PGSIZES; 193 - 194 - /* 195 - * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap 196 - * to know which ones are already in use. 197 - */ 198 - unsigned long *amd_iommu_pd_alloc_bitmap; 199 196 200 197 enum iommu_init_state { 201 198 IOMMU_START_STATE, ··· 1073 1082 if (dte_v && dom_id) { 1074 1083 pci_seg->old_dev_tbl_cpy[devid].data[0] = old_devtb[devid].data[0]; 1075 1084 pci_seg->old_dev_tbl_cpy[devid].data[1] = old_devtb[devid].data[1]; 1076 - __set_bit(dom_id, amd_iommu_pd_alloc_bitmap); 1085 + /* Reserve the Domain IDs used by previous kernel */ 1086 + if (ida_alloc_range(&pdom_ids, dom_id, dom_id, GFP_ATOMIC) != dom_id) { 1087 + pr_err("Failed to reserve domain ID 0x%x\n", dom_id); 1088 + memunmap(old_devtb); 1089 + return false; 1090 + } 1077 1091 /* If gcr3 table existed, mask it out */ 1078 1092 if (old_devtb[devid].data[0] & DTE_FLAG_GV) { 1079 1093 tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B; ··· 1740 1744 return -ENOSYS; 1741 1745 } 1742 1746 1743 - /* Index is fine - add IOMMU to the array */ 1744 - amd_iommus[iommu->index] = iommu; 1745 - 1746 1747 /* 1747 1748 * Copy data from ACPI table entry to the iommu struct 1748 1749 */ ··· 2063 2070 2064 2071 init_iommu_perf_ctr(iommu); 2065 2072 2066 - if (amd_iommu_pgtable == AMD_IOMMU_V2) { 2067 - if (!check_feature(FEATURE_GIOSUP) || 2068 - !check_feature(FEATURE_GT)) { 2069 - pr_warn("Cannot enable v2 page table for DMA-API. Fallback to v1.\n"); 2070 - amd_iommu_pgtable = AMD_IOMMU_V1; 2071 - } 2072 - } 2073 - 2074 2073 if (is_rd890_iommu(iommu->dev)) { 2075 2074 int i, j; 2076 2075 ··· 2156 2171 struct amd_iommu *iommu; 2157 2172 struct amd_iommu_pci_seg *pci_seg; 2158 2173 int ret; 2174 + 2175 + /* Init global identity domain before registering IOMMU */ 2176 + amd_iommu_init_identity_domain(); 2159 2177 2160 2178 for_each_iommu(iommu) { 2161 2179 ret = iommu_init_pci(iommu); ··· 2870 2882 #endif 2871 2883 } 2872 2884 2873 - static void enable_iommus(void) 2874 - { 2875 - early_enable_iommus(); 2876 - } 2877 - 2878 2885 static void disable_iommus(void) 2879 2886 { 2880 2887 struct amd_iommu *iommu; ··· 2896 2913 iommu_apply_resume_quirks(iommu); 2897 2914 2898 2915 /* re-load the hardware */ 2899 - enable_iommus(); 2916 + for_each_iommu(iommu) 2917 + early_enable_iommu(iommu); 2900 2918 2901 2919 amd_iommu_enable_interrupts(); 2902 2920 } ··· 2978 2994 2979 2995 static void __init free_dma_resources(void) 2980 2996 { 2981 - iommu_free_pages(amd_iommu_pd_alloc_bitmap, 2982 - get_order(MAX_DOMAIN_ID / 8)); 2983 - amd_iommu_pd_alloc_bitmap = NULL; 2997 + ida_destroy(&pdom_ids); 2984 2998 2985 2999 free_unity_maps(); 2986 3000 } ··· 3046 3064 amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base); 3047 3065 DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type); 3048 3066 3049 - /* Device table - directly used by all IOMMUs */ 3050 - ret = -ENOMEM; 3051 - 3052 - amd_iommu_pd_alloc_bitmap = iommu_alloc_pages(GFP_KERNEL, 3053 - get_order(MAX_DOMAIN_ID / 8)); 3054 - if (amd_iommu_pd_alloc_bitmap == NULL) 3055 - goto out; 3056 - 3057 - /* 3058 - * never allocate domain 0 because its used as the non-allocated and 3059 - * error value placeholder 3060 - */ 3061 - __set_bit(0, amd_iommu_pd_alloc_bitmap); 3062 - 3063 3067 /* 3064 3068 * now the data structures are allocated and basically initialized 3065 3069 * start the real acpi table scan ··· 3058 3090 if (cpu_feature_enabled(X86_FEATURE_LA57) && 3059 3091 FIELD_GET(FEATURE_GATS, amd_iommu_efr) == GUEST_PGTABLE_5_LEVEL) 3060 3092 amd_iommu_gpt_level = PAGE_MODE_5_LEVEL; 3093 + 3094 + if (amd_iommu_pgtable == AMD_IOMMU_V2) { 3095 + if (!amd_iommu_v2_pgtbl_supported()) { 3096 + pr_warn("Cannot enable v2 page table for DMA-API. Fallback to v1.\n"); 3097 + amd_iommu_pgtable = AMD_IOMMU_V1; 3098 + } 3099 + } 3061 3100 3062 3101 /* Disable any previously enabled IOMMUs */ 3063 3102 if (!is_kdump_kernel() || amd_iommu_disabled)
+8 -3
drivers/iommu/amd/io_pgtable.c
··· 118 118 */ 119 119 static bool increase_address_space(struct amd_io_pgtable *pgtable, 120 120 unsigned long address, 121 + unsigned int page_size_level, 121 122 gfp_t gfp) 122 123 { 123 124 struct io_pgtable_cfg *cfg = &pgtable->pgtbl.cfg; ··· 134 133 135 134 spin_lock_irqsave(&domain->lock, flags); 136 135 137 - if (address <= PM_LEVEL_SIZE(pgtable->mode)) 136 + if (address <= PM_LEVEL_SIZE(pgtable->mode) && 137 + pgtable->mode - 1 >= page_size_level) 138 138 goto out; 139 139 140 140 ret = false; ··· 165 163 gfp_t gfp, 166 164 bool *updated) 167 165 { 166 + unsigned long last_addr = address + (page_size - 1); 168 167 struct io_pgtable_cfg *cfg = &pgtable->pgtbl.cfg; 169 168 int level, end_lvl; 170 169 u64 *pte, *page; 171 170 172 171 BUG_ON(!is_power_of_2(page_size)); 173 172 174 - while (address > PM_LEVEL_SIZE(pgtable->mode)) { 173 + while (last_addr > PM_LEVEL_SIZE(pgtable->mode) || 174 + pgtable->mode - 1 < PAGE_SIZE_LEVEL(page_size)) { 175 175 /* 176 176 * Return an error if there is no memory to update the 177 177 * page-table. 178 178 */ 179 - if (!increase_address_space(pgtable, address, gfp)) 179 + if (!increase_address_space(pgtable, last_addr, 180 + PAGE_SIZE_LEVEL(page_size), gfp)) 180 181 return NULL; 181 182 } 182 183
+3
drivers/iommu/amd/io_pgtable_v2.c
··· 268 268 out: 269 269 if (updated) { 270 270 struct protection_domain *pdom = io_pgtable_ops_to_domain(ops); 271 + unsigned long flags; 271 272 273 + spin_lock_irqsave(&pdom->lock, flags); 272 274 amd_iommu_domain_flush_pages(pdom, o_iova, size); 275 + spin_unlock_irqrestore(&pdom->lock, flags); 273 276 } 274 277 275 278 if (mapped)
+275 -234
drivers/iommu/amd/iommu.c
··· 18 18 #include <linux/scatterlist.h> 19 19 #include <linux/dma-map-ops.h> 20 20 #include <linux/dma-direct.h> 21 + #include <linux/idr.h> 21 22 #include <linux/iommu-helper.h> 22 23 #include <linux/delay.h> 23 24 #include <linux/amd-iommu.h> ··· 53 52 #define HT_RANGE_START (0xfd00000000ULL) 54 53 #define HT_RANGE_END (0xffffffffffULL) 55 54 56 - static DEFINE_SPINLOCK(pd_bitmap_lock); 57 - 58 55 LIST_HEAD(ioapic_map); 59 56 LIST_HEAD(hpet_map); 60 57 LIST_HEAD(acpihid_map); ··· 69 70 u32 data[4]; 70 71 }; 71 72 73 + /* 74 + * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap 75 + * to know which ones are already in use. 76 + */ 77 + DEFINE_IDA(pdom_ids); 78 + 72 79 struct kmem_cache *amd_iommu_irq_cache; 73 80 74 - static void detach_device(struct device *dev); 81 + static int amd_iommu_attach_device(struct iommu_domain *dom, 82 + struct device *dev); 75 83 76 84 static void set_dte_entry(struct amd_iommu *iommu, 77 85 struct iommu_dev_data *dev_data); ··· 208 202 if (!dev_data) 209 203 return NULL; 210 204 211 - spin_lock_init(&dev_data->lock); 205 + mutex_init(&dev_data->mutex); 212 206 dev_data->devid = devid; 213 207 ratelimit_default_init(&dev_data->rs); 214 208 ··· 561 555 setup_aliases(iommu, dev); 562 556 } 563 557 564 - static void amd_iommu_uninit_device(struct device *dev) 565 - { 566 - struct iommu_dev_data *dev_data; 567 - 568 - dev_data = dev_iommu_priv_get(dev); 569 - if (!dev_data) 570 - return; 571 - 572 - if (dev_data->domain) 573 - detach_device(dev); 574 - 575 - /* 576 - * We keep dev_data around for unplugged devices and reuse it when the 577 - * device is re-plugged - not doing so would introduce a ton of races. 578 - */ 579 - } 580 558 581 559 /**************************************************************************** 582 560 * ··· 1220 1230 if (!iommu->need_sync) 1221 1231 return 0; 1222 1232 1223 - data = atomic64_add_return(1, &iommu->cmd_sem_val); 1233 + data = atomic64_inc_return(&iommu->cmd_sem_val); 1224 1234 build_completion_wait(&cmd, iommu, data); 1225 1235 1226 1236 raw_spin_lock_irqsave(&iommu->lock, flags); ··· 1239 1249 1240 1250 static void domain_flush_complete(struct protection_domain *domain) 1241 1251 { 1242 - int i; 1252 + struct pdom_iommu_info *pdom_iommu_info; 1253 + unsigned long i; 1243 1254 1244 - for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { 1245 - if (domain && !domain->dev_iommu[i]) 1246 - continue; 1255 + lockdep_assert_held(&domain->lock); 1247 1256 1248 - /* 1249 - * Devices of this domain are behind this IOMMU 1250 - * We need to wait for completion of all commands. 1251 - */ 1252 - iommu_completion_wait(amd_iommus[i]); 1253 - } 1257 + /* 1258 + * Devices of this domain are behind this IOMMU 1259 + * We need to wait for completion of all commands. 1260 + */ 1261 + xa_for_each(&domain->iommu_array, i, pdom_iommu_info) 1262 + iommu_completion_wait(pdom_iommu_info->iommu); 1254 1263 } 1255 1264 1256 1265 static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid) ··· 1431 1442 static int domain_flush_pages_v1(struct protection_domain *pdom, 1432 1443 u64 address, size_t size) 1433 1444 { 1445 + struct pdom_iommu_info *pdom_iommu_info; 1434 1446 struct iommu_cmd cmd; 1435 - int ret = 0, i; 1447 + int ret = 0; 1448 + unsigned long i; 1449 + 1450 + lockdep_assert_held(&pdom->lock); 1436 1451 1437 1452 build_inv_iommu_pages(&cmd, address, size, 1438 1453 pdom->id, IOMMU_NO_PASID, false); 1439 1454 1440 - for (i = 0; i < amd_iommu_get_num_iommus(); ++i) { 1441 - if (!pdom->dev_iommu[i]) 1442 - continue; 1443 - 1455 + xa_for_each(&pdom->iommu_array, i, pdom_iommu_info) { 1444 1456 /* 1445 1457 * Devices of this domain are behind this IOMMU 1446 1458 * We need a TLB flush 1447 1459 */ 1448 - ret |= iommu_queue_command(amd_iommus[i], &cmd); 1460 + ret |= iommu_queue_command(pdom_iommu_info->iommu, &cmd); 1449 1461 } 1450 1462 1451 1463 return ret; ··· 1485 1495 void amd_iommu_domain_flush_pages(struct protection_domain *domain, 1486 1496 u64 address, size_t size) 1487 1497 { 1498 + lockdep_assert_held(&domain->lock); 1499 + 1488 1500 if (likely(!amd_iommu_np_cache)) { 1489 1501 __domain_flush_pages(domain, address, size); 1490 1502 ··· 1632 1640 * 1633 1641 ****************************************************************************/ 1634 1642 1635 - static u16 domain_id_alloc(void) 1643 + static int pdom_id_alloc(void) 1636 1644 { 1637 - unsigned long flags; 1638 - int id; 1639 - 1640 - spin_lock_irqsave(&pd_bitmap_lock, flags); 1641 - id = find_first_zero_bit(amd_iommu_pd_alloc_bitmap, MAX_DOMAIN_ID); 1642 - BUG_ON(id == 0); 1643 - if (id > 0 && id < MAX_DOMAIN_ID) 1644 - __set_bit(id, amd_iommu_pd_alloc_bitmap); 1645 - else 1646 - id = 0; 1647 - spin_unlock_irqrestore(&pd_bitmap_lock, flags); 1648 - 1649 - return id; 1645 + return ida_alloc_range(&pdom_ids, 1, MAX_DOMAIN_ID - 1, GFP_ATOMIC); 1650 1646 } 1651 1647 1652 - static void domain_id_free(int id) 1648 + static void pdom_id_free(int id) 1653 1649 { 1654 - unsigned long flags; 1655 - 1656 - spin_lock_irqsave(&pd_bitmap_lock, flags); 1657 - if (id > 0 && id < MAX_DOMAIN_ID) 1658 - __clear_bit(id, amd_iommu_pd_alloc_bitmap); 1659 - spin_unlock_irqrestore(&pd_bitmap_lock, flags); 1650 + ida_free(&pdom_ids, id); 1660 1651 } 1661 1652 1662 1653 static void free_gcr3_tbl_level1(u64 *tbl) ··· 1684 1709 gcr3_info->glx = 0; 1685 1710 1686 1711 /* Free per device domain ID */ 1687 - domain_id_free(gcr3_info->domid); 1712 + pdom_id_free(gcr3_info->domid); 1688 1713 1689 1714 iommu_free_page(gcr3_info->gcr3_tbl); 1690 1715 gcr3_info->gcr3_tbl = NULL; ··· 1711 1736 { 1712 1737 int levels = get_gcr3_levels(pasids); 1713 1738 int nid = iommu ? dev_to_node(&iommu->dev->dev) : NUMA_NO_NODE; 1739 + int domid; 1714 1740 1715 1741 if (levels > amd_iommu_max_glx_val) 1716 1742 return -EINVAL; ··· 1720 1744 return -EBUSY; 1721 1745 1722 1746 /* Allocate per device domain ID */ 1723 - gcr3_info->domid = domain_id_alloc(); 1747 + domid = pdom_id_alloc(); 1748 + if (domid <= 0) 1749 + return -ENOSPC; 1750 + gcr3_info->domid = domid; 1724 1751 1725 1752 gcr3_info->gcr3_tbl = iommu_alloc_page_node(nid, GFP_ATOMIC); 1726 1753 if (gcr3_info->gcr3_tbl == NULL) { 1727 - domain_id_free(gcr3_info->domid); 1754 + pdom_id_free(domid); 1728 1755 return -ENOMEM; 1729 1756 } 1730 1757 ··· 1998 2019 free_gcr3_table(gcr3_info); 1999 2020 } 2000 2021 2001 - static int do_attach(struct iommu_dev_data *dev_data, 2002 - struct protection_domain *domain) 2022 + static int pdom_attach_iommu(struct amd_iommu *iommu, 2023 + struct protection_domain *pdom) 2003 2024 { 2004 - struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2005 - struct io_pgtable_cfg *cfg = &domain->iop.pgtbl.cfg; 2025 + struct pdom_iommu_info *pdom_iommu_info, *curr; 2026 + struct io_pgtable_cfg *cfg = &pdom->iop.pgtbl.cfg; 2027 + unsigned long flags; 2006 2028 int ret = 0; 2007 2029 2008 - /* Update data structures */ 2009 - dev_data->domain = domain; 2010 - list_add(&dev_data->list, &domain->dev_list); 2030 + spin_lock_irqsave(&pdom->lock, flags); 2031 + 2032 + pdom_iommu_info = xa_load(&pdom->iommu_array, iommu->index); 2033 + if (pdom_iommu_info) { 2034 + pdom_iommu_info->refcnt++; 2035 + goto out_unlock; 2036 + } 2037 + 2038 + pdom_iommu_info = kzalloc(sizeof(*pdom_iommu_info), GFP_ATOMIC); 2039 + if (!pdom_iommu_info) { 2040 + ret = -ENOMEM; 2041 + goto out_unlock; 2042 + } 2043 + 2044 + pdom_iommu_info->iommu = iommu; 2045 + pdom_iommu_info->refcnt = 1; 2046 + 2047 + curr = xa_cmpxchg(&pdom->iommu_array, iommu->index, 2048 + NULL, pdom_iommu_info, GFP_ATOMIC); 2049 + if (curr) { 2050 + kfree(pdom_iommu_info); 2051 + ret = -ENOSPC; 2052 + goto out_unlock; 2053 + } 2011 2054 2012 2055 /* Update NUMA Node ID */ 2013 2056 if (cfg->amd.nid == NUMA_NO_NODE) 2014 - cfg->amd.nid = dev_to_node(dev_data->dev); 2057 + cfg->amd.nid = dev_to_node(&iommu->dev->dev); 2015 2058 2016 - /* Do reference counting */ 2017 - domain->dev_iommu[iommu->index] += 1; 2018 - domain->dev_cnt += 1; 2019 - 2020 - /* Setup GCR3 table */ 2021 - if (pdom_is_sva_capable(domain)) { 2022 - ret = init_gcr3_table(dev_data, domain); 2023 - if (ret) 2024 - return ret; 2025 - } 2026 - 2059 + out_unlock: 2060 + spin_unlock_irqrestore(&pdom->lock, flags); 2027 2061 return ret; 2028 2062 } 2029 2063 2030 - static void do_detach(struct iommu_dev_data *dev_data) 2064 + static void pdom_detach_iommu(struct amd_iommu *iommu, 2065 + struct protection_domain *pdom) 2031 2066 { 2032 - struct protection_domain *domain = dev_data->domain; 2033 - struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2067 + struct pdom_iommu_info *pdom_iommu_info; 2068 + unsigned long flags; 2034 2069 2035 - /* Clear DTE and flush the entry */ 2036 - dev_update_dte(dev_data, false); 2070 + spin_lock_irqsave(&pdom->lock, flags); 2037 2071 2038 - /* Flush IOTLB and wait for the flushes to finish */ 2039 - amd_iommu_domain_flush_all(domain); 2072 + pdom_iommu_info = xa_load(&pdom->iommu_array, iommu->index); 2073 + if (!pdom_iommu_info) { 2074 + spin_unlock_irqrestore(&pdom->lock, flags); 2075 + return; 2076 + } 2040 2077 2041 - /* Clear GCR3 table */ 2042 - if (pdom_is_sva_capable(domain)) 2043 - destroy_gcr3_table(dev_data, domain); 2078 + pdom_iommu_info->refcnt--; 2079 + if (pdom_iommu_info->refcnt == 0) { 2080 + xa_erase(&pdom->iommu_array, iommu->index); 2081 + kfree(pdom_iommu_info); 2082 + } 2044 2083 2045 - /* Update data structures */ 2046 - dev_data->domain = NULL; 2047 - list_del(&dev_data->list); 2048 - 2049 - /* decrease reference counters - needs to happen after the flushes */ 2050 - domain->dev_iommu[iommu->index] -= 1; 2051 - domain->dev_cnt -= 1; 2084 + spin_unlock_irqrestore(&pdom->lock, flags); 2052 2085 } 2053 2086 2054 2087 /* ··· 2070 2079 static int attach_device(struct device *dev, 2071 2080 struct protection_domain *domain) 2072 2081 { 2073 - struct iommu_dev_data *dev_data; 2074 - unsigned long flags; 2082 + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 2083 + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2084 + struct pci_dev *pdev; 2075 2085 int ret = 0; 2076 2086 2077 - spin_lock_irqsave(&domain->lock, flags); 2078 - 2079 - dev_data = dev_iommu_priv_get(dev); 2080 - 2081 - spin_lock(&dev_data->lock); 2087 + mutex_lock(&dev_data->mutex); 2082 2088 2083 2089 if (dev_data->domain != NULL) { 2084 2090 ret = -EBUSY; 2085 2091 goto out; 2086 2092 } 2087 2093 2088 - ret = do_attach(dev_data, domain); 2094 + /* Do reference counting */ 2095 + ret = pdom_attach_iommu(iommu, domain); 2096 + if (ret) 2097 + goto out; 2098 + 2099 + /* Setup GCR3 table */ 2100 + if (pdom_is_sva_capable(domain)) { 2101 + ret = init_gcr3_table(dev_data, domain); 2102 + if (ret) { 2103 + pdom_detach_iommu(iommu, domain); 2104 + goto out; 2105 + } 2106 + } 2107 + 2108 + pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL; 2109 + if (pdev && pdom_is_sva_capable(domain)) { 2110 + pdev_enable_caps(pdev); 2111 + 2112 + /* 2113 + * Device can continue to function even if IOPF 2114 + * enablement failed. Hence in error path just 2115 + * disable device PRI support. 2116 + */ 2117 + if (amd_iommu_iopf_add_device(iommu, dev_data)) 2118 + pdev_disable_cap_pri(pdev); 2119 + } else if (pdev) { 2120 + pdev_enable_cap_ats(pdev); 2121 + } 2122 + 2123 + /* Update data structures */ 2124 + dev_data->domain = domain; 2125 + list_add(&dev_data->list, &domain->dev_list); 2126 + 2127 + /* Update device table */ 2128 + dev_update_dte(dev_data, true); 2089 2129 2090 2130 out: 2091 - spin_unlock(&dev_data->lock); 2092 - 2093 - spin_unlock_irqrestore(&domain->lock, flags); 2131 + mutex_unlock(&dev_data->mutex); 2094 2132 2095 2133 return ret; 2096 2134 } ··· 2130 2110 static void detach_device(struct device *dev) 2131 2111 { 2132 2112 struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 2133 - struct protection_domain *domain = dev_data->domain; 2134 2113 struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2114 + struct protection_domain *domain = dev_data->domain; 2135 2115 unsigned long flags; 2136 - bool ppr = dev_data->ppr; 2137 2116 2138 - spin_lock_irqsave(&domain->lock, flags); 2139 - 2140 - spin_lock(&dev_data->lock); 2117 + mutex_lock(&dev_data->mutex); 2141 2118 2142 2119 /* 2143 2120 * First check if the device is still attached. It might already ··· 2145 2128 if (WARN_ON(!dev_data->domain)) 2146 2129 goto out; 2147 2130 2148 - if (ppr) { 2149 - iopf_queue_flush_dev(dev); 2150 - 2151 - /* Updated here so that it gets reflected in DTE */ 2152 - dev_data->ppr = false; 2153 - } 2154 - 2155 - do_detach(dev_data); 2156 - 2157 - out: 2158 - spin_unlock(&dev_data->lock); 2159 - 2160 - spin_unlock_irqrestore(&domain->lock, flags); 2161 - 2162 2131 /* Remove IOPF handler */ 2163 - if (ppr) 2132 + if (dev_data->ppr) { 2133 + iopf_queue_flush_dev(dev); 2164 2134 amd_iommu_iopf_remove_device(iommu, dev_data); 2135 + } 2165 2136 2166 2137 if (dev_is_pci(dev)) 2167 2138 pdev_disable_caps(to_pci_dev(dev)); 2168 2139 2140 + /* Clear DTE and flush the entry */ 2141 + dev_update_dte(dev_data, false); 2142 + 2143 + /* Flush IOTLB and wait for the flushes to finish */ 2144 + spin_lock_irqsave(&domain->lock, flags); 2145 + amd_iommu_domain_flush_all(domain); 2146 + spin_unlock_irqrestore(&domain->lock, flags); 2147 + 2148 + /* Clear GCR3 table */ 2149 + if (pdom_is_sva_capable(domain)) 2150 + destroy_gcr3_table(dev_data, domain); 2151 + 2152 + /* Update data structures */ 2153 + dev_data->domain = NULL; 2154 + list_del(&dev_data->list); 2155 + 2156 + /* decrease reference counters - needs to happen after the flushes */ 2157 + pdom_detach_iommu(iommu, domain); 2158 + 2159 + out: 2160 + mutex_unlock(&dev_data->mutex); 2169 2161 } 2170 2162 2171 2163 static struct iommu_device *amd_iommu_probe_device(struct device *dev) ··· 2231 2205 2232 2206 static void amd_iommu_release_device(struct device *dev) 2233 2207 { 2234 - struct amd_iommu *iommu; 2208 + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 2235 2209 2236 - if (!check_device(dev)) 2237 - return; 2210 + WARN_ON(dev_data->domain); 2238 2211 2239 - iommu = rlookup_amd_iommu(dev); 2240 - if (!iommu) 2241 - return; 2242 - 2243 - amd_iommu_uninit_device(dev); 2244 - iommu_completion_wait(iommu); 2212 + /* 2213 + * We keep dev_data around for unplugged devices and reuse it when the 2214 + * device is re-plugged - not doing so would introduce a ton of races. 2215 + */ 2245 2216 } 2246 2217 2247 2218 static struct iommu_group *amd_iommu_device_group(struct device *dev) ··· 2259 2236 * 2260 2237 *****************************************************************************/ 2261 2238 2262 - static void cleanup_domain(struct protection_domain *domain) 2263 - { 2264 - struct iommu_dev_data *entry; 2265 - 2266 - lockdep_assert_held(&domain->lock); 2267 - 2268 - if (!domain->dev_cnt) 2269 - return; 2270 - 2271 - while (!list_empty(&domain->dev_list)) { 2272 - entry = list_first_entry(&domain->dev_list, 2273 - struct iommu_dev_data, list); 2274 - BUG_ON(!entry->domain); 2275 - do_detach(entry); 2276 - } 2277 - WARN_ON(domain->dev_cnt != 0); 2278 - } 2279 - 2280 2239 void protection_domain_free(struct protection_domain *domain) 2281 2240 { 2282 2241 WARN_ON(!list_empty(&domain->dev_list)); 2283 2242 if (domain->domain.type & __IOMMU_DOMAIN_PAGING) 2284 2243 free_io_pgtable_ops(&domain->iop.pgtbl.ops); 2285 - domain_id_free(domain->id); 2244 + pdom_id_free(domain->id); 2286 2245 kfree(domain); 2246 + } 2247 + 2248 + static void protection_domain_init(struct protection_domain *domain, int nid) 2249 + { 2250 + spin_lock_init(&domain->lock); 2251 + INIT_LIST_HEAD(&domain->dev_list); 2252 + INIT_LIST_HEAD(&domain->dev_data_list); 2253 + xa_init(&domain->iommu_array); 2254 + domain->iop.pgtbl.cfg.amd.nid = nid; 2287 2255 } 2288 2256 2289 2257 struct protection_domain *protection_domain_alloc(unsigned int type, int nid) 2290 2258 { 2291 - struct io_pgtable_ops *pgtbl_ops; 2292 2259 struct protection_domain *domain; 2293 - int pgtable; 2260 + int domid; 2294 2261 2295 2262 domain = kzalloc(sizeof(*domain), GFP_KERNEL); 2296 2263 if (!domain) 2297 2264 return NULL; 2298 2265 2299 - domain->id = domain_id_alloc(); 2300 - if (!domain->id) 2301 - goto err_free; 2302 - 2303 - spin_lock_init(&domain->lock); 2304 - INIT_LIST_HEAD(&domain->dev_list); 2305 - INIT_LIST_HEAD(&domain->dev_data_list); 2306 - domain->iop.pgtbl.cfg.amd.nid = nid; 2307 - 2308 - switch (type) { 2309 - /* No need to allocate io pgtable ops in passthrough mode */ 2310 - case IOMMU_DOMAIN_IDENTITY: 2311 - case IOMMU_DOMAIN_SVA: 2312 - return domain; 2313 - case IOMMU_DOMAIN_DMA: 2314 - pgtable = amd_iommu_pgtable; 2315 - break; 2316 - /* 2317 - * Force IOMMU v1 page table when allocating 2318 - * domain for pass-through devices. 2319 - */ 2320 - case IOMMU_DOMAIN_UNMANAGED: 2321 - pgtable = AMD_IOMMU_V1; 2322 - break; 2323 - default: 2324 - goto err_id; 2266 + domid = pdom_id_alloc(); 2267 + if (domid <= 0) { 2268 + kfree(domain); 2269 + return NULL; 2325 2270 } 2271 + domain->id = domid; 2272 + 2273 + protection_domain_init(domain, nid); 2274 + 2275 + return domain; 2276 + } 2277 + 2278 + static int pdom_setup_pgtable(struct protection_domain *domain, 2279 + unsigned int type, int pgtable) 2280 + { 2281 + struct io_pgtable_ops *pgtbl_ops; 2282 + 2283 + /* No need to allocate io pgtable ops in passthrough mode */ 2284 + if (!(type & __IOMMU_DOMAIN_PAGING)) 2285 + return 0; 2326 2286 2327 2287 switch (pgtable) { 2328 2288 case AMD_IOMMU_V1: ··· 2315 2309 domain->pd_mode = PD_MODE_V2; 2316 2310 break; 2317 2311 default: 2318 - goto err_id; 2312 + return -EINVAL; 2319 2313 } 2320 2314 2321 2315 pgtbl_ops = 2322 2316 alloc_io_pgtable_ops(pgtable, &domain->iop.pgtbl.cfg, domain); 2323 2317 if (!pgtbl_ops) 2324 - goto err_id; 2318 + return -ENOMEM; 2325 2319 2326 - return domain; 2327 - err_id: 2328 - domain_id_free(domain->id); 2329 - err_free: 2330 - kfree(domain); 2331 - return NULL; 2320 + return 0; 2332 2321 } 2333 2322 2334 - static inline u64 dma_max_address(void) 2323 + static inline u64 dma_max_address(int pgtable) 2335 2324 { 2336 - if (amd_iommu_pgtable == AMD_IOMMU_V1) 2325 + if (pgtable == AMD_IOMMU_V1) 2337 2326 return ~0ULL; 2338 2327 2339 2328 /* V2 with 4/5 level page table */ ··· 2341 2340 } 2342 2341 2343 2342 static struct iommu_domain *do_iommu_domain_alloc(unsigned int type, 2344 - struct device *dev, u32 flags) 2343 + struct device *dev, 2344 + u32 flags, int pgtable) 2345 2345 { 2346 2346 bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 2347 2347 struct protection_domain *domain; 2348 2348 struct amd_iommu *iommu = NULL; 2349 + int ret; 2349 2350 2350 2351 if (dev) 2351 2352 iommu = get_amd_iommu_from_dev(dev); ··· 2359 2356 if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY)) 2360 2357 return ERR_PTR(-EINVAL); 2361 2358 2362 - if (dirty_tracking && !amd_iommu_hd_support(iommu)) 2363 - return ERR_PTR(-EOPNOTSUPP); 2364 - 2365 2359 domain = protection_domain_alloc(type, 2366 2360 dev ? dev_to_node(dev) : NUMA_NO_NODE); 2367 2361 if (!domain) 2368 2362 return ERR_PTR(-ENOMEM); 2369 2363 2364 + ret = pdom_setup_pgtable(domain, type, pgtable); 2365 + if (ret) { 2366 + pdom_id_free(domain->id); 2367 + kfree(domain); 2368 + return ERR_PTR(ret); 2369 + } 2370 + 2370 2371 domain->domain.geometry.aperture_start = 0; 2371 - domain->domain.geometry.aperture_end = dma_max_address(); 2372 + domain->domain.geometry.aperture_end = dma_max_address(pgtable); 2372 2373 domain->domain.geometry.force_aperture = true; 2373 2374 domain->domain.pgsize_bitmap = domain->iop.pgtbl.cfg.pgsize_bitmap; 2374 2375 ··· 2390 2383 static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type) 2391 2384 { 2392 2385 struct iommu_domain *domain; 2386 + int pgtable = amd_iommu_pgtable; 2393 2387 2394 - domain = do_iommu_domain_alloc(type, NULL, 0); 2388 + /* 2389 + * Force IOMMU v1 page table when allocating 2390 + * domain for pass-through devices. 2391 + */ 2392 + if (type == IOMMU_DOMAIN_UNMANAGED) 2393 + pgtable = AMD_IOMMU_V1; 2394 + 2395 + domain = do_iommu_domain_alloc(type, NULL, 0, pgtable); 2395 2396 if (IS_ERR(domain)) 2396 2397 return NULL; 2397 2398 ··· 2413 2398 2414 2399 { 2415 2400 unsigned int type = IOMMU_DOMAIN_UNMANAGED; 2401 + struct amd_iommu *iommu = NULL; 2402 + const u32 supported_flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | 2403 + IOMMU_HWPT_ALLOC_PASID; 2416 2404 2417 - if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data) 2405 + if (dev) 2406 + iommu = get_amd_iommu_from_dev(dev); 2407 + 2408 + if ((flags & ~supported_flags) || parent || user_data) 2418 2409 return ERR_PTR(-EOPNOTSUPP); 2419 2410 2420 - return do_iommu_domain_alloc(type, dev, flags); 2411 + /* Allocate domain with v2 page table if IOMMU supports PASID. */ 2412 + if (flags & IOMMU_HWPT_ALLOC_PASID) { 2413 + if (!amd_iommu_pasid_supported()) 2414 + return ERR_PTR(-EOPNOTSUPP); 2415 + 2416 + return do_iommu_domain_alloc(type, dev, flags, AMD_IOMMU_V2); 2417 + } 2418 + 2419 + /* Allocate domain with v1 page table for dirty tracking */ 2420 + if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) { 2421 + if (iommu && amd_iommu_hd_support(iommu)) { 2422 + return do_iommu_domain_alloc(type, dev, 2423 + flags, AMD_IOMMU_V1); 2424 + } 2425 + 2426 + return ERR_PTR(-EOPNOTSUPP); 2427 + } 2428 + 2429 + /* If nothing specific is required use the kernel commandline default */ 2430 + return do_iommu_domain_alloc(type, dev, 0, amd_iommu_pgtable); 2421 2431 } 2422 2432 2423 2433 void amd_iommu_domain_free(struct iommu_domain *dom) 2424 2434 { 2425 - struct protection_domain *domain; 2426 - unsigned long flags; 2427 - 2428 - domain = to_pdomain(dom); 2429 - 2430 - spin_lock_irqsave(&domain->lock, flags); 2431 - 2432 - cleanup_domain(domain); 2433 - 2434 - spin_unlock_irqrestore(&domain->lock, flags); 2435 + struct protection_domain *domain = to_pdomain(dom); 2435 2436 2436 2437 protection_domain_free(domain); 2437 2438 } ··· 2461 2430 detach_device(dev); 2462 2431 2463 2432 /* Clear DTE and flush the entry */ 2464 - spin_lock(&dev_data->lock); 2433 + mutex_lock(&dev_data->mutex); 2465 2434 dev_update_dte(dev_data, false); 2466 - spin_unlock(&dev_data->lock); 2435 + mutex_unlock(&dev_data->mutex); 2467 2436 2468 2437 return 0; 2469 2438 } 2470 2439 2471 2440 static struct iommu_domain blocked_domain = { 2441 + .type = IOMMU_DOMAIN_BLOCKED, 2442 + .ops = &(const struct iommu_domain_ops) { 2443 + .attach_dev = blocked_domain_attach_device, 2444 + } 2445 + }; 2446 + 2447 + static struct protection_domain identity_domain; 2448 + 2449 + static const struct iommu_domain_ops identity_domain_ops = { 2450 + .attach_dev = amd_iommu_attach_device, 2451 + }; 2452 + 2453 + void amd_iommu_init_identity_domain(void) 2454 + { 2455 + struct iommu_domain *domain = &identity_domain.domain; 2456 + 2457 + domain->type = IOMMU_DOMAIN_IDENTITY; 2458 + domain->ops = &identity_domain_ops; 2459 + domain->owner = &amd_iommu_ops; 2460 + 2461 + identity_domain.id = pdom_id_alloc(); 2462 + 2463 + protection_domain_init(&identity_domain, NUMA_NO_NODE); 2464 + } 2465 + 2466 + /* Same as blocked domain except it supports only ops->attach_dev() */ 2467 + static struct iommu_domain release_domain = { 2472 2468 .type = IOMMU_DOMAIN_BLOCKED, 2473 2469 .ops = &(const struct iommu_domain_ops) { 2474 2470 .attach_dev = blocked_domain_attach_device, ··· 2508 2450 struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 2509 2451 struct protection_domain *domain = to_pdomain(dom); 2510 2452 struct amd_iommu *iommu = get_amd_iommu_from_dev(dev); 2511 - struct pci_dev *pdev; 2512 2453 int ret; 2513 2454 2514 2455 /* ··· 2539 2482 dev_data->use_vapic = 0; 2540 2483 } 2541 2484 #endif 2542 - 2543 - pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL; 2544 - if (pdev && pdom_is_sva_capable(domain)) { 2545 - pdev_enable_caps(pdev); 2546 - 2547 - /* 2548 - * Device can continue to function even if IOPF 2549 - * enablement failed. Hence in error path just 2550 - * disable device PRI support. 2551 - */ 2552 - if (amd_iommu_iopf_add_device(iommu, dev_data)) 2553 - pdev_disable_cap_pri(pdev); 2554 - } else if (pdev) { 2555 - pdev_enable_cap_ats(pdev); 2556 - } 2557 - 2558 - /* Update device table */ 2559 - dev_update_dte(dev_data, true); 2560 2485 2561 2486 return ret; 2562 2487 } ··· 2881 2842 const struct iommu_ops amd_iommu_ops = { 2882 2843 .capable = amd_iommu_capable, 2883 2844 .blocked_domain = &blocked_domain, 2845 + .release_domain = &release_domain, 2846 + .identity_domain = &identity_domain.domain, 2884 2847 .domain_alloc = amd_iommu_domain_alloc, 2885 2848 .domain_alloc_user = amd_iommu_domain_alloc_user, 2886 2849 .domain_alloc_sva = amd_iommu_domain_alloc_sva, ··· 2931 2890 return; 2932 2891 2933 2892 build_inv_irt(&cmd, devid); 2934 - data = atomic64_add_return(1, &iommu->cmd_sem_val); 2893 + data = atomic64_inc_return(&iommu->cmd_sem_val); 2935 2894 build_completion_wait(&cmd2, iommu, data); 2936 2895 2937 2896 raw_spin_lock_irqsave(&iommu->lock, flags);
+5 -1
drivers/iommu/amd/pasid.c
··· 100 100 }; 101 101 102 102 int iommu_sva_set_dev_pasid(struct iommu_domain *domain, 103 - struct device *dev, ioasid_t pasid) 103 + struct device *dev, ioasid_t pasid, 104 + struct iommu_domain *old) 104 105 { 105 106 struct pdom_dev_data *pdom_dev_data; 106 107 struct protection_domain *sva_pdom = to_pdomain(domain); 107 108 struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 108 109 unsigned long flags; 109 110 int ret = -EINVAL; 111 + 112 + if (old) 113 + return -EOPNOTSUPP; 110 114 111 115 /* PASID zero is used for requests from the I/O device without PASID */ 112 116 if (!is_pasid_valid(dev_data, pasid))
+3 -2
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
··· 332 332 } 333 333 334 334 static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain, 335 - struct device *dev, ioasid_t id) 335 + struct device *dev, ioasid_t id, 336 + struct iommu_domain *old) 336 337 { 337 338 struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 338 339 struct arm_smmu_master *master = dev_iommu_priv_get(dev); ··· 349 348 * get reassigned 350 349 */ 351 350 arm_smmu_make_sva_cd(&target, master, domain->mm, smmu_domain->cd.asid); 352 - ret = arm_smmu_set_pasid(master, smmu_domain, id, &target); 351 + ret = arm_smmu_set_pasid(master, smmu_domain, id, &target, old); 353 352 354 353 mmput(domain->mm); 355 354 return ret;
+9 -7
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
··· 2911 2911 } 2912 2912 2913 2913 static int arm_smmu_s1_set_dev_pasid(struct iommu_domain *domain, 2914 - struct device *dev, ioasid_t id) 2914 + struct device *dev, ioasid_t id, 2915 + struct iommu_domain *old) 2915 2916 { 2916 2917 struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 2917 2918 struct arm_smmu_master *master = dev_iommu_priv_get(dev); ··· 2938 2937 */ 2939 2938 arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); 2940 2939 return arm_smmu_set_pasid(master, to_smmu_domain(domain), id, 2941 - &target_cd); 2940 + &target_cd, old); 2942 2941 } 2943 2942 2944 2943 static void arm_smmu_update_ste(struct arm_smmu_master *master, ··· 2968 2967 2969 2968 int arm_smmu_set_pasid(struct arm_smmu_master *master, 2970 2969 struct arm_smmu_domain *smmu_domain, ioasid_t pasid, 2971 - struct arm_smmu_cd *cd) 2970 + struct arm_smmu_cd *cd, struct iommu_domain *old) 2972 2971 { 2973 2972 struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev); 2974 2973 struct arm_smmu_attach_state state = { 2975 2974 .master = master, 2976 - /* 2977 - * For now the core code prevents calling this when a domain is 2978 - * already attached, no need to set old_domain. 2979 - */ 2980 2975 .ssid = pasid, 2976 + .old_domain = old, 2981 2977 }; 2982 2978 struct arm_smmu_cd *cdptr; 2983 2979 int ret; ··· 3138 3140 { 3139 3141 struct arm_smmu_master *master = dev_iommu_priv_get(dev); 3140 3142 const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | 3143 + IOMMU_HWPT_ALLOC_PASID | 3141 3144 IOMMU_HWPT_ALLOC_NEST_PARENT; 3142 3145 struct arm_smmu_domain *smmu_domain; 3143 3146 int ret; ··· 3147 3148 return ERR_PTR(-EOPNOTSUPP); 3148 3149 if (parent || user_data) 3149 3150 return ERR_PTR(-EOPNOTSUPP); 3151 + 3152 + if (flags & IOMMU_HWPT_ALLOC_PASID) 3153 + return arm_smmu_domain_alloc_paging(dev); 3150 3154 3151 3155 smmu_domain = arm_smmu_domain_alloc(); 3152 3156 if (IS_ERR(smmu_domain))
+1 -1
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
··· 911 911 912 912 int arm_smmu_set_pasid(struct arm_smmu_master *master, 913 913 struct arm_smmu_domain *smmu_domain, ioasid_t pasid, 914 - struct arm_smmu_cd *cd); 914 + struct arm_smmu_cd *cd, struct iommu_domain *old); 915 915 916 916 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); 917 917 void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
+5 -2
drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
··· 509 509 510 510 snprintf(name, 16, "vcmdq%u", vcmdq->idx); 511 511 512 - q->llq.max_n_shift = VCMDQ_LOG2SIZE_MAX; 512 + /* Queue size, capped to ensure natural alignment */ 513 + q->llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT, VCMDQ_LOG2SIZE_MAX); 513 514 514 515 /* Use the common helper to init the VCMDQ, and then... */ 515 516 ret = arm_smmu_init_one_queue(smmu, q, vcmdq->page0, ··· 801 800 return 0; 802 801 } 803 802 804 - struct dentry *cmdqv_debugfs_dir; 803 + #ifdef CONFIG_IOMMU_DEBUGFS 804 + static struct dentry *cmdqv_debugfs_dir; 805 + #endif 805 806 806 807 static struct arm_smmu_device * 807 808 __tegra241_cmdqv_probe(struct arm_smmu_device *smmu, struct resource *res,
+11
drivers/iommu/arm/arm-smmu/arm-smmu.c
··· 1437 1437 goto out_free; 1438 1438 } else { 1439 1439 smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode); 1440 + 1441 + /* 1442 + * Defer probe if the relevant SMMU instance hasn't finished 1443 + * probing yet. This is a fragile hack and we'd ideally 1444 + * avoid this race in the core code. Until that's ironed 1445 + * out, however, this is the most pragmatic option on the 1446 + * table. 1447 + */ 1448 + if (!smmu) 1449 + return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER, 1450 + "smmu dev has not bound yet\n")); 1440 1451 } 1441 1452 1442 1453 ret = -EINVAL;
+1 -1
drivers/iommu/intel/Kconfig
··· 14 14 depends on PCI_MSI && ACPI && X86 15 15 select IOMMU_API 16 16 select IOMMU_IOVA 17 + select IOMMU_IOPF 17 18 select IOMMUFD_DRIVER if IOMMUFD 18 19 select NEED_DMA_MAP_STATE 19 20 select DMAR_TABLE ··· 51 50 depends on X86_64 52 51 select MMU_NOTIFIER 53 52 select IOMMU_SVA 54 - select IOMMU_IOPF 55 53 help 56 54 Shared Virtual Memory (SVM) provides a facility for devices 57 55 to access DMA resources through process address space by
+1 -1
drivers/iommu/intel/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_DMAR_TABLE) += dmar.o 3 - obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o 3 + obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o prq.o 4 4 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o 5 5 obj-$(CONFIG_DMAR_PERF) += perf.o 6 6 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
+1 -14
drivers/iommu/intel/dmar.c
··· 1060 1060 err = iommu->seq_id; 1061 1061 goto error; 1062 1062 } 1063 - sprintf(iommu->name, "dmar%d", iommu->seq_id); 1063 + snprintf(iommu->name, sizeof(iommu->name), "dmar%d", iommu->seq_id); 1064 1064 1065 1065 err = map_iommu(iommu, drhd); 1066 1066 if (err) { ··· 1892 1892 writel(msg->data, iommu->reg + reg + 4); 1893 1893 writel(msg->address_lo, iommu->reg + reg + 8); 1894 1894 writel(msg->address_hi, iommu->reg + reg + 12); 1895 - raw_spin_unlock_irqrestore(&iommu->register_lock, flag); 1896 - } 1897 - 1898 - void dmar_msi_read(int irq, struct msi_msg *msg) 1899 - { 1900 - struct intel_iommu *iommu = irq_get_handler_data(irq); 1901 - int reg = dmar_msi_reg(iommu, irq); 1902 - unsigned long flag; 1903 - 1904 - raw_spin_lock_irqsave(&iommu->register_lock, flag); 1905 - msg->data = readl(iommu->reg + reg + 4); 1906 - msg->address_lo = readl(iommu->reg + reg + 8); 1907 - msg->address_hi = readl(iommu->reg + reg + 12); 1908 1895 raw_spin_unlock_irqrestore(&iommu->register_lock, flag); 1909 1896 } 1910 1897
+224 -352
drivers/iommu/intel/iommu.c
··· 352 352 ecap_smpwc(iommu->ecap) : ecap_coherent(iommu->ecap); 353 353 } 354 354 355 - static void domain_update_iommu_coherency(struct dmar_domain *domain) 356 - { 357 - struct iommu_domain_info *info; 358 - struct dmar_drhd_unit *drhd; 359 - struct intel_iommu *iommu; 360 - bool found = false; 361 - unsigned long i; 362 - 363 - domain->iommu_coherency = true; 364 - xa_for_each(&domain->iommu_array, i, info) { 365 - found = true; 366 - if (!iommu_paging_structure_coherency(info->iommu)) { 367 - domain->iommu_coherency = false; 368 - break; 369 - } 370 - } 371 - if (found) 372 - return; 373 - 374 - /* No hardware attached; use lowest common denominator */ 375 - rcu_read_lock(); 376 - for_each_active_iommu(iommu, drhd) { 377 - if (!iommu_paging_structure_coherency(iommu)) { 378 - domain->iommu_coherency = false; 379 - break; 380 - } 381 - } 382 - rcu_read_unlock(); 383 - } 384 - 385 - static int domain_update_iommu_superpage(struct dmar_domain *domain, 386 - struct intel_iommu *skip) 387 - { 388 - struct dmar_drhd_unit *drhd; 389 - struct intel_iommu *iommu; 390 - int mask = 0x3; 391 - 392 - if (!intel_iommu_superpage) 393 - return 0; 394 - 395 - /* set iommu_superpage to the smallest common denominator */ 396 - rcu_read_lock(); 397 - for_each_active_iommu(iommu, drhd) { 398 - if (iommu != skip) { 399 - if (domain && domain->use_first_level) { 400 - if (!cap_fl1gp_support(iommu->cap)) 401 - mask = 0x1; 402 - } else { 403 - mask &= cap_super_page_val(iommu->cap); 404 - } 405 - 406 - if (!mask) 407 - break; 408 - } 409 - } 410 - rcu_read_unlock(); 411 - 412 - return fls(mask); 413 - } 414 - 415 - static int domain_update_device_node(struct dmar_domain *domain) 416 - { 417 - struct device_domain_info *info; 418 - int nid = NUMA_NO_NODE; 419 - unsigned long flags; 420 - 421 - spin_lock_irqsave(&domain->lock, flags); 422 - list_for_each_entry(info, &domain->devices, link) { 423 - /* 424 - * There could possibly be multiple device numa nodes as devices 425 - * within the same domain may sit behind different IOMMUs. There 426 - * isn't perfect answer in such situation, so we select first 427 - * come first served policy. 428 - */ 429 - nid = dev_to_node(info->dev); 430 - if (nid != NUMA_NO_NODE) 431 - break; 432 - } 433 - spin_unlock_irqrestore(&domain->lock, flags); 434 - 435 - return nid; 436 - } 437 - 438 355 /* Return the super pagesize bitmap if supported. */ 439 356 static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain) 440 357 { ··· 367 450 bitmap |= SZ_2M | SZ_1G; 368 451 369 452 return bitmap; 370 - } 371 - 372 - /* Some capabilities may be different across iommus */ 373 - void domain_update_iommu_cap(struct dmar_domain *domain) 374 - { 375 - domain_update_iommu_coherency(domain); 376 - domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL); 377 - 378 - /* 379 - * If RHSA is missing, we should default to the device numa domain 380 - * as fall back. 381 - */ 382 - if (domain->nid == NUMA_NO_NODE) 383 - domain->nid = domain_update_device_node(domain); 384 - 385 - /* 386 - * First-level translation restricts the input-address to a 387 - * canonical address (i.e., address bits 63:N have the same 388 - * value as address bit [N-1], where N is 48-bits with 4-level 389 - * paging and 57-bits with 5-level paging). Hence, skip bit 390 - * [N-1]. 391 - */ 392 - if (domain->use_first_level) 393 - domain->domain.geometry.aperture_end = __DOMAIN_MAX_ADDR(domain->gaw - 1); 394 - else 395 - domain->domain.geometry.aperture_end = __DOMAIN_MAX_ADDR(domain->gaw); 396 - 397 - domain->domain.pgsize_bitmap |= domain_super_pgsize_bitmap(domain); 398 453 } 399 454 400 455 struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, ··· 596 707 while (1) { 597 708 offset = pfn_level_offset(pfn, level); 598 709 pte = &parent[offset]; 599 - if (!pte || (dma_pte_superpage(pte) || !dma_pte_present(pte))) { 600 - pr_info("PTE not present at level %d\n", level); 601 - break; 602 - } 603 710 604 711 pr_info("pte level: %d, pte value: 0x%016llx\n", level, pte->val); 605 712 606 - if (level == 1) 713 + if (!dma_pte_present(pte)) { 714 + pr_info("page table not present at level %d\n", level - 1); 715 + break; 716 + } 717 + 718 + if (level == 1 || dma_pte_superpage(pte)) 607 719 break; 608 720 609 721 parent = phys_to_virt(dma_pte_addr(pte)); ··· 627 737 pr_info("Dump %s table entries for IOVA 0x%llx\n", iommu->name, addr); 628 738 629 739 /* root entry dump */ 630 - rt_entry = &iommu->root_entry[bus]; 631 - if (!rt_entry) { 632 - pr_info("root table entry is not present\n"); 740 + if (!iommu->root_entry) { 741 + pr_info("root table is not present\n"); 633 742 return; 634 743 } 744 + rt_entry = &iommu->root_entry[bus]; 635 745 636 746 if (sm_supported(iommu)) 637 747 pr_info("scalable mode root entry: hi 0x%016llx, low 0x%016llx\n", ··· 642 752 /* context entry dump */ 643 753 ctx_entry = iommu_context_addr(iommu, bus, devfn, 0); 644 754 if (!ctx_entry) { 645 - pr_info("context table entry is not present\n"); 755 + pr_info("context table is not present\n"); 646 756 return; 647 757 } 648 758 ··· 651 761 652 762 /* legacy mode does not require PASID entries */ 653 763 if (!sm_supported(iommu)) { 764 + if (!context_present(ctx_entry)) { 765 + pr_info("legacy mode page table is not present\n"); 766 + return; 767 + } 654 768 level = agaw_to_level(ctx_entry->hi & 7); 655 769 pgtable = phys_to_virt(ctx_entry->lo & VTD_PAGE_MASK); 656 770 goto pgtable_walk; 657 771 } 658 772 659 - /* get the pointer to pasid directory entry */ 660 - dir = phys_to_virt(ctx_entry->lo & VTD_PAGE_MASK); 661 - if (!dir) { 662 - pr_info("pasid directory entry is not present\n"); 773 + if (!context_present(ctx_entry)) { 774 + pr_info("pasid directory table is not present\n"); 663 775 return; 664 776 } 777 + 778 + /* get the pointer to pasid directory entry */ 779 + dir = phys_to_virt(ctx_entry->lo & VTD_PAGE_MASK); 780 + 665 781 /* For request-without-pasid, get the pasid from context entry */ 666 782 if (intel_iommu_sm && pasid == IOMMU_PASID_INVALID) 667 783 pasid = IOMMU_NO_PASID; ··· 679 783 /* get the pointer to the pasid table entry */ 680 784 entries = get_pasid_table_from_pde(pde); 681 785 if (!entries) { 682 - pr_info("pasid table entry is not present\n"); 786 + pr_info("pasid table is not present\n"); 683 787 return; 684 788 } 685 789 index = pasid & PASID_PTE_MASK; 686 790 pte = &entries[index]; 687 791 for (i = 0; i < ARRAY_SIZE(pte->val); i++) 688 792 pr_info("pasid table entry[%d]: 0x%016llx\n", i, pte->val[i]); 793 + 794 + if (!pasid_pte_is_present(pte)) { 795 + pr_info("scalable mode page table is not present\n"); 796 + return; 797 + } 689 798 690 799 if (pasid_pte_get_pgtt(pte) == PASID_ENTRY_PGTT_FL_ONLY) { 691 800 level = pte->val[2] & BIT_ULL(2) ? 5 : 4; ··· 1329 1428 /* free context mapping */ 1330 1429 free_context_table(iommu); 1331 1430 1332 - #ifdef CONFIG_INTEL_IOMMU_SVM 1333 - if (pasid_supported(iommu)) { 1334 - if (ecap_prs(iommu->ecap)) 1335 - intel_svm_finish_prq(iommu); 1336 - } 1337 - #endif 1431 + if (ecap_prs(iommu->ecap)) 1432 + intel_iommu_finish_prq(iommu); 1338 1433 } 1339 1434 1340 1435 /* 1341 1436 * Check and return whether first level is used by default for 1342 1437 * DMA translation. 1343 1438 */ 1344 - static bool first_level_by_default(unsigned int type) 1439 + static bool first_level_by_default(struct intel_iommu *iommu) 1345 1440 { 1346 1441 /* Only SL is available in legacy mode */ 1347 - if (!scalable_mode_support()) 1442 + if (!sm_supported(iommu)) 1348 1443 return false; 1349 1444 1350 1445 /* Only level (either FL or SL) is available, just use it */ 1351 - if (intel_cap_flts_sanity() ^ intel_cap_slts_sanity()) 1352 - return intel_cap_flts_sanity(); 1446 + if (ecap_flts(iommu->ecap) ^ ecap_slts(iommu->ecap)) 1447 + return ecap_flts(iommu->ecap); 1353 1448 1354 - /* Both levels are available, decide it based on domain type */ 1355 - return type != IOMMU_DOMAIN_UNMANAGED; 1356 - } 1357 - 1358 - static struct dmar_domain *alloc_domain(unsigned int type) 1359 - { 1360 - struct dmar_domain *domain; 1361 - 1362 - domain = kzalloc(sizeof(*domain), GFP_KERNEL); 1363 - if (!domain) 1364 - return NULL; 1365 - 1366 - domain->nid = NUMA_NO_NODE; 1367 - if (first_level_by_default(type)) 1368 - domain->use_first_level = true; 1369 - INIT_LIST_HEAD(&domain->devices); 1370 - INIT_LIST_HEAD(&domain->dev_pasids); 1371 - INIT_LIST_HEAD(&domain->cache_tags); 1372 - spin_lock_init(&domain->lock); 1373 - spin_lock_init(&domain->cache_lock); 1374 - xa_init(&domain->iommu_array); 1375 - 1376 - return domain; 1449 + return true; 1377 1450 } 1378 1451 1379 1452 int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu) ··· 1389 1514 ret = xa_err(curr) ? : -EBUSY; 1390 1515 goto err_clear; 1391 1516 } 1392 - domain_update_iommu_cap(domain); 1393 1517 1394 1518 spin_unlock(&iommu->lock); 1395 1519 return 0; ··· 1414 1540 clear_bit(info->did, iommu->domain_ids); 1415 1541 xa_erase(&domain->iommu_array, iommu->seq_id); 1416 1542 domain->nid = NUMA_NO_NODE; 1417 - domain_update_iommu_cap(domain); 1418 1543 kfree(info); 1419 1544 } 1420 1545 spin_unlock(&iommu->lock); 1421 - } 1422 - 1423 - static int guestwidth_to_adjustwidth(int gaw) 1424 - { 1425 - int agaw; 1426 - int r = (gaw - 12) % 9; 1427 - 1428 - if (r == 0) 1429 - agaw = gaw; 1430 - else 1431 - agaw = gaw + 9 - r; 1432 - if (agaw > 64) 1433 - agaw = 64; 1434 - return agaw; 1435 1546 } 1436 1547 1437 1548 static void domain_exit(struct dmar_domain *domain) ··· 1460 1601 1461 1602 if (did_old < cap_ndoms(iommu->cap)) { 1462 1603 iommu->flush.flush_context(iommu, did_old, 1463 - (((u16)bus) << 8) | devfn, 1604 + PCI_DEVID(bus, devfn), 1464 1605 DMA_CCMD_MASK_NOBIT, 1465 1606 DMA_CCMD_DEVICE_INVL); 1466 1607 iommu->flush.flush_iotlb(iommu, did_old, 0, 0, ··· 1481 1622 { 1482 1623 if (cap_caching_mode(iommu->cap)) { 1483 1624 iommu->flush.flush_context(iommu, 0, 1484 - (((u16)bus) << 8) | devfn, 1625 + PCI_DEVID(bus, devfn), 1485 1626 DMA_CCMD_MASK_NOBIT, 1486 1627 DMA_CCMD_DEVICE_INVL); 1487 1628 iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); ··· 1500 1641 int translation = CONTEXT_TT_MULTI_LEVEL; 1501 1642 struct dma_pte *pgd = domain->pgd; 1502 1643 struct context_entry *context; 1503 - int agaw, ret; 1644 + int ret; 1504 1645 1505 1646 pr_debug("Set context mapping for %02x:%02x.%d\n", 1506 1647 bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); ··· 1517 1658 1518 1659 copied_context_tear_down(iommu, context, bus, devfn); 1519 1660 context_clear_entry(context); 1520 - 1521 1661 context_set_domain_id(context, did); 1522 - 1523 - /* 1524 - * Skip top levels of page tables for iommu which has 1525 - * less agaw than default. Unnecessary for PT mode. 1526 - */ 1527 - for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { 1528 - ret = -ENOMEM; 1529 - pgd = phys_to_virt(dma_pte_addr(pgd)); 1530 - if (!dma_pte_present(pgd)) 1531 - goto out_unlock; 1532 - } 1533 1662 1534 1663 if (info && info->ats_supported) 1535 1664 translation = CONTEXT_TT_DEV_IOTLB; ··· 1525 1678 translation = CONTEXT_TT_MULTI_LEVEL; 1526 1679 1527 1680 context_set_address_root(context, virt_to_phys(pgd)); 1528 - context_set_address_width(context, agaw); 1681 + context_set_address_width(context, domain->agaw); 1529 1682 context_set_translation_type(context, translation); 1530 1683 context_set_fault_enable(context); 1531 1684 context_set_present(context); ··· 1752 1905 intel_context_flush_present(info, context, did, true); 1753 1906 } 1754 1907 1908 + int __domain_setup_first_level(struct intel_iommu *iommu, 1909 + struct device *dev, ioasid_t pasid, 1910 + u16 did, pgd_t *pgd, int flags, 1911 + struct iommu_domain *old) 1912 + { 1913 + if (!old) 1914 + return intel_pasid_setup_first_level(iommu, dev, pgd, 1915 + pasid, did, flags); 1916 + return intel_pasid_replace_first_level(iommu, dev, pgd, pasid, did, 1917 + iommu_domain_did(old, iommu), 1918 + flags); 1919 + } 1920 + 1921 + static int domain_setup_second_level(struct intel_iommu *iommu, 1922 + struct dmar_domain *domain, 1923 + struct device *dev, ioasid_t pasid, 1924 + struct iommu_domain *old) 1925 + { 1926 + if (!old) 1927 + return intel_pasid_setup_second_level(iommu, domain, 1928 + dev, pasid); 1929 + return intel_pasid_replace_second_level(iommu, domain, dev, 1930 + iommu_domain_did(old, iommu), 1931 + pasid); 1932 + } 1933 + 1934 + static int domain_setup_passthrough(struct intel_iommu *iommu, 1935 + struct device *dev, ioasid_t pasid, 1936 + struct iommu_domain *old) 1937 + { 1938 + if (!old) 1939 + return intel_pasid_setup_pass_through(iommu, dev, pasid); 1940 + return intel_pasid_replace_pass_through(iommu, dev, 1941 + iommu_domain_did(old, iommu), 1942 + pasid); 1943 + } 1944 + 1755 1945 static int domain_setup_first_level(struct intel_iommu *iommu, 1756 1946 struct dmar_domain *domain, 1757 1947 struct device *dev, 1758 - u32 pasid) 1948 + u32 pasid, struct iommu_domain *old) 1759 1949 { 1760 1950 struct dma_pte *pgd = domain->pgd; 1761 - int agaw, level; 1762 - int flags = 0; 1951 + int level, flags = 0; 1763 1952 1764 - /* 1765 - * Skip top levels of page tables for iommu which has 1766 - * less agaw than default. Unnecessary for PT mode. 1767 - */ 1768 - for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { 1769 - pgd = phys_to_virt(dma_pte_addr(pgd)); 1770 - if (!dma_pte_present(pgd)) 1771 - return -ENOMEM; 1772 - } 1773 - 1774 - level = agaw_to_level(agaw); 1953 + level = agaw_to_level(domain->agaw); 1775 1954 if (level != 4 && level != 5) 1776 1955 return -EINVAL; 1777 1956 ··· 1807 1934 if (domain->force_snooping) 1808 1935 flags |= PASID_FLAG_PAGE_SNOOP; 1809 1936 1810 - return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid, 1811 - domain_id_iommu(domain, iommu), 1812 - flags); 1813 - } 1814 - 1815 - static bool dev_is_real_dma_subdevice(struct device *dev) 1816 - { 1817 - return dev && dev_is_pci(dev) && 1818 - pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev); 1937 + return __domain_setup_first_level(iommu, dev, pasid, 1938 + domain_id_iommu(domain, iommu), 1939 + (pgd_t *)pgd, flags, old); 1819 1940 } 1820 1941 1821 1942 static int dmar_domain_attach_device(struct dmar_domain *domain, ··· 1835 1968 if (!sm_supported(iommu)) 1836 1969 ret = domain_context_mapping(domain, dev); 1837 1970 else if (domain->use_first_level) 1838 - ret = domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID); 1971 + ret = domain_setup_first_level(iommu, domain, dev, 1972 + IOMMU_NO_PASID, NULL); 1839 1973 else 1840 - ret = intel_pasid_setup_second_level(iommu, domain, dev, IOMMU_NO_PASID); 1974 + ret = domain_setup_second_level(iommu, domain, dev, 1975 + IOMMU_NO_PASID, NULL); 1841 1976 1842 1977 if (ret) 1843 1978 goto out_block_translation; ··· 2223 2354 2224 2355 iommu_flush_write_buffer(iommu); 2225 2356 2226 - #ifdef CONFIG_INTEL_IOMMU_SVM 2227 - if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) { 2357 + if (ecap_prs(iommu->ecap)) { 2228 2358 /* 2229 2359 * Call dmar_alloc_hwirq() with dmar_global_lock held, 2230 2360 * could cause possible lock race condition. 2231 2361 */ 2232 2362 up_write(&dmar_global_lock); 2233 - ret = intel_svm_enable_prq(iommu); 2363 + ret = intel_iommu_enable_prq(iommu); 2234 2364 down_write(&dmar_global_lock); 2235 2365 if (ret) 2236 2366 goto free_iommu; 2237 2367 } 2238 - #endif 2368 + 2239 2369 ret = dmar_set_interrupt(iommu); 2240 2370 if (ret) 2241 2371 goto free_iommu; ··· 2614 2746 2615 2747 static int intel_iommu_add(struct dmar_drhd_unit *dmaru) 2616 2748 { 2617 - int sp, ret; 2618 2749 struct intel_iommu *iommu = dmaru->iommu; 2750 + int ret; 2619 2751 2620 2752 ret = intel_cap_audit(CAP_AUDIT_HOTPLUG_DMAR, iommu); 2621 2753 if (ret) 2622 2754 goto out; 2623 - 2624 - sp = domain_update_iommu_superpage(NULL, iommu) - 1; 2625 - if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) { 2626 - pr_warn("%s: Doesn't support large page.\n", 2627 - iommu->name); 2628 - return -ENXIO; 2629 - } 2630 2755 2631 2756 /* 2632 2757 * Disable translation if already enabled prior to OS handover. ··· 2647 2786 intel_iommu_init_qi(iommu); 2648 2787 iommu_flush_write_buffer(iommu); 2649 2788 2650 - #ifdef CONFIG_INTEL_IOMMU_SVM 2651 - if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) { 2652 - ret = intel_svm_enable_prq(iommu); 2789 + if (ecap_prs(iommu->ecap)) { 2790 + ret = intel_iommu_enable_prq(iommu); 2653 2791 if (ret) 2654 2792 goto disable_iommu; 2655 2793 } 2656 - #endif 2794 + 2657 2795 ret = dmar_set_interrupt(iommu); 2658 2796 if (ret) 2659 2797 goto disable_iommu; ··· 3148 3288 * the virtual and physical IOMMU page-tables. 3149 3289 */ 3150 3290 if (cap_caching_mode(iommu->cap) && 3151 - !first_level_by_default(IOMMU_DOMAIN_DMA)) { 3291 + !first_level_by_default(iommu)) { 3152 3292 pr_info_once("IOMMU batching disallowed due to virtualization\n"); 3153 3293 iommu_set_dma_strict(); 3154 3294 } ··· 3241 3381 info->domain = NULL; 3242 3382 } 3243 3383 3244 - static int md_domain_init(struct dmar_domain *domain, int guest_width) 3245 - { 3246 - int adjust_width; 3247 - 3248 - /* calculate AGAW */ 3249 - domain->gaw = guest_width; 3250 - adjust_width = guestwidth_to_adjustwidth(guest_width); 3251 - domain->agaw = width_to_agaw(adjust_width); 3252 - 3253 - domain->iommu_coherency = false; 3254 - domain->iommu_superpage = 0; 3255 - domain->max_addr = 0; 3256 - 3257 - /* always allocate the top pgd */ 3258 - domain->pgd = iommu_alloc_page_node(domain->nid, GFP_ATOMIC); 3259 - if (!domain->pgd) 3260 - return -ENOMEM; 3261 - domain_flush_cache(domain, domain->pgd, PAGE_SIZE); 3262 - return 0; 3263 - } 3264 - 3265 3384 static int blocking_domain_attach_dev(struct iommu_domain *domain, 3266 3385 struct device *dev) 3267 3386 { ··· 3327 3488 return domain; 3328 3489 } 3329 3490 3330 - static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) 3331 - { 3332 - struct dmar_domain *dmar_domain; 3333 - struct iommu_domain *domain; 3334 - 3335 - switch (type) { 3336 - case IOMMU_DOMAIN_DMA: 3337 - case IOMMU_DOMAIN_UNMANAGED: 3338 - dmar_domain = alloc_domain(type); 3339 - if (!dmar_domain) { 3340 - pr_err("Can't allocate dmar_domain\n"); 3341 - return NULL; 3342 - } 3343 - if (md_domain_init(dmar_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) { 3344 - pr_err("Domain initialization failed\n"); 3345 - domain_exit(dmar_domain); 3346 - return NULL; 3347 - } 3348 - 3349 - domain = &dmar_domain->domain; 3350 - domain->geometry.aperture_start = 0; 3351 - domain->geometry.aperture_end = 3352 - __DOMAIN_MAX_ADDR(dmar_domain->gaw); 3353 - domain->geometry.force_aperture = true; 3354 - 3355 - return domain; 3356 - default: 3357 - return NULL; 3358 - } 3359 - 3360 - return NULL; 3361 - } 3362 - 3363 3491 static struct iommu_domain * 3364 3492 intel_iommu_domain_alloc_user(struct device *dev, u32 flags, 3365 3493 struct iommu_domain *parent, ··· 3338 3532 struct intel_iommu *iommu = info->iommu; 3339 3533 struct dmar_domain *dmar_domain; 3340 3534 struct iommu_domain *domain; 3535 + bool first_stage; 3341 3536 3342 3537 /* Must be NESTING domain */ 3343 3538 if (parent) { ··· 3348 3541 } 3349 3542 3350 3543 if (flags & 3351 - (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) 3544 + (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING 3545 + | IOMMU_HWPT_FAULT_ID_VALID))) 3352 3546 return ERR_PTR(-EOPNOTSUPP); 3353 3547 if (nested_parent && !nested_supported(iommu)) 3354 3548 return ERR_PTR(-EOPNOTSUPP); 3355 3549 if (user_data || (dirty_tracking && !ssads_supported(iommu))) 3356 3550 return ERR_PTR(-EOPNOTSUPP); 3357 3551 3358 - /* Do not use first stage for user domain translation. */ 3359 - dmar_domain = paging_domain_alloc(dev, false); 3552 + /* 3553 + * Always allocate the guest compatible page table unless 3554 + * IOMMU_HWPT_ALLOC_NEST_PARENT or IOMMU_HWPT_ALLOC_DIRTY_TRACKING 3555 + * is specified. 3556 + */ 3557 + if (nested_parent || dirty_tracking) { 3558 + if (!sm_supported(iommu) || !ecap_slts(iommu->ecap)) 3559 + return ERR_PTR(-EOPNOTSUPP); 3560 + first_stage = false; 3561 + } else { 3562 + first_stage = first_level_by_default(iommu); 3563 + } 3564 + 3565 + dmar_domain = paging_domain_alloc(dev, first_stage); 3360 3566 if (IS_ERR(dmar_domain)) 3361 3567 return ERR_CAST(dmar_domain); 3362 3568 domain = &dmar_domain->domain; ··· 3403 3583 domain_exit(dmar_domain); 3404 3584 } 3405 3585 3406 - int prepare_domain_attach_device(struct iommu_domain *domain, 3407 - struct device *dev) 3586 + int paging_domain_compatible(struct iommu_domain *domain, struct device *dev) 3408 3587 { 3409 3588 struct device_domain_info *info = dev_iommu_priv_get(dev); 3410 3589 struct dmar_domain *dmar_domain = to_dmar_domain(domain); 3411 3590 struct intel_iommu *iommu = info->iommu; 3412 3591 int addr_width; 3592 + 3593 + if (WARN_ON_ONCE(!(domain->type & __IOMMU_DOMAIN_PAGING))) 3594 + return -EPERM; 3413 3595 3414 3596 if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap)) 3415 3597 return -EINVAL; ··· 3419 3597 if (domain->dirty_ops && !ssads_supported(iommu)) 3420 3598 return -EINVAL; 3421 3599 3600 + if (dmar_domain->iommu_coherency != 3601 + iommu_paging_structure_coherency(iommu)) 3602 + return -EINVAL; 3603 + 3604 + if (dmar_domain->iommu_superpage != 3605 + iommu_superpage_capability(iommu, dmar_domain->use_first_level)) 3606 + return -EINVAL; 3607 + 3608 + if (dmar_domain->use_first_level && 3609 + (!sm_supported(iommu) || !ecap_flts(iommu->ecap))) 3610 + return -EINVAL; 3611 + 3422 3612 /* check if this iommu agaw is sufficient for max mapped address */ 3423 3613 addr_width = agaw_to_width(iommu->agaw); 3424 3614 if (addr_width > cap_mgaw(iommu->cap)) 3425 3615 addr_width = cap_mgaw(iommu->cap); 3426 3616 3427 - if (dmar_domain->max_addr > (1LL << addr_width)) 3617 + if (dmar_domain->gaw > addr_width || dmar_domain->agaw > iommu->agaw) 3428 3618 return -EINVAL; 3429 - dmar_domain->gaw = addr_width; 3430 - 3431 - /* 3432 - * Knock out extra levels of page tables if necessary 3433 - */ 3434 - while (iommu->agaw < dmar_domain->agaw) { 3435 - struct dma_pte *pte; 3436 - 3437 - pte = dmar_domain->pgd; 3438 - if (dma_pte_present(pte)) { 3439 - dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte)); 3440 - iommu_free_page(pte); 3441 - } 3442 - dmar_domain->agaw--; 3443 - } 3444 3619 3445 3620 if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) && 3446 3621 context_copied(iommu, info->bus, info->devfn)) ··· 3453 3634 3454 3635 device_block_translation(dev); 3455 3636 3456 - ret = prepare_domain_attach_device(domain, dev); 3637 + ret = paging_domain_compatible(domain, dev); 3457 3638 if (ret) 3458 3639 return ret; 3459 3640 ··· 4071 4252 return 0; 4072 4253 } 4073 4254 4074 - static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 4075 - struct iommu_domain *domain) 4255 + void domain_remove_dev_pasid(struct iommu_domain *domain, 4256 + struct device *dev, ioasid_t pasid) 4076 4257 { 4077 4258 struct device_domain_info *info = dev_iommu_priv_get(dev); 4078 4259 struct dev_pasid_info *curr, *dev_pasid = NULL; ··· 4080 4261 struct dmar_domain *dmar_domain; 4081 4262 unsigned long flags; 4082 4263 4083 - if (domain->type == IOMMU_DOMAIN_IDENTITY) { 4084 - intel_pasid_tear_down_entry(iommu, dev, pasid, false); 4264 + if (!domain) 4085 4265 return; 4086 - } 4266 + 4267 + /* Identity domain has no meta data for pasid. */ 4268 + if (domain->type == IOMMU_DOMAIN_IDENTITY) 4269 + return; 4087 4270 4088 4271 dmar_domain = to_dmar_domain(domain); 4089 4272 spin_lock_irqsave(&dmar_domain->lock, flags); ··· 4103 4282 domain_detach_iommu(dmar_domain, iommu); 4104 4283 intel_iommu_debugfs_remove_dev_pasid(dev_pasid); 4105 4284 kfree(dev_pasid); 4106 - intel_pasid_tear_down_entry(iommu, dev, pasid, false); 4107 - intel_drain_pasid_prq(dev, pasid); 4108 4285 } 4109 4286 4110 - static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, 4111 - struct device *dev, ioasid_t pasid) 4287 + static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 4288 + struct iommu_domain *domain) 4289 + { 4290 + struct device_domain_info *info = dev_iommu_priv_get(dev); 4291 + 4292 + intel_pasid_tear_down_entry(info->iommu, dev, pasid, false); 4293 + domain_remove_dev_pasid(domain, dev, pasid); 4294 + } 4295 + 4296 + struct dev_pasid_info * 4297 + domain_add_dev_pasid(struct iommu_domain *domain, 4298 + struct device *dev, ioasid_t pasid) 4112 4299 { 4113 4300 struct device_domain_info *info = dev_iommu_priv_get(dev); 4114 4301 struct dmar_domain *dmar_domain = to_dmar_domain(domain); ··· 4124 4295 struct dev_pasid_info *dev_pasid; 4125 4296 unsigned long flags; 4126 4297 int ret; 4298 + 4299 + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); 4300 + if (!dev_pasid) 4301 + return ERR_PTR(-ENOMEM); 4302 + 4303 + ret = domain_attach_iommu(dmar_domain, iommu); 4304 + if (ret) 4305 + goto out_free; 4306 + 4307 + ret = cache_tag_assign_domain(dmar_domain, dev, pasid); 4308 + if (ret) 4309 + goto out_detach_iommu; 4310 + 4311 + dev_pasid->dev = dev; 4312 + dev_pasid->pasid = pasid; 4313 + spin_lock_irqsave(&dmar_domain->lock, flags); 4314 + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); 4315 + spin_unlock_irqrestore(&dmar_domain->lock, flags); 4316 + 4317 + return dev_pasid; 4318 + out_detach_iommu: 4319 + domain_detach_iommu(dmar_domain, iommu); 4320 + out_free: 4321 + kfree(dev_pasid); 4322 + return ERR_PTR(ret); 4323 + } 4324 + 4325 + static int intel_iommu_set_dev_pasid(struct iommu_domain *domain, 4326 + struct device *dev, ioasid_t pasid, 4327 + struct iommu_domain *old) 4328 + { 4329 + struct device_domain_info *info = dev_iommu_priv_get(dev); 4330 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 4331 + struct intel_iommu *iommu = info->iommu; 4332 + struct dev_pasid_info *dev_pasid; 4333 + int ret; 4334 + 4335 + if (WARN_ON_ONCE(!(domain->type & __IOMMU_DOMAIN_PAGING))) 4336 + return -EINVAL; 4127 4337 4128 4338 if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev)) 4129 4339 return -EOPNOTSUPP; ··· 4173 4305 if (context_copied(iommu, info->bus, info->devfn)) 4174 4306 return -EBUSY; 4175 4307 4176 - ret = prepare_domain_attach_device(domain, dev); 4308 + ret = paging_domain_compatible(domain, dev); 4177 4309 if (ret) 4178 4310 return ret; 4179 4311 4180 - dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); 4181 - if (!dev_pasid) 4182 - return -ENOMEM; 4183 - 4184 - ret = domain_attach_iommu(dmar_domain, iommu); 4185 - if (ret) 4186 - goto out_free; 4187 - 4188 - ret = cache_tag_assign_domain(dmar_domain, dev, pasid); 4189 - if (ret) 4190 - goto out_detach_iommu; 4312 + dev_pasid = domain_add_dev_pasid(domain, dev, pasid); 4313 + if (IS_ERR(dev_pasid)) 4314 + return PTR_ERR(dev_pasid); 4191 4315 4192 4316 if (dmar_domain->use_first_level) 4193 4317 ret = domain_setup_first_level(iommu, dmar_domain, 4194 - dev, pasid); 4318 + dev, pasid, old); 4195 4319 else 4196 - ret = intel_pasid_setup_second_level(iommu, dmar_domain, 4197 - dev, pasid); 4320 + ret = domain_setup_second_level(iommu, dmar_domain, 4321 + dev, pasid, old); 4198 4322 if (ret) 4199 - goto out_unassign_tag; 4323 + goto out_remove_dev_pasid; 4200 4324 4201 - dev_pasid->dev = dev; 4202 - dev_pasid->pasid = pasid; 4203 - spin_lock_irqsave(&dmar_domain->lock, flags); 4204 - list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); 4205 - spin_unlock_irqrestore(&dmar_domain->lock, flags); 4325 + domain_remove_dev_pasid(old, dev, pasid); 4206 4326 4207 - if (domain->type & __IOMMU_DOMAIN_PAGING) 4208 - intel_iommu_debugfs_create_dev_pasid(dev_pasid); 4327 + intel_iommu_debugfs_create_dev_pasid(dev_pasid); 4209 4328 4210 4329 return 0; 4211 - out_unassign_tag: 4212 - cache_tag_unassign_domain(dmar_domain, dev, pasid); 4213 - out_detach_iommu: 4214 - domain_detach_iommu(dmar_domain, iommu); 4215 - out_free: 4216 - kfree(dev_pasid); 4330 + 4331 + out_remove_dev_pasid: 4332 + domain_remove_dev_pasid(domain, dev, pasid); 4217 4333 return ret; 4218 4334 } 4219 4335 ··· 4425 4573 } 4426 4574 4427 4575 static int identity_domain_set_dev_pasid(struct iommu_domain *domain, 4428 - struct device *dev, ioasid_t pasid) 4576 + struct device *dev, ioasid_t pasid, 4577 + struct iommu_domain *old) 4429 4578 { 4430 4579 struct device_domain_info *info = dev_iommu_priv_get(dev); 4431 4580 struct intel_iommu *iommu = info->iommu; 4581 + int ret; 4432 4582 4433 4583 if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev)) 4434 4584 return -EOPNOTSUPP; 4435 4585 4436 - return intel_pasid_setup_pass_through(iommu, dev, pasid); 4586 + ret = domain_setup_passthrough(iommu, dev, pasid, old); 4587 + if (ret) 4588 + return ret; 4589 + 4590 + domain_remove_dev_pasid(old, dev, pasid); 4591 + return 0; 4437 4592 } 4438 4593 4439 4594 static struct iommu_domain identity_domain = { ··· 4451 4592 }, 4452 4593 }; 4453 4594 4595 + static struct iommu_domain *intel_iommu_domain_alloc_paging(struct device *dev) 4596 + { 4597 + struct device_domain_info *info = dev_iommu_priv_get(dev); 4598 + struct intel_iommu *iommu = info->iommu; 4599 + struct dmar_domain *dmar_domain; 4600 + bool first_stage; 4601 + 4602 + first_stage = first_level_by_default(iommu); 4603 + dmar_domain = paging_domain_alloc(dev, first_stage); 4604 + if (IS_ERR(dmar_domain)) 4605 + return ERR_CAST(dmar_domain); 4606 + 4607 + return &dmar_domain->domain; 4608 + } 4609 + 4454 4610 const struct iommu_ops intel_iommu_ops = { 4455 4611 .blocked_domain = &blocking_domain, 4456 4612 .release_domain = &blocking_domain, 4457 4613 .identity_domain = &identity_domain, 4458 4614 .capable = intel_iommu_capable, 4459 4615 .hw_info = intel_iommu_hw_info, 4460 - .domain_alloc = intel_iommu_domain_alloc, 4461 4616 .domain_alloc_user = intel_iommu_domain_alloc_user, 4462 4617 .domain_alloc_sva = intel_svm_domain_alloc, 4618 + .domain_alloc_paging = intel_iommu_domain_alloc_paging, 4463 4619 .probe_device = intel_iommu_probe_device, 4464 4620 .release_device = intel_iommu_release_device, 4465 4621 .get_resv_regions = intel_iommu_get_resv_regions, ··· 4485 4611 .def_domain_type = device_def_domain_type, 4486 4612 .remove_dev_pasid = intel_iommu_remove_dev_pasid, 4487 4613 .pgsize_bitmap = SZ_4K, 4488 - #ifdef CONFIG_INTEL_IOMMU_SVM 4489 - .page_response = intel_svm_page_response, 4490 - #endif 4614 + .page_response = intel_iommu_page_response, 4491 4615 .default_domain_ops = &(const struct iommu_domain_ops) { 4492 4616 .attach_dev = intel_iommu_attach_device, 4493 4617 .set_dev_pasid = intel_iommu_set_dev_pasid,
+42 -14
drivers/iommu/intel/iommu.h
··· 22 22 #include <linux/bitfield.h> 23 23 #include <linux/xarray.h> 24 24 #include <linux/perf_event.h> 25 + #include <linux/pci.h> 25 26 26 27 #include <asm/cacheflush.h> 27 28 #include <asm/iommu.h> ··· 654 653 struct { 655 654 /* parent page table which the user domain is nested on */ 656 655 struct dmar_domain *s2_domain; 657 - /* user page table pointer (in GPA) */ 658 - unsigned long s1_pgtbl; 659 656 /* page table attributes */ 660 657 struct iommu_hwpt_vtd_s1 s1_cfg; 661 658 /* link to parent domain siblings */ ··· 719 720 int msagaw; /* max sagaw of this iommu */ 720 721 unsigned int irq, pr_irq, perf_irq; 721 722 u16 segment; /* PCI segment# */ 722 - unsigned char name[13]; /* Device Name */ 723 + unsigned char name[16]; /* Device Name */ 723 724 724 725 #ifdef CONFIG_INTEL_IOMMU 725 726 unsigned long *domain_ids; /* bitmap of domains */ ··· 729 730 730 731 struct iommu_flush flush; 731 732 #endif 732 - #ifdef CONFIG_INTEL_IOMMU_SVM 733 733 struct page_req_dsc *prq; 734 734 unsigned char prq_name[16]; /* Name for PRQ interrupt */ 735 735 unsigned long prq_seq_number; 736 736 struct completion prq_complete; 737 - #endif 738 737 struct iopf_queue *iopf_queue; 739 738 unsigned char iopfq_name[16]; 740 739 /* Synchronization between fault report and iommu device release. */ ··· 807 810 return container_of(dom, struct dmar_domain, domain); 808 811 } 809 812 813 + /* 814 + * Domain ID reserved for pasid entries programmed for first-level 815 + * only and pass-through transfer modes. 816 + */ 817 + #define FLPT_DEFAULT_DID 1 818 + #define NUM_RESERVED_DID 2 819 + 810 820 /* Retrieve the domain ID which has allocated to the domain */ 811 821 static inline u16 812 822 domain_id_iommu(struct dmar_domain *domain, struct intel_iommu *iommu) ··· 822 818 xa_load(&domain->iommu_array, iommu->seq_id); 823 819 824 820 return info->did; 821 + } 822 + 823 + static inline u16 824 + iommu_domain_did(struct iommu_domain *domain, struct intel_iommu *iommu) 825 + { 826 + if (domain->type == IOMMU_DOMAIN_SVA || 827 + domain->type == IOMMU_DOMAIN_IDENTITY) 828 + return FLPT_DEFAULT_DID; 829 + return domain_id_iommu(to_dmar_domain(domain), iommu); 830 + } 831 + 832 + static inline bool dev_is_real_dma_subdevice(struct device *dev) 833 + { 834 + return dev && dev_is_pci(dev) && 835 + pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev); 825 836 } 826 837 827 838 /* ··· 1249 1230 int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu); 1250 1231 void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu); 1251 1232 void device_block_translation(struct device *dev); 1252 - int prepare_domain_attach_device(struct iommu_domain *domain, 1253 - struct device *dev); 1254 - void domain_update_iommu_cap(struct dmar_domain *domain); 1233 + int paging_domain_compatible(struct iommu_domain *domain, struct device *dev); 1234 + 1235 + struct dev_pasid_info * 1236 + domain_add_dev_pasid(struct iommu_domain *domain, 1237 + struct device *dev, ioasid_t pasid); 1238 + void domain_remove_dev_pasid(struct iommu_domain *domain, 1239 + struct device *dev, ioasid_t pasid); 1240 + 1241 + int __domain_setup_first_level(struct intel_iommu *iommu, 1242 + struct device *dev, ioasid_t pasid, 1243 + u16 did, pgd_t *pgd, int flags, 1244 + struct iommu_domain *old); 1255 1245 1256 1246 int dmar_ir_support(void); 1257 1247 ··· 1306 1278 struct context_entry *context, 1307 1279 u16 did, bool affect_domains); 1308 1280 1281 + int intel_iommu_enable_prq(struct intel_iommu *iommu); 1282 + int intel_iommu_finish_prq(struct intel_iommu *iommu); 1283 + void intel_iommu_page_response(struct device *dev, struct iopf_fault *evt, 1284 + struct iommu_page_response *msg); 1285 + void intel_iommu_drain_pasid_prq(struct device *dev, u32 pasid); 1286 + 1309 1287 #ifdef CONFIG_INTEL_IOMMU_SVM 1310 1288 void intel_svm_check(struct intel_iommu *iommu); 1311 - int intel_svm_enable_prq(struct intel_iommu *iommu); 1312 - int intel_svm_finish_prq(struct intel_iommu *iommu); 1313 - void intel_svm_page_response(struct device *dev, struct iopf_fault *evt, 1314 - struct iommu_page_response *msg); 1315 1289 struct iommu_domain *intel_svm_domain_alloc(struct device *dev, 1316 1290 struct mm_struct *mm); 1317 - void intel_drain_pasid_prq(struct device *dev, u32 pasid); 1318 1291 #else 1319 1292 static inline void intel_svm_check(struct intel_iommu *iommu) {} 1320 - static inline void intel_drain_pasid_prq(struct device *dev, u32 pasid) {} 1321 1293 static inline struct iommu_domain *intel_svm_domain_alloc(struct device *dev, 1322 1294 struct mm_struct *mm) 1323 1295 {
+2 -2
drivers/iommu/intel/irq_remapping.c
··· 312 312 313 313 for (i = 0; i < MAX_IO_APICS; i++) { 314 314 if (ir_ioapic[i].iommu && ir_ioapic[i].id == apic) { 315 - sid = (ir_ioapic[i].bus << 8) | ir_ioapic[i].devfn; 315 + sid = PCI_DEVID(ir_ioapic[i].bus, ir_ioapic[i].devfn); 316 316 break; 317 317 } 318 318 } ··· 337 337 338 338 for (i = 0; i < MAX_HPET_TBS; i++) { 339 339 if (ir_hpet[i].iommu && ir_hpet[i].id == id) { 340 - sid = (ir_hpet[i].bus << 8) | ir_hpet[i].devfn; 340 + sid = PCI_DEVID(ir_hpet[i].bus, ir_hpet[i].devfn); 341 341 break; 342 342 } 343 343 }
+51 -2
drivers/iommu/intel/nested.c
··· 40 40 * The s2_domain will be used in nested translation, hence needs 41 41 * to ensure the s2_domain is compatible with this IOMMU. 42 42 */ 43 - ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev); 43 + ret = paging_domain_compatible(&dmar_domain->s2_domain->domain, dev); 44 44 if (ret) { 45 45 dev_err_ratelimited(dev, "s2 domain is not compatible\n"); 46 46 return ret; ··· 130 130 return ret; 131 131 } 132 132 133 + static int domain_setup_nested(struct intel_iommu *iommu, 134 + struct dmar_domain *domain, 135 + struct device *dev, ioasid_t pasid, 136 + struct iommu_domain *old) 137 + { 138 + if (!old) 139 + return intel_pasid_setup_nested(iommu, dev, pasid, domain); 140 + return intel_pasid_replace_nested(iommu, dev, pasid, 141 + iommu_domain_did(old, iommu), 142 + domain); 143 + } 144 + 145 + static int intel_nested_set_dev_pasid(struct iommu_domain *domain, 146 + struct device *dev, ioasid_t pasid, 147 + struct iommu_domain *old) 148 + { 149 + struct device_domain_info *info = dev_iommu_priv_get(dev); 150 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 151 + struct intel_iommu *iommu = info->iommu; 152 + struct dev_pasid_info *dev_pasid; 153 + int ret; 154 + 155 + if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev)) 156 + return -EOPNOTSUPP; 157 + 158 + if (context_copied(iommu, info->bus, info->devfn)) 159 + return -EBUSY; 160 + 161 + ret = paging_domain_compatible(&dmar_domain->s2_domain->domain, dev); 162 + if (ret) 163 + return ret; 164 + 165 + dev_pasid = domain_add_dev_pasid(domain, dev, pasid); 166 + if (IS_ERR(dev_pasid)) 167 + return PTR_ERR(dev_pasid); 168 + 169 + ret = domain_setup_nested(iommu, dmar_domain, dev, pasid, old); 170 + if (ret) 171 + goto out_remove_dev_pasid; 172 + 173 + domain_remove_dev_pasid(old, dev, pasid); 174 + 175 + return 0; 176 + 177 + out_remove_dev_pasid: 178 + domain_remove_dev_pasid(domain, dev, pasid); 179 + return ret; 180 + } 181 + 133 182 static const struct iommu_domain_ops intel_nested_domain_ops = { 134 183 .attach_dev = intel_nested_attach_dev, 184 + .set_dev_pasid = intel_nested_set_dev_pasid, 135 185 .free = intel_nested_domain_free, 136 186 .cache_invalidate_user = intel_nested_cache_invalidate_user, 137 187 }; ··· 212 162 213 163 domain->use_first_level = true; 214 164 domain->s2_domain = s2_domain; 215 - domain->s1_pgtbl = vtd.pgtbl_addr; 216 165 domain->s1_cfg = vtd; 217 166 domain->domain.ops = &intel_nested_domain_ops; 218 167 domain->domain.type = IOMMU_DOMAIN_NESTED;
+323 -102
drivers/iommu/intel/pasid.c
··· 220 220 if (pci_dev_is_disconnected(to_pci_dev(dev))) 221 221 return; 222 222 223 - sid = info->bus << 8 | info->devfn; 223 + sid = PCI_DEVID(info->bus, info->devfn); 224 224 qdep = info->ats_qdep; 225 225 pfsid = info->pfsid; 226 226 ··· 265 265 iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); 266 266 267 267 devtlb_invalidation_with_pasid(iommu, dev, pasid); 268 + intel_iommu_drain_pasid_prq(dev, pasid); 268 269 } 269 270 270 271 /* ··· 288 287 } 289 288 290 289 /* 290 + * This function is supposed to be used after caller updates the fields 291 + * except for the SSADE and P bit of a pasid table entry. It does the 292 + * below: 293 + * - Flush cacheline if needed 294 + * - Flush the caches per Table 28 ”Guidance to Software for Invalidations“ 295 + * of VT-d spec 5.0. 296 + */ 297 + static void intel_pasid_flush_present(struct intel_iommu *iommu, 298 + struct device *dev, 299 + u32 pasid, u16 did, 300 + struct pasid_entry *pte) 301 + { 302 + if (!ecap_coherent(iommu->ecap)) 303 + clflush_cache_range(pte, sizeof(*pte)); 304 + 305 + /* 306 + * VT-d spec 5.0 table28 states guides for cache invalidation: 307 + * 308 + * - PASID-selective-within-Domain PASID-cache invalidation 309 + * - PASID-selective PASID-based IOTLB invalidation 310 + * - If (pasid is RID_PASID) 311 + * - Global Device-TLB invalidation to affected functions 312 + * Else 313 + * - PASID-based Device-TLB invalidation (with S=1 and 314 + * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions 315 + */ 316 + pasid_cache_invalidation_with_pasid(iommu, did, pasid); 317 + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0); 318 + 319 + devtlb_invalidation_with_pasid(iommu, dev, pasid); 320 + } 321 + 322 + /* 291 323 * Set up the scalable mode pasid table entry for first only 292 324 * translation type. 293 325 */ 326 + static void pasid_pte_config_first_level(struct intel_iommu *iommu, 327 + struct pasid_entry *pte, 328 + pgd_t *pgd, u16 did, int flags) 329 + { 330 + lockdep_assert_held(&iommu->lock); 331 + 332 + pasid_clear_entry(pte); 333 + 334 + /* Setup the first level page table pointer: */ 335 + pasid_set_flptr(pte, (u64)__pa(pgd)); 336 + 337 + if (flags & PASID_FLAG_FL5LP) 338 + pasid_set_flpm(pte, 1); 339 + 340 + if (flags & PASID_FLAG_PAGE_SNOOP) 341 + pasid_set_pgsnp(pte); 342 + 343 + pasid_set_domain_id(pte, did); 344 + pasid_set_address_width(pte, iommu->agaw); 345 + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 346 + 347 + /* Setup Present and PASID Granular Transfer Type: */ 348 + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_FL_ONLY); 349 + pasid_set_present(pte); 350 + } 351 + 294 352 int intel_pasid_setup_first_level(struct intel_iommu *iommu, 295 353 struct device *dev, pgd_t *pgd, 296 354 u32 pasid, u16 did, int flags) ··· 380 320 return -EBUSY; 381 321 } 382 322 383 - pasid_clear_entry(pte); 323 + pasid_pte_config_first_level(iommu, pte, pgd, did, flags); 384 324 385 - /* Setup the first level page table pointer: */ 386 - pasid_set_flptr(pte, (u64)__pa(pgd)); 387 - 388 - if (flags & PASID_FLAG_FL5LP) 389 - pasid_set_flpm(pte, 1); 390 - 391 - if (flags & PASID_FLAG_PAGE_SNOOP) 392 - pasid_set_pgsnp(pte); 393 - 394 - pasid_set_domain_id(pte, did); 395 - pasid_set_address_width(pte, iommu->agaw); 396 - pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 397 - 398 - /* Setup Present and PASID Granular Transfer Type: */ 399 - pasid_set_translation_type(pte, PASID_ENTRY_PGTT_FL_ONLY); 400 - pasid_set_present(pte); 401 325 spin_unlock(&iommu->lock); 402 326 403 327 pasid_flush_caches(iommu, pte, pasid, did); ··· 389 345 return 0; 390 346 } 391 347 392 - /* 393 - * Skip top levels of page tables for iommu which has less agaw 394 - * than default. Unnecessary for PT mode. 395 - */ 396 - static int iommu_skip_agaw(struct dmar_domain *domain, 397 - struct intel_iommu *iommu, 398 - struct dma_pte **pgd) 348 + int intel_pasid_replace_first_level(struct intel_iommu *iommu, 349 + struct device *dev, pgd_t *pgd, 350 + u32 pasid, u16 did, u16 old_did, 351 + int flags) 399 352 { 400 - int agaw; 353 + struct pasid_entry *pte, new_pte; 401 354 402 - for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { 403 - *pgd = phys_to_virt(dma_pte_addr(*pgd)); 404 - if (!dma_pte_present(*pgd)) 405 - return -EINVAL; 355 + if (!ecap_flts(iommu->ecap)) { 356 + pr_err("No first level translation support on %s\n", 357 + iommu->name); 358 + return -EINVAL; 406 359 } 407 360 408 - return agaw; 361 + if ((flags & PASID_FLAG_FL5LP) && !cap_fl5lp_support(iommu->cap)) { 362 + pr_err("No 5-level paging support for first-level on %s\n", 363 + iommu->name); 364 + return -EINVAL; 365 + } 366 + 367 + pasid_pte_config_first_level(iommu, &new_pte, pgd, did, flags); 368 + 369 + spin_lock(&iommu->lock); 370 + pte = intel_pasid_get_entry(dev, pasid); 371 + if (!pte) { 372 + spin_unlock(&iommu->lock); 373 + return -ENODEV; 374 + } 375 + 376 + if (!pasid_pte_is_present(pte)) { 377 + spin_unlock(&iommu->lock); 378 + return -EINVAL; 379 + } 380 + 381 + WARN_ON(old_did != pasid_get_domain_id(pte)); 382 + 383 + *pte = new_pte; 384 + spin_unlock(&iommu->lock); 385 + 386 + intel_pasid_flush_present(iommu, dev, pasid, old_did, pte); 387 + intel_iommu_drain_pasid_prq(dev, pasid); 388 + 389 + return 0; 409 390 } 410 391 411 392 /* 412 393 * Set up the scalable mode pasid entry for second only translation type. 413 394 */ 395 + static void pasid_pte_config_second_level(struct intel_iommu *iommu, 396 + struct pasid_entry *pte, 397 + u64 pgd_val, int agaw, u16 did, 398 + bool dirty_tracking) 399 + { 400 + lockdep_assert_held(&iommu->lock); 401 + 402 + pasid_clear_entry(pte); 403 + pasid_set_domain_id(pte, did); 404 + pasid_set_slptr(pte, pgd_val); 405 + pasid_set_address_width(pte, agaw); 406 + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY); 407 + pasid_set_fault_enable(pte); 408 + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 409 + if (dirty_tracking) 410 + pasid_set_ssade(pte); 411 + 412 + pasid_set_present(pte); 413 + } 414 + 414 415 int intel_pasid_setup_second_level(struct intel_iommu *iommu, 415 416 struct dmar_domain *domain, 416 417 struct device *dev, u32 pasid) ··· 463 374 struct pasid_entry *pte; 464 375 struct dma_pte *pgd; 465 376 u64 pgd_val; 466 - int agaw; 467 377 u16 did; 468 378 469 379 /* ··· 476 388 } 477 389 478 390 pgd = domain->pgd; 479 - agaw = iommu_skip_agaw(domain, iommu, &pgd); 480 - if (agaw < 0) { 481 - dev_err(dev, "Invalid domain page table\n"); 482 - return -EINVAL; 483 - } 484 - 485 391 pgd_val = virt_to_phys(pgd); 486 392 did = domain_id_iommu(domain, iommu); 487 393 ··· 491 409 return -EBUSY; 492 410 } 493 411 494 - pasid_clear_entry(pte); 495 - pasid_set_domain_id(pte, did); 496 - pasid_set_slptr(pte, pgd_val); 497 - pasid_set_address_width(pte, agaw); 498 - pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY); 499 - pasid_set_fault_enable(pte); 500 - pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 501 - if (domain->dirty_tracking) 502 - pasid_set_ssade(pte); 503 - 504 - pasid_set_present(pte); 412 + pasid_pte_config_second_level(iommu, pte, pgd_val, domain->agaw, 413 + did, domain->dirty_tracking); 505 414 spin_unlock(&iommu->lock); 506 415 507 416 pasid_flush_caches(iommu, pte, pasid, did); 417 + 418 + return 0; 419 + } 420 + 421 + int intel_pasid_replace_second_level(struct intel_iommu *iommu, 422 + struct dmar_domain *domain, 423 + struct device *dev, u16 old_did, 424 + u32 pasid) 425 + { 426 + struct pasid_entry *pte, new_pte; 427 + struct dma_pte *pgd; 428 + u64 pgd_val; 429 + u16 did; 430 + 431 + /* 432 + * If hardware advertises no support for second level 433 + * translation, return directly. 434 + */ 435 + if (!ecap_slts(iommu->ecap)) { 436 + pr_err("No second level translation support on %s\n", 437 + iommu->name); 438 + return -EINVAL; 439 + } 440 + 441 + pgd = domain->pgd; 442 + pgd_val = virt_to_phys(pgd); 443 + did = domain_id_iommu(domain, iommu); 444 + 445 + pasid_pte_config_second_level(iommu, &new_pte, pgd_val, 446 + domain->agaw, did, 447 + domain->dirty_tracking); 448 + 449 + spin_lock(&iommu->lock); 450 + pte = intel_pasid_get_entry(dev, pasid); 451 + if (!pte) { 452 + spin_unlock(&iommu->lock); 453 + return -ENODEV; 454 + } 455 + 456 + if (!pasid_pte_is_present(pte)) { 457 + spin_unlock(&iommu->lock); 458 + return -EINVAL; 459 + } 460 + 461 + WARN_ON(old_did != pasid_get_domain_id(pte)); 462 + 463 + *pte = new_pte; 464 + spin_unlock(&iommu->lock); 465 + 466 + intel_pasid_flush_present(iommu, dev, pasid, old_did, pte); 467 + intel_iommu_drain_pasid_prq(dev, pasid); 508 468 509 469 return 0; 510 470 } ··· 623 499 /* 624 500 * Set up the scalable mode pasid entry for passthrough translation type. 625 501 */ 502 + static void pasid_pte_config_pass_through(struct intel_iommu *iommu, 503 + struct pasid_entry *pte, u16 did) 504 + { 505 + lockdep_assert_held(&iommu->lock); 506 + 507 + pasid_clear_entry(pte); 508 + pasid_set_domain_id(pte, did); 509 + pasid_set_address_width(pte, iommu->agaw); 510 + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_PT); 511 + pasid_set_fault_enable(pte); 512 + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 513 + pasid_set_present(pte); 514 + } 515 + 626 516 int intel_pasid_setup_pass_through(struct intel_iommu *iommu, 627 517 struct device *dev, u32 pasid) 628 518 { ··· 655 517 return -EBUSY; 656 518 } 657 519 658 - pasid_clear_entry(pte); 659 - pasid_set_domain_id(pte, did); 660 - pasid_set_address_width(pte, iommu->agaw); 661 - pasid_set_translation_type(pte, PASID_ENTRY_PGTT_PT); 662 - pasid_set_fault_enable(pte); 663 - pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 664 - pasid_set_present(pte); 520 + pasid_pte_config_pass_through(iommu, pte, did); 665 521 spin_unlock(&iommu->lock); 666 522 667 523 pasid_flush_caches(iommu, pte, pasid, did); 524 + 525 + return 0; 526 + } 527 + 528 + int intel_pasid_replace_pass_through(struct intel_iommu *iommu, 529 + struct device *dev, u16 old_did, 530 + u32 pasid) 531 + { 532 + struct pasid_entry *pte, new_pte; 533 + u16 did = FLPT_DEFAULT_DID; 534 + 535 + pasid_pte_config_pass_through(iommu, &new_pte, did); 536 + 537 + spin_lock(&iommu->lock); 538 + pte = intel_pasid_get_entry(dev, pasid); 539 + if (!pte) { 540 + spin_unlock(&iommu->lock); 541 + return -ENODEV; 542 + } 543 + 544 + if (!pasid_pte_is_present(pte)) { 545 + spin_unlock(&iommu->lock); 546 + return -EINVAL; 547 + } 548 + 549 + WARN_ON(old_did != pasid_get_domain_id(pte)); 550 + 551 + *pte = new_pte; 552 + spin_unlock(&iommu->lock); 553 + 554 + intel_pasid_flush_present(iommu, dev, pasid, old_did, pte); 555 + intel_iommu_drain_pasid_prq(dev, pasid); 668 556 669 557 return 0; 670 558 } ··· 715 551 did = pasid_get_domain_id(pte); 716 552 spin_unlock(&iommu->lock); 717 553 718 - if (!ecap_coherent(iommu->ecap)) 719 - clflush_cache_range(pte, sizeof(*pte)); 554 + intel_pasid_flush_present(iommu, dev, pasid, did, pte); 555 + } 720 556 721 - /* 722 - * VT-d spec 3.4 table23 states guides for cache invalidation: 723 - * 724 - * - PASID-selective-within-Domain PASID-cache invalidation 725 - * - PASID-selective PASID-based IOTLB invalidation 726 - * - If (pasid is RID_PASID) 727 - * - Global Device-TLB invalidation to affected functions 728 - * Else 729 - * - PASID-based Device-TLB invalidation (with S=1 and 730 - * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions 731 - */ 732 - pasid_cache_invalidation_with_pasid(iommu, did, pasid); 733 - qi_flush_piotlb(iommu, did, pasid, 0, -1, 0); 557 + static void pasid_pte_config_nestd(struct intel_iommu *iommu, 558 + struct pasid_entry *pte, 559 + struct iommu_hwpt_vtd_s1 *s1_cfg, 560 + struct dmar_domain *s2_domain, 561 + u16 did) 562 + { 563 + struct dma_pte *pgd = s2_domain->pgd; 734 564 735 - devtlb_invalidation_with_pasid(iommu, dev, pasid); 565 + lockdep_assert_held(&iommu->lock); 566 + 567 + pasid_clear_entry(pte); 568 + 569 + if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL) 570 + pasid_set_flpm(pte, 1); 571 + 572 + pasid_set_flptr(pte, s1_cfg->pgtbl_addr); 573 + 574 + if (s1_cfg->flags & IOMMU_VTD_S1_SRE) { 575 + pasid_set_sre(pte); 576 + if (s1_cfg->flags & IOMMU_VTD_S1_WPE) 577 + pasid_set_wpe(pte); 578 + } 579 + 580 + if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) 581 + pasid_set_eafe(pte); 582 + 583 + if (s2_domain->force_snooping) 584 + pasid_set_pgsnp(pte); 585 + 586 + pasid_set_slptr(pte, virt_to_phys(pgd)); 587 + pasid_set_fault_enable(pte); 588 + pasid_set_domain_id(pte, did); 589 + pasid_set_address_width(pte, s2_domain->agaw); 590 + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 591 + if (s2_domain->dirty_tracking) 592 + pasid_set_ssade(pte); 593 + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); 594 + pasid_set_present(pte); 736 595 } 737 596 738 597 /** ··· 773 586 u32 pasid, struct dmar_domain *domain) 774 587 { 775 588 struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg; 776 - pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl; 777 589 struct dmar_domain *s2_domain = domain->s2_domain; 778 590 u16 did = domain_id_iommu(domain, iommu); 779 - struct dma_pte *pgd = s2_domain->pgd; 780 591 struct pasid_entry *pte; 781 592 782 593 /* Address width should match the address width supported by hardware */ ··· 817 632 return -EBUSY; 818 633 } 819 634 820 - pasid_clear_entry(pte); 821 - 822 - if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL) 823 - pasid_set_flpm(pte, 1); 824 - 825 - pasid_set_flptr(pte, (uintptr_t)s1_gpgd); 826 - 827 - if (s1_cfg->flags & IOMMU_VTD_S1_SRE) { 828 - pasid_set_sre(pte); 829 - if (s1_cfg->flags & IOMMU_VTD_S1_WPE) 830 - pasid_set_wpe(pte); 831 - } 832 - 833 - if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) 834 - pasid_set_eafe(pte); 835 - 836 - if (s2_domain->force_snooping) 837 - pasid_set_pgsnp(pte); 838 - 839 - pasid_set_slptr(pte, virt_to_phys(pgd)); 840 - pasid_set_fault_enable(pte); 841 - pasid_set_domain_id(pte, did); 842 - pasid_set_address_width(pte, s2_domain->agaw); 843 - pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 844 - if (s2_domain->dirty_tracking) 845 - pasid_set_ssade(pte); 846 - pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); 847 - pasid_set_present(pte); 635 + pasid_pte_config_nestd(iommu, pte, s1_cfg, s2_domain, did); 848 636 spin_unlock(&iommu->lock); 849 637 850 638 pasid_flush_caches(iommu, pte, pasid, did); 639 + 640 + return 0; 641 + } 642 + 643 + int intel_pasid_replace_nested(struct intel_iommu *iommu, 644 + struct device *dev, u32 pasid, 645 + u16 old_did, struct dmar_domain *domain) 646 + { 647 + struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg; 648 + struct dmar_domain *s2_domain = domain->s2_domain; 649 + u16 did = domain_id_iommu(domain, iommu); 650 + struct pasid_entry *pte, new_pte; 651 + 652 + /* Address width should match the address width supported by hardware */ 653 + switch (s1_cfg->addr_width) { 654 + case ADDR_WIDTH_4LEVEL: 655 + break; 656 + case ADDR_WIDTH_5LEVEL: 657 + if (!cap_fl5lp_support(iommu->cap)) { 658 + dev_err_ratelimited(dev, 659 + "5-level paging not supported\n"); 660 + return -EINVAL; 661 + } 662 + break; 663 + default: 664 + dev_err_ratelimited(dev, "Invalid stage-1 address width %d\n", 665 + s1_cfg->addr_width); 666 + return -EINVAL; 667 + } 668 + 669 + if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) { 670 + pr_err_ratelimited("No supervisor request support on %s\n", 671 + iommu->name); 672 + return -EINVAL; 673 + } 674 + 675 + if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) { 676 + pr_err_ratelimited("No extended access flag support on %s\n", 677 + iommu->name); 678 + return -EINVAL; 679 + } 680 + 681 + pasid_pte_config_nestd(iommu, &new_pte, s1_cfg, s2_domain, did); 682 + 683 + spin_lock(&iommu->lock); 684 + pte = intel_pasid_get_entry(dev, pasid); 685 + if (!pte) { 686 + spin_unlock(&iommu->lock); 687 + return -ENODEV; 688 + } 689 + 690 + if (!pasid_pte_is_present(pte)) { 691 + spin_unlock(&iommu->lock); 692 + return -EINVAL; 693 + } 694 + 695 + WARN_ON(old_did != pasid_get_domain_id(pte)); 696 + 697 + *pte = new_pte; 698 + spin_unlock(&iommu->lock); 699 + 700 + intel_pasid_flush_present(iommu, dev, pasid, old_did, pte); 701 + intel_iommu_drain_pasid_prq(dev, pasid); 851 702 852 703 return 0; 853 704 }
+15 -7
drivers/iommu/intel/pasid.h
··· 22 22 #define is_pasid_enabled(entry) (((entry)->lo >> 3) & 0x1) 23 23 #define get_pasid_dir_size(entry) (1 << ((((entry)->lo >> 9) & 0x7) + 7)) 24 24 25 - /* 26 - * Domain ID reserved for pasid entries programmed for first-level 27 - * only and pass-through transfer modes. 28 - */ 29 - #define FLPT_DEFAULT_DID 1 30 - #define NUM_RESERVED_DID 2 31 - 32 25 #define PASID_FLAG_NESTED BIT(1) 33 26 #define PASID_FLAG_PAGE_SNOOP BIT(2) 34 27 ··· 296 303 struct device *dev, u32 pasid); 297 304 int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev, 298 305 u32 pasid, struct dmar_domain *domain); 306 + int intel_pasid_replace_first_level(struct intel_iommu *iommu, 307 + struct device *dev, pgd_t *pgd, 308 + u32 pasid, u16 did, u16 old_did, 309 + int flags); 310 + int intel_pasid_replace_second_level(struct intel_iommu *iommu, 311 + struct dmar_domain *domain, 312 + struct device *dev, u16 old_did, 313 + u32 pasid); 314 + int intel_pasid_replace_pass_through(struct intel_iommu *iommu, 315 + struct device *dev, u16 old_did, 316 + u32 pasid); 317 + int intel_pasid_replace_nested(struct intel_iommu *iommu, 318 + struct device *dev, u32 pasid, 319 + u16 old_did, struct dmar_domain *domain); 320 + 299 321 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, 300 322 struct device *dev, u32 pasid, 301 323 bool fault_ignore);
+396
drivers/iommu/intel/prq.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2015 Intel Corporation 4 + * 5 + * Originally split from drivers/iommu/intel/svm.c 6 + */ 7 + 8 + #include <linux/pci.h> 9 + #include <linux/pci-ats.h> 10 + 11 + #include "iommu.h" 12 + #include "pasid.h" 13 + #include "../iommu-pages.h" 14 + #include "trace.h" 15 + 16 + /* Page request queue descriptor */ 17 + struct page_req_dsc { 18 + union { 19 + struct { 20 + u64 type:8; 21 + u64 pasid_present:1; 22 + u64 rsvd:7; 23 + u64 rid:16; 24 + u64 pasid:20; 25 + u64 exe_req:1; 26 + u64 pm_req:1; 27 + u64 rsvd2:10; 28 + }; 29 + u64 qw_0; 30 + }; 31 + union { 32 + struct { 33 + u64 rd_req:1; 34 + u64 wr_req:1; 35 + u64 lpig:1; 36 + u64 prg_index:9; 37 + u64 addr:52; 38 + }; 39 + u64 qw_1; 40 + }; 41 + u64 qw_2; 42 + u64 qw_3; 43 + }; 44 + 45 + /** 46 + * intel_iommu_drain_pasid_prq - Drain page requests and responses for a pasid 47 + * @dev: target device 48 + * @pasid: pasid for draining 49 + * 50 + * Drain all pending page requests and responses related to @pasid in both 51 + * software and hardware. This is supposed to be called after the device 52 + * driver has stopped DMA, the pasid entry has been cleared, and both IOTLB 53 + * and DevTLB have been invalidated. 54 + * 55 + * It waits until all pending page requests for @pasid in the page fault 56 + * queue are completed by the prq handling thread. Then follow the steps 57 + * described in VT-d spec CH7.10 to drain all page requests and page 58 + * responses pending in the hardware. 59 + */ 60 + void intel_iommu_drain_pasid_prq(struct device *dev, u32 pasid) 61 + { 62 + struct device_domain_info *info; 63 + struct dmar_domain *domain; 64 + struct intel_iommu *iommu; 65 + struct qi_desc desc[3]; 66 + int head, tail; 67 + u16 sid, did; 68 + 69 + info = dev_iommu_priv_get(dev); 70 + if (!info->pri_enabled) 71 + return; 72 + 73 + iommu = info->iommu; 74 + domain = info->domain; 75 + sid = PCI_DEVID(info->bus, info->devfn); 76 + did = domain ? domain_id_iommu(domain, iommu) : FLPT_DEFAULT_DID; 77 + 78 + /* 79 + * Check and wait until all pending page requests in the queue are 80 + * handled by the prq handling thread. 81 + */ 82 + prq_retry: 83 + reinit_completion(&iommu->prq_complete); 84 + tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK; 85 + head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK; 86 + while (head != tail) { 87 + struct page_req_dsc *req; 88 + 89 + req = &iommu->prq[head / sizeof(*req)]; 90 + if (!req->pasid_present || req->pasid != pasid) { 91 + head = (head + sizeof(*req)) & PRQ_RING_MASK; 92 + continue; 93 + } 94 + 95 + wait_for_completion(&iommu->prq_complete); 96 + goto prq_retry; 97 + } 98 + 99 + iopf_queue_flush_dev(dev); 100 + 101 + /* 102 + * Perform steps described in VT-d spec CH7.10 to drain page 103 + * requests and responses in hardware. 104 + */ 105 + memset(desc, 0, sizeof(desc)); 106 + desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) | 107 + QI_IWD_FENCE | 108 + QI_IWD_TYPE; 109 + if (pasid == IOMMU_NO_PASID) { 110 + qi_desc_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH, &desc[1]); 111 + qi_desc_dev_iotlb(sid, info->pfsid, info->ats_qdep, 0, 112 + MAX_AGAW_PFN_WIDTH, &desc[2]); 113 + } else { 114 + qi_desc_piotlb(did, pasid, 0, -1, 0, &desc[1]); 115 + qi_desc_dev_iotlb_pasid(sid, info->pfsid, pasid, info->ats_qdep, 116 + 0, MAX_AGAW_PFN_WIDTH, &desc[2]); 117 + } 118 + qi_retry: 119 + reinit_completion(&iommu->prq_complete); 120 + qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN); 121 + if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) { 122 + wait_for_completion(&iommu->prq_complete); 123 + goto qi_retry; 124 + } 125 + } 126 + 127 + static bool is_canonical_address(u64 addr) 128 + { 129 + int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1); 130 + long saddr = (long)addr; 131 + 132 + return (((saddr << shift) >> shift) == saddr); 133 + } 134 + 135 + static void handle_bad_prq_event(struct intel_iommu *iommu, 136 + struct page_req_dsc *req, int result) 137 + { 138 + struct qi_desc desc = { }; 139 + 140 + pr_err("%s: Invalid page request: %08llx %08llx\n", 141 + iommu->name, ((unsigned long long *)req)[0], 142 + ((unsigned long long *)req)[1]); 143 + 144 + if (!req->lpig) 145 + return; 146 + 147 + desc.qw0 = QI_PGRP_PASID(req->pasid) | 148 + QI_PGRP_DID(req->rid) | 149 + QI_PGRP_PASID_P(req->pasid_present) | 150 + QI_PGRP_RESP_CODE(result) | 151 + QI_PGRP_RESP_TYPE; 152 + desc.qw1 = QI_PGRP_IDX(req->prg_index) | 153 + QI_PGRP_LPIG(req->lpig); 154 + 155 + qi_submit_sync(iommu, &desc, 1, 0); 156 + } 157 + 158 + static int prq_to_iommu_prot(struct page_req_dsc *req) 159 + { 160 + int prot = 0; 161 + 162 + if (req->rd_req) 163 + prot |= IOMMU_FAULT_PERM_READ; 164 + if (req->wr_req) 165 + prot |= IOMMU_FAULT_PERM_WRITE; 166 + if (req->exe_req) 167 + prot |= IOMMU_FAULT_PERM_EXEC; 168 + if (req->pm_req) 169 + prot |= IOMMU_FAULT_PERM_PRIV; 170 + 171 + return prot; 172 + } 173 + 174 + static void intel_prq_report(struct intel_iommu *iommu, struct device *dev, 175 + struct page_req_dsc *desc) 176 + { 177 + struct iopf_fault event = { }; 178 + 179 + /* Fill in event data for device specific processing */ 180 + event.fault.type = IOMMU_FAULT_PAGE_REQ; 181 + event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT; 182 + event.fault.prm.pasid = desc->pasid; 183 + event.fault.prm.grpid = desc->prg_index; 184 + event.fault.prm.perm = prq_to_iommu_prot(desc); 185 + 186 + if (desc->lpig) 187 + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 188 + if (desc->pasid_present) { 189 + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 190 + event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID; 191 + } 192 + 193 + iommu_report_device_fault(dev, &event); 194 + } 195 + 196 + static irqreturn_t prq_event_thread(int irq, void *d) 197 + { 198 + struct intel_iommu *iommu = d; 199 + struct page_req_dsc *req; 200 + int head, tail, handled; 201 + struct device *dev; 202 + u64 address; 203 + 204 + /* 205 + * Clear PPR bit before reading head/tail registers, to ensure that 206 + * we get a new interrupt if needed. 207 + */ 208 + writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG); 209 + 210 + tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK; 211 + head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK; 212 + handled = (head != tail); 213 + while (head != tail) { 214 + req = &iommu->prq[head / sizeof(*req)]; 215 + address = (u64)req->addr << VTD_PAGE_SHIFT; 216 + 217 + if (unlikely(!is_canonical_address(address))) { 218 + pr_err("IOMMU: %s: Address is not canonical\n", 219 + iommu->name); 220 + bad_req: 221 + handle_bad_prq_event(iommu, req, QI_RESP_INVALID); 222 + goto prq_advance; 223 + } 224 + 225 + if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) { 226 + pr_err("IOMMU: %s: Page request in Privilege Mode\n", 227 + iommu->name); 228 + goto bad_req; 229 + } 230 + 231 + if (unlikely(req->exe_req && req->rd_req)) { 232 + pr_err("IOMMU: %s: Execution request not supported\n", 233 + iommu->name); 234 + goto bad_req; 235 + } 236 + 237 + /* Drop Stop Marker message. No need for a response. */ 238 + if (unlikely(req->lpig && !req->rd_req && !req->wr_req)) 239 + goto prq_advance; 240 + 241 + /* 242 + * If prq is to be handled outside iommu driver via receiver of 243 + * the fault notifiers, we skip the page response here. 244 + */ 245 + mutex_lock(&iommu->iopf_lock); 246 + dev = device_rbtree_find(iommu, req->rid); 247 + if (!dev) { 248 + mutex_unlock(&iommu->iopf_lock); 249 + goto bad_req; 250 + } 251 + 252 + intel_prq_report(iommu, dev, req); 253 + trace_prq_report(iommu, dev, req->qw_0, req->qw_1, 254 + req->qw_2, req->qw_3, 255 + iommu->prq_seq_number++); 256 + mutex_unlock(&iommu->iopf_lock); 257 + prq_advance: 258 + head = (head + sizeof(*req)) & PRQ_RING_MASK; 259 + } 260 + 261 + dmar_writeq(iommu->reg + DMAR_PQH_REG, tail); 262 + 263 + /* 264 + * Clear the page request overflow bit and wake up all threads that 265 + * are waiting for the completion of this handling. 266 + */ 267 + if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) { 268 + pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n", 269 + iommu->name); 270 + head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK; 271 + tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK; 272 + if (head == tail) { 273 + iopf_queue_discard_partial(iommu->iopf_queue); 274 + writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG); 275 + pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared", 276 + iommu->name); 277 + } 278 + } 279 + 280 + if (!completion_done(&iommu->prq_complete)) 281 + complete(&iommu->prq_complete); 282 + 283 + return IRQ_RETVAL(handled); 284 + } 285 + 286 + int intel_iommu_enable_prq(struct intel_iommu *iommu) 287 + { 288 + struct iopf_queue *iopfq; 289 + int irq, ret; 290 + 291 + iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER); 292 + if (!iommu->prq) { 293 + pr_warn("IOMMU: %s: Failed to allocate page request queue\n", 294 + iommu->name); 295 + return -ENOMEM; 296 + } 297 + 298 + irq = dmar_alloc_hwirq(IOMMU_IRQ_ID_OFFSET_PRQ + iommu->seq_id, iommu->node, iommu); 299 + if (irq <= 0) { 300 + pr_err("IOMMU: %s: Failed to create IRQ vector for page request queue\n", 301 + iommu->name); 302 + ret = -EINVAL; 303 + goto free_prq; 304 + } 305 + iommu->pr_irq = irq; 306 + 307 + snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name), 308 + "dmar%d-iopfq", iommu->seq_id); 309 + iopfq = iopf_queue_alloc(iommu->iopfq_name); 310 + if (!iopfq) { 311 + pr_err("IOMMU: %s: Failed to allocate iopf queue\n", iommu->name); 312 + ret = -ENOMEM; 313 + goto free_hwirq; 314 + } 315 + iommu->iopf_queue = iopfq; 316 + 317 + snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-prq", iommu->seq_id); 318 + 319 + ret = request_threaded_irq(irq, NULL, prq_event_thread, IRQF_ONESHOT, 320 + iommu->prq_name, iommu); 321 + if (ret) { 322 + pr_err("IOMMU: %s: Failed to request IRQ for page request queue\n", 323 + iommu->name); 324 + goto free_iopfq; 325 + } 326 + dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL); 327 + dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL); 328 + dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER); 329 + 330 + init_completion(&iommu->prq_complete); 331 + 332 + return 0; 333 + 334 + free_iopfq: 335 + iopf_queue_free(iommu->iopf_queue); 336 + iommu->iopf_queue = NULL; 337 + free_hwirq: 338 + dmar_free_hwirq(irq); 339 + iommu->pr_irq = 0; 340 + free_prq: 341 + iommu_free_pages(iommu->prq, PRQ_ORDER); 342 + iommu->prq = NULL; 343 + 344 + return ret; 345 + } 346 + 347 + int intel_iommu_finish_prq(struct intel_iommu *iommu) 348 + { 349 + dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL); 350 + dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL); 351 + dmar_writeq(iommu->reg + DMAR_PQA_REG, 0ULL); 352 + 353 + if (iommu->pr_irq) { 354 + free_irq(iommu->pr_irq, iommu); 355 + dmar_free_hwirq(iommu->pr_irq); 356 + iommu->pr_irq = 0; 357 + } 358 + 359 + if (iommu->iopf_queue) { 360 + iopf_queue_free(iommu->iopf_queue); 361 + iommu->iopf_queue = NULL; 362 + } 363 + 364 + iommu_free_pages(iommu->prq, PRQ_ORDER); 365 + iommu->prq = NULL; 366 + 367 + return 0; 368 + } 369 + 370 + void intel_iommu_page_response(struct device *dev, struct iopf_fault *evt, 371 + struct iommu_page_response *msg) 372 + { 373 + struct device_domain_info *info = dev_iommu_priv_get(dev); 374 + struct intel_iommu *iommu = info->iommu; 375 + u8 bus = info->bus, devfn = info->devfn; 376 + struct iommu_fault_page_request *prm; 377 + struct qi_desc desc; 378 + bool pasid_present; 379 + bool last_page; 380 + u16 sid; 381 + 382 + prm = &evt->fault.prm; 383 + sid = PCI_DEVID(bus, devfn); 384 + pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 385 + last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 386 + 387 + desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) | 388 + QI_PGRP_PASID_P(pasid_present) | 389 + QI_PGRP_RESP_CODE(msg->code) | 390 + QI_PGRP_RESP_TYPE; 391 + desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page); 392 + desc.qw2 = 0; 393 + desc.qw3 = 0; 394 + 395 + qi_submit_sync(iommu, &desc, 1, 0); 396 + }
+12 -421
drivers/iommu/intel/svm.c
··· 25 25 #include "../iommu-pages.h" 26 26 #include "trace.h" 27 27 28 - static irqreturn_t prq_event_thread(int irq, void *d); 29 - 30 - int intel_svm_enable_prq(struct intel_iommu *iommu) 31 - { 32 - struct iopf_queue *iopfq; 33 - int irq, ret; 34 - 35 - iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER); 36 - if (!iommu->prq) { 37 - pr_warn("IOMMU: %s: Failed to allocate page request queue\n", 38 - iommu->name); 39 - return -ENOMEM; 40 - } 41 - 42 - irq = dmar_alloc_hwirq(IOMMU_IRQ_ID_OFFSET_PRQ + iommu->seq_id, iommu->node, iommu); 43 - if (irq <= 0) { 44 - pr_err("IOMMU: %s: Failed to create IRQ vector for page request queue\n", 45 - iommu->name); 46 - ret = -EINVAL; 47 - goto free_prq; 48 - } 49 - iommu->pr_irq = irq; 50 - 51 - snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name), 52 - "dmar%d-iopfq", iommu->seq_id); 53 - iopfq = iopf_queue_alloc(iommu->iopfq_name); 54 - if (!iopfq) { 55 - pr_err("IOMMU: %s: Failed to allocate iopf queue\n", iommu->name); 56 - ret = -ENOMEM; 57 - goto free_hwirq; 58 - } 59 - iommu->iopf_queue = iopfq; 60 - 61 - snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-prq", iommu->seq_id); 62 - 63 - ret = request_threaded_irq(irq, NULL, prq_event_thread, IRQF_ONESHOT, 64 - iommu->prq_name, iommu); 65 - if (ret) { 66 - pr_err("IOMMU: %s: Failed to request IRQ for page request queue\n", 67 - iommu->name); 68 - goto free_iopfq; 69 - } 70 - dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL); 71 - dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL); 72 - dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER); 73 - 74 - init_completion(&iommu->prq_complete); 75 - 76 - return 0; 77 - 78 - free_iopfq: 79 - iopf_queue_free(iommu->iopf_queue); 80 - iommu->iopf_queue = NULL; 81 - free_hwirq: 82 - dmar_free_hwirq(irq); 83 - iommu->pr_irq = 0; 84 - free_prq: 85 - iommu_free_pages(iommu->prq, PRQ_ORDER); 86 - iommu->prq = NULL; 87 - 88 - return ret; 89 - } 90 - 91 - int intel_svm_finish_prq(struct intel_iommu *iommu) 92 - { 93 - dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL); 94 - dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL); 95 - dmar_writeq(iommu->reg + DMAR_PQA_REG, 0ULL); 96 - 97 - if (iommu->pr_irq) { 98 - free_irq(iommu->pr_irq, iommu); 99 - dmar_free_hwirq(iommu->pr_irq); 100 - iommu->pr_irq = 0; 101 - } 102 - 103 - if (iommu->iopf_queue) { 104 - iopf_queue_free(iommu->iopf_queue); 105 - iommu->iopf_queue = NULL; 106 - } 107 - 108 - iommu_free_pages(iommu->prq, PRQ_ORDER); 109 - iommu->prq = NULL; 110 - 111 - return 0; 112 - } 113 - 114 28 void intel_svm_check(struct intel_iommu *iommu) 115 29 { 116 30 if (!pasid_supported(iommu)) ··· 111 197 }; 112 198 113 199 static int intel_svm_set_dev_pasid(struct iommu_domain *domain, 114 - struct device *dev, ioasid_t pasid) 200 + struct device *dev, ioasid_t pasid, 201 + struct iommu_domain *old) 115 202 { 116 203 struct device_domain_info *info = dev_iommu_priv_get(dev); 117 - struct dmar_domain *dmar_domain = to_dmar_domain(domain); 118 204 struct intel_iommu *iommu = info->iommu; 119 205 struct mm_struct *mm = domain->mm; 120 206 struct dev_pasid_info *dev_pasid; 121 207 unsigned long sflags; 122 - unsigned long flags; 123 208 int ret = 0; 124 209 125 - dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); 126 - if (!dev_pasid) 127 - return -ENOMEM; 128 - 129 - dev_pasid->dev = dev; 130 - dev_pasid->pasid = pasid; 131 - 132 - ret = cache_tag_assign_domain(to_dmar_domain(domain), dev, pasid); 133 - if (ret) 134 - goto free_dev_pasid; 210 + dev_pasid = domain_add_dev_pasid(domain, dev, pasid); 211 + if (IS_ERR(dev_pasid)) 212 + return PTR_ERR(dev_pasid); 135 213 136 214 /* Setup the pasid table: */ 137 215 sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0; 138 - ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, pasid, 139 - FLPT_DEFAULT_DID, sflags); 216 + ret = __domain_setup_first_level(iommu, dev, pasid, 217 + FLPT_DEFAULT_DID, mm->pgd, 218 + sflags, old); 140 219 if (ret) 141 - goto unassign_tag; 220 + goto out_remove_dev_pasid; 142 221 143 - spin_lock_irqsave(&dmar_domain->lock, flags); 144 - list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); 145 - spin_unlock_irqrestore(&dmar_domain->lock, flags); 222 + domain_remove_dev_pasid(old, dev, pasid); 146 223 147 224 return 0; 148 225 149 - unassign_tag: 150 - cache_tag_unassign_domain(to_dmar_domain(domain), dev, pasid); 151 - free_dev_pasid: 152 - kfree(dev_pasid); 153 - 226 + out_remove_dev_pasid: 227 + domain_remove_dev_pasid(domain, dev, pasid); 154 228 return ret; 155 - } 156 - 157 - /* Page request queue descriptor */ 158 - struct page_req_dsc { 159 - union { 160 - struct { 161 - u64 type:8; 162 - u64 pasid_present:1; 163 - u64 rsvd:7; 164 - u64 rid:16; 165 - u64 pasid:20; 166 - u64 exe_req:1; 167 - u64 pm_req:1; 168 - u64 rsvd2:10; 169 - }; 170 - u64 qw_0; 171 - }; 172 - union { 173 - struct { 174 - u64 rd_req:1; 175 - u64 wr_req:1; 176 - u64 lpig:1; 177 - u64 prg_index:9; 178 - u64 addr:52; 179 - }; 180 - u64 qw_1; 181 - }; 182 - u64 qw_2; 183 - u64 qw_3; 184 - }; 185 - 186 - static bool is_canonical_address(u64 addr) 187 - { 188 - int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1); 189 - long saddr = (long) addr; 190 - 191 - return (((saddr << shift) >> shift) == saddr); 192 - } 193 - 194 - /** 195 - * intel_drain_pasid_prq - Drain page requests and responses for a pasid 196 - * @dev: target device 197 - * @pasid: pasid for draining 198 - * 199 - * Drain all pending page requests and responses related to @pasid in both 200 - * software and hardware. This is supposed to be called after the device 201 - * driver has stopped DMA, the pasid entry has been cleared, and both IOTLB 202 - * and DevTLB have been invalidated. 203 - * 204 - * It waits until all pending page requests for @pasid in the page fault 205 - * queue are completed by the prq handling thread. Then follow the steps 206 - * described in VT-d spec CH7.10 to drain all page requests and page 207 - * responses pending in the hardware. 208 - */ 209 - void intel_drain_pasid_prq(struct device *dev, u32 pasid) 210 - { 211 - struct device_domain_info *info; 212 - struct dmar_domain *domain; 213 - struct intel_iommu *iommu; 214 - struct qi_desc desc[3]; 215 - struct pci_dev *pdev; 216 - int head, tail; 217 - u16 sid, did; 218 - int qdep; 219 - 220 - info = dev_iommu_priv_get(dev); 221 - if (WARN_ON(!info || !dev_is_pci(dev))) 222 - return; 223 - 224 - if (!info->pri_enabled) 225 - return; 226 - 227 - iommu = info->iommu; 228 - domain = info->domain; 229 - pdev = to_pci_dev(dev); 230 - sid = PCI_DEVID(info->bus, info->devfn); 231 - did = domain ? domain_id_iommu(domain, iommu) : FLPT_DEFAULT_DID; 232 - qdep = pci_ats_queue_depth(pdev); 233 - 234 - /* 235 - * Check and wait until all pending page requests in the queue are 236 - * handled by the prq handling thread. 237 - */ 238 - prq_retry: 239 - reinit_completion(&iommu->prq_complete); 240 - tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK; 241 - head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK; 242 - while (head != tail) { 243 - struct page_req_dsc *req; 244 - 245 - req = &iommu->prq[head / sizeof(*req)]; 246 - if (!req->pasid_present || req->pasid != pasid) { 247 - head = (head + sizeof(*req)) & PRQ_RING_MASK; 248 - continue; 249 - } 250 - 251 - wait_for_completion(&iommu->prq_complete); 252 - goto prq_retry; 253 - } 254 - 255 - iopf_queue_flush_dev(dev); 256 - 257 - /* 258 - * Perform steps described in VT-d spec CH7.10 to drain page 259 - * requests and responses in hardware. 260 - */ 261 - memset(desc, 0, sizeof(desc)); 262 - desc[0].qw0 = QI_IWD_STATUS_DATA(QI_DONE) | 263 - QI_IWD_FENCE | 264 - QI_IWD_TYPE; 265 - desc[1].qw0 = QI_EIOTLB_PASID(pasid) | 266 - QI_EIOTLB_DID(did) | 267 - QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) | 268 - QI_EIOTLB_TYPE; 269 - desc[2].qw0 = QI_DEV_EIOTLB_PASID(pasid) | 270 - QI_DEV_EIOTLB_SID(sid) | 271 - QI_DEV_EIOTLB_QDEP(qdep) | 272 - QI_DEIOTLB_TYPE | 273 - QI_DEV_IOTLB_PFSID(info->pfsid); 274 - qi_retry: 275 - reinit_completion(&iommu->prq_complete); 276 - qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN); 277 - if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) { 278 - wait_for_completion(&iommu->prq_complete); 279 - goto qi_retry; 280 - } 281 - } 282 - 283 - static int prq_to_iommu_prot(struct page_req_dsc *req) 284 - { 285 - int prot = 0; 286 - 287 - if (req->rd_req) 288 - prot |= IOMMU_FAULT_PERM_READ; 289 - if (req->wr_req) 290 - prot |= IOMMU_FAULT_PERM_WRITE; 291 - if (req->exe_req) 292 - prot |= IOMMU_FAULT_PERM_EXEC; 293 - if (req->pm_req) 294 - prot |= IOMMU_FAULT_PERM_PRIV; 295 - 296 - return prot; 297 - } 298 - 299 - static void intel_svm_prq_report(struct intel_iommu *iommu, struct device *dev, 300 - struct page_req_dsc *desc) 301 - { 302 - struct iopf_fault event = { }; 303 - 304 - /* Fill in event data for device specific processing */ 305 - event.fault.type = IOMMU_FAULT_PAGE_REQ; 306 - event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT; 307 - event.fault.prm.pasid = desc->pasid; 308 - event.fault.prm.grpid = desc->prg_index; 309 - event.fault.prm.perm = prq_to_iommu_prot(desc); 310 - 311 - if (desc->lpig) 312 - event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 313 - if (desc->pasid_present) { 314 - event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 315 - event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID; 316 - } 317 - 318 - iommu_report_device_fault(dev, &event); 319 - } 320 - 321 - static void handle_bad_prq_event(struct intel_iommu *iommu, 322 - struct page_req_dsc *req, int result) 323 - { 324 - struct qi_desc desc = { }; 325 - 326 - pr_err("%s: Invalid page request: %08llx %08llx\n", 327 - iommu->name, ((unsigned long long *)req)[0], 328 - ((unsigned long long *)req)[1]); 329 - 330 - if (!req->lpig) 331 - return; 332 - 333 - desc.qw0 = QI_PGRP_PASID(req->pasid) | 334 - QI_PGRP_DID(req->rid) | 335 - QI_PGRP_PASID_P(req->pasid_present) | 336 - QI_PGRP_RESP_CODE(result) | 337 - QI_PGRP_RESP_TYPE; 338 - desc.qw1 = QI_PGRP_IDX(req->prg_index) | 339 - QI_PGRP_LPIG(req->lpig); 340 - 341 - qi_submit_sync(iommu, &desc, 1, 0); 342 - } 343 - 344 - static irqreturn_t prq_event_thread(int irq, void *d) 345 - { 346 - struct intel_iommu *iommu = d; 347 - struct page_req_dsc *req; 348 - int head, tail, handled; 349 - struct device *dev; 350 - u64 address; 351 - 352 - /* 353 - * Clear PPR bit before reading head/tail registers, to ensure that 354 - * we get a new interrupt if needed. 355 - */ 356 - writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG); 357 - 358 - tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK; 359 - head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK; 360 - handled = (head != tail); 361 - while (head != tail) { 362 - req = &iommu->prq[head / sizeof(*req)]; 363 - address = (u64)req->addr << VTD_PAGE_SHIFT; 364 - 365 - if (unlikely(!req->pasid_present)) { 366 - pr_err("IOMMU: %s: Page request without PASID\n", 367 - iommu->name); 368 - bad_req: 369 - handle_bad_prq_event(iommu, req, QI_RESP_INVALID); 370 - goto prq_advance; 371 - } 372 - 373 - if (unlikely(!is_canonical_address(address))) { 374 - pr_err("IOMMU: %s: Address is not canonical\n", 375 - iommu->name); 376 - goto bad_req; 377 - } 378 - 379 - if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) { 380 - pr_err("IOMMU: %s: Page request in Privilege Mode\n", 381 - iommu->name); 382 - goto bad_req; 383 - } 384 - 385 - if (unlikely(req->exe_req && req->rd_req)) { 386 - pr_err("IOMMU: %s: Execution request not supported\n", 387 - iommu->name); 388 - goto bad_req; 389 - } 390 - 391 - /* Drop Stop Marker message. No need for a response. */ 392 - if (unlikely(req->lpig && !req->rd_req && !req->wr_req)) 393 - goto prq_advance; 394 - 395 - /* 396 - * If prq is to be handled outside iommu driver via receiver of 397 - * the fault notifiers, we skip the page response here. 398 - */ 399 - mutex_lock(&iommu->iopf_lock); 400 - dev = device_rbtree_find(iommu, req->rid); 401 - if (!dev) { 402 - mutex_unlock(&iommu->iopf_lock); 403 - goto bad_req; 404 - } 405 - 406 - intel_svm_prq_report(iommu, dev, req); 407 - trace_prq_report(iommu, dev, req->qw_0, req->qw_1, 408 - req->qw_2, req->qw_3, 409 - iommu->prq_seq_number++); 410 - mutex_unlock(&iommu->iopf_lock); 411 - prq_advance: 412 - head = (head + sizeof(*req)) & PRQ_RING_MASK; 413 - } 414 - 415 - dmar_writeq(iommu->reg + DMAR_PQH_REG, tail); 416 - 417 - /* 418 - * Clear the page request overflow bit and wake up all threads that 419 - * are waiting for the completion of this handling. 420 - */ 421 - if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) { 422 - pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n", 423 - iommu->name); 424 - head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK; 425 - tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK; 426 - if (head == tail) { 427 - iopf_queue_discard_partial(iommu->iopf_queue); 428 - writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG); 429 - pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared", 430 - iommu->name); 431 - } 432 - } 433 - 434 - if (!completion_done(&iommu->prq_complete)) 435 - complete(&iommu->prq_complete); 436 - 437 - return IRQ_RETVAL(handled); 438 - } 439 - 440 - void intel_svm_page_response(struct device *dev, struct iopf_fault *evt, 441 - struct iommu_page_response *msg) 442 - { 443 - struct device_domain_info *info = dev_iommu_priv_get(dev); 444 - struct intel_iommu *iommu = info->iommu; 445 - u8 bus = info->bus, devfn = info->devfn; 446 - struct iommu_fault_page_request *prm; 447 - struct qi_desc desc; 448 - bool pasid_present; 449 - bool last_page; 450 - u16 sid; 451 - 452 - prm = &evt->fault.prm; 453 - sid = PCI_DEVID(bus, devfn); 454 - pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 455 - last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 456 - 457 - desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) | 458 - QI_PGRP_PASID_P(pasid_present) | 459 - QI_PGRP_RESP_CODE(msg->code) | 460 - QI_PGRP_RESP_TYPE; 461 - desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page); 462 - desc.qw2 = 0; 463 - desc.qw3 = 0; 464 - 465 - qi_submit_sync(iommu, &desc, 1, 0); 466 229 } 467 230 468 231 static void intel_svm_domain_free(struct iommu_domain *domain)
+6 -143
drivers/iommu/io-pgtable-arm-v7s.c
··· 166 166 167 167 arm_v7s_iopte *pgd; 168 168 struct kmem_cache *l2_tables; 169 - spinlock_t split_lock; 170 169 }; 171 170 172 171 static bool arm_v7s_pte_is_cont(arm_v7s_iopte pte, int lvl); ··· 362 363 return pte; 363 364 } 364 365 365 - static int arm_v7s_pte_to_prot(arm_v7s_iopte pte, int lvl) 366 - { 367 - int prot = IOMMU_READ; 368 - arm_v7s_iopte attr = pte >> ARM_V7S_ATTR_SHIFT(lvl); 369 - 370 - if (!(attr & ARM_V7S_PTE_AP_RDONLY)) 371 - prot |= IOMMU_WRITE; 372 - if (!(attr & ARM_V7S_PTE_AP_UNPRIV)) 373 - prot |= IOMMU_PRIV; 374 - if ((attr & (ARM_V7S_TEX_MASK << ARM_V7S_TEX_SHIFT)) == 0) 375 - prot |= IOMMU_MMIO; 376 - else if (pte & ARM_V7S_ATTR_C) 377 - prot |= IOMMU_CACHE; 378 - if (pte & ARM_V7S_ATTR_XN(lvl)) 379 - prot |= IOMMU_NOEXEC; 380 - 381 - return prot; 382 - } 383 - 384 366 static arm_v7s_iopte arm_v7s_pte_to_cont(arm_v7s_iopte pte, int lvl) 385 367 { 386 368 if (lvl == 1) { ··· 374 394 pte |= (xn << ARM_V7S_CONT_PAGE_XN_SHIFT) | 375 395 (tex << ARM_V7S_CONT_PAGE_TEX_SHIFT) | 376 396 ARM_V7S_PTE_TYPE_CONT_PAGE; 377 - } 378 - return pte; 379 - } 380 - 381 - static arm_v7s_iopte arm_v7s_cont_to_pte(arm_v7s_iopte pte, int lvl) 382 - { 383 - if (lvl == 1) { 384 - pte &= ~ARM_V7S_CONT_SECTION; 385 - } else if (lvl == 2) { 386 - arm_v7s_iopte xn = pte & BIT(ARM_V7S_CONT_PAGE_XN_SHIFT); 387 - arm_v7s_iopte tex = pte & (ARM_V7S_CONT_PAGE_TEX_MASK << 388 - ARM_V7S_CONT_PAGE_TEX_SHIFT); 389 - 390 - pte ^= xn | tex | ARM_V7S_PTE_TYPE_CONT_PAGE; 391 - pte |= (xn >> ARM_V7S_CONT_PAGE_XN_SHIFT) | 392 - (tex >> ARM_V7S_CONT_PAGE_TEX_SHIFT) | 393 - ARM_V7S_PTE_TYPE_PAGE; 394 397 } 395 398 return pte; 396 399 } ··· 554 591 kfree(data); 555 592 } 556 593 557 - static arm_v7s_iopte arm_v7s_split_cont(struct arm_v7s_io_pgtable *data, 558 - unsigned long iova, int idx, int lvl, 559 - arm_v7s_iopte *ptep) 560 - { 561 - struct io_pgtable *iop = &data->iop; 562 - arm_v7s_iopte pte; 563 - size_t size = ARM_V7S_BLOCK_SIZE(lvl); 564 - int i; 565 - 566 - /* Check that we didn't lose a race to get the lock */ 567 - pte = *ptep; 568 - if (!arm_v7s_pte_is_cont(pte, lvl)) 569 - return pte; 570 - 571 - ptep -= idx & (ARM_V7S_CONT_PAGES - 1); 572 - pte = arm_v7s_cont_to_pte(pte, lvl); 573 - for (i = 0; i < ARM_V7S_CONT_PAGES; i++) 574 - ptep[i] = pte + i * size; 575 - 576 - __arm_v7s_pte_sync(ptep, ARM_V7S_CONT_PAGES, &iop->cfg); 577 - 578 - size *= ARM_V7S_CONT_PAGES; 579 - io_pgtable_tlb_flush_walk(iop, iova, size, size); 580 - return pte; 581 - } 582 - 583 - static size_t arm_v7s_split_blk_unmap(struct arm_v7s_io_pgtable *data, 584 - struct iommu_iotlb_gather *gather, 585 - unsigned long iova, size_t size, 586 - arm_v7s_iopte blk_pte, 587 - arm_v7s_iopte *ptep) 588 - { 589 - struct io_pgtable_cfg *cfg = &data->iop.cfg; 590 - arm_v7s_iopte pte, *tablep; 591 - int i, unmap_idx, num_entries, num_ptes; 592 - 593 - tablep = __arm_v7s_alloc_table(2, GFP_ATOMIC, data); 594 - if (!tablep) 595 - return 0; /* Bytes unmapped */ 596 - 597 - num_ptes = ARM_V7S_PTES_PER_LVL(2, cfg); 598 - num_entries = size >> ARM_V7S_LVL_SHIFT(2); 599 - unmap_idx = ARM_V7S_LVL_IDX(iova, 2, cfg); 600 - 601 - pte = arm_v7s_prot_to_pte(arm_v7s_pte_to_prot(blk_pte, 1), 2, cfg); 602 - if (num_entries > 1) 603 - pte = arm_v7s_pte_to_cont(pte, 2); 604 - 605 - for (i = 0; i < num_ptes; i += num_entries, pte += size) { 606 - /* Unmap! */ 607 - if (i == unmap_idx) 608 - continue; 609 - 610 - __arm_v7s_set_pte(&tablep[i], pte, num_entries, cfg); 611 - } 612 - 613 - pte = arm_v7s_install_table(tablep, ptep, blk_pte, cfg); 614 - if (pte != blk_pte) { 615 - __arm_v7s_free_table(tablep, 2, data); 616 - 617 - if (!ARM_V7S_PTE_IS_TABLE(pte, 1)) 618 - return 0; 619 - 620 - tablep = iopte_deref(pte, 1, data); 621 - return __arm_v7s_unmap(data, gather, iova, size, 2, tablep); 622 - } 623 - 624 - io_pgtable_tlb_add_page(&data->iop, gather, iova, size); 625 - return size; 626 - } 627 - 628 594 static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, 629 595 struct iommu_iotlb_gather *gather, 630 596 unsigned long iova, size_t size, int lvl, ··· 586 694 * case in a lock for the sake of correctness and be done with it. 587 695 */ 588 696 if (num_entries <= 1 && arm_v7s_pte_is_cont(pte[0], lvl)) { 589 - unsigned long flags; 590 - 591 - spin_lock_irqsave(&data->split_lock, flags); 592 - pte[0] = arm_v7s_split_cont(data, iova, idx, lvl, ptep); 593 - spin_unlock_irqrestore(&data->split_lock, flags); 697 + WARN_ONCE(true, "Unmap of a partial large IOPTE is not allowed"); 698 + return 0; 594 699 } 595 700 596 701 /* If the size matches this level, we're in the right place */ ··· 610 721 } 611 722 return size; 612 723 } else if (lvl == 1 && !ARM_V7S_PTE_IS_TABLE(pte[0], lvl)) { 613 - /* 614 - * Insert a table at the next level to map the old region, 615 - * minus the part we want to unmap 616 - */ 617 - return arm_v7s_split_blk_unmap(data, gather, iova, size, pte[0], 618 - ptep); 724 + WARN_ONCE(true, "Unmap of a partial large IOPTE is not allowed"); 725 + return 0; 619 726 } 620 727 621 728 /* Keep on walkin' */ ··· 695 810 data = kmalloc(sizeof(*data), GFP_KERNEL); 696 811 if (!data) 697 812 return NULL; 698 - 699 - spin_lock_init(&data->split_lock); 700 813 701 814 /* 702 815 * ARM_MTK_TTBR_EXT extend the translation table base support larger ··· 819 936 .quirks = IO_PGTABLE_QUIRK_ARM_NS, 820 937 .pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M, 821 938 }; 822 - unsigned int iova, size, iova_start; 823 - unsigned int i, loopnr = 0; 939 + unsigned int iova, size; 940 + unsigned int i; 824 941 size_t mapped; 825 942 826 943 selftest_running = true; ··· 868 985 return __FAIL(ops); 869 986 870 987 iova += SZ_16M; 871 - loopnr++; 872 - } 873 - 874 - /* Partial unmap */ 875 - i = 1; 876 - size = 1UL << __ffs(cfg.pgsize_bitmap); 877 - while (i < loopnr) { 878 - iova_start = i * SZ_16M; 879 - if (ops->unmap_pages(ops, iova_start + size, size, 1, NULL) != size) 880 - return __FAIL(ops); 881 - 882 - /* Remap of partial unmap */ 883 - if (ops->map_pages(ops, iova_start + size, size, size, 1, 884 - IOMMU_READ, GFP_KERNEL, &mapped)) 885 - return __FAIL(ops); 886 - 887 - if (ops->iova_to_phys(ops, iova_start + size + 42) 888 - != (size + 42)) 889 - return __FAIL(ops); 890 - i++; 891 988 } 892 989 893 990 /* Full unmap */
+33 -81
drivers/iommu/io-pgtable-arm.c
··· 211 211 return (paddr | (paddr << (48 - 12))) & (ARM_LPAE_PTE_ADDR_MASK << 4); 212 212 } 213 213 214 + /* 215 + * Convert an index returned by ARM_LPAE_PGD_IDX(), which can point into 216 + * a concatenated PGD, into the maximum number of entries that can be 217 + * mapped in the same table page. 218 + */ 219 + static inline int arm_lpae_max_entries(int i, struct arm_lpae_io_pgtable *data) 220 + { 221 + int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data); 222 + 223 + return ptes_per_table - (i & (ptes_per_table - 1)); 224 + } 225 + 214 226 static bool selftest_running = false; 215 227 216 228 static dma_addr_t __arm_lpae_dma_addr(void *pages) ··· 414 402 415 403 /* If we can install a leaf entry at this level, then do so */ 416 404 if (size == block_size) { 417 - max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start; 405 + max_entries = arm_lpae_max_entries(map_idx_start, data); 418 406 num_entries = min_t(int, pgcount, max_entries); 419 407 ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep); 420 408 if (!ret) ··· 597 585 kfree(data); 598 586 } 599 587 600 - static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, 601 - struct iommu_iotlb_gather *gather, 602 - unsigned long iova, size_t size, 603 - arm_lpae_iopte blk_pte, int lvl, 604 - arm_lpae_iopte *ptep, size_t pgcount) 605 - { 606 - struct io_pgtable_cfg *cfg = &data->iop.cfg; 607 - arm_lpae_iopte pte, *tablep; 608 - phys_addr_t blk_paddr; 609 - size_t tablesz = ARM_LPAE_GRANULE(data); 610 - size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data); 611 - int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data); 612 - int i, unmap_idx_start = -1, num_entries = 0, max_entries; 613 - 614 - if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) 615 - return 0; 616 - 617 - tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg, data->iop.cookie); 618 - if (!tablep) 619 - return 0; /* Bytes unmapped */ 620 - 621 - if (size == split_sz) { 622 - unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data); 623 - max_entries = ptes_per_table - unmap_idx_start; 624 - num_entries = min_t(int, pgcount, max_entries); 625 - } 626 - 627 - blk_paddr = iopte_to_paddr(blk_pte, data); 628 - pte = iopte_prot(blk_pte); 629 - 630 - for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) { 631 - /* Unmap! */ 632 - if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries)) 633 - continue; 634 - 635 - __arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, &tablep[i]); 636 - } 637 - 638 - pte = arm_lpae_install_table(tablep, ptep, blk_pte, data); 639 - if (pte != blk_pte) { 640 - __arm_lpae_free_pages(tablep, tablesz, cfg, data->iop.cookie); 641 - /* 642 - * We may race against someone unmapping another part of this 643 - * block, but anything else is invalid. We can't misinterpret 644 - * a page entry here since we're never at the last level. 645 - */ 646 - if (iopte_type(pte) != ARM_LPAE_PTE_TYPE_TABLE) 647 - return 0; 648 - 649 - tablep = iopte_deref(pte, data); 650 - } else if (unmap_idx_start >= 0) { 651 - for (i = 0; i < num_entries; i++) 652 - io_pgtable_tlb_add_page(&data->iop, gather, iova + i * size, size); 653 - 654 - return num_entries * size; 655 - } 656 - 657 - return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep); 658 - } 659 - 660 588 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, 661 589 struct iommu_iotlb_gather *gather, 662 590 unsigned long iova, size_t size, size_t pgcount, ··· 618 666 619 667 /* If the size matches this level, we're in the right place */ 620 668 if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { 621 - max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start; 669 + max_entries = arm_lpae_max_entries(unmap_idx_start, data); 622 670 num_entries = min_t(int, pgcount, max_entries); 623 671 624 672 /* Find and handle non-leaf entries */ ··· 646 694 647 695 return i * size; 648 696 } else if (iopte_leaf(pte, lvl, iop->fmt)) { 649 - /* 650 - * Insert a table at the next level to map the old region, 651 - * minus the part we want to unmap 652 - */ 653 - return arm_lpae_split_blk_unmap(data, gather, iova, size, pte, 654 - lvl + 1, ptep, pgcount); 697 + WARN_ONCE(true, "Unmap of a partial large IOPTE is not allowed"); 698 + return 0; 655 699 } 656 700 657 701 /* Keep on walkin' */ ··· 1310 1362 iova += SZ_1G; 1311 1363 } 1312 1364 1313 - /* Partial unmap */ 1314 - size = 1UL << __ffs(cfg->pgsize_bitmap); 1315 - if (ops->unmap_pages(ops, SZ_1G + size, size, 1, NULL) != size) 1316 - return __FAIL(ops, i); 1317 - 1318 - /* Remap of partial unmap */ 1319 - if (ops->map_pages(ops, SZ_1G + size, size, size, 1, 1320 - IOMMU_READ, GFP_KERNEL, &mapped)) 1321 - return __FAIL(ops, i); 1322 - 1323 - if (ops->iova_to_phys(ops, SZ_1G + size + 42) != (size + 42)) 1324 - return __FAIL(ops, i); 1325 - 1326 1365 /* Full unmap */ 1327 1366 iova = 0; 1328 1367 for_each_set_bit(j, &cfg->pgsize_bitmap, BITS_PER_LONG) { ··· 1331 1396 1332 1397 iova += SZ_1G; 1333 1398 } 1399 + 1400 + /* 1401 + * Map/unmap the last largest supported page of the IAS, this can 1402 + * trigger corner cases in the concatednated page tables. 1403 + */ 1404 + mapped = 0; 1405 + size = 1UL << __fls(cfg->pgsize_bitmap); 1406 + iova = (1UL << cfg->ias) - size; 1407 + if (ops->map_pages(ops, iova, iova, size, 1, 1408 + IOMMU_READ | IOMMU_WRITE | 1409 + IOMMU_NOEXEC | IOMMU_CACHE, 1410 + GFP_KERNEL, &mapped)) 1411 + return __FAIL(ops, i); 1412 + if (mapped != size) 1413 + return __FAIL(ops, i); 1414 + if (ops->unmap_pages(ops, iova, size, 1, NULL) != size) 1415 + return __FAIL(ops, i); 1334 1416 1335 1417 free_io_pgtable_ops(ops); 1336 1418 }
+1 -1
drivers/iommu/iommu-sysfs.c
··· 34 34 kfree(dev); 35 35 } 36 36 37 - static struct class iommu_class = { 37 + static const struct class iommu_class = { 38 38 .name = "iommu", 39 39 .dev_release = release_device, 40 40 .dev_groups = dev_groups,
+136 -137
drivers/iommu/iommu.c
··· 32 32 #include <trace/events/iommu.h> 33 33 #include <linux/sched/mm.h> 34 34 #include <linux/msi.h> 35 + #include <uapi/linux/iommufd.h> 35 36 36 37 #include "dma-iommu.h" 37 38 #include "iommu-priv.h" ··· 91 90 #define IOMMU_CMD_LINE_DMA_API BIT(0) 92 91 #define IOMMU_CMD_LINE_STRICT BIT(1) 93 92 93 + static int bus_iommu_probe(const struct bus_type *bus); 94 94 static int iommu_bus_notifier(struct notifier_block *nb, 95 95 unsigned long action, void *data); 96 96 static void iommu_release_device(struct device *dev); 97 - static struct iommu_domain * 98 - __iommu_group_domain_alloc(struct iommu_group *group, unsigned int type); 99 97 static int __iommu_attach_device(struct iommu_domain *domain, 100 98 struct device *dev); 101 99 static int __iommu_attach_group(struct iommu_domain *domain, 102 100 struct iommu_group *group); 101 + static struct iommu_domain *__iommu_paging_domain_alloc_flags(struct device *dev, 102 + unsigned int type, 103 + unsigned int flags); 103 104 104 105 enum { 105 106 IOMMU_SET_DOMAIN_MUST_SUCCEED = 1 << 0, ··· 136 133 struct device *dev); 137 134 static void __iommu_group_free_device(struct iommu_group *group, 138 135 struct group_device *grp_dev); 136 + static void iommu_domain_init(struct iommu_domain *domain, unsigned int type, 137 + const struct iommu_ops *ops); 139 138 140 139 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store) \ 141 140 struct iommu_group_attribute iommu_group_attr_##_name = \ ··· 1146 1141 } 1147 1142 1148 1143 } 1149 - 1150 - if (!list_empty(&mappings) && iommu_is_dma_domain(domain)) 1151 - iommu_flush_iotlb_all(domain); 1152 - 1153 1144 out: 1154 1145 iommu_put_resv_regions(dev, &mappings); 1155 1146 ··· 1587 1586 } 1588 1587 EXPORT_SYMBOL_GPL(fsl_mc_device_group); 1589 1588 1589 + static struct iommu_domain *__iommu_alloc_identity_domain(struct device *dev) 1590 + { 1591 + const struct iommu_ops *ops = dev_iommu_ops(dev); 1592 + struct iommu_domain *domain; 1593 + 1594 + if (ops->identity_domain) 1595 + return ops->identity_domain; 1596 + 1597 + /* Older drivers create the identity domain via ops->domain_alloc() */ 1598 + if (!ops->domain_alloc) 1599 + return ERR_PTR(-EOPNOTSUPP); 1600 + 1601 + domain = ops->domain_alloc(IOMMU_DOMAIN_IDENTITY); 1602 + if (IS_ERR(domain)) 1603 + return domain; 1604 + if (!domain) 1605 + return ERR_PTR(-ENOMEM); 1606 + 1607 + iommu_domain_init(domain, IOMMU_DOMAIN_IDENTITY, ops); 1608 + return domain; 1609 + } 1610 + 1590 1611 static struct iommu_domain * 1591 1612 __iommu_group_alloc_default_domain(struct iommu_group *group, int req_type) 1592 1613 { 1614 + struct device *dev = iommu_group_first_dev(group); 1615 + struct iommu_domain *dom; 1616 + 1593 1617 if (group->default_domain && group->default_domain->type == req_type) 1594 1618 return group->default_domain; 1595 - return __iommu_group_domain_alloc(group, req_type); 1619 + 1620 + /* 1621 + * When allocating the DMA API domain assume that the driver is going to 1622 + * use PASID and make sure the RID's domain is PASID compatible. 1623 + */ 1624 + if (req_type & __IOMMU_DOMAIN_PAGING) { 1625 + dom = __iommu_paging_domain_alloc_flags(dev, req_type, 1626 + dev->iommu->max_pasids ? IOMMU_HWPT_ALLOC_PASID : 0); 1627 + 1628 + /* 1629 + * If driver does not support PASID feature then 1630 + * try to allocate non-PASID domain 1631 + */ 1632 + if (PTR_ERR(dom) == -EOPNOTSUPP) 1633 + dom = __iommu_paging_domain_alloc_flags(dev, req_type, 0); 1634 + 1635 + return dom; 1636 + } 1637 + 1638 + if (req_type == IOMMU_DOMAIN_IDENTITY) 1639 + return __iommu_alloc_identity_domain(dev); 1640 + 1641 + return ERR_PTR(-EINVAL); 1596 1642 } 1597 1643 1598 1644 /* ··· 1843 1795 ops->probe_finalize(dev); 1844 1796 } 1845 1797 1846 - int bus_iommu_probe(const struct bus_type *bus) 1798 + static int bus_iommu_probe(const struct bus_type *bus) 1847 1799 { 1848 1800 struct iommu_group *group, *next; 1849 1801 LIST_HEAD(group_list); ··· 1887 1839 1888 1840 return 0; 1889 1841 } 1890 - 1891 - /** 1892 - * iommu_present() - make platform-specific assumptions about an IOMMU 1893 - * @bus: bus to check 1894 - * 1895 - * Do not use this function. You want device_iommu_mapped() instead. 1896 - * 1897 - * Return: true if some IOMMU is present and aware of devices on the given bus; 1898 - * in general it may not be the only IOMMU, and it may not have anything to do 1899 - * with whatever device you are ultimately interested in. 1900 - */ 1901 - bool iommu_present(const struct bus_type *bus) 1902 - { 1903 - bool ret = false; 1904 - 1905 - for (int i = 0; i < ARRAY_SIZE(iommu_buses); i++) { 1906 - if (iommu_buses[i] == bus) { 1907 - spin_lock(&iommu_device_lock); 1908 - ret = !list_empty(&iommu_device_list); 1909 - spin_unlock(&iommu_device_lock); 1910 - } 1911 - } 1912 - return ret; 1913 - } 1914 - EXPORT_SYMBOL_GPL(iommu_present); 1915 1842 1916 1843 /** 1917 1844 * device_iommu_capable() - check for a general IOMMU capability ··· 1957 1934 } 1958 1935 EXPORT_SYMBOL_GPL(iommu_set_fault_handler); 1959 1936 1960 - static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops, 1961 - struct device *dev, 1962 - unsigned int type) 1937 + static void iommu_domain_init(struct iommu_domain *domain, unsigned int type, 1938 + const struct iommu_ops *ops) 1963 1939 { 1964 - struct iommu_domain *domain; 1965 - unsigned int alloc_type = type & IOMMU_DOMAIN_ALLOC_FLAGS; 1966 - 1967 - if (alloc_type == IOMMU_DOMAIN_IDENTITY && ops->identity_domain) 1968 - return ops->identity_domain; 1969 - else if (alloc_type == IOMMU_DOMAIN_BLOCKED && ops->blocked_domain) 1970 - return ops->blocked_domain; 1971 - else if (type & __IOMMU_DOMAIN_PAGING && ops->domain_alloc_paging) 1972 - domain = ops->domain_alloc_paging(dev); 1973 - else if (ops->domain_alloc) 1974 - domain = ops->domain_alloc(alloc_type); 1975 - else 1976 - return ERR_PTR(-EOPNOTSUPP); 1977 - 1978 - /* 1979 - * Many domain_alloc ops now return ERR_PTR, make things easier for the 1980 - * driver by accepting ERR_PTR from all domain_alloc ops instead of 1981 - * having two rules. 1982 - */ 1983 - if (IS_ERR(domain)) 1984 - return domain; 1985 - if (!domain) 1986 - return ERR_PTR(-ENOMEM); 1987 - 1988 1940 domain->type = type; 1989 1941 domain->owner = ops; 1942 + if (!domain->ops) 1943 + domain->ops = ops->default_domain_ops; 1944 + 1990 1945 /* 1991 1946 * If not already set, assume all sizes by default; the driver 1992 1947 * may override this later 1993 1948 */ 1994 1949 if (!domain->pgsize_bitmap) 1995 1950 domain->pgsize_bitmap = ops->pgsize_bitmap; 1996 - 1997 - if (!domain->ops) 1998 - domain->ops = ops->default_domain_ops; 1999 - 2000 - if (iommu_is_dma_domain(domain)) { 2001 - int rc; 2002 - 2003 - rc = iommu_get_dma_cookie(domain); 2004 - if (rc) { 2005 - iommu_domain_free(domain); 2006 - return ERR_PTR(rc); 2007 - } 2008 - } 2009 - return domain; 2010 1951 } 2011 1952 2012 1953 static struct iommu_domain * 2013 - __iommu_group_domain_alloc(struct iommu_group *group, unsigned int type) 1954 + __iommu_paging_domain_alloc_flags(struct device *dev, unsigned int type, 1955 + unsigned int flags) 2014 1956 { 2015 - struct device *dev = iommu_group_first_dev(group); 2016 - 2017 - return __iommu_domain_alloc(dev_iommu_ops(dev), dev, type); 2018 - } 2019 - 2020 - static int __iommu_domain_alloc_dev(struct device *dev, void *data) 2021 - { 2022 - const struct iommu_ops **ops = data; 2023 - 2024 - if (!dev_has_iommu(dev)) 2025 - return 0; 2026 - 2027 - if (WARN_ONCE(*ops && *ops != dev_iommu_ops(dev), 2028 - "Multiple IOMMU drivers present for bus %s, which the public IOMMU API can't fully support yet. You will still need to disable one or more for this to work, sorry!\n", 2029 - dev_bus_name(dev))) 2030 - return -EBUSY; 2031 - 2032 - *ops = dev_iommu_ops(dev); 2033 - return 0; 2034 - } 2035 - 2036 - /* 2037 - * The iommu ops in bus has been retired. Do not use this interface in 2038 - * new drivers. 2039 - */ 2040 - struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus) 2041 - { 2042 - const struct iommu_ops *ops = NULL; 2043 - int err = bus_for_each_dev(bus, NULL, &ops, __iommu_domain_alloc_dev); 1957 + const struct iommu_ops *ops; 2044 1958 struct iommu_domain *domain; 2045 1959 2046 - if (err || !ops) 2047 - return NULL; 2048 - 2049 - domain = __iommu_domain_alloc(ops, NULL, IOMMU_DOMAIN_UNMANAGED); 2050 - if (IS_ERR(domain)) 2051 - return NULL; 2052 - return domain; 2053 - } 2054 - EXPORT_SYMBOL_GPL(iommu_domain_alloc); 2055 - 2056 - /** 2057 - * iommu_paging_domain_alloc() - Allocate a paging domain 2058 - * @dev: device for which the domain is allocated 2059 - * 2060 - * Allocate a paging domain which will be managed by a kernel driver. Return 2061 - * allocated domain if successful, or a ERR pointer for failure. 2062 - */ 2063 - struct iommu_domain *iommu_paging_domain_alloc(struct device *dev) 2064 - { 2065 1960 if (!dev_has_iommu(dev)) 2066 1961 return ERR_PTR(-ENODEV); 2067 1962 2068 - return __iommu_domain_alloc(dev_iommu_ops(dev), dev, IOMMU_DOMAIN_UNMANAGED); 1963 + ops = dev_iommu_ops(dev); 1964 + 1965 + if (ops->domain_alloc_paging && !flags) 1966 + domain = ops->domain_alloc_paging(dev); 1967 + else if (ops->domain_alloc_user) 1968 + domain = ops->domain_alloc_user(dev, flags, NULL, NULL); 1969 + else if (ops->domain_alloc && !flags) 1970 + domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED); 1971 + else 1972 + return ERR_PTR(-EOPNOTSUPP); 1973 + 1974 + if (IS_ERR(domain)) 1975 + return domain; 1976 + if (!domain) 1977 + return ERR_PTR(-ENOMEM); 1978 + 1979 + iommu_domain_init(domain, type, ops); 1980 + return domain; 2069 1981 } 2070 - EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc); 1982 + 1983 + /** 1984 + * iommu_paging_domain_alloc_flags() - Allocate a paging domain 1985 + * @dev: device for which the domain is allocated 1986 + * @flags: Bitmap of iommufd_hwpt_alloc_flags 1987 + * 1988 + * Allocate a paging domain which will be managed by a kernel driver. Return 1989 + * allocated domain if successful, or an ERR pointer for failure. 1990 + */ 1991 + struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev, 1992 + unsigned int flags) 1993 + { 1994 + return __iommu_paging_domain_alloc_flags(dev, 1995 + IOMMU_DOMAIN_UNMANAGED, flags); 1996 + } 1997 + EXPORT_SYMBOL_GPL(iommu_paging_domain_alloc_flags); 2071 1998 2072 1999 void iommu_domain_free(struct iommu_domain *domain) 2073 2000 { ··· 2189 2216 2190 2217 /** 2191 2218 * iommu_group_replace_domain - replace the domain that a group is attached to 2192 - * @new_domain: new IOMMU domain to replace with 2193 2219 * @group: IOMMU group that will be attached to the new domain 2220 + * @new_domain: new IOMMU domain to replace with 2194 2221 * 2195 2222 * This API allows the group to switch domains without being forced to go to 2196 2223 * the blocking domain in-between. ··· 2559 2586 return unmapped; 2560 2587 } 2561 2588 2589 + /** 2590 + * iommu_unmap() - Remove mappings from a range of IOVA 2591 + * @domain: Domain to manipulate 2592 + * @iova: IO virtual address to start 2593 + * @size: Length of the range starting from @iova 2594 + * 2595 + * iommu_unmap() will remove a translation created by iommu_map(). It cannot 2596 + * subdivide a mapping created by iommu_map(), so it should be called with IOVA 2597 + * ranges that match what was passed to iommu_map(). The range can aggregate 2598 + * contiguous iommu_map() calls so long as no individual range is split. 2599 + * 2600 + * Returns: Number of bytes of IOVA unmapped. iova + res will be the point 2601 + * unmapping stopped. 2602 + */ 2562 2603 size_t iommu_unmap(struct iommu_domain *domain, 2563 2604 unsigned long iova, size_t size) 2564 2605 { ··· 2942 2955 if (group->default_domain == dom) 2943 2956 return 0; 2944 2957 2958 + if (iommu_is_dma_domain(dom)) { 2959 + ret = iommu_get_dma_cookie(dom); 2960 + if (ret) { 2961 + iommu_domain_free(dom); 2962 + return ret; 2963 + } 2964 + } 2965 + 2945 2966 /* 2946 2967 * IOMMU_RESV_DIRECT and IOMMU_RESV_DIRECT_RELAXABLE regions must be 2947 2968 * mapped before their device is attached, in order to guarantee ··· 3137 3142 3138 3143 static int __iommu_group_alloc_blocking_domain(struct iommu_group *group) 3139 3144 { 3145 + struct device *dev = iommu_group_first_dev(group); 3146 + const struct iommu_ops *ops = dev_iommu_ops(dev); 3140 3147 struct iommu_domain *domain; 3141 3148 3142 3149 if (group->blocking_domain) 3143 3150 return 0; 3144 3151 3145 - domain = __iommu_group_domain_alloc(group, IOMMU_DOMAIN_BLOCKED); 3146 - if (IS_ERR(domain)) { 3147 - /* 3148 - * For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED 3149 - * create an empty domain instead. 3150 - */ 3151 - domain = __iommu_group_domain_alloc(group, 3152 - IOMMU_DOMAIN_UNMANAGED); 3153 - if (IS_ERR(domain)) 3154 - return PTR_ERR(domain); 3152 + if (ops->blocked_domain) { 3153 + group->blocking_domain = ops->blocked_domain; 3154 + return 0; 3155 3155 } 3156 + 3157 + /* 3158 + * For drivers that do not yet understand IOMMU_DOMAIN_BLOCKED create an 3159 + * empty PAGING domain instead. 3160 + */ 3161 + domain = iommu_paging_domain_alloc(dev); 3162 + if (IS_ERR(domain)) 3163 + return PTR_ERR(domain); 3156 3164 group->blocking_domain = domain; 3157 3165 return 0; 3158 3166 } ··· 3319 3321 int ret; 3320 3322 3321 3323 for_each_group_device(group, device) { 3322 - ret = domain->ops->set_dev_pasid(domain, device->dev, pasid); 3324 + ret = domain->ops->set_dev_pasid(domain, device->dev, 3325 + pasid, NULL); 3323 3326 if (ret) 3324 3327 goto err_revert; 3325 3328 }
+2 -1
drivers/iommu/iommufd/hw_pagetable.c
··· 110 110 const struct iommu_user_data *user_data) 111 111 { 112 112 const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | 113 - IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 113 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING | 114 + IOMMU_HWPT_FAULT_ID_VALID; 114 115 const struct iommu_ops *ops = dev_iommu_ops(idev->dev); 115 116 struct iommufd_hwpt_paging *hwpt_paging; 116 117 struct iommufd_hw_pagetable *hwpt;
+1 -1
drivers/iommu/iova.c
··· 506 506 * reserve_iova - reserves an iova in the given range 507 507 * @iovad: - iova domain pointer 508 508 * @pfn_lo: - lower page frame address 509 - * @pfn_hi:- higher pfn adderss 509 + * @pfn_hi:- higher pfn address 510 510 * This function allocates reserves the address range from pfn_lo to pfn_hi so 511 511 * that this address is not dished out as part of alloc_iova. 512 512 */
+1 -1
drivers/iommu/mtk_iommu.c
··· 1599 1599 static const struct mtk_iommu_plat_data mt8186_data_mm = { 1600 1600 .m4u_plat = M4U_MT8186, 1601 1601 .flags = HAS_BCLK | HAS_SUB_COMM_2BITS | OUT_ORDER_WR_EN | 1602 - WR_THROT_EN | IOVA_34_EN | MTK_IOMMU_TYPE_MM, 1602 + WR_THROT_EN | IOVA_34_EN | MTK_IOMMU_TYPE_MM | PGTABLE_PA_35_EN, 1603 1603 .larbid_remap = {{0}, {1, MTK_INVALID_LARBID, 8}, {4}, {7}, {2}, {9, 11, 19, 20}, 1604 1604 {MTK_INVALID_LARBID, 14, 16}, 1605 1605 {MTK_INVALID_LARBID, 13, MTK_INVALID_LARBID, 17}},
+16 -10
drivers/iommu/omap-iommu.c
··· 1230 1230 if (err) 1231 1231 return err; 1232 1232 1233 - err = iommu_device_register(&obj->iommu, &omap_iommu_ops, &pdev->dev); 1234 - if (err) 1235 - goto out_sysfs; 1236 1233 obj->has_iommu_driver = true; 1237 1234 } 1235 + 1236 + err = iommu_device_register(&obj->iommu, &omap_iommu_ops, &pdev->dev); 1237 + if (err) 1238 + goto out_sysfs; 1238 1239 1239 1240 pm_runtime_enable(obj->dev); 1240 1241 ··· 1243 1242 1244 1243 dev_info(&pdev->dev, "%s registered\n", obj->name); 1245 1244 1246 - /* Re-probe bus to probe device attached to this IOMMU */ 1247 - bus_iommu_probe(&platform_bus_type); 1248 - 1249 1245 return 0; 1250 1246 1251 1247 out_sysfs: 1252 - iommu_device_sysfs_remove(&obj->iommu); 1248 + if (obj->has_iommu_driver) 1249 + iommu_device_sysfs_remove(&obj->iommu); 1253 1250 return err; 1254 1251 } 1255 1252 ··· 1255 1256 { 1256 1257 struct omap_iommu *obj = platform_get_drvdata(pdev); 1257 1258 1258 - if (obj->has_iommu_driver) { 1259 + if (obj->has_iommu_driver) 1259 1260 iommu_device_sysfs_remove(&obj->iommu); 1260 - iommu_device_unregister(&obj->iommu); 1261 - } 1261 + 1262 + iommu_device_unregister(&obj->iommu); 1262 1263 1263 1264 omap_iommu_debugfs_remove(obj); 1264 1265 ··· 1722 1723 1723 1724 } 1724 1725 1726 + static int omap_iommu_of_xlate(struct device *dev, const struct of_phandle_args *args) 1727 + { 1728 + /* TODO: collect args->np to save re-parsing in probe above */ 1729 + return 0; 1730 + } 1731 + 1725 1732 static const struct iommu_ops omap_iommu_ops = { 1726 1733 .identity_domain = &omap_iommu_identity_domain, 1727 1734 .domain_alloc_paging = omap_iommu_domain_alloc_paging, 1728 1735 .probe_device = omap_iommu_probe_device, 1729 1736 .release_device = omap_iommu_release_device, 1730 1737 .device_group = generic_single_device_group, 1738 + .of_xlate = omap_iommu_of_xlate, 1731 1739 .pgsize_bitmap = OMAP_IOMMU_PGSIZES, 1732 1740 .default_domain_ops = &(const struct iommu_domain_ops) { 1733 1741 .attach_dev = omap_iommu_attach_dev,
+20
drivers/iommu/riscv/Kconfig
··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + # RISC-V IOMMU support 3 + 4 + config RISCV_IOMMU 5 + bool "RISC-V IOMMU Support" 6 + depends on RISCV && 64BIT 7 + default y 8 + select IOMMU_API 9 + help 10 + Support for implementations of the RISC-V IOMMU architecture that 11 + complements the RISC-V MMU capabilities, providing similar address 12 + translation and protection functions for accesses from I/O devices. 13 + 14 + Say Y here if your SoC includes an IOMMU device implementing 15 + the RISC-V IOMMU architecture. 16 + 17 + config RISCV_IOMMU_PCI 18 + def_bool y if RISCV_IOMMU && PCI_MSI 19 + help 20 + Support for the PCIe implementation of RISC-V IOMMU architecture.
+3
drivers/iommu/riscv/Makefile
··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + obj-$(CONFIG_RISCV_IOMMU) += iommu.o iommu-platform.o 3 + obj-$(CONFIG_RISCV_IOMMU_PCI) += iommu-pci.o
+784
drivers/iommu/riscv/iommu-bits.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright © 2022-2024 Rivos Inc. 4 + * Copyright © 2023 FORTH-ICS/CARV 5 + * Copyright © 2023 RISC-V IOMMU Task Group 6 + * 7 + * RISC-V IOMMU - Register Layout and Data Structures. 8 + * 9 + * Based on the 'RISC-V IOMMU Architecture Specification', Version 1.0 10 + * Published at https://github.com/riscv-non-isa/riscv-iommu 11 + * 12 + */ 13 + 14 + #ifndef _RISCV_IOMMU_BITS_H_ 15 + #define _RISCV_IOMMU_BITS_H_ 16 + 17 + #include <linux/types.h> 18 + #include <linux/bitfield.h> 19 + #include <linux/bits.h> 20 + 21 + /* 22 + * Chapter 5: Memory Mapped register interface 23 + */ 24 + 25 + /* Common field positions */ 26 + #define RISCV_IOMMU_PPN_FIELD GENMASK_ULL(53, 10) 27 + #define RISCV_IOMMU_QUEUE_LOG2SZ_FIELD GENMASK_ULL(4, 0) 28 + #define RISCV_IOMMU_QUEUE_INDEX_FIELD GENMASK_ULL(31, 0) 29 + #define RISCV_IOMMU_QUEUE_ENABLE BIT(0) 30 + #define RISCV_IOMMU_QUEUE_INTR_ENABLE BIT(1) 31 + #define RISCV_IOMMU_QUEUE_MEM_FAULT BIT(8) 32 + #define RISCV_IOMMU_QUEUE_OVERFLOW BIT(9) 33 + #define RISCV_IOMMU_QUEUE_ACTIVE BIT(16) 34 + #define RISCV_IOMMU_QUEUE_BUSY BIT(17) 35 + 36 + #define RISCV_IOMMU_ATP_PPN_FIELD GENMASK_ULL(43, 0) 37 + #define RISCV_IOMMU_ATP_MODE_FIELD GENMASK_ULL(63, 60) 38 + 39 + /* 5.3 IOMMU Capabilities (64bits) */ 40 + #define RISCV_IOMMU_REG_CAPABILITIES 0x0000 41 + #define RISCV_IOMMU_CAPABILITIES_VERSION GENMASK_ULL(7, 0) 42 + #define RISCV_IOMMU_CAPABILITIES_SV32 BIT_ULL(8) 43 + #define RISCV_IOMMU_CAPABILITIES_SV39 BIT_ULL(9) 44 + #define RISCV_IOMMU_CAPABILITIES_SV48 BIT_ULL(10) 45 + #define RISCV_IOMMU_CAPABILITIES_SV57 BIT_ULL(11) 46 + #define RISCV_IOMMU_CAPABILITIES_SVPBMT BIT_ULL(15) 47 + #define RISCV_IOMMU_CAPABILITIES_SV32X4 BIT_ULL(16) 48 + #define RISCV_IOMMU_CAPABILITIES_SV39X4 BIT_ULL(17) 49 + #define RISCV_IOMMU_CAPABILITIES_SV48X4 BIT_ULL(18) 50 + #define RISCV_IOMMU_CAPABILITIES_SV57X4 BIT_ULL(19) 51 + #define RISCV_IOMMU_CAPABILITIES_AMO_MRIF BIT_ULL(21) 52 + #define RISCV_IOMMU_CAPABILITIES_MSI_FLAT BIT_ULL(22) 53 + #define RISCV_IOMMU_CAPABILITIES_MSI_MRIF BIT_ULL(23) 54 + #define RISCV_IOMMU_CAPABILITIES_AMO_HWAD BIT_ULL(24) 55 + #define RISCV_IOMMU_CAPABILITIES_ATS BIT_ULL(25) 56 + #define RISCV_IOMMU_CAPABILITIES_T2GPA BIT_ULL(26) 57 + #define RISCV_IOMMU_CAPABILITIES_END BIT_ULL(27) 58 + #define RISCV_IOMMU_CAPABILITIES_IGS GENMASK_ULL(29, 28) 59 + #define RISCV_IOMMU_CAPABILITIES_HPM BIT_ULL(30) 60 + #define RISCV_IOMMU_CAPABILITIES_DBG BIT_ULL(31) 61 + #define RISCV_IOMMU_CAPABILITIES_PAS GENMASK_ULL(37, 32) 62 + #define RISCV_IOMMU_CAPABILITIES_PD8 BIT_ULL(38) 63 + #define RISCV_IOMMU_CAPABILITIES_PD17 BIT_ULL(39) 64 + #define RISCV_IOMMU_CAPABILITIES_PD20 BIT_ULL(40) 65 + 66 + /** 67 + * enum riscv_iommu_igs_settings - Interrupt Generation Support Settings 68 + * @RISCV_IOMMU_CAPABILITIES_IGS_MSI: IOMMU supports only MSI generation 69 + * @RISCV_IOMMU_CAPABILITIES_IGS_WSI: IOMMU supports only Wired-Signaled interrupt 70 + * @RISCV_IOMMU_CAPABILITIES_IGS_BOTH: IOMMU supports both MSI and WSI generation 71 + * @RISCV_IOMMU_CAPABILITIES_IGS_RSRV: Reserved for standard use 72 + */ 73 + enum riscv_iommu_igs_settings { 74 + RISCV_IOMMU_CAPABILITIES_IGS_MSI = 0, 75 + RISCV_IOMMU_CAPABILITIES_IGS_WSI = 1, 76 + RISCV_IOMMU_CAPABILITIES_IGS_BOTH = 2, 77 + RISCV_IOMMU_CAPABILITIES_IGS_RSRV = 3 78 + }; 79 + 80 + /* 5.4 Features control register (32bits) */ 81 + #define RISCV_IOMMU_REG_FCTL 0x0008 82 + #define RISCV_IOMMU_FCTL_BE BIT(0) 83 + #define RISCV_IOMMU_FCTL_WSI BIT(1) 84 + #define RISCV_IOMMU_FCTL_GXL BIT(2) 85 + 86 + /* 5.5 Device-directory-table pointer (64bits) */ 87 + #define RISCV_IOMMU_REG_DDTP 0x0010 88 + #define RISCV_IOMMU_DDTP_IOMMU_MODE GENMASK_ULL(3, 0) 89 + #define RISCV_IOMMU_DDTP_BUSY BIT_ULL(4) 90 + #define RISCV_IOMMU_DDTP_PPN RISCV_IOMMU_PPN_FIELD 91 + 92 + /** 93 + * enum riscv_iommu_ddtp_modes - IOMMU translation modes 94 + * @RISCV_IOMMU_DDTP_IOMMU_MODE_OFF: No inbound transactions allowed 95 + * @RISCV_IOMMU_DDTP_IOMMU_MODE_BARE: Pass-through mode 96 + * @RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL: One-level DDT 97 + * @RISCV_IOMMU_DDTP_IOMMU_MODE_2LVL: Two-level DDT 98 + * @RISCV_IOMMU_DDTP_IOMMU_MODE_3LVL: Three-level DDT 99 + * @RISCV_IOMMU_DDTP_IOMMU_MODE_MAX: Max value allowed by specification 100 + */ 101 + enum riscv_iommu_ddtp_modes { 102 + RISCV_IOMMU_DDTP_IOMMU_MODE_OFF = 0, 103 + RISCV_IOMMU_DDTP_IOMMU_MODE_BARE = 1, 104 + RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL = 2, 105 + RISCV_IOMMU_DDTP_IOMMU_MODE_2LVL = 3, 106 + RISCV_IOMMU_DDTP_IOMMU_MODE_3LVL = 4, 107 + RISCV_IOMMU_DDTP_IOMMU_MODE_MAX = 4 108 + }; 109 + 110 + /* 5.6 Command Queue Base (64bits) */ 111 + #define RISCV_IOMMU_REG_CQB 0x0018 112 + #define RISCV_IOMMU_CQB_ENTRIES RISCV_IOMMU_QUEUE_LOG2SZ_FIELD 113 + #define RISCV_IOMMU_CQB_PPN RISCV_IOMMU_PPN_FIELD 114 + 115 + /* 5.7 Command Queue head (32bits) */ 116 + #define RISCV_IOMMU_REG_CQH 0x0020 117 + #define RISCV_IOMMU_CQH_INDEX RISCV_IOMMU_QUEUE_INDEX_FIELD 118 + 119 + /* 5.8 Command Queue tail (32bits) */ 120 + #define RISCV_IOMMU_REG_CQT 0x0024 121 + #define RISCV_IOMMU_CQT_INDEX RISCV_IOMMU_QUEUE_INDEX_FIELD 122 + 123 + /* 5.9 Fault Queue Base (64bits) */ 124 + #define RISCV_IOMMU_REG_FQB 0x0028 125 + #define RISCV_IOMMU_FQB_ENTRIES RISCV_IOMMU_QUEUE_LOG2SZ_FIELD 126 + #define RISCV_IOMMU_FQB_PPN RISCV_IOMMU_PPN_FIELD 127 + 128 + /* 5.10 Fault Queue Head (32bits) */ 129 + #define RISCV_IOMMU_REG_FQH 0x0030 130 + #define RISCV_IOMMU_FQH_INDEX RISCV_IOMMU_QUEUE_INDEX_FIELD 131 + 132 + /* 5.11 Fault Queue tail (32bits) */ 133 + #define RISCV_IOMMU_REG_FQT 0x0034 134 + #define RISCV_IOMMU_FQT_INDEX RISCV_IOMMU_QUEUE_INDEX_FIELD 135 + 136 + /* 5.12 Page Request Queue base (64bits) */ 137 + #define RISCV_IOMMU_REG_PQB 0x0038 138 + #define RISCV_IOMMU_PQB_ENTRIES RISCV_IOMMU_QUEUE_LOG2SZ_FIELD 139 + #define RISCV_IOMMU_PQB_PPN RISCV_IOMMU_PPN_FIELD 140 + 141 + /* 5.13 Page Request Queue head (32bits) */ 142 + #define RISCV_IOMMU_REG_PQH 0x0040 143 + #define RISCV_IOMMU_PQH_INDEX RISCV_IOMMU_QUEUE_INDEX_FIELD 144 + 145 + /* 5.14 Page Request Queue tail (32bits) */ 146 + #define RISCV_IOMMU_REG_PQT 0x0044 147 + #define RISCV_IOMMU_PQT_INDEX_MASK RISCV_IOMMU_QUEUE_INDEX_FIELD 148 + 149 + /* 5.15 Command Queue CSR (32bits) */ 150 + #define RISCV_IOMMU_REG_CQCSR 0x0048 151 + #define RISCV_IOMMU_CQCSR_CQEN RISCV_IOMMU_QUEUE_ENABLE 152 + #define RISCV_IOMMU_CQCSR_CIE RISCV_IOMMU_QUEUE_INTR_ENABLE 153 + #define RISCV_IOMMU_CQCSR_CQMF RISCV_IOMMU_QUEUE_MEM_FAULT 154 + #define RISCV_IOMMU_CQCSR_CMD_TO BIT(9) 155 + #define RISCV_IOMMU_CQCSR_CMD_ILL BIT(10) 156 + #define RISCV_IOMMU_CQCSR_FENCE_W_IP BIT(11) 157 + #define RISCV_IOMMU_CQCSR_CQON RISCV_IOMMU_QUEUE_ACTIVE 158 + #define RISCV_IOMMU_CQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY 159 + 160 + /* 5.16 Fault Queue CSR (32bits) */ 161 + #define RISCV_IOMMU_REG_FQCSR 0x004C 162 + #define RISCV_IOMMU_FQCSR_FQEN RISCV_IOMMU_QUEUE_ENABLE 163 + #define RISCV_IOMMU_FQCSR_FIE RISCV_IOMMU_QUEUE_INTR_ENABLE 164 + #define RISCV_IOMMU_FQCSR_FQMF RISCV_IOMMU_QUEUE_MEM_FAULT 165 + #define RISCV_IOMMU_FQCSR_FQOF RISCV_IOMMU_QUEUE_OVERFLOW 166 + #define RISCV_IOMMU_FQCSR_FQON RISCV_IOMMU_QUEUE_ACTIVE 167 + #define RISCV_IOMMU_FQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY 168 + 169 + /* 5.17 Page Request Queue CSR (32bits) */ 170 + #define RISCV_IOMMU_REG_PQCSR 0x0050 171 + #define RISCV_IOMMU_PQCSR_PQEN RISCV_IOMMU_QUEUE_ENABLE 172 + #define RISCV_IOMMU_PQCSR_PIE RISCV_IOMMU_QUEUE_INTR_ENABLE 173 + #define RISCV_IOMMU_PQCSR_PQMF RISCV_IOMMU_QUEUE_MEM_FAULT 174 + #define RISCV_IOMMU_PQCSR_PQOF RISCV_IOMMU_QUEUE_OVERFLOW 175 + #define RISCV_IOMMU_PQCSR_PQON RISCV_IOMMU_QUEUE_ACTIVE 176 + #define RISCV_IOMMU_PQCSR_BUSY RISCV_IOMMU_QUEUE_BUSY 177 + 178 + /* 5.18 Interrupt Pending Status (32bits) */ 179 + #define RISCV_IOMMU_REG_IPSR 0x0054 180 + 181 + #define RISCV_IOMMU_INTR_CQ 0 182 + #define RISCV_IOMMU_INTR_FQ 1 183 + #define RISCV_IOMMU_INTR_PM 2 184 + #define RISCV_IOMMU_INTR_PQ 3 185 + #define RISCV_IOMMU_INTR_COUNT 4 186 + 187 + #define RISCV_IOMMU_IPSR_CIP BIT(RISCV_IOMMU_INTR_CQ) 188 + #define RISCV_IOMMU_IPSR_FIP BIT(RISCV_IOMMU_INTR_FQ) 189 + #define RISCV_IOMMU_IPSR_PMIP BIT(RISCV_IOMMU_INTR_PM) 190 + #define RISCV_IOMMU_IPSR_PIP BIT(RISCV_IOMMU_INTR_PQ) 191 + 192 + /* 5.19 Performance monitoring counter overflow status (32bits) */ 193 + #define RISCV_IOMMU_REG_IOCOUNTOVF 0x0058 194 + #define RISCV_IOMMU_IOCOUNTOVF_CY BIT(0) 195 + #define RISCV_IOMMU_IOCOUNTOVF_HPM GENMASK_ULL(31, 1) 196 + 197 + /* 5.20 Performance monitoring counter inhibits (32bits) */ 198 + #define RISCV_IOMMU_REG_IOCOUNTINH 0x005C 199 + #define RISCV_IOMMU_IOCOUNTINH_CY BIT(0) 200 + #define RISCV_IOMMU_IOCOUNTINH_HPM GENMASK(31, 1) 201 + 202 + /* 5.21 Performance monitoring cycles counter (64bits) */ 203 + #define RISCV_IOMMU_REG_IOHPMCYCLES 0x0060 204 + #define RISCV_IOMMU_IOHPMCYCLES_COUNTER GENMASK_ULL(62, 0) 205 + #define RISCV_IOMMU_IOHPMCYCLES_OF BIT_ULL(63) 206 + 207 + /* 5.22 Performance monitoring event counters (31 * 64bits) */ 208 + #define RISCV_IOMMU_REG_IOHPMCTR_BASE 0x0068 209 + #define RISCV_IOMMU_REG_IOHPMCTR(_n) (RISCV_IOMMU_REG_IOHPMCTR_BASE + ((_n) * 0x8)) 210 + 211 + /* 5.23 Performance monitoring event selectors (31 * 64bits) */ 212 + #define RISCV_IOMMU_REG_IOHPMEVT_BASE 0x0160 213 + #define RISCV_IOMMU_REG_IOHPMEVT(_n) (RISCV_IOMMU_REG_IOHPMEVT_BASE + ((_n) * 0x8)) 214 + #define RISCV_IOMMU_IOHPMEVT_EVENTID GENMASK_ULL(14, 0) 215 + #define RISCV_IOMMU_IOHPMEVT_DMASK BIT_ULL(15) 216 + #define RISCV_IOMMU_IOHPMEVT_PID_PSCID GENMASK_ULL(35, 16) 217 + #define RISCV_IOMMU_IOHPMEVT_DID_GSCID GENMASK_ULL(59, 36) 218 + #define RISCV_IOMMU_IOHPMEVT_PV_PSCV BIT_ULL(60) 219 + #define RISCV_IOMMU_IOHPMEVT_DV_GSCV BIT_ULL(61) 220 + #define RISCV_IOMMU_IOHPMEVT_IDT BIT_ULL(62) 221 + #define RISCV_IOMMU_IOHPMEVT_OF BIT_ULL(63) 222 + 223 + /* Number of defined performance-monitoring event selectors */ 224 + #define RISCV_IOMMU_IOHPMEVT_CNT 31 225 + 226 + /** 227 + * enum riscv_iommu_hpmevent_id - Performance-monitoring event identifier 228 + * 229 + * @RISCV_IOMMU_HPMEVENT_INVALID: Invalid event, do not count 230 + * @RISCV_IOMMU_HPMEVENT_URQ: Untranslated requests 231 + * @RISCV_IOMMU_HPMEVENT_TRQ: Translated requests 232 + * @RISCV_IOMMU_HPMEVENT_ATS_RQ: ATS translation requests 233 + * @RISCV_IOMMU_HPMEVENT_TLB_MISS: TLB misses 234 + * @RISCV_IOMMU_HPMEVENT_DD_WALK: Device directory walks 235 + * @RISCV_IOMMU_HPMEVENT_PD_WALK: Process directory walks 236 + * @RISCV_IOMMU_HPMEVENT_S_VS_WALKS: First-stage page table walks 237 + * @RISCV_IOMMU_HPMEVENT_G_WALKS: Second-stage page table walks 238 + * @RISCV_IOMMU_HPMEVENT_MAX: Value to denote maximum Event IDs 239 + */ 240 + enum riscv_iommu_hpmevent_id { 241 + RISCV_IOMMU_HPMEVENT_INVALID = 0, 242 + RISCV_IOMMU_HPMEVENT_URQ = 1, 243 + RISCV_IOMMU_HPMEVENT_TRQ = 2, 244 + RISCV_IOMMU_HPMEVENT_ATS_RQ = 3, 245 + RISCV_IOMMU_HPMEVENT_TLB_MISS = 4, 246 + RISCV_IOMMU_HPMEVENT_DD_WALK = 5, 247 + RISCV_IOMMU_HPMEVENT_PD_WALK = 6, 248 + RISCV_IOMMU_HPMEVENT_S_VS_WALKS = 7, 249 + RISCV_IOMMU_HPMEVENT_G_WALKS = 8, 250 + RISCV_IOMMU_HPMEVENT_MAX = 9 251 + }; 252 + 253 + /* 5.24 Translation request IOVA (64bits) */ 254 + #define RISCV_IOMMU_REG_TR_REQ_IOVA 0x0258 255 + #define RISCV_IOMMU_TR_REQ_IOVA_VPN GENMASK_ULL(63, 12) 256 + 257 + /* 5.25 Translation request control (64bits) */ 258 + #define RISCV_IOMMU_REG_TR_REQ_CTL 0x0260 259 + #define RISCV_IOMMU_TR_REQ_CTL_GO_BUSY BIT_ULL(0) 260 + #define RISCV_IOMMU_TR_REQ_CTL_PRIV BIT_ULL(1) 261 + #define RISCV_IOMMU_TR_REQ_CTL_EXE BIT_ULL(2) 262 + #define RISCV_IOMMU_TR_REQ_CTL_NW BIT_ULL(3) 263 + #define RISCV_IOMMU_TR_REQ_CTL_PID GENMASK_ULL(31, 12) 264 + #define RISCV_IOMMU_TR_REQ_CTL_PV BIT_ULL(32) 265 + #define RISCV_IOMMU_TR_REQ_CTL_DID GENMASK_ULL(63, 40) 266 + 267 + /* 5.26 Translation request response (64bits) */ 268 + #define RISCV_IOMMU_REG_TR_RESPONSE 0x0268 269 + #define RISCV_IOMMU_TR_RESPONSE_FAULT BIT_ULL(0) 270 + #define RISCV_IOMMU_TR_RESPONSE_PBMT GENMASK_ULL(8, 7) 271 + #define RISCV_IOMMU_TR_RESPONSE_SZ BIT_ULL(9) 272 + #define RISCV_IOMMU_TR_RESPONSE_PPN RISCV_IOMMU_PPN_FIELD 273 + 274 + /* 5.27 Interrupt cause to vector (64bits) */ 275 + #define RISCV_IOMMU_REG_ICVEC 0x02F8 276 + #define RISCV_IOMMU_ICVEC_CIV GENMASK_ULL(3, 0) 277 + #define RISCV_IOMMU_ICVEC_FIV GENMASK_ULL(7, 4) 278 + #define RISCV_IOMMU_ICVEC_PMIV GENMASK_ULL(11, 8) 279 + #define RISCV_IOMMU_ICVEC_PIV GENMASK_ULL(15, 12) 280 + 281 + /* 5.28 MSI Configuration table (32 * 64bits) */ 282 + #define RISCV_IOMMU_REG_MSI_CFG_TBL 0x0300 283 + #define RISCV_IOMMU_REG_MSI_CFG_TBL_ADDR(_n) \ 284 + (RISCV_IOMMU_REG_MSI_CFG_TBL + ((_n) * 0x10)) 285 + #define RISCV_IOMMU_MSI_CFG_TBL_ADDR GENMASK_ULL(55, 2) 286 + #define RISCV_IOMMU_REG_MSI_CFG_TBL_DATA(_n) \ 287 + (RISCV_IOMMU_REG_MSI_CFG_TBL + ((_n) * 0x10) + 0x08) 288 + #define RISCV_IOMMU_MSI_CFG_TBL_DATA GENMASK_ULL(31, 0) 289 + #define RISCV_IOMMU_REG_MSI_CFG_TBL_CTRL(_n) \ 290 + (RISCV_IOMMU_REG_MSI_CFG_TBL + ((_n) * 0x10) + 0x0C) 291 + #define RISCV_IOMMU_MSI_CFG_TBL_CTRL_M BIT_ULL(0) 292 + 293 + #define RISCV_IOMMU_REG_SIZE 0x1000 294 + 295 + /* 296 + * Chapter 2: Data structures 297 + */ 298 + 299 + /* 300 + * Device Directory Table macros for non-leaf nodes 301 + */ 302 + #define RISCV_IOMMU_DDTE_V BIT_ULL(0) 303 + #define RISCV_IOMMU_DDTE_PPN RISCV_IOMMU_PPN_FIELD 304 + 305 + /** 306 + * struct riscv_iommu_dc - Device Context 307 + * @tc: Translation Control 308 + * @iohgatp: I/O Hypervisor guest address translation and protection 309 + * (Second stage context) 310 + * @ta: Translation Attributes 311 + * @fsc: First stage context 312 + * @msiptp: MSI page table pointer 313 + * @msi_addr_mask: MSI address mask 314 + * @msi_addr_pattern: MSI address pattern 315 + * @_reserved: Reserved for future use, padding 316 + * 317 + * This structure is used for leaf nodes on the Device Directory Table, 318 + * in case RISCV_IOMMU_CAPABILITIES_MSI_FLAT is not set, the bottom 4 fields 319 + * are not present and are skipped with pointer arithmetic to avoid 320 + * casting, check out riscv_iommu_get_dc(). 321 + * See section 2.1 for more details 322 + */ 323 + struct riscv_iommu_dc { 324 + u64 tc; 325 + u64 iohgatp; 326 + u64 ta; 327 + u64 fsc; 328 + u64 msiptp; 329 + u64 msi_addr_mask; 330 + u64 msi_addr_pattern; 331 + u64 _reserved; 332 + }; 333 + 334 + /* Translation control fields */ 335 + #define RISCV_IOMMU_DC_TC_V BIT_ULL(0) 336 + #define RISCV_IOMMU_DC_TC_EN_ATS BIT_ULL(1) 337 + #define RISCV_IOMMU_DC_TC_EN_PRI BIT_ULL(2) 338 + #define RISCV_IOMMU_DC_TC_T2GPA BIT_ULL(3) 339 + #define RISCV_IOMMU_DC_TC_DTF BIT_ULL(4) 340 + #define RISCV_IOMMU_DC_TC_PDTV BIT_ULL(5) 341 + #define RISCV_IOMMU_DC_TC_PRPR BIT_ULL(6) 342 + #define RISCV_IOMMU_DC_TC_GADE BIT_ULL(7) 343 + #define RISCV_IOMMU_DC_TC_SADE BIT_ULL(8) 344 + #define RISCV_IOMMU_DC_TC_DPE BIT_ULL(9) 345 + #define RISCV_IOMMU_DC_TC_SBE BIT_ULL(10) 346 + #define RISCV_IOMMU_DC_TC_SXL BIT_ULL(11) 347 + 348 + /* Second-stage (aka G-stage) context fields */ 349 + #define RISCV_IOMMU_DC_IOHGATP_PPN RISCV_IOMMU_ATP_PPN_FIELD 350 + #define RISCV_IOMMU_DC_IOHGATP_GSCID GENMASK_ULL(59, 44) 351 + #define RISCV_IOMMU_DC_IOHGATP_MODE RISCV_IOMMU_ATP_MODE_FIELD 352 + 353 + /** 354 + * enum riscv_iommu_dc_iohgatp_modes - Guest address translation/protection modes 355 + * @RISCV_IOMMU_DC_IOHGATP_MODE_BARE: No translation/protection 356 + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4: Sv32x4 (2-bit extension of Sv32), when fctl.GXL == 1 357 + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4: Sv39x4 (2-bit extension of Sv39), when fctl.GXL == 0 358 + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4: Sv48x4 (2-bit extension of Sv48), when fctl.GXL == 0 359 + * @RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4: Sv57x4 (2-bit extension of Sv57), when fctl.GXL == 0 360 + */ 361 + enum riscv_iommu_dc_iohgatp_modes { 362 + RISCV_IOMMU_DC_IOHGATP_MODE_BARE = 0, 363 + RISCV_IOMMU_DC_IOHGATP_MODE_SV32X4 = 8, 364 + RISCV_IOMMU_DC_IOHGATP_MODE_SV39X4 = 8, 365 + RISCV_IOMMU_DC_IOHGATP_MODE_SV48X4 = 9, 366 + RISCV_IOMMU_DC_IOHGATP_MODE_SV57X4 = 10 367 + }; 368 + 369 + /* Translation attributes fields */ 370 + #define RISCV_IOMMU_DC_TA_PSCID GENMASK_ULL(31, 12) 371 + 372 + /* First-stage context fields */ 373 + #define RISCV_IOMMU_DC_FSC_PPN RISCV_IOMMU_ATP_PPN_FIELD 374 + #define RISCV_IOMMU_DC_FSC_MODE RISCV_IOMMU_ATP_MODE_FIELD 375 + 376 + /** 377 + * enum riscv_iommu_dc_fsc_atp_modes - First stage address translation/protection modes 378 + * @RISCV_IOMMU_DC_FSC_MODE_BARE: No translation/protection 379 + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32: Sv32, when dc.tc.SXL == 1 380 + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: Sv39, when dc.tc.SXL == 0 381 + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: Sv48, when dc.tc.SXL == 0 382 + * @RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: Sv57, when dc.tc.SXL == 0 383 + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8: 1lvl PDT, 8bit process ids 384 + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17: 2lvl PDT, 17bit process ids 385 + * @RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20: 3lvl PDT, 20bit process ids 386 + * 387 + * FSC holds IOSATP when RISCV_IOMMU_DC_TC_PDTV is 0 and PDTP otherwise. 388 + * IOSATP controls the first stage address translation (same as the satp register on 389 + * the RISC-V MMU), and PDTP holds the process directory table, used to select a 390 + * first stage page table based on a process id (for devices that support multiple 391 + * process ids). 392 + */ 393 + enum riscv_iommu_dc_fsc_atp_modes { 394 + RISCV_IOMMU_DC_FSC_MODE_BARE = 0, 395 + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV32 = 8, 396 + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 = 8, 397 + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48 = 9, 398 + RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57 = 10, 399 + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD8 = 1, 400 + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD17 = 2, 401 + RISCV_IOMMU_DC_FSC_PDTP_MODE_PD20 = 3 402 + }; 403 + 404 + /* MSI page table pointer */ 405 + #define RISCV_IOMMU_DC_MSIPTP_PPN RISCV_IOMMU_ATP_PPN_FIELD 406 + #define RISCV_IOMMU_DC_MSIPTP_MODE RISCV_IOMMU_ATP_MODE_FIELD 407 + #define RISCV_IOMMU_DC_MSIPTP_MODE_OFF 0 408 + #define RISCV_IOMMU_DC_MSIPTP_MODE_FLAT 1 409 + 410 + /* MSI address mask */ 411 + #define RISCV_IOMMU_DC_MSI_ADDR_MASK GENMASK_ULL(51, 0) 412 + 413 + /* MSI address pattern */ 414 + #define RISCV_IOMMU_DC_MSI_PATTERN GENMASK_ULL(51, 0) 415 + 416 + /** 417 + * struct riscv_iommu_pc - Process Context 418 + * @ta: Translation Attributes 419 + * @fsc: First stage context 420 + * 421 + * This structure is used for leaf nodes on the Process Directory Table 422 + * See section 2.3 for more details 423 + */ 424 + struct riscv_iommu_pc { 425 + u64 ta; 426 + u64 fsc; 427 + }; 428 + 429 + /* Translation attributes fields */ 430 + #define RISCV_IOMMU_PC_TA_V BIT_ULL(0) 431 + #define RISCV_IOMMU_PC_TA_ENS BIT_ULL(1) 432 + #define RISCV_IOMMU_PC_TA_SUM BIT_ULL(2) 433 + #define RISCV_IOMMU_PC_TA_PSCID GENMASK_ULL(31, 12) 434 + 435 + /* First stage context fields */ 436 + #define RISCV_IOMMU_PC_FSC_PPN RISCV_IOMMU_ATP_PPN_FIELD 437 + #define RISCV_IOMMU_PC_FSC_MODE RISCV_IOMMU_ATP_MODE_FIELD 438 + 439 + /* 440 + * Chapter 3: In-memory queue interface 441 + */ 442 + 443 + /** 444 + * struct riscv_iommu_command - Generic IOMMU command structure 445 + * @dword0: Includes the opcode and the function identifier 446 + * @dword1: Opcode specific data 447 + * 448 + * The commands are interpreted as two 64bit fields, where the first 449 + * 7bits of the first field are the opcode which also defines the 450 + * command's format, followed by a 3bit field that specifies the 451 + * function invoked by that command, and the rest is opcode-specific. 452 + * This is a generic struct which will be populated differently 453 + * according to each command. For more infos on the commands and 454 + * the command queue check section 3.1. 455 + */ 456 + struct riscv_iommu_command { 457 + u64 dword0; 458 + u64 dword1; 459 + }; 460 + 461 + /* Fields on dword0, common for all commands */ 462 + #define RISCV_IOMMU_CMD_OPCODE GENMASK_ULL(6, 0) 463 + #define RISCV_IOMMU_CMD_FUNC GENMASK_ULL(9, 7) 464 + 465 + /* 3.1.1 IOMMU Page-table cache invalidation */ 466 + /* Fields on dword0 */ 467 + #define RISCV_IOMMU_CMD_IOTINVAL_OPCODE 1 468 + #define RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA 0 469 + #define RISCV_IOMMU_CMD_IOTINVAL_FUNC_GVMA 1 470 + #define RISCV_IOMMU_CMD_IOTINVAL_AV BIT_ULL(10) 471 + #define RISCV_IOMMU_CMD_IOTINVAL_PSCID GENMASK_ULL(31, 12) 472 + #define RISCV_IOMMU_CMD_IOTINVAL_PSCV BIT_ULL(32) 473 + #define RISCV_IOMMU_CMD_IOTINVAL_GV BIT_ULL(33) 474 + #define RISCV_IOMMU_CMD_IOTINVAL_GSCID GENMASK_ULL(59, 44) 475 + /* dword1[61:10] is the 4K-aligned page address */ 476 + #define RISCV_IOMMU_CMD_IOTINVAL_ADDR GENMASK_ULL(61, 10) 477 + 478 + /* 3.1.2 IOMMU Command Queue Fences */ 479 + /* Fields on dword0 */ 480 + #define RISCV_IOMMU_CMD_IOFENCE_OPCODE 2 481 + #define RISCV_IOMMU_CMD_IOFENCE_FUNC_C 0 482 + #define RISCV_IOMMU_CMD_IOFENCE_AV BIT_ULL(10) 483 + #define RISCV_IOMMU_CMD_IOFENCE_WSI BIT_ULL(11) 484 + #define RISCV_IOMMU_CMD_IOFENCE_PR BIT_ULL(12) 485 + #define RISCV_IOMMU_CMD_IOFENCE_PW BIT_ULL(13) 486 + #define RISCV_IOMMU_CMD_IOFENCE_DATA GENMASK_ULL(63, 32) 487 + /* dword1 is the address, word-size aligned and shifted to the right by two bits. */ 488 + 489 + /* 3.1.3 IOMMU Directory cache invalidation */ 490 + /* Fields on dword0 */ 491 + #define RISCV_IOMMU_CMD_IODIR_OPCODE 3 492 + #define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT 0 493 + #define RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT 1 494 + #define RISCV_IOMMU_CMD_IODIR_PID GENMASK_ULL(31, 12) 495 + #define RISCV_IOMMU_CMD_IODIR_DV BIT_ULL(33) 496 + #define RISCV_IOMMU_CMD_IODIR_DID GENMASK_ULL(63, 40) 497 + /* dword1 is reserved for standard use */ 498 + 499 + /* 3.1.4 IOMMU PCIe ATS */ 500 + /* Fields on dword0 */ 501 + #define RISCV_IOMMU_CMD_ATS_OPCODE 4 502 + #define RISCV_IOMMU_CMD_ATS_FUNC_INVAL 0 503 + #define RISCV_IOMMU_CMD_ATS_FUNC_PRGR 1 504 + #define RISCV_IOMMU_CMD_ATS_PID GENMASK_ULL(31, 12) 505 + #define RISCV_IOMMU_CMD_ATS_PV BIT_ULL(32) 506 + #define RISCV_IOMMU_CMD_ATS_DSV BIT_ULL(33) 507 + #define RISCV_IOMMU_CMD_ATS_RID GENMASK_ULL(55, 40) 508 + #define RISCV_IOMMU_CMD_ATS_DSEG GENMASK_ULL(63, 56) 509 + /* dword1 is the ATS payload, two different payload types for INVAL and PRGR */ 510 + 511 + /* ATS.INVAL payload*/ 512 + #define RISCV_IOMMU_CMD_ATS_INVAL_G BIT_ULL(0) 513 + /* Bits 1 - 10 are zeroed */ 514 + #define RISCV_IOMMU_CMD_ATS_INVAL_S BIT_ULL(11) 515 + #define RISCV_IOMMU_CMD_ATS_INVAL_UADDR GENMASK_ULL(63, 12) 516 + 517 + /* ATS.PRGR payload */ 518 + /* Bits 0 - 31 are zeroed */ 519 + #define RISCV_IOMMU_CMD_ATS_PRGR_PRG_INDEX GENMASK_ULL(40, 32) 520 + /* Bits 41 - 43 are zeroed */ 521 + #define RISCV_IOMMU_CMD_ATS_PRGR_RESP_CODE GENMASK_ULL(47, 44) 522 + #define RISCV_IOMMU_CMD_ATS_PRGR_DST_ID GENMASK_ULL(63, 48) 523 + 524 + /** 525 + * struct riscv_iommu_fq_record - Fault/Event Queue Record 526 + * @hdr: Header, includes fault/event cause, PID/DID, transaction type etc 527 + * @_reserved: Low 32bits for custom use, high 32bits for standard use 528 + * @iotval: Transaction-type/cause specific format 529 + * @iotval2: Cause specific format 530 + * 531 + * The fault/event queue reports events and failures raised when 532 + * processing transactions. Each record is a 32byte structure where 533 + * the first dword has a fixed format for providing generic infos 534 + * regarding the fault/event, and two more dwords are there for 535 + * fault/event-specific information. For more details see section 536 + * 3.2. 537 + */ 538 + struct riscv_iommu_fq_record { 539 + u64 hdr; 540 + u64 _reserved; 541 + u64 iotval; 542 + u64 iotval2; 543 + }; 544 + 545 + /* Fields on header */ 546 + #define RISCV_IOMMU_FQ_HDR_CAUSE GENMASK_ULL(11, 0) 547 + #define RISCV_IOMMU_FQ_HDR_PID GENMASK_ULL(31, 12) 548 + #define RISCV_IOMMU_FQ_HDR_PV BIT_ULL(32) 549 + #define RISCV_IOMMU_FQ_HDR_PRIV BIT_ULL(33) 550 + #define RISCV_IOMMU_FQ_HDR_TTYP GENMASK_ULL(39, 34) 551 + #define RISCV_IOMMU_FQ_HDR_DID GENMASK_ULL(63, 40) 552 + 553 + /** 554 + * enum riscv_iommu_fq_causes - Fault/event cause values 555 + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT: Instruction access fault 556 + * @RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED: Read address misaligned 557 + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT: Read load fault 558 + * @RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED: Write/AMO address misaligned 559 + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT: Write/AMO access fault 560 + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S: Instruction page fault 561 + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S: Read page fault 562 + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S: Write/AMO page fault 563 + * @RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS: Instruction guest page fault 564 + * @RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS: Read guest page fault 565 + * @RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS: Write/AMO guest page fault 566 + * @RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED: All inbound transactions disallowed 567 + * @RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT: DDT entry load access fault 568 + * @RISCV_IOMMU_FQ_CAUSE_DDT_INVALID: DDT entry invalid 569 + * @RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED: DDT entry misconfigured 570 + * @RISCV_IOMMU_FQ_CAUSE_TTYP_BLOCKED: Transaction type disallowed 571 + * @RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT: MSI PTE load access fault 572 + * @RISCV_IOMMU_FQ_CAUSE_MSI_INVALID: MSI PTE invalid 573 + * @RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED: MSI PTE misconfigured 574 + * @RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT: MRIF access fault 575 + * @RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT: PDT entry load access fault 576 + * @RISCV_IOMMU_FQ_CAUSE_PDT_INVALID: PDT entry invalid 577 + * @RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED: PDT entry misconfigured 578 + * @RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED: DDT data corruption 579 + * @RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED: PDT data corruption 580 + * @RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED: MSI page table data corruption 581 + * @RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED: MRIF data corruption 582 + * @RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR: Internal data path error 583 + * @RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT: IOMMU MSI write access fault 584 + * @RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED: First/second stage page table data corruption 585 + * 586 + * Values are on table 11 of the spec, encodings 275 - 2047 are reserved for standard 587 + * use, and 2048 - 4095 for custom use. 588 + */ 589 + enum riscv_iommu_fq_causes { 590 + RISCV_IOMMU_FQ_CAUSE_INST_FAULT = 1, 591 + RISCV_IOMMU_FQ_CAUSE_RD_ADDR_MISALIGNED = 4, 592 + RISCV_IOMMU_FQ_CAUSE_RD_FAULT = 5, 593 + RISCV_IOMMU_FQ_CAUSE_WR_ADDR_MISALIGNED = 6, 594 + RISCV_IOMMU_FQ_CAUSE_WR_FAULT = 7, 595 + RISCV_IOMMU_FQ_CAUSE_INST_FAULT_S = 12, 596 + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_S = 13, 597 + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_S = 15, 598 + RISCV_IOMMU_FQ_CAUSE_INST_FAULT_VS = 20, 599 + RISCV_IOMMU_FQ_CAUSE_RD_FAULT_VS = 21, 600 + RISCV_IOMMU_FQ_CAUSE_WR_FAULT_VS = 23, 601 + RISCV_IOMMU_FQ_CAUSE_DMA_DISABLED = 256, 602 + RISCV_IOMMU_FQ_CAUSE_DDT_LOAD_FAULT = 257, 603 + RISCV_IOMMU_FQ_CAUSE_DDT_INVALID = 258, 604 + RISCV_IOMMU_FQ_CAUSE_DDT_MISCONFIGURED = 259, 605 + RISCV_IOMMU_FQ_CAUSE_TTYP_BLOCKED = 260, 606 + RISCV_IOMMU_FQ_CAUSE_MSI_LOAD_FAULT = 261, 607 + RISCV_IOMMU_FQ_CAUSE_MSI_INVALID = 262, 608 + RISCV_IOMMU_FQ_CAUSE_MSI_MISCONFIGURED = 263, 609 + RISCV_IOMMU_FQ_CAUSE_MRIF_FAULT = 264, 610 + RISCV_IOMMU_FQ_CAUSE_PDT_LOAD_FAULT = 265, 611 + RISCV_IOMMU_FQ_CAUSE_PDT_INVALID = 266, 612 + RISCV_IOMMU_FQ_CAUSE_PDT_MISCONFIGURED = 267, 613 + RISCV_IOMMU_FQ_CAUSE_DDT_CORRUPTED = 268, 614 + RISCV_IOMMU_FQ_CAUSE_PDT_CORRUPTED = 269, 615 + RISCV_IOMMU_FQ_CAUSE_MSI_PT_CORRUPTED = 270, 616 + RISCV_IOMMU_FQ_CAUSE_MRIF_CORRUIPTED = 271, 617 + RISCV_IOMMU_FQ_CAUSE_INTERNAL_DP_ERROR = 272, 618 + RISCV_IOMMU_FQ_CAUSE_MSI_WR_FAULT = 273, 619 + RISCV_IOMMU_FQ_CAUSE_PT_CORRUPTED = 274 620 + }; 621 + 622 + /** 623 + * enum riscv_iommu_fq_ttypes: Fault/event transaction types 624 + * @RISCV_IOMMU_FQ_TTYP_NONE: None. Fault not caused by an inbound transaction. 625 + * @RISCV_IOMMU_FQ_TTYP_UADDR_INST_FETCH: Instruction fetch from untranslated address 626 + * @RISCV_IOMMU_FQ_TTYP_UADDR_RD: Read from untranslated address 627 + * @RISCV_IOMMU_FQ_TTYP_UADDR_WR: Write/AMO to untranslated address 628 + * @RISCV_IOMMU_FQ_TTYP_TADDR_INST_FETCH: Instruction fetch from translated address 629 + * @RISCV_IOMMU_FQ_TTYP_TADDR_RD: Read from translated address 630 + * @RISCV_IOMMU_FQ_TTYP_TADDR_WR: Write/AMO to translated address 631 + * @RISCV_IOMMU_FQ_TTYP_PCIE_ATS_REQ: PCIe ATS translation request 632 + * @RISCV_IOMMU_FQ_TTYP_PCIE_MSG_REQ: PCIe message request 633 + * 634 + * Values are on table 12 of the spec, type 4 and 10 - 31 are reserved for standard use 635 + * and 31 - 63 for custom use. 636 + */ 637 + enum riscv_iommu_fq_ttypes { 638 + RISCV_IOMMU_FQ_TTYP_NONE = 0, 639 + RISCV_IOMMU_FQ_TTYP_UADDR_INST_FETCH = 1, 640 + RISCV_IOMMU_FQ_TTYP_UADDR_RD = 2, 641 + RISCV_IOMMU_FQ_TTYP_UADDR_WR = 3, 642 + RISCV_IOMMU_FQ_TTYP_TADDR_INST_FETCH = 5, 643 + RISCV_IOMMU_FQ_TTYP_TADDR_RD = 6, 644 + RISCV_IOMMU_FQ_TTYP_TADDR_WR = 7, 645 + RISCV_IOMMU_FQ_TTYP_PCIE_ATS_REQ = 8, 646 + RISCV_IOMMU_FQ_TTYP_PCIE_MSG_REQ = 9, 647 + }; 648 + 649 + /** 650 + * struct riscv_iommu_pq_record - PCIe Page Request record 651 + * @hdr: Header, includes PID, DID etc 652 + * @payload: Holds the page address, request group and permission bits 653 + * 654 + * For more infos on the PCIe Page Request queue see chapter 3.3. 655 + */ 656 + struct riscv_iommu_pq_record { 657 + u64 hdr; 658 + u64 payload; 659 + }; 660 + 661 + /* Header fields */ 662 + #define RISCV_IOMMU_PQ_HDR_PID GENMASK_ULL(31, 12) 663 + #define RISCV_IOMMU_PQ_HDR_PV BIT_ULL(32) 664 + #define RISCV_IOMMU_PQ_HDR_PRIV BIT_ULL(33) 665 + #define RISCV_IOMMU_PQ_HDR_EXEC BIT_ULL(34) 666 + #define RISCV_IOMMU_PQ_HDR_DID GENMASK_ULL(63, 40) 667 + 668 + /* Payload fields */ 669 + #define RISCV_IOMMU_PQ_PAYLOAD_R BIT_ULL(0) 670 + #define RISCV_IOMMU_PQ_PAYLOAD_W BIT_ULL(1) 671 + #define RISCV_IOMMU_PQ_PAYLOAD_L BIT_ULL(2) 672 + #define RISCV_IOMMU_PQ_PAYLOAD_RWL_MASK GENMASK_ULL(2, 0) 673 + #define RISCV_IOMMU_PQ_PAYLOAD_PRGI GENMASK_ULL(11, 3) /* Page Request Group Index */ 674 + #define RISCV_IOMMU_PQ_PAYLOAD_ADDR GENMASK_ULL(63, 12) 675 + 676 + /** 677 + * struct riscv_iommu_msipte - MSI Page Table Entry 678 + * @pte: MSI PTE 679 + * @mrif_info: Memory-resident interrupt file info 680 + * 681 + * The MSI Page Table is used for virtualizing MSIs, so that when 682 + * a device sends an MSI to a guest, the IOMMU can reroute it 683 + * by translating the MSI address, either to a guest interrupt file 684 + * or a memory resident interrupt file (MRIF). Note that this page table 685 + * is an array of MSI PTEs, not a multi-level pt, each entry 686 + * is a leaf entry. For more infos check out the AIA spec, chapter 9.5. 687 + * 688 + * Also in basic mode the mrif_info field is ignored by the IOMMU and can 689 + * be used by software, any other reserved fields on pte must be zeroed-out 690 + * by software. 691 + */ 692 + struct riscv_iommu_msipte { 693 + u64 pte; 694 + u64 mrif_info; 695 + }; 696 + 697 + /* Fields on pte */ 698 + #define RISCV_IOMMU_MSIPTE_V BIT_ULL(0) 699 + #define RISCV_IOMMU_MSIPTE_M GENMASK_ULL(2, 1) 700 + #define RISCV_IOMMU_MSIPTE_MRIF_ADDR GENMASK_ULL(53, 7) /* When M == 1 (MRIF mode) */ 701 + #define RISCV_IOMMU_MSIPTE_PPN RISCV_IOMMU_PPN_FIELD /* When M == 3 (basic mode) */ 702 + #define RISCV_IOMMU_MSIPTE_C BIT_ULL(63) 703 + 704 + /* Fields on mrif_info */ 705 + #define RISCV_IOMMU_MSIPTE_MRIF_NID GENMASK_ULL(9, 0) 706 + #define RISCV_IOMMU_MSIPTE_MRIF_NPPN RISCV_IOMMU_PPN_FIELD 707 + #define RISCV_IOMMU_MSIPTE_MRIF_NID_MSB BIT_ULL(60) 708 + 709 + /* Helper functions: command structure builders. */ 710 + 711 + static inline void riscv_iommu_cmd_inval_vma(struct riscv_iommu_command *cmd) 712 + { 713 + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOTINVAL_OPCODE) | 714 + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOTINVAL_FUNC_VMA); 715 + cmd->dword1 = 0; 716 + } 717 + 718 + static inline void riscv_iommu_cmd_inval_set_addr(struct riscv_iommu_command *cmd, 719 + u64 addr) 720 + { 721 + cmd->dword1 = FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_ADDR, phys_to_pfn(addr)); 722 + cmd->dword0 |= RISCV_IOMMU_CMD_IOTINVAL_AV; 723 + } 724 + 725 + static inline void riscv_iommu_cmd_inval_set_pscid(struct riscv_iommu_command *cmd, 726 + int pscid) 727 + { 728 + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_PSCID, pscid) | 729 + RISCV_IOMMU_CMD_IOTINVAL_PSCV; 730 + } 731 + 732 + static inline void riscv_iommu_cmd_inval_set_gscid(struct riscv_iommu_command *cmd, 733 + int gscid) 734 + { 735 + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IOTINVAL_GSCID, gscid) | 736 + RISCV_IOMMU_CMD_IOTINVAL_GV; 737 + } 738 + 739 + static inline void riscv_iommu_cmd_iofence(struct riscv_iommu_command *cmd) 740 + { 741 + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) | 742 + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) | 743 + RISCV_IOMMU_CMD_IOFENCE_PR | RISCV_IOMMU_CMD_IOFENCE_PW; 744 + cmd->dword1 = 0; 745 + } 746 + 747 + static inline void riscv_iommu_cmd_iofence_set_av(struct riscv_iommu_command *cmd, 748 + u64 addr, u32 data) 749 + { 750 + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IOFENCE_OPCODE) | 751 + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IOFENCE_FUNC_C) | 752 + FIELD_PREP(RISCV_IOMMU_CMD_IOFENCE_DATA, data) | 753 + RISCV_IOMMU_CMD_IOFENCE_AV; 754 + cmd->dword1 = addr >> 2; 755 + } 756 + 757 + static inline void riscv_iommu_cmd_iodir_inval_ddt(struct riscv_iommu_command *cmd) 758 + { 759 + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) | 760 + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_DDT); 761 + cmd->dword1 = 0; 762 + } 763 + 764 + static inline void riscv_iommu_cmd_iodir_inval_pdt(struct riscv_iommu_command *cmd) 765 + { 766 + cmd->dword0 = FIELD_PREP(RISCV_IOMMU_CMD_OPCODE, RISCV_IOMMU_CMD_IODIR_OPCODE) | 767 + FIELD_PREP(RISCV_IOMMU_CMD_FUNC, RISCV_IOMMU_CMD_IODIR_FUNC_INVAL_PDT); 768 + cmd->dword1 = 0; 769 + } 770 + 771 + static inline void riscv_iommu_cmd_iodir_set_did(struct riscv_iommu_command *cmd, 772 + unsigned int devid) 773 + { 774 + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IODIR_DID, devid) | 775 + RISCV_IOMMU_CMD_IODIR_DV; 776 + } 777 + 778 + static inline void riscv_iommu_cmd_iodir_set_pid(struct riscv_iommu_command *cmd, 779 + unsigned int pasid) 780 + { 781 + cmd->dword0 |= FIELD_PREP(RISCV_IOMMU_CMD_IODIR_PID, pasid); 782 + } 783 + 784 + #endif /* _RISCV_IOMMU_BITS_H_ */
+120
drivers/iommu/riscv/iommu-pci.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + /* 4 + * Copyright © 2022-2024 Rivos Inc. 5 + * Copyright © 2023 FORTH-ICS/CARV 6 + * 7 + * RISCV IOMMU as a PCIe device 8 + * 9 + * Authors 10 + * Tomasz Jeznach <tjeznach@rivosinc.com> 11 + * Nick Kossifidis <mick@ics.forth.gr> 12 + */ 13 + 14 + #include <linux/compiler.h> 15 + #include <linux/init.h> 16 + #include <linux/iommu.h> 17 + #include <linux/kernel.h> 18 + #include <linux/pci.h> 19 + 20 + #include "iommu-bits.h" 21 + #include "iommu.h" 22 + 23 + /* QEMU RISC-V IOMMU implementation */ 24 + #define PCI_DEVICE_ID_REDHAT_RISCV_IOMMU 0x0014 25 + 26 + /* Rivos Inc. assigned PCI Vendor and Device IDs */ 27 + #ifndef PCI_VENDOR_ID_RIVOS 28 + #define PCI_VENDOR_ID_RIVOS 0x1efd 29 + #endif 30 + 31 + #define PCI_DEVICE_ID_RIVOS_RISCV_IOMMU_GA 0x0008 32 + 33 + static int riscv_iommu_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) 34 + { 35 + struct device *dev = &pdev->dev; 36 + struct riscv_iommu_device *iommu; 37 + int rc, vec; 38 + 39 + rc = pcim_enable_device(pdev); 40 + if (rc) 41 + return rc; 42 + 43 + if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) 44 + return -ENODEV; 45 + 46 + if (pci_resource_len(pdev, 0) < RISCV_IOMMU_REG_SIZE) 47 + return -ENODEV; 48 + 49 + rc = pcim_iomap_regions(pdev, BIT(0), pci_name(pdev)); 50 + if (rc) 51 + return dev_err_probe(dev, rc, "pcim_iomap_regions failed\n"); 52 + 53 + iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL); 54 + if (!iommu) 55 + return -ENOMEM; 56 + 57 + iommu->dev = dev; 58 + iommu->reg = pcim_iomap_table(pdev)[0]; 59 + 60 + pci_set_master(pdev); 61 + dev_set_drvdata(dev, iommu); 62 + 63 + /* Check device reported capabilities / features. */ 64 + iommu->caps = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAPABILITIES); 65 + iommu->fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL); 66 + 67 + /* The PCI driver only uses MSIs, make sure the IOMMU supports this */ 68 + switch (FIELD_GET(RISCV_IOMMU_CAPABILITIES_IGS, iommu->caps)) { 69 + case RISCV_IOMMU_CAPABILITIES_IGS_MSI: 70 + case RISCV_IOMMU_CAPABILITIES_IGS_BOTH: 71 + break; 72 + default: 73 + return dev_err_probe(dev, -ENODEV, 74 + "unable to use message-signaled interrupts\n"); 75 + } 76 + 77 + /* Allocate and assign IRQ vectors for the various events */ 78 + rc = pci_alloc_irq_vectors(pdev, 1, RISCV_IOMMU_INTR_COUNT, 79 + PCI_IRQ_MSIX | PCI_IRQ_MSI); 80 + if (rc <= 0) 81 + return dev_err_probe(dev, -ENODEV, 82 + "unable to allocate irq vectors\n"); 83 + 84 + iommu->irqs_count = rc; 85 + for (vec = 0; vec < iommu->irqs_count; vec++) 86 + iommu->irqs[vec] = msi_get_virq(dev, vec); 87 + 88 + /* Enable message-signaled interrupts, fctl.WSI */ 89 + if (iommu->fctl & RISCV_IOMMU_FCTL_WSI) { 90 + iommu->fctl ^= RISCV_IOMMU_FCTL_WSI; 91 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, iommu->fctl); 92 + } 93 + 94 + return riscv_iommu_init(iommu); 95 + } 96 + 97 + static void riscv_iommu_pci_remove(struct pci_dev *pdev) 98 + { 99 + struct riscv_iommu_device *iommu = dev_get_drvdata(&pdev->dev); 100 + 101 + riscv_iommu_remove(iommu); 102 + } 103 + 104 + static const struct pci_device_id riscv_iommu_pci_tbl[] = { 105 + {PCI_VDEVICE(REDHAT, PCI_DEVICE_ID_REDHAT_RISCV_IOMMU), 0}, 106 + {PCI_VDEVICE(RIVOS, PCI_DEVICE_ID_RIVOS_RISCV_IOMMU_GA), 0}, 107 + {0,} 108 + }; 109 + 110 + static struct pci_driver riscv_iommu_pci_driver = { 111 + .name = KBUILD_MODNAME, 112 + .id_table = riscv_iommu_pci_tbl, 113 + .probe = riscv_iommu_pci_probe, 114 + .remove = riscv_iommu_pci_remove, 115 + .driver = { 116 + .suppress_bind_attrs = true, 117 + }, 118 + }; 119 + 120 + builtin_pci_driver(riscv_iommu_pci_driver);
+92
drivers/iommu/riscv/iommu-platform.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * RISC-V IOMMU as a platform device 4 + * 5 + * Copyright © 2023 FORTH-ICS/CARV 6 + * Copyright © 2023-2024 Rivos Inc. 7 + * 8 + * Authors 9 + * Nick Kossifidis <mick@ics.forth.gr> 10 + * Tomasz Jeznach <tjeznach@rivosinc.com> 11 + */ 12 + 13 + #include <linux/kernel.h> 14 + #include <linux/of_platform.h> 15 + #include <linux/platform_device.h> 16 + 17 + #include "iommu-bits.h" 18 + #include "iommu.h" 19 + 20 + static int riscv_iommu_platform_probe(struct platform_device *pdev) 21 + { 22 + struct device *dev = &pdev->dev; 23 + struct riscv_iommu_device *iommu = NULL; 24 + struct resource *res = NULL; 25 + int vec; 26 + 27 + iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL); 28 + if (!iommu) 29 + return -ENOMEM; 30 + 31 + iommu->dev = dev; 32 + iommu->reg = devm_platform_get_and_ioremap_resource(pdev, 0, &res); 33 + if (IS_ERR(iommu->reg)) 34 + return dev_err_probe(dev, PTR_ERR(iommu->reg), 35 + "could not map register region\n"); 36 + 37 + dev_set_drvdata(dev, iommu); 38 + 39 + /* Check device reported capabilities / features. */ 40 + iommu->caps = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_CAPABILITIES); 41 + iommu->fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL); 42 + 43 + /* For now we only support WSI */ 44 + switch (FIELD_GET(RISCV_IOMMU_CAPABILITIES_IGS, iommu->caps)) { 45 + case RISCV_IOMMU_CAPABILITIES_IGS_WSI: 46 + case RISCV_IOMMU_CAPABILITIES_IGS_BOTH: 47 + break; 48 + default: 49 + return dev_err_probe(dev, -ENODEV, 50 + "unable to use wire-signaled interrupts\n"); 51 + } 52 + 53 + iommu->irqs_count = platform_irq_count(pdev); 54 + if (iommu->irqs_count <= 0) 55 + return dev_err_probe(dev, -ENODEV, 56 + "no IRQ resources provided\n"); 57 + if (iommu->irqs_count > RISCV_IOMMU_INTR_COUNT) 58 + iommu->irqs_count = RISCV_IOMMU_INTR_COUNT; 59 + 60 + for (vec = 0; vec < iommu->irqs_count; vec++) 61 + iommu->irqs[vec] = platform_get_irq(pdev, vec); 62 + 63 + /* Enable wire-signaled interrupts, fctl.WSI */ 64 + if (!(iommu->fctl & RISCV_IOMMU_FCTL_WSI)) { 65 + iommu->fctl |= RISCV_IOMMU_FCTL_WSI; 66 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, iommu->fctl); 67 + } 68 + 69 + return riscv_iommu_init(iommu); 70 + }; 71 + 72 + static void riscv_iommu_platform_remove(struct platform_device *pdev) 73 + { 74 + riscv_iommu_remove(dev_get_drvdata(&pdev->dev)); 75 + }; 76 + 77 + static const struct of_device_id riscv_iommu_of_match[] = { 78 + {.compatible = "riscv,iommu",}, 79 + {}, 80 + }; 81 + 82 + static struct platform_driver riscv_iommu_platform_driver = { 83 + .probe = riscv_iommu_platform_probe, 84 + .remove_new = riscv_iommu_platform_remove, 85 + .driver = { 86 + .name = "riscv,iommu", 87 + .of_match_table = riscv_iommu_of_match, 88 + .suppress_bind_attrs = true, 89 + }, 90 + }; 91 + 92 + builtin_platform_driver(riscv_iommu_platform_driver);
+1661
drivers/iommu/riscv/iommu.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * IOMMU API for RISC-V IOMMU implementations. 4 + * 5 + * Copyright © 2022-2024 Rivos Inc. 6 + * Copyright © 2023 FORTH-ICS/CARV 7 + * 8 + * Authors 9 + * Tomasz Jeznach <tjeznach@rivosinc.com> 10 + * Nick Kossifidis <mick@ics.forth.gr> 11 + */ 12 + 13 + #define pr_fmt(fmt) "riscv-iommu: " fmt 14 + 15 + #include <linux/compiler.h> 16 + #include <linux/crash_dump.h> 17 + #include <linux/init.h> 18 + #include <linux/iommu.h> 19 + #include <linux/iopoll.h> 20 + #include <linux/kernel.h> 21 + #include <linux/pci.h> 22 + 23 + #include "../iommu-pages.h" 24 + #include "iommu-bits.h" 25 + #include "iommu.h" 26 + 27 + /* Timeouts in [us] */ 28 + #define RISCV_IOMMU_QCSR_TIMEOUT 150000 29 + #define RISCV_IOMMU_QUEUE_TIMEOUT 150000 30 + #define RISCV_IOMMU_DDTP_TIMEOUT 10000000 31 + #define RISCV_IOMMU_IOTINVAL_TIMEOUT 90000000 32 + 33 + /* Number of entries per CMD/FLT queue, should be <= INT_MAX */ 34 + #define RISCV_IOMMU_DEF_CQ_COUNT 8192 35 + #define RISCV_IOMMU_DEF_FQ_COUNT 4096 36 + 37 + /* RISC-V IOMMU PPN <> PHYS address conversions, PHYS <=> PPN[53:10] */ 38 + #define phys_to_ppn(pa) (((pa) >> 2) & (((1ULL << 44) - 1) << 10)) 39 + #define ppn_to_phys(pn) (((pn) << 2) & (((1ULL << 44) - 1) << 12)) 40 + 41 + #define dev_to_iommu(dev) \ 42 + iommu_get_iommu_dev(dev, struct riscv_iommu_device, iommu) 43 + 44 + /* IOMMU PSCID allocation namespace. */ 45 + static DEFINE_IDA(riscv_iommu_pscids); 46 + #define RISCV_IOMMU_MAX_PSCID (BIT(20) - 1) 47 + 48 + /* Device resource-managed allocations */ 49 + struct riscv_iommu_devres { 50 + void *addr; 51 + int order; 52 + }; 53 + 54 + static void riscv_iommu_devres_pages_release(struct device *dev, void *res) 55 + { 56 + struct riscv_iommu_devres *devres = res; 57 + 58 + iommu_free_pages(devres->addr, devres->order); 59 + } 60 + 61 + static int riscv_iommu_devres_pages_match(struct device *dev, void *res, void *p) 62 + { 63 + struct riscv_iommu_devres *devres = res; 64 + struct riscv_iommu_devres *target = p; 65 + 66 + return devres->addr == target->addr; 67 + } 68 + 69 + static void *riscv_iommu_get_pages(struct riscv_iommu_device *iommu, int order) 70 + { 71 + struct riscv_iommu_devres *devres; 72 + void *addr; 73 + 74 + addr = iommu_alloc_pages_node(dev_to_node(iommu->dev), 75 + GFP_KERNEL_ACCOUNT, order); 76 + if (unlikely(!addr)) 77 + return NULL; 78 + 79 + devres = devres_alloc(riscv_iommu_devres_pages_release, 80 + sizeof(struct riscv_iommu_devres), GFP_KERNEL); 81 + 82 + if (unlikely(!devres)) { 83 + iommu_free_pages(addr, order); 84 + return NULL; 85 + } 86 + 87 + devres->addr = addr; 88 + devres->order = order; 89 + 90 + devres_add(iommu->dev, devres); 91 + 92 + return addr; 93 + } 94 + 95 + static void riscv_iommu_free_pages(struct riscv_iommu_device *iommu, void *addr) 96 + { 97 + struct riscv_iommu_devres devres = { .addr = addr }; 98 + 99 + devres_release(iommu->dev, riscv_iommu_devres_pages_release, 100 + riscv_iommu_devres_pages_match, &devres); 101 + } 102 + 103 + /* 104 + * Hardware queue allocation and management. 105 + */ 106 + 107 + /* Setup queue base, control registers and default queue length */ 108 + #define RISCV_IOMMU_QUEUE_INIT(q, name) do { \ 109 + struct riscv_iommu_queue *_q = q; \ 110 + _q->qid = RISCV_IOMMU_INTR_ ## name; \ 111 + _q->qbr = RISCV_IOMMU_REG_ ## name ## B; \ 112 + _q->qcr = RISCV_IOMMU_REG_ ## name ## CSR; \ 113 + _q->mask = _q->mask ?: (RISCV_IOMMU_DEF_ ## name ## _COUNT) - 1;\ 114 + } while (0) 115 + 116 + /* Note: offsets are the same for all queues */ 117 + #define Q_HEAD(q) ((q)->qbr + (RISCV_IOMMU_REG_CQH - RISCV_IOMMU_REG_CQB)) 118 + #define Q_TAIL(q) ((q)->qbr + (RISCV_IOMMU_REG_CQT - RISCV_IOMMU_REG_CQB)) 119 + #define Q_ITEM(q, index) ((q)->mask & (index)) 120 + #define Q_IPSR(q) BIT((q)->qid) 121 + 122 + /* 123 + * Discover queue ring buffer hardware configuration, allocate in-memory 124 + * ring buffer or use fixed I/O memory location, configure queue base register. 125 + * Must be called before hardware queue is enabled. 126 + * 127 + * @queue - data structure, configured with RISCV_IOMMU_QUEUE_INIT() 128 + * @entry_size - queue single element size in bytes. 129 + */ 130 + static int riscv_iommu_queue_alloc(struct riscv_iommu_device *iommu, 131 + struct riscv_iommu_queue *queue, 132 + size_t entry_size) 133 + { 134 + unsigned int logsz; 135 + u64 qb, rb; 136 + 137 + /* 138 + * Use WARL base register property to discover maximum allowed 139 + * number of entries and optional fixed IO address for queue location. 140 + */ 141 + riscv_iommu_writeq(iommu, queue->qbr, RISCV_IOMMU_QUEUE_LOG2SZ_FIELD); 142 + qb = riscv_iommu_readq(iommu, queue->qbr); 143 + 144 + /* 145 + * Calculate and verify hardware supported queue length, as reported 146 + * by the field LOG2SZ, where max queue length is equal to 2^(LOG2SZ + 1). 147 + * Update queue size based on hardware supported value. 148 + */ 149 + logsz = ilog2(queue->mask); 150 + if (logsz > FIELD_GET(RISCV_IOMMU_QUEUE_LOG2SZ_FIELD, qb)) 151 + logsz = FIELD_GET(RISCV_IOMMU_QUEUE_LOG2SZ_FIELD, qb); 152 + 153 + /* 154 + * Use WARL base register property to discover an optional fixed IO 155 + * address for queue ring buffer location. Otherwise allocate contiguous 156 + * system memory. 157 + */ 158 + if (FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb)) { 159 + const size_t queue_size = entry_size << (logsz + 1); 160 + 161 + queue->phys = pfn_to_phys(FIELD_GET(RISCV_IOMMU_PPN_FIELD, qb)); 162 + queue->base = devm_ioremap(iommu->dev, queue->phys, queue_size); 163 + } else { 164 + do { 165 + const size_t queue_size = entry_size << (logsz + 1); 166 + const int order = get_order(queue_size); 167 + 168 + queue->base = riscv_iommu_get_pages(iommu, order); 169 + queue->phys = __pa(queue->base); 170 + } while (!queue->base && logsz-- > 0); 171 + } 172 + 173 + if (!queue->base) 174 + return -ENOMEM; 175 + 176 + qb = phys_to_ppn(queue->phys) | 177 + FIELD_PREP(RISCV_IOMMU_QUEUE_LOG2SZ_FIELD, logsz); 178 + 179 + /* Update base register and read back to verify hw accepted our write */ 180 + riscv_iommu_writeq(iommu, queue->qbr, qb); 181 + rb = riscv_iommu_readq(iommu, queue->qbr); 182 + if (rb != qb) { 183 + dev_err(iommu->dev, "queue #%u allocation failed\n", queue->qid); 184 + return -ENODEV; 185 + } 186 + 187 + /* Update actual queue mask */ 188 + queue->mask = (2U << logsz) - 1; 189 + 190 + dev_dbg(iommu->dev, "queue #%u allocated 2^%u entries", 191 + queue->qid, logsz + 1); 192 + 193 + return 0; 194 + } 195 + 196 + /* Check interrupt queue status, IPSR */ 197 + static irqreturn_t riscv_iommu_queue_ipsr(int irq, void *data) 198 + { 199 + struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data; 200 + 201 + if (riscv_iommu_readl(queue->iommu, RISCV_IOMMU_REG_IPSR) & Q_IPSR(queue)) 202 + return IRQ_WAKE_THREAD; 203 + 204 + return IRQ_NONE; 205 + } 206 + 207 + static int riscv_iommu_queue_vec(struct riscv_iommu_device *iommu, int n) 208 + { 209 + /* Reuse ICVEC.CIV mask for all interrupt vectors mapping. */ 210 + return (iommu->icvec >> (n * 4)) & RISCV_IOMMU_ICVEC_CIV; 211 + } 212 + 213 + /* 214 + * Enable queue processing in the hardware, register interrupt handler. 215 + * 216 + * @queue - data structure, already allocated with riscv_iommu_queue_alloc() 217 + * @irq_handler - threaded interrupt handler. 218 + */ 219 + static int riscv_iommu_queue_enable(struct riscv_iommu_device *iommu, 220 + struct riscv_iommu_queue *queue, 221 + irq_handler_t irq_handler) 222 + { 223 + const unsigned int irq = iommu->irqs[riscv_iommu_queue_vec(iommu, queue->qid)]; 224 + u32 csr; 225 + int rc; 226 + 227 + if (queue->iommu) 228 + return -EBUSY; 229 + 230 + /* Polling not implemented */ 231 + if (!irq) 232 + return -ENODEV; 233 + 234 + queue->iommu = iommu; 235 + rc = request_threaded_irq(irq, riscv_iommu_queue_ipsr, irq_handler, 236 + IRQF_ONESHOT | IRQF_SHARED, 237 + dev_name(iommu->dev), queue); 238 + if (rc) { 239 + queue->iommu = NULL; 240 + return rc; 241 + } 242 + 243 + /* 244 + * Enable queue with interrupts, clear any memory fault if any. 245 + * Wait for the hardware to acknowledge request and activate queue 246 + * processing. 247 + * Note: All CSR bitfields are in the same offsets for all queues. 248 + */ 249 + riscv_iommu_writel(iommu, queue->qcr, 250 + RISCV_IOMMU_QUEUE_ENABLE | 251 + RISCV_IOMMU_QUEUE_INTR_ENABLE | 252 + RISCV_IOMMU_QUEUE_MEM_FAULT); 253 + 254 + riscv_iommu_readl_timeout(iommu, queue->qcr, 255 + csr, !(csr & RISCV_IOMMU_QUEUE_BUSY), 256 + 10, RISCV_IOMMU_QCSR_TIMEOUT); 257 + 258 + if (RISCV_IOMMU_QUEUE_ACTIVE != (csr & (RISCV_IOMMU_QUEUE_ACTIVE | 259 + RISCV_IOMMU_QUEUE_BUSY | 260 + RISCV_IOMMU_QUEUE_MEM_FAULT))) { 261 + /* Best effort to stop and disable failing hardware queue. */ 262 + riscv_iommu_writel(iommu, queue->qcr, 0); 263 + free_irq(irq, queue); 264 + queue->iommu = NULL; 265 + dev_err(iommu->dev, "queue #%u failed to start\n", queue->qid); 266 + return -EBUSY; 267 + } 268 + 269 + /* Clear any pending interrupt flag. */ 270 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, Q_IPSR(queue)); 271 + 272 + return 0; 273 + } 274 + 275 + /* 276 + * Disable queue. Wait for the hardware to acknowledge request and 277 + * stop processing enqueued requests. Report errors but continue. 278 + */ 279 + static void riscv_iommu_queue_disable(struct riscv_iommu_queue *queue) 280 + { 281 + struct riscv_iommu_device *iommu = queue->iommu; 282 + u32 csr; 283 + 284 + if (!iommu) 285 + return; 286 + 287 + free_irq(iommu->irqs[riscv_iommu_queue_vec(iommu, queue->qid)], queue); 288 + riscv_iommu_writel(iommu, queue->qcr, 0); 289 + riscv_iommu_readl_timeout(iommu, queue->qcr, 290 + csr, !(csr & RISCV_IOMMU_QUEUE_BUSY), 291 + 10, RISCV_IOMMU_QCSR_TIMEOUT); 292 + 293 + if (csr & (RISCV_IOMMU_QUEUE_ACTIVE | RISCV_IOMMU_QUEUE_BUSY)) 294 + dev_err(iommu->dev, "fail to disable hardware queue #%u, csr 0x%x\n", 295 + queue->qid, csr); 296 + 297 + queue->iommu = NULL; 298 + } 299 + 300 + /* 301 + * Returns number of available valid queue entries and the first item index. 302 + * Update shadow producer index if necessary. 303 + */ 304 + static int riscv_iommu_queue_consume(struct riscv_iommu_queue *queue, 305 + unsigned int *index) 306 + { 307 + unsigned int head = atomic_read(&queue->head); 308 + unsigned int tail = atomic_read(&queue->tail); 309 + unsigned int last = Q_ITEM(queue, tail); 310 + int available = (int)(tail - head); 311 + 312 + *index = head; 313 + 314 + if (available > 0) 315 + return available; 316 + 317 + /* read hardware producer index, check reserved register bits are not set. */ 318 + if (riscv_iommu_readl_timeout(queue->iommu, Q_TAIL(queue), 319 + tail, (tail & ~queue->mask) == 0, 320 + 0, RISCV_IOMMU_QUEUE_TIMEOUT)) { 321 + dev_err_once(queue->iommu->dev, 322 + "Hardware error: queue access timeout\n"); 323 + return 0; 324 + } 325 + 326 + if (tail == last) 327 + return 0; 328 + 329 + /* update shadow producer index */ 330 + return (int)(atomic_add_return((tail - last) & queue->mask, &queue->tail) - head); 331 + } 332 + 333 + /* 334 + * Release processed queue entries, should match riscv_iommu_queue_consume() calls. 335 + */ 336 + static void riscv_iommu_queue_release(struct riscv_iommu_queue *queue, int count) 337 + { 338 + const unsigned int head = atomic_add_return(count, &queue->head); 339 + 340 + riscv_iommu_writel(queue->iommu, Q_HEAD(queue), Q_ITEM(queue, head)); 341 + } 342 + 343 + /* Return actual consumer index based on hardware reported queue head index. */ 344 + static unsigned int riscv_iommu_queue_cons(struct riscv_iommu_queue *queue) 345 + { 346 + const unsigned int cons = atomic_read(&queue->head); 347 + const unsigned int last = Q_ITEM(queue, cons); 348 + unsigned int head; 349 + 350 + if (riscv_iommu_readl_timeout(queue->iommu, Q_HEAD(queue), head, 351 + !(head & ~queue->mask), 352 + 0, RISCV_IOMMU_QUEUE_TIMEOUT)) 353 + return cons; 354 + 355 + return cons + ((head - last) & queue->mask); 356 + } 357 + 358 + /* Wait for submitted item to be processed. */ 359 + static int riscv_iommu_queue_wait(struct riscv_iommu_queue *queue, 360 + unsigned int index, 361 + unsigned int timeout_us) 362 + { 363 + unsigned int cons = atomic_read(&queue->head); 364 + 365 + /* Already processed by the consumer */ 366 + if ((int)(cons - index) > 0) 367 + return 0; 368 + 369 + /* Monitor consumer index */ 370 + return readx_poll_timeout(riscv_iommu_queue_cons, queue, cons, 371 + (int)(cons - index) > 0, 0, timeout_us); 372 + } 373 + 374 + /* Enqueue an entry and wait to be processed if timeout_us > 0 375 + * 376 + * Error handling for IOMMU hardware not responding in reasonable time 377 + * will be added as separate patch series along with other RAS features. 378 + * For now, only report hardware failure and continue. 379 + */ 380 + static unsigned int riscv_iommu_queue_send(struct riscv_iommu_queue *queue, 381 + void *entry, size_t entry_size) 382 + { 383 + unsigned int prod; 384 + unsigned int head; 385 + unsigned int tail; 386 + unsigned long flags; 387 + 388 + /* Do not preempt submission flow. */ 389 + local_irq_save(flags); 390 + 391 + /* 1. Allocate some space in the queue */ 392 + prod = atomic_inc_return(&queue->prod) - 1; 393 + head = atomic_read(&queue->head); 394 + 395 + /* 2. Wait for space availability. */ 396 + if ((prod - head) > queue->mask) { 397 + if (readx_poll_timeout(atomic_read, &queue->head, 398 + head, (prod - head) < queue->mask, 399 + 0, RISCV_IOMMU_QUEUE_TIMEOUT)) 400 + goto err_busy; 401 + } else if ((prod - head) == queue->mask) { 402 + const unsigned int last = Q_ITEM(queue, head); 403 + 404 + if (riscv_iommu_readl_timeout(queue->iommu, Q_HEAD(queue), head, 405 + !(head & ~queue->mask) && head != last, 406 + 0, RISCV_IOMMU_QUEUE_TIMEOUT)) 407 + goto err_busy; 408 + atomic_add((head - last) & queue->mask, &queue->head); 409 + } 410 + 411 + /* 3. Store entry in the ring buffer */ 412 + memcpy(queue->base + Q_ITEM(queue, prod) * entry_size, entry, entry_size); 413 + 414 + /* 4. Wait for all previous entries to be ready */ 415 + if (readx_poll_timeout(atomic_read, &queue->tail, tail, prod == tail, 416 + 0, RISCV_IOMMU_QUEUE_TIMEOUT)) 417 + goto err_busy; 418 + 419 + /* 420 + * 5. Make sure the ring buffer update (whether in normal or I/O memory) is 421 + * completed and visible before signaling the tail doorbell to fetch 422 + * the next command. 'fence ow, ow' 423 + */ 424 + dma_wmb(); 425 + riscv_iommu_writel(queue->iommu, Q_TAIL(queue), Q_ITEM(queue, prod + 1)); 426 + 427 + /* 428 + * 6. Make sure the doorbell write to the device has finished before updating 429 + * the shadow tail index in normal memory. 'fence o, w' 430 + */ 431 + mmiowb(); 432 + atomic_inc(&queue->tail); 433 + 434 + /* 7. Complete submission and restore local interrupts */ 435 + local_irq_restore(flags); 436 + 437 + return prod; 438 + 439 + err_busy: 440 + local_irq_restore(flags); 441 + dev_err_once(queue->iommu->dev, "Hardware error: command enqueue failed\n"); 442 + 443 + return prod; 444 + } 445 + 446 + /* 447 + * IOMMU Command queue chapter 3.1 448 + */ 449 + 450 + /* Command queue interrupt handler thread function */ 451 + static irqreturn_t riscv_iommu_cmdq_process(int irq, void *data) 452 + { 453 + const struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data; 454 + unsigned int ctrl; 455 + 456 + /* Clear MF/CQ errors, complete error recovery to be implemented. */ 457 + ctrl = riscv_iommu_readl(queue->iommu, queue->qcr); 458 + if (ctrl & (RISCV_IOMMU_CQCSR_CQMF | RISCV_IOMMU_CQCSR_CMD_TO | 459 + RISCV_IOMMU_CQCSR_CMD_ILL | RISCV_IOMMU_CQCSR_FENCE_W_IP)) { 460 + riscv_iommu_writel(queue->iommu, queue->qcr, ctrl); 461 + dev_warn(queue->iommu->dev, 462 + "Queue #%u error; fault:%d timeout:%d illegal:%d fence_w_ip:%d\n", 463 + queue->qid, 464 + !!(ctrl & RISCV_IOMMU_CQCSR_CQMF), 465 + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_TO), 466 + !!(ctrl & RISCV_IOMMU_CQCSR_CMD_ILL), 467 + !!(ctrl & RISCV_IOMMU_CQCSR_FENCE_W_IP)); 468 + } 469 + 470 + /* Placeholder for command queue interrupt notifiers */ 471 + 472 + /* Clear command interrupt pending. */ 473 + riscv_iommu_writel(queue->iommu, RISCV_IOMMU_REG_IPSR, Q_IPSR(queue)); 474 + 475 + return IRQ_HANDLED; 476 + } 477 + 478 + /* Send command to the IOMMU command queue */ 479 + static void riscv_iommu_cmd_send(struct riscv_iommu_device *iommu, 480 + struct riscv_iommu_command *cmd) 481 + { 482 + riscv_iommu_queue_send(&iommu->cmdq, cmd, sizeof(*cmd)); 483 + } 484 + 485 + /* Send IOFENCE.C command and wait for all scheduled commands to complete. */ 486 + static void riscv_iommu_cmd_sync(struct riscv_iommu_device *iommu, 487 + unsigned int timeout_us) 488 + { 489 + struct riscv_iommu_command cmd; 490 + unsigned int prod; 491 + 492 + riscv_iommu_cmd_iofence(&cmd); 493 + prod = riscv_iommu_queue_send(&iommu->cmdq, &cmd, sizeof(cmd)); 494 + 495 + if (!timeout_us) 496 + return; 497 + 498 + if (riscv_iommu_queue_wait(&iommu->cmdq, prod, timeout_us)) 499 + dev_err_once(iommu->dev, 500 + "Hardware error: command execution timeout\n"); 501 + } 502 + 503 + /* 504 + * IOMMU Fault/Event queue chapter 3.2 505 + */ 506 + 507 + static void riscv_iommu_fault(struct riscv_iommu_device *iommu, 508 + struct riscv_iommu_fq_record *event) 509 + { 510 + unsigned int err = FIELD_GET(RISCV_IOMMU_FQ_HDR_CAUSE, event->hdr); 511 + unsigned int devid = FIELD_GET(RISCV_IOMMU_FQ_HDR_DID, event->hdr); 512 + 513 + /* Placeholder for future fault handling implementation, report only. */ 514 + if (err) 515 + dev_warn_ratelimited(iommu->dev, 516 + "Fault %d devid: 0x%x iotval: %llx iotval2: %llx\n", 517 + err, devid, event->iotval, event->iotval2); 518 + } 519 + 520 + /* Fault queue interrupt handler thread function */ 521 + static irqreturn_t riscv_iommu_fltq_process(int irq, void *data) 522 + { 523 + struct riscv_iommu_queue *queue = (struct riscv_iommu_queue *)data; 524 + struct riscv_iommu_device *iommu = queue->iommu; 525 + struct riscv_iommu_fq_record *events; 526 + unsigned int ctrl, idx; 527 + int cnt, len; 528 + 529 + events = (struct riscv_iommu_fq_record *)queue->base; 530 + 531 + /* Clear fault interrupt pending and process all received fault events. */ 532 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_IPSR, Q_IPSR(queue)); 533 + 534 + do { 535 + cnt = riscv_iommu_queue_consume(queue, &idx); 536 + for (len = 0; len < cnt; idx++, len++) 537 + riscv_iommu_fault(iommu, &events[Q_ITEM(queue, idx)]); 538 + riscv_iommu_queue_release(queue, cnt); 539 + } while (cnt > 0); 540 + 541 + /* Clear MF/OF errors, complete error recovery to be implemented. */ 542 + ctrl = riscv_iommu_readl(iommu, queue->qcr); 543 + if (ctrl & (RISCV_IOMMU_FQCSR_FQMF | RISCV_IOMMU_FQCSR_FQOF)) { 544 + riscv_iommu_writel(iommu, queue->qcr, ctrl); 545 + dev_warn(iommu->dev, 546 + "Queue #%u error; memory fault:%d overflow:%d\n", 547 + queue->qid, 548 + !!(ctrl & RISCV_IOMMU_FQCSR_FQMF), 549 + !!(ctrl & RISCV_IOMMU_FQCSR_FQOF)); 550 + } 551 + 552 + return IRQ_HANDLED; 553 + } 554 + 555 + /* Lookup and initialize device context info structure. */ 556 + static struct riscv_iommu_dc *riscv_iommu_get_dc(struct riscv_iommu_device *iommu, 557 + unsigned int devid) 558 + { 559 + const bool base_format = !(iommu->caps & RISCV_IOMMU_CAPABILITIES_MSI_FLAT); 560 + unsigned int depth; 561 + unsigned long ddt, old, new; 562 + void *ptr; 563 + u8 ddi_bits[3] = { 0 }; 564 + u64 *ddtp = NULL; 565 + 566 + /* Make sure the mode is valid */ 567 + if (iommu->ddt_mode < RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL || 568 + iommu->ddt_mode > RISCV_IOMMU_DDTP_IOMMU_MODE_3LVL) 569 + return NULL; 570 + 571 + /* 572 + * Device id partitioning for base format: 573 + * DDI[0]: bits 0 - 6 (1st level) (7 bits) 574 + * DDI[1]: bits 7 - 15 (2nd level) (9 bits) 575 + * DDI[2]: bits 16 - 23 (3rd level) (8 bits) 576 + * 577 + * For extended format: 578 + * DDI[0]: bits 0 - 5 (1st level) (6 bits) 579 + * DDI[1]: bits 6 - 14 (2nd level) (9 bits) 580 + * DDI[2]: bits 15 - 23 (3rd level) (9 bits) 581 + */ 582 + if (base_format) { 583 + ddi_bits[0] = 7; 584 + ddi_bits[1] = 7 + 9; 585 + ddi_bits[2] = 7 + 9 + 8; 586 + } else { 587 + ddi_bits[0] = 6; 588 + ddi_bits[1] = 6 + 9; 589 + ddi_bits[2] = 6 + 9 + 9; 590 + } 591 + 592 + /* Make sure device id is within range */ 593 + depth = iommu->ddt_mode - RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL; 594 + if (devid >= (1 << ddi_bits[depth])) 595 + return NULL; 596 + 597 + /* Get to the level of the non-leaf node that holds the device context */ 598 + for (ddtp = iommu->ddt_root; depth-- > 0;) { 599 + const int split = ddi_bits[depth]; 600 + /* 601 + * Each non-leaf node is 64bits wide and on each level 602 + * nodes are indexed by DDI[depth]. 603 + */ 604 + ddtp += (devid >> split) & 0x1FF; 605 + 606 + /* 607 + * Check if this node has been populated and if not 608 + * allocate a new level and populate it. 609 + */ 610 + do { 611 + ddt = READ_ONCE(*(unsigned long *)ddtp); 612 + if (ddt & RISCV_IOMMU_DDTE_V) { 613 + ddtp = __va(ppn_to_phys(ddt)); 614 + break; 615 + } 616 + 617 + ptr = riscv_iommu_get_pages(iommu, 0); 618 + if (!ptr) 619 + return NULL; 620 + 621 + new = phys_to_ppn(__pa(ptr)) | RISCV_IOMMU_DDTE_V; 622 + old = cmpxchg_relaxed((unsigned long *)ddtp, ddt, new); 623 + 624 + if (old == ddt) { 625 + ddtp = (u64 *)ptr; 626 + break; 627 + } 628 + 629 + /* Race setting DDT detected, re-read and retry. */ 630 + riscv_iommu_free_pages(iommu, ptr); 631 + } while (1); 632 + } 633 + 634 + /* 635 + * Grab the node that matches DDI[depth], note that when using base 636 + * format the device context is 4 * 64bits, and the extended format 637 + * is 8 * 64bits, hence the (3 - base_format) below. 638 + */ 639 + ddtp += (devid & ((64 << base_format) - 1)) << (3 - base_format); 640 + 641 + return (struct riscv_iommu_dc *)ddtp; 642 + } 643 + 644 + /* 645 + * This is best effort IOMMU translation shutdown flow. 646 + * Disable IOMMU without waiting for hardware response. 647 + */ 648 + static void riscv_iommu_disable(struct riscv_iommu_device *iommu) 649 + { 650 + riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, 0); 651 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_CQCSR, 0); 652 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FQCSR, 0); 653 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_PQCSR, 0); 654 + } 655 + 656 + #define riscv_iommu_read_ddtp(iommu) ({ \ 657 + u64 ddtp; \ 658 + riscv_iommu_readq_timeout((iommu), RISCV_IOMMU_REG_DDTP, ddtp, \ 659 + !(ddtp & RISCV_IOMMU_DDTP_BUSY), 10, \ 660 + RISCV_IOMMU_DDTP_TIMEOUT); \ 661 + ddtp; }) 662 + 663 + static int riscv_iommu_iodir_alloc(struct riscv_iommu_device *iommu) 664 + { 665 + u64 ddtp; 666 + unsigned int mode; 667 + 668 + ddtp = riscv_iommu_read_ddtp(iommu); 669 + if (ddtp & RISCV_IOMMU_DDTP_BUSY) 670 + return -EBUSY; 671 + 672 + /* 673 + * It is optional for the hardware to report a fixed address for device 674 + * directory root page when DDT.MODE is OFF or BARE. 675 + */ 676 + mode = FIELD_GET(RISCV_IOMMU_DDTP_IOMMU_MODE, ddtp); 677 + if (mode == RISCV_IOMMU_DDTP_IOMMU_MODE_BARE || 678 + mode == RISCV_IOMMU_DDTP_IOMMU_MODE_OFF) { 679 + /* Use WARL to discover hardware fixed DDT PPN */ 680 + riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, 681 + FIELD_PREP(RISCV_IOMMU_DDTP_IOMMU_MODE, mode)); 682 + ddtp = riscv_iommu_read_ddtp(iommu); 683 + if (ddtp & RISCV_IOMMU_DDTP_BUSY) 684 + return -EBUSY; 685 + 686 + iommu->ddt_phys = ppn_to_phys(ddtp); 687 + if (iommu->ddt_phys) 688 + iommu->ddt_root = devm_ioremap(iommu->dev, 689 + iommu->ddt_phys, PAGE_SIZE); 690 + if (iommu->ddt_root) 691 + memset(iommu->ddt_root, 0, PAGE_SIZE); 692 + } 693 + 694 + if (!iommu->ddt_root) { 695 + iommu->ddt_root = riscv_iommu_get_pages(iommu, 0); 696 + iommu->ddt_phys = __pa(iommu->ddt_root); 697 + } 698 + 699 + if (!iommu->ddt_root) 700 + return -ENOMEM; 701 + 702 + return 0; 703 + } 704 + 705 + /* 706 + * Discover supported DDT modes starting from requested value, 707 + * configure DDTP register with accepted mode and root DDT address. 708 + * Accepted iommu->ddt_mode is updated on success. 709 + */ 710 + static int riscv_iommu_iodir_set_mode(struct riscv_iommu_device *iommu, 711 + unsigned int ddtp_mode) 712 + { 713 + struct device *dev = iommu->dev; 714 + u64 ddtp, rq_ddtp; 715 + unsigned int mode, rq_mode = ddtp_mode; 716 + struct riscv_iommu_command cmd; 717 + 718 + ddtp = riscv_iommu_read_ddtp(iommu); 719 + if (ddtp & RISCV_IOMMU_DDTP_BUSY) 720 + return -EBUSY; 721 + 722 + /* Disallow state transition from xLVL to xLVL. */ 723 + mode = FIELD_GET(RISCV_IOMMU_DDTP_IOMMU_MODE, ddtp); 724 + if (mode != RISCV_IOMMU_DDTP_IOMMU_MODE_BARE && 725 + mode != RISCV_IOMMU_DDTP_IOMMU_MODE_OFF && 726 + rq_mode != RISCV_IOMMU_DDTP_IOMMU_MODE_BARE && 727 + rq_mode != RISCV_IOMMU_DDTP_IOMMU_MODE_OFF) 728 + return -EINVAL; 729 + 730 + do { 731 + rq_ddtp = FIELD_PREP(RISCV_IOMMU_DDTP_IOMMU_MODE, rq_mode); 732 + if (rq_mode > RISCV_IOMMU_DDTP_IOMMU_MODE_BARE) 733 + rq_ddtp |= phys_to_ppn(iommu->ddt_phys); 734 + 735 + riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_DDTP, rq_ddtp); 736 + ddtp = riscv_iommu_read_ddtp(iommu); 737 + if (ddtp & RISCV_IOMMU_DDTP_BUSY) { 738 + dev_err(dev, "timeout when setting ddtp (ddt mode: %u, read: %llx)\n", 739 + rq_mode, ddtp); 740 + return -EBUSY; 741 + } 742 + 743 + /* Verify IOMMU hardware accepts new DDTP config. */ 744 + mode = FIELD_GET(RISCV_IOMMU_DDTP_IOMMU_MODE, ddtp); 745 + 746 + if (rq_mode == mode) 747 + break; 748 + 749 + /* Hardware mandatory DDTP mode has not been accepted. */ 750 + if (rq_mode < RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL && rq_ddtp != ddtp) { 751 + dev_err(dev, "DDTP update failed hw: %llx vs %llx\n", 752 + ddtp, rq_ddtp); 753 + return -EINVAL; 754 + } 755 + 756 + /* 757 + * Mode field is WARL, an IOMMU may support a subset of 758 + * directory table levels in which case if we tried to set 759 + * an unsupported number of levels we'll readback either 760 + * a valid xLVL or off/bare. If we got off/bare, try again 761 + * with a smaller xLVL. 762 + */ 763 + if (mode < RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL && 764 + rq_mode > RISCV_IOMMU_DDTP_IOMMU_MODE_1LVL) { 765 + dev_dbg(dev, "DDTP hw mode %u vs %u\n", mode, rq_mode); 766 + rq_mode--; 767 + continue; 768 + } 769 + 770 + /* 771 + * We tried all supported modes and IOMMU hardware failed to 772 + * accept new settings, something went very wrong since off/bare 773 + * and at least one xLVL must be supported. 774 + */ 775 + dev_err(dev, "DDTP hw mode %u, failed to set %u\n", 776 + mode, ddtp_mode); 777 + return -EINVAL; 778 + } while (1); 779 + 780 + iommu->ddt_mode = mode; 781 + if (mode != ddtp_mode) 782 + dev_dbg(dev, "DDTP hw mode %u, requested %u\n", mode, ddtp_mode); 783 + 784 + /* Invalidate device context cache */ 785 + riscv_iommu_cmd_iodir_inval_ddt(&cmd); 786 + riscv_iommu_cmd_send(iommu, &cmd); 787 + 788 + /* Invalidate address translation cache */ 789 + riscv_iommu_cmd_inval_vma(&cmd); 790 + riscv_iommu_cmd_send(iommu, &cmd); 791 + 792 + /* IOFENCE.C */ 793 + riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT); 794 + 795 + return 0; 796 + } 797 + 798 + /* This struct contains protection domain specific IOMMU driver data. */ 799 + struct riscv_iommu_domain { 800 + struct iommu_domain domain; 801 + struct list_head bonds; 802 + spinlock_t lock; /* protect bonds list updates. */ 803 + int pscid; 804 + bool amo_enabled; 805 + int numa_node; 806 + unsigned int pgd_mode; 807 + unsigned long *pgd_root; 808 + }; 809 + 810 + #define iommu_domain_to_riscv(iommu_domain) \ 811 + container_of(iommu_domain, struct riscv_iommu_domain, domain) 812 + 813 + /* Private IOMMU data for managed devices, dev_iommu_priv_* */ 814 + struct riscv_iommu_info { 815 + struct riscv_iommu_domain *domain; 816 + }; 817 + 818 + /* 819 + * Linkage between an iommu_domain and attached devices. 820 + * 821 + * Protection domain requiring IOATC and DevATC translation cache invalidations, 822 + * should be linked to attached devices using a riscv_iommu_bond structure. 823 + * Devices should be linked to the domain before first use and unlinked after 824 + * the translations from the referenced protection domain can no longer be used. 825 + * Blocking and identity domains are not tracked here, as the IOMMU hardware 826 + * does not cache negative and/or identity (BARE mode) translations, and DevATC 827 + * is disabled for those protection domains. 828 + * 829 + * The device pointer and IOMMU data remain stable in the bond struct after 830 + * _probe_device() where it's attached to the managed IOMMU, up to the 831 + * completion of the _release_device() call. The release of the bond structure 832 + * is synchronized with the device release. 833 + */ 834 + struct riscv_iommu_bond { 835 + struct list_head list; 836 + struct rcu_head rcu; 837 + struct device *dev; 838 + }; 839 + 840 + static int riscv_iommu_bond_link(struct riscv_iommu_domain *domain, 841 + struct device *dev) 842 + { 843 + struct riscv_iommu_device *iommu = dev_to_iommu(dev); 844 + struct riscv_iommu_bond *bond; 845 + struct list_head *bonds; 846 + 847 + bond = kzalloc(sizeof(*bond), GFP_KERNEL); 848 + if (!bond) 849 + return -ENOMEM; 850 + bond->dev = dev; 851 + 852 + /* 853 + * List of devices attached to the domain is arranged based on 854 + * managed IOMMU device. 855 + */ 856 + 857 + spin_lock(&domain->lock); 858 + list_for_each(bonds, &domain->bonds) 859 + if (dev_to_iommu(list_entry(bonds, struct riscv_iommu_bond, list)->dev) == iommu) 860 + break; 861 + list_add_rcu(&bond->list, bonds); 862 + spin_unlock(&domain->lock); 863 + 864 + /* Synchronize with riscv_iommu_iotlb_inval() sequence. See comment below. */ 865 + smp_mb(); 866 + 867 + return 0; 868 + } 869 + 870 + static void riscv_iommu_bond_unlink(struct riscv_iommu_domain *domain, 871 + struct device *dev) 872 + { 873 + struct riscv_iommu_device *iommu = dev_to_iommu(dev); 874 + struct riscv_iommu_bond *bond, *found = NULL; 875 + struct riscv_iommu_command cmd; 876 + int count = 0; 877 + 878 + if (!domain) 879 + return; 880 + 881 + spin_lock(&domain->lock); 882 + list_for_each_entry(bond, &domain->bonds, list) { 883 + if (found && count) 884 + break; 885 + else if (bond->dev == dev) 886 + found = bond; 887 + else if (dev_to_iommu(bond->dev) == iommu) 888 + count++; 889 + } 890 + if (found) 891 + list_del_rcu(&found->list); 892 + spin_unlock(&domain->lock); 893 + kfree_rcu(found, rcu); 894 + 895 + /* 896 + * If this was the last bond between this domain and the IOMMU 897 + * invalidate all cached entries for domain's PSCID. 898 + */ 899 + if (!count) { 900 + riscv_iommu_cmd_inval_vma(&cmd); 901 + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid); 902 + riscv_iommu_cmd_send(iommu, &cmd); 903 + 904 + riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT); 905 + } 906 + } 907 + 908 + /* 909 + * Send IOTLB.INVAL for whole address space for ranges larger than 2MB. 910 + * This limit will be replaced with range invalidations, if supported by 911 + * the hardware, when RISC-V IOMMU architecture specification update for 912 + * range invalidations update will be available. 913 + */ 914 + #define RISCV_IOMMU_IOTLB_INVAL_LIMIT (2 << 20) 915 + 916 + static void riscv_iommu_iotlb_inval(struct riscv_iommu_domain *domain, 917 + unsigned long start, unsigned long end) 918 + { 919 + struct riscv_iommu_bond *bond; 920 + struct riscv_iommu_device *iommu, *prev; 921 + struct riscv_iommu_command cmd; 922 + unsigned long len = end - start + 1; 923 + unsigned long iova; 924 + 925 + /* 926 + * For each IOMMU linked with this protection domain (via bonds->dev), 927 + * an IOTLB invaliation command will be submitted and executed. 928 + * 929 + * Possbile race with domain attach flow is handled by sequencing 930 + * bond creation - riscv_iommu_bond_link(), and device directory 931 + * update - riscv_iommu_iodir_update(). 932 + * 933 + * PTE Update / IOTLB Inval Device attach & directory update 934 + * -------------------------- -------------------------- 935 + * update page table entries add dev to the bond list 936 + * FENCE RW,RW FENCE RW,RW 937 + * For all IOMMUs: (can be empty) Update FSC/PSCID 938 + * FENCE IOW,IOW FENCE IOW,IOW 939 + * IOTLB.INVAL IODIR.INVAL 940 + * IOFENCE.C 941 + * 942 + * If bond list is not updated with new device, directory context will 943 + * be configured with already valid page table content. If an IOMMU is 944 + * linked to the protection domain it will receive invalidation 945 + * requests for updated page table entries. 946 + */ 947 + smp_mb(); 948 + 949 + rcu_read_lock(); 950 + 951 + prev = NULL; 952 + list_for_each_entry_rcu(bond, &domain->bonds, list) { 953 + iommu = dev_to_iommu(bond->dev); 954 + 955 + /* 956 + * IOTLB invalidation request can be safely omitted if already sent 957 + * to the IOMMU for the same PSCID, and with domain->bonds list 958 + * arranged based on the device's IOMMU, it's sufficient to check 959 + * last device the invalidation was sent to. 960 + */ 961 + if (iommu == prev) 962 + continue; 963 + 964 + riscv_iommu_cmd_inval_vma(&cmd); 965 + riscv_iommu_cmd_inval_set_pscid(&cmd, domain->pscid); 966 + if (len && len < RISCV_IOMMU_IOTLB_INVAL_LIMIT) { 967 + for (iova = start; iova < end; iova += PAGE_SIZE) { 968 + riscv_iommu_cmd_inval_set_addr(&cmd, iova); 969 + riscv_iommu_cmd_send(iommu, &cmd); 970 + } 971 + } else { 972 + riscv_iommu_cmd_send(iommu, &cmd); 973 + } 974 + prev = iommu; 975 + } 976 + 977 + prev = NULL; 978 + list_for_each_entry_rcu(bond, &domain->bonds, list) { 979 + iommu = dev_to_iommu(bond->dev); 980 + if (iommu == prev) 981 + continue; 982 + 983 + riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT); 984 + prev = iommu; 985 + } 986 + rcu_read_unlock(); 987 + } 988 + 989 + #define RISCV_IOMMU_FSC_BARE 0 990 + 991 + /* 992 + * Update IODIR for the device. 993 + * 994 + * During the execution of riscv_iommu_probe_device(), IODIR entries are 995 + * allocated for the device's identifiers. Device context invalidation 996 + * becomes necessary only if one of the updated entries was previously 997 + * marked as valid, given that invalid device context entries are not 998 + * cached by the IOMMU hardware. 999 + * In this implementation, updating a valid device context while the 1000 + * device is not quiesced might be disruptive, potentially causing 1001 + * interim translation faults. 1002 + */ 1003 + static void riscv_iommu_iodir_update(struct riscv_iommu_device *iommu, 1004 + struct device *dev, u64 fsc, u64 ta) 1005 + { 1006 + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); 1007 + struct riscv_iommu_dc *dc; 1008 + struct riscv_iommu_command cmd; 1009 + bool sync_required = false; 1010 + u64 tc; 1011 + int i; 1012 + 1013 + for (i = 0; i < fwspec->num_ids; i++) { 1014 + dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]); 1015 + tc = READ_ONCE(dc->tc); 1016 + if (!(tc & RISCV_IOMMU_DC_TC_V)) 1017 + continue; 1018 + 1019 + WRITE_ONCE(dc->tc, tc & ~RISCV_IOMMU_DC_TC_V); 1020 + 1021 + /* Invalidate device context cached values */ 1022 + riscv_iommu_cmd_iodir_inval_ddt(&cmd); 1023 + riscv_iommu_cmd_iodir_set_did(&cmd, fwspec->ids[i]); 1024 + riscv_iommu_cmd_send(iommu, &cmd); 1025 + sync_required = true; 1026 + } 1027 + 1028 + if (sync_required) 1029 + riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT); 1030 + 1031 + /* 1032 + * For device context with DC_TC_PDTV = 0, translation attributes valid bit 1033 + * is stored as DC_TC_V bit (both sharing the same location at BIT(0)). 1034 + */ 1035 + for (i = 0; i < fwspec->num_ids; i++) { 1036 + dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]); 1037 + tc = READ_ONCE(dc->tc); 1038 + tc |= ta & RISCV_IOMMU_DC_TC_V; 1039 + 1040 + WRITE_ONCE(dc->fsc, fsc); 1041 + WRITE_ONCE(dc->ta, ta & RISCV_IOMMU_PC_TA_PSCID); 1042 + /* Update device context, write TC.V as the last step. */ 1043 + dma_wmb(); 1044 + WRITE_ONCE(dc->tc, tc); 1045 + 1046 + /* Invalidate device context after update */ 1047 + riscv_iommu_cmd_iodir_inval_ddt(&cmd); 1048 + riscv_iommu_cmd_iodir_set_did(&cmd, fwspec->ids[i]); 1049 + riscv_iommu_cmd_send(iommu, &cmd); 1050 + } 1051 + 1052 + riscv_iommu_cmd_sync(iommu, RISCV_IOMMU_IOTINVAL_TIMEOUT); 1053 + } 1054 + 1055 + /* 1056 + * IOVA page translation tree management. 1057 + */ 1058 + 1059 + static void riscv_iommu_iotlb_flush_all(struct iommu_domain *iommu_domain) 1060 + { 1061 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1062 + 1063 + riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX); 1064 + } 1065 + 1066 + static void riscv_iommu_iotlb_sync(struct iommu_domain *iommu_domain, 1067 + struct iommu_iotlb_gather *gather) 1068 + { 1069 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1070 + 1071 + riscv_iommu_iotlb_inval(domain, gather->start, gather->end); 1072 + } 1073 + 1074 + #define PT_SHIFT (PAGE_SHIFT - ilog2(sizeof(pte_t))) 1075 + 1076 + #define _io_pte_present(pte) ((pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) 1077 + #define _io_pte_leaf(pte) ((pte) & _PAGE_LEAF) 1078 + #define _io_pte_none(pte) ((pte) == 0) 1079 + #define _io_pte_entry(pn, prot) ((_PAGE_PFN_MASK & ((pn) << _PAGE_PFN_SHIFT)) | (prot)) 1080 + 1081 + static void riscv_iommu_pte_free(struct riscv_iommu_domain *domain, 1082 + unsigned long pte, struct list_head *freelist) 1083 + { 1084 + unsigned long *ptr; 1085 + int i; 1086 + 1087 + if (!_io_pte_present(pte) || _io_pte_leaf(pte)) 1088 + return; 1089 + 1090 + ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); 1091 + 1092 + /* Recursively free all sub page table pages */ 1093 + for (i = 0; i < PTRS_PER_PTE; i++) { 1094 + pte = READ_ONCE(ptr[i]); 1095 + if (!_io_pte_none(pte) && cmpxchg_relaxed(ptr + i, pte, 0) == pte) 1096 + riscv_iommu_pte_free(domain, pte, freelist); 1097 + } 1098 + 1099 + if (freelist) 1100 + list_add_tail(&virt_to_page(ptr)->lru, freelist); 1101 + else 1102 + iommu_free_page(ptr); 1103 + } 1104 + 1105 + static unsigned long *riscv_iommu_pte_alloc(struct riscv_iommu_domain *domain, 1106 + unsigned long iova, size_t pgsize, 1107 + gfp_t gfp) 1108 + { 1109 + unsigned long *ptr = domain->pgd_root; 1110 + unsigned long pte, old; 1111 + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; 1112 + void *addr; 1113 + 1114 + do { 1115 + const int shift = PAGE_SHIFT + PT_SHIFT * level; 1116 + 1117 + ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); 1118 + /* 1119 + * Note: returned entry might be a non-leaf if there was 1120 + * existing mapping with smaller granularity. Up to the caller 1121 + * to replace and invalidate. 1122 + */ 1123 + if (((size_t)1 << shift) == pgsize) 1124 + return ptr; 1125 + pte_retry: 1126 + pte = READ_ONCE(*ptr); 1127 + /* 1128 + * This is very likely incorrect as we should not be adding 1129 + * new mapping with smaller granularity on top 1130 + * of existing 2M/1G mapping. Fail. 1131 + */ 1132 + if (_io_pte_present(pte) && _io_pte_leaf(pte)) 1133 + return NULL; 1134 + /* 1135 + * Non-leaf entry is missing, allocate and try to add to the 1136 + * page table. This might race with other mappings, retry. 1137 + */ 1138 + if (_io_pte_none(pte)) { 1139 + addr = iommu_alloc_page_node(domain->numa_node, gfp); 1140 + if (!addr) 1141 + return NULL; 1142 + old = pte; 1143 + pte = _io_pte_entry(virt_to_pfn(addr), _PAGE_TABLE); 1144 + if (cmpxchg_relaxed(ptr, old, pte) != old) { 1145 + iommu_free_page(addr); 1146 + goto pte_retry; 1147 + } 1148 + } 1149 + ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); 1150 + } while (level-- > 0); 1151 + 1152 + return NULL; 1153 + } 1154 + 1155 + static unsigned long *riscv_iommu_pte_fetch(struct riscv_iommu_domain *domain, 1156 + unsigned long iova, size_t *pte_pgsize) 1157 + { 1158 + unsigned long *ptr = domain->pgd_root; 1159 + unsigned long pte; 1160 + int level = domain->pgd_mode - RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39 + 2; 1161 + 1162 + do { 1163 + const int shift = PAGE_SHIFT + PT_SHIFT * level; 1164 + 1165 + ptr += ((iova >> shift) & (PTRS_PER_PTE - 1)); 1166 + pte = READ_ONCE(*ptr); 1167 + if (_io_pte_present(pte) && _io_pte_leaf(pte)) { 1168 + *pte_pgsize = (size_t)1 << shift; 1169 + return ptr; 1170 + } 1171 + if (_io_pte_none(pte)) 1172 + return NULL; 1173 + ptr = (unsigned long *)pfn_to_virt(__page_val_to_pfn(pte)); 1174 + } while (level-- > 0); 1175 + 1176 + return NULL; 1177 + } 1178 + 1179 + static int riscv_iommu_map_pages(struct iommu_domain *iommu_domain, 1180 + unsigned long iova, phys_addr_t phys, 1181 + size_t pgsize, size_t pgcount, int prot, 1182 + gfp_t gfp, size_t *mapped) 1183 + { 1184 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1185 + size_t size = 0; 1186 + unsigned long *ptr; 1187 + unsigned long pte, old, pte_prot; 1188 + int rc = 0; 1189 + LIST_HEAD(freelist); 1190 + 1191 + if (!(prot & IOMMU_WRITE)) 1192 + pte_prot = _PAGE_BASE | _PAGE_READ; 1193 + else if (domain->amo_enabled) 1194 + pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE; 1195 + else 1196 + pte_prot = _PAGE_BASE | _PAGE_READ | _PAGE_WRITE | _PAGE_DIRTY; 1197 + 1198 + while (pgcount) { 1199 + ptr = riscv_iommu_pte_alloc(domain, iova, pgsize, gfp); 1200 + if (!ptr) { 1201 + rc = -ENOMEM; 1202 + break; 1203 + } 1204 + 1205 + old = READ_ONCE(*ptr); 1206 + pte = _io_pte_entry(phys_to_pfn(phys), pte_prot); 1207 + if (cmpxchg_relaxed(ptr, old, pte) != old) 1208 + continue; 1209 + 1210 + riscv_iommu_pte_free(domain, old, &freelist); 1211 + 1212 + size += pgsize; 1213 + iova += pgsize; 1214 + phys += pgsize; 1215 + --pgcount; 1216 + } 1217 + 1218 + *mapped = size; 1219 + 1220 + if (!list_empty(&freelist)) { 1221 + /* 1222 + * In 1.0 spec version, the smallest scope we can use to 1223 + * invalidate all levels of page table (i.e. leaf and non-leaf) 1224 + * is an invalidate-all-PSCID IOTINVAL.VMA with AV=0. 1225 + * This will be updated with hardware support for 1226 + * capability.NL (non-leaf) IOTINVAL command. 1227 + */ 1228 + riscv_iommu_iotlb_inval(domain, 0, ULONG_MAX); 1229 + iommu_put_pages_list(&freelist); 1230 + } 1231 + 1232 + return rc; 1233 + } 1234 + 1235 + static size_t riscv_iommu_unmap_pages(struct iommu_domain *iommu_domain, 1236 + unsigned long iova, size_t pgsize, 1237 + size_t pgcount, 1238 + struct iommu_iotlb_gather *gather) 1239 + { 1240 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1241 + size_t size = pgcount << __ffs(pgsize); 1242 + unsigned long *ptr, old; 1243 + size_t unmapped = 0; 1244 + size_t pte_size; 1245 + 1246 + while (unmapped < size) { 1247 + ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); 1248 + if (!ptr) 1249 + return unmapped; 1250 + 1251 + /* partial unmap is not allowed, fail. */ 1252 + if (iova & (pte_size - 1)) 1253 + return unmapped; 1254 + 1255 + old = READ_ONCE(*ptr); 1256 + if (cmpxchg_relaxed(ptr, old, 0) != old) 1257 + continue; 1258 + 1259 + iommu_iotlb_gather_add_page(&domain->domain, gather, iova, 1260 + pte_size); 1261 + 1262 + iova += pte_size; 1263 + unmapped += pte_size; 1264 + } 1265 + 1266 + return unmapped; 1267 + } 1268 + 1269 + static phys_addr_t riscv_iommu_iova_to_phys(struct iommu_domain *iommu_domain, 1270 + dma_addr_t iova) 1271 + { 1272 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1273 + unsigned long pte_size; 1274 + unsigned long *ptr; 1275 + 1276 + ptr = riscv_iommu_pte_fetch(domain, iova, &pte_size); 1277 + if (_io_pte_none(*ptr) || !_io_pte_present(*ptr)) 1278 + return 0; 1279 + 1280 + return pfn_to_phys(__page_val_to_pfn(*ptr)) | (iova & (pte_size - 1)); 1281 + } 1282 + 1283 + static void riscv_iommu_free_paging_domain(struct iommu_domain *iommu_domain) 1284 + { 1285 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1286 + const unsigned long pfn = virt_to_pfn(domain->pgd_root); 1287 + 1288 + WARN_ON(!list_empty(&domain->bonds)); 1289 + 1290 + if ((int)domain->pscid > 0) 1291 + ida_free(&riscv_iommu_pscids, domain->pscid); 1292 + 1293 + riscv_iommu_pte_free(domain, _io_pte_entry(pfn, _PAGE_TABLE), NULL); 1294 + kfree(domain); 1295 + } 1296 + 1297 + static bool riscv_iommu_pt_supported(struct riscv_iommu_device *iommu, int pgd_mode) 1298 + { 1299 + switch (pgd_mode) { 1300 + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39: 1301 + return iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39; 1302 + 1303 + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48: 1304 + return iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48; 1305 + 1306 + case RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57: 1307 + return iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57; 1308 + } 1309 + return false; 1310 + } 1311 + 1312 + static int riscv_iommu_attach_paging_domain(struct iommu_domain *iommu_domain, 1313 + struct device *dev) 1314 + { 1315 + struct riscv_iommu_domain *domain = iommu_domain_to_riscv(iommu_domain); 1316 + struct riscv_iommu_device *iommu = dev_to_iommu(dev); 1317 + struct riscv_iommu_info *info = dev_iommu_priv_get(dev); 1318 + u64 fsc, ta; 1319 + 1320 + if (!riscv_iommu_pt_supported(iommu, domain->pgd_mode)) 1321 + return -ENODEV; 1322 + 1323 + fsc = FIELD_PREP(RISCV_IOMMU_PC_FSC_MODE, domain->pgd_mode) | 1324 + FIELD_PREP(RISCV_IOMMU_PC_FSC_PPN, virt_to_pfn(domain->pgd_root)); 1325 + ta = FIELD_PREP(RISCV_IOMMU_PC_TA_PSCID, domain->pscid) | 1326 + RISCV_IOMMU_PC_TA_V; 1327 + 1328 + if (riscv_iommu_bond_link(domain, dev)) 1329 + return -ENOMEM; 1330 + 1331 + riscv_iommu_iodir_update(iommu, dev, fsc, ta); 1332 + riscv_iommu_bond_unlink(info->domain, dev); 1333 + info->domain = domain; 1334 + 1335 + return 0; 1336 + } 1337 + 1338 + static const struct iommu_domain_ops riscv_iommu_paging_domain_ops = { 1339 + .attach_dev = riscv_iommu_attach_paging_domain, 1340 + .free = riscv_iommu_free_paging_domain, 1341 + .map_pages = riscv_iommu_map_pages, 1342 + .unmap_pages = riscv_iommu_unmap_pages, 1343 + .iova_to_phys = riscv_iommu_iova_to_phys, 1344 + .iotlb_sync = riscv_iommu_iotlb_sync, 1345 + .flush_iotlb_all = riscv_iommu_iotlb_flush_all, 1346 + }; 1347 + 1348 + static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) 1349 + { 1350 + struct riscv_iommu_domain *domain; 1351 + struct riscv_iommu_device *iommu; 1352 + unsigned int pgd_mode; 1353 + dma_addr_t va_mask; 1354 + int va_bits; 1355 + 1356 + iommu = dev_to_iommu(dev); 1357 + if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV57) { 1358 + pgd_mode = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV57; 1359 + va_bits = 57; 1360 + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV48) { 1361 + pgd_mode = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV48; 1362 + va_bits = 48; 1363 + } else if (iommu->caps & RISCV_IOMMU_CAPABILITIES_SV39) { 1364 + pgd_mode = RISCV_IOMMU_DC_FSC_IOSATP_MODE_SV39; 1365 + va_bits = 39; 1366 + } else { 1367 + dev_err(dev, "cannot find supported page table mode\n"); 1368 + return ERR_PTR(-ENODEV); 1369 + } 1370 + 1371 + domain = kzalloc(sizeof(*domain), GFP_KERNEL); 1372 + if (!domain) 1373 + return ERR_PTR(-ENOMEM); 1374 + 1375 + INIT_LIST_HEAD_RCU(&domain->bonds); 1376 + spin_lock_init(&domain->lock); 1377 + domain->numa_node = dev_to_node(iommu->dev); 1378 + domain->amo_enabled = !!(iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD); 1379 + domain->pgd_mode = pgd_mode; 1380 + domain->pgd_root = iommu_alloc_page_node(domain->numa_node, 1381 + GFP_KERNEL_ACCOUNT); 1382 + if (!domain->pgd_root) { 1383 + kfree(domain); 1384 + return ERR_PTR(-ENOMEM); 1385 + } 1386 + 1387 + domain->pscid = ida_alloc_range(&riscv_iommu_pscids, 1, 1388 + RISCV_IOMMU_MAX_PSCID, GFP_KERNEL); 1389 + if (domain->pscid < 0) { 1390 + iommu_free_page(domain->pgd_root); 1391 + kfree(domain); 1392 + return ERR_PTR(-ENOMEM); 1393 + } 1394 + 1395 + /* 1396 + * Note: RISC-V Privilege spec mandates that virtual addresses 1397 + * need to be sign-extended, so if (VA_BITS - 1) is set, all 1398 + * bits >= VA_BITS need to also be set or else we'll get a 1399 + * page fault. However the code that creates the mappings 1400 + * above us (e.g. iommu_dma_alloc_iova()) won't do that for us 1401 + * for now, so we'll end up with invalid virtual addresses 1402 + * to map. As a workaround until we get this sorted out 1403 + * limit the available virtual addresses to VA_BITS - 1. 1404 + */ 1405 + va_mask = DMA_BIT_MASK(va_bits - 1); 1406 + 1407 + domain->domain.geometry.aperture_start = 0; 1408 + domain->domain.geometry.aperture_end = va_mask; 1409 + domain->domain.geometry.force_aperture = true; 1410 + domain->domain.pgsize_bitmap = va_mask & (SZ_4K | SZ_2M | SZ_1G | SZ_512G); 1411 + 1412 + domain->domain.ops = &riscv_iommu_paging_domain_ops; 1413 + 1414 + return &domain->domain; 1415 + } 1416 + 1417 + static int riscv_iommu_attach_blocking_domain(struct iommu_domain *iommu_domain, 1418 + struct device *dev) 1419 + { 1420 + struct riscv_iommu_device *iommu = dev_to_iommu(dev); 1421 + struct riscv_iommu_info *info = dev_iommu_priv_get(dev); 1422 + 1423 + /* Make device context invalid, translation requests will fault w/ #258 */ 1424 + riscv_iommu_iodir_update(iommu, dev, RISCV_IOMMU_FSC_BARE, 0); 1425 + riscv_iommu_bond_unlink(info->domain, dev); 1426 + info->domain = NULL; 1427 + 1428 + return 0; 1429 + } 1430 + 1431 + static struct iommu_domain riscv_iommu_blocking_domain = { 1432 + .type = IOMMU_DOMAIN_BLOCKED, 1433 + .ops = &(const struct iommu_domain_ops) { 1434 + .attach_dev = riscv_iommu_attach_blocking_domain, 1435 + } 1436 + }; 1437 + 1438 + static int riscv_iommu_attach_identity_domain(struct iommu_domain *iommu_domain, 1439 + struct device *dev) 1440 + { 1441 + struct riscv_iommu_device *iommu = dev_to_iommu(dev); 1442 + struct riscv_iommu_info *info = dev_iommu_priv_get(dev); 1443 + 1444 + riscv_iommu_iodir_update(iommu, dev, RISCV_IOMMU_FSC_BARE, RISCV_IOMMU_PC_TA_V); 1445 + riscv_iommu_bond_unlink(info->domain, dev); 1446 + info->domain = NULL; 1447 + 1448 + return 0; 1449 + } 1450 + 1451 + static struct iommu_domain riscv_iommu_identity_domain = { 1452 + .type = IOMMU_DOMAIN_IDENTITY, 1453 + .ops = &(const struct iommu_domain_ops) { 1454 + .attach_dev = riscv_iommu_attach_identity_domain, 1455 + } 1456 + }; 1457 + 1458 + static struct iommu_group *riscv_iommu_device_group(struct device *dev) 1459 + { 1460 + if (dev_is_pci(dev)) 1461 + return pci_device_group(dev); 1462 + return generic_device_group(dev); 1463 + } 1464 + 1465 + static int riscv_iommu_of_xlate(struct device *dev, const struct of_phandle_args *args) 1466 + { 1467 + return iommu_fwspec_add_ids(dev, args->args, 1); 1468 + } 1469 + 1470 + static struct iommu_device *riscv_iommu_probe_device(struct device *dev) 1471 + { 1472 + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); 1473 + struct riscv_iommu_device *iommu; 1474 + struct riscv_iommu_info *info; 1475 + struct riscv_iommu_dc *dc; 1476 + u64 tc; 1477 + int i; 1478 + 1479 + if (!fwspec || !fwspec->iommu_fwnode->dev || !fwspec->num_ids) 1480 + return ERR_PTR(-ENODEV); 1481 + 1482 + iommu = dev_get_drvdata(fwspec->iommu_fwnode->dev); 1483 + if (!iommu) 1484 + return ERR_PTR(-ENODEV); 1485 + 1486 + /* 1487 + * IOMMU hardware operating in fail-over BARE mode will provide 1488 + * identity translation for all connected devices anyway... 1489 + */ 1490 + if (iommu->ddt_mode <= RISCV_IOMMU_DDTP_IOMMU_MODE_BARE) 1491 + return ERR_PTR(-ENODEV); 1492 + 1493 + info = kzalloc(sizeof(*info), GFP_KERNEL); 1494 + if (!info) 1495 + return ERR_PTR(-ENOMEM); 1496 + /* 1497 + * Allocate and pre-configure device context entries in 1498 + * the device directory. Do not mark the context valid yet. 1499 + */ 1500 + tc = 0; 1501 + if (iommu->caps & RISCV_IOMMU_CAPABILITIES_AMO_HWAD) 1502 + tc |= RISCV_IOMMU_DC_TC_SADE; 1503 + for (i = 0; i < fwspec->num_ids; i++) { 1504 + dc = riscv_iommu_get_dc(iommu, fwspec->ids[i]); 1505 + if (!dc) { 1506 + kfree(info); 1507 + return ERR_PTR(-ENODEV); 1508 + } 1509 + if (READ_ONCE(dc->tc) & RISCV_IOMMU_DC_TC_V) 1510 + dev_warn(dev, "already attached to IOMMU device directory\n"); 1511 + WRITE_ONCE(dc->tc, tc); 1512 + } 1513 + 1514 + dev_iommu_priv_set(dev, info); 1515 + 1516 + return &iommu->iommu; 1517 + } 1518 + 1519 + static void riscv_iommu_release_device(struct device *dev) 1520 + { 1521 + struct riscv_iommu_info *info = dev_iommu_priv_get(dev); 1522 + 1523 + kfree_rcu_mightsleep(info); 1524 + } 1525 + 1526 + static const struct iommu_ops riscv_iommu_ops = { 1527 + .pgsize_bitmap = SZ_4K, 1528 + .of_xlate = riscv_iommu_of_xlate, 1529 + .identity_domain = &riscv_iommu_identity_domain, 1530 + .blocked_domain = &riscv_iommu_blocking_domain, 1531 + .release_domain = &riscv_iommu_blocking_domain, 1532 + .domain_alloc_paging = riscv_iommu_alloc_paging_domain, 1533 + .device_group = riscv_iommu_device_group, 1534 + .probe_device = riscv_iommu_probe_device, 1535 + .release_device = riscv_iommu_release_device, 1536 + }; 1537 + 1538 + static int riscv_iommu_init_check(struct riscv_iommu_device *iommu) 1539 + { 1540 + u64 ddtp; 1541 + 1542 + /* 1543 + * Make sure the IOMMU is switched off or in pass-through mode during 1544 + * regular boot flow and disable translation when we boot into a kexec 1545 + * kernel and the previous kernel left them enabled. 1546 + */ 1547 + ddtp = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_DDTP); 1548 + if (ddtp & RISCV_IOMMU_DDTP_BUSY) 1549 + return -EBUSY; 1550 + 1551 + if (FIELD_GET(RISCV_IOMMU_DDTP_IOMMU_MODE, ddtp) > 1552 + RISCV_IOMMU_DDTP_IOMMU_MODE_BARE) { 1553 + if (!is_kdump_kernel()) 1554 + return -EBUSY; 1555 + riscv_iommu_disable(iommu); 1556 + } 1557 + 1558 + /* Configure accesses to in-memory data structures for CPU-native byte order. */ 1559 + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) != 1560 + !!(iommu->fctl & RISCV_IOMMU_FCTL_BE)) { 1561 + if (!(iommu->caps & RISCV_IOMMU_CAPABILITIES_END)) 1562 + return -EINVAL; 1563 + riscv_iommu_writel(iommu, RISCV_IOMMU_REG_FCTL, 1564 + iommu->fctl ^ RISCV_IOMMU_FCTL_BE); 1565 + iommu->fctl = riscv_iommu_readl(iommu, RISCV_IOMMU_REG_FCTL); 1566 + if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN) != 1567 + !!(iommu->fctl & RISCV_IOMMU_FCTL_BE)) 1568 + return -EINVAL; 1569 + } 1570 + 1571 + /* 1572 + * Distribute interrupt vectors, always use first vector for CIV. 1573 + * At least one interrupt is required. Read back and verify. 1574 + */ 1575 + if (!iommu->irqs_count) 1576 + return -EINVAL; 1577 + 1578 + iommu->icvec = FIELD_PREP(RISCV_IOMMU_ICVEC_FIV, 1 % iommu->irqs_count) | 1579 + FIELD_PREP(RISCV_IOMMU_ICVEC_PIV, 2 % iommu->irqs_count) | 1580 + FIELD_PREP(RISCV_IOMMU_ICVEC_PMIV, 3 % iommu->irqs_count); 1581 + riscv_iommu_writeq(iommu, RISCV_IOMMU_REG_ICVEC, iommu->icvec); 1582 + iommu->icvec = riscv_iommu_readq(iommu, RISCV_IOMMU_REG_ICVEC); 1583 + if (max(max(FIELD_GET(RISCV_IOMMU_ICVEC_CIV, iommu->icvec), 1584 + FIELD_GET(RISCV_IOMMU_ICVEC_FIV, iommu->icvec)), 1585 + max(FIELD_GET(RISCV_IOMMU_ICVEC_PIV, iommu->icvec), 1586 + FIELD_GET(RISCV_IOMMU_ICVEC_PMIV, iommu->icvec))) >= iommu->irqs_count) 1587 + return -EINVAL; 1588 + 1589 + return 0; 1590 + } 1591 + 1592 + void riscv_iommu_remove(struct riscv_iommu_device *iommu) 1593 + { 1594 + iommu_device_unregister(&iommu->iommu); 1595 + iommu_device_sysfs_remove(&iommu->iommu); 1596 + riscv_iommu_iodir_set_mode(iommu, RISCV_IOMMU_DDTP_IOMMU_MODE_OFF); 1597 + riscv_iommu_queue_disable(&iommu->cmdq); 1598 + riscv_iommu_queue_disable(&iommu->fltq); 1599 + } 1600 + 1601 + int riscv_iommu_init(struct riscv_iommu_device *iommu) 1602 + { 1603 + int rc; 1604 + 1605 + RISCV_IOMMU_QUEUE_INIT(&iommu->cmdq, CQ); 1606 + RISCV_IOMMU_QUEUE_INIT(&iommu->fltq, FQ); 1607 + 1608 + rc = riscv_iommu_init_check(iommu); 1609 + if (rc) 1610 + return dev_err_probe(iommu->dev, rc, "unexpected device state\n"); 1611 + 1612 + rc = riscv_iommu_iodir_alloc(iommu); 1613 + if (rc) 1614 + return rc; 1615 + 1616 + rc = riscv_iommu_queue_alloc(iommu, &iommu->cmdq, 1617 + sizeof(struct riscv_iommu_command)); 1618 + if (rc) 1619 + return rc; 1620 + 1621 + rc = riscv_iommu_queue_alloc(iommu, &iommu->fltq, 1622 + sizeof(struct riscv_iommu_fq_record)); 1623 + if (rc) 1624 + return rc; 1625 + 1626 + rc = riscv_iommu_queue_enable(iommu, &iommu->cmdq, riscv_iommu_cmdq_process); 1627 + if (rc) 1628 + return rc; 1629 + 1630 + rc = riscv_iommu_queue_enable(iommu, &iommu->fltq, riscv_iommu_fltq_process); 1631 + if (rc) 1632 + goto err_queue_disable; 1633 + 1634 + rc = riscv_iommu_iodir_set_mode(iommu, RISCV_IOMMU_DDTP_IOMMU_MODE_MAX); 1635 + if (rc) 1636 + goto err_queue_disable; 1637 + 1638 + rc = iommu_device_sysfs_add(&iommu->iommu, NULL, NULL, "riscv-iommu@%s", 1639 + dev_name(iommu->dev)); 1640 + if (rc) { 1641 + dev_err_probe(iommu->dev, rc, "cannot register sysfs interface\n"); 1642 + goto err_iodir_off; 1643 + } 1644 + 1645 + rc = iommu_device_register(&iommu->iommu, &riscv_iommu_ops, iommu->dev); 1646 + if (rc) { 1647 + dev_err_probe(iommu->dev, rc, "cannot register iommu interface\n"); 1648 + goto err_remove_sysfs; 1649 + } 1650 + 1651 + return 0; 1652 + 1653 + err_remove_sysfs: 1654 + iommu_device_sysfs_remove(&iommu->iommu); 1655 + err_iodir_off: 1656 + riscv_iommu_iodir_set_mode(iommu, RISCV_IOMMU_DDTP_IOMMU_MODE_OFF); 1657 + err_queue_disable: 1658 + riscv_iommu_queue_disable(&iommu->fltq); 1659 + riscv_iommu_queue_disable(&iommu->cmdq); 1660 + return rc; 1661 + }
+88
drivers/iommu/riscv/iommu.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright © 2022-2024 Rivos Inc. 4 + * Copyright © 2023 FORTH-ICS/CARV 5 + * 6 + * Authors 7 + * Tomasz Jeznach <tjeznach@rivosinc.com> 8 + * Nick Kossifidis <mick@ics.forth.gr> 9 + */ 10 + 11 + #ifndef _RISCV_IOMMU_H_ 12 + #define _RISCV_IOMMU_H_ 13 + 14 + #include <linux/iommu.h> 15 + #include <linux/types.h> 16 + #include <linux/iopoll.h> 17 + 18 + #include "iommu-bits.h" 19 + 20 + struct riscv_iommu_device; 21 + 22 + struct riscv_iommu_queue { 23 + atomic_t prod; /* unbounded producer allocation index */ 24 + atomic_t head; /* unbounded shadow ring buffer consumer index */ 25 + atomic_t tail; /* unbounded shadow ring buffer producer index */ 26 + unsigned int mask; /* index mask, queue length - 1 */ 27 + unsigned int irq; /* allocated interrupt number */ 28 + struct riscv_iommu_device *iommu; /* iommu device handling the queue when active */ 29 + void *base; /* ring buffer kernel pointer */ 30 + dma_addr_t phys; /* ring buffer physical address */ 31 + u16 qbr; /* base register offset, head and tail reference */ 32 + u16 qcr; /* control and status register offset */ 33 + u8 qid; /* queue identifier, same as RISCV_IOMMU_INTR_XX */ 34 + }; 35 + 36 + struct riscv_iommu_device { 37 + /* iommu core interface */ 38 + struct iommu_device iommu; 39 + 40 + /* iommu hardware */ 41 + struct device *dev; 42 + 43 + /* hardware control register space */ 44 + void __iomem *reg; 45 + 46 + /* supported and enabled hardware capabilities */ 47 + u64 caps; 48 + u32 fctl; 49 + 50 + /* available interrupt numbers, MSI or WSI */ 51 + unsigned int irqs[RISCV_IOMMU_INTR_COUNT]; 52 + unsigned int irqs_count; 53 + unsigned int icvec; 54 + 55 + /* hardware queues */ 56 + struct riscv_iommu_queue cmdq; 57 + struct riscv_iommu_queue fltq; 58 + 59 + /* device directory */ 60 + unsigned int ddt_mode; 61 + dma_addr_t ddt_phys; 62 + u64 *ddt_root; 63 + }; 64 + 65 + int riscv_iommu_init(struct riscv_iommu_device *iommu); 66 + void riscv_iommu_remove(struct riscv_iommu_device *iommu); 67 + 68 + #define riscv_iommu_readl(iommu, addr) \ 69 + readl_relaxed((iommu)->reg + (addr)) 70 + 71 + #define riscv_iommu_readq(iommu, addr) \ 72 + readq_relaxed((iommu)->reg + (addr)) 73 + 74 + #define riscv_iommu_writel(iommu, addr, val) \ 75 + writel_relaxed((val), (iommu)->reg + (addr)) 76 + 77 + #define riscv_iommu_writeq(iommu, addr, val) \ 78 + writeq_relaxed((val), (iommu)->reg + (addr)) 79 + 80 + #define riscv_iommu_readq_timeout(iommu, addr, val, cond, delay_us, timeout_us) \ 81 + readx_poll_timeout(readq_relaxed, (iommu)->reg + (addr), val, cond, \ 82 + delay_us, timeout_us) 83 + 84 + #define riscv_iommu_readl_timeout(iommu, addr, val, cond, delay_us, timeout_us) \ 85 + readx_poll_timeout(readl_relaxed, (iommu)->reg + (addr), val, cond, \ 86 + delay_us, timeout_us) 87 + 88 + #endif
+47 -28
drivers/iommu/s390-iommu.c
··· 33 33 struct rcu_head rcu; 34 34 }; 35 35 36 + static struct iommu_domain blocking_domain; 37 + 36 38 static inline unsigned int calc_rtx(dma_addr_t ptr) 37 39 { 38 40 return ((unsigned long)ptr >> ZPCI_RT_SHIFT) & ZPCI_INDEX_MASK; ··· 371 369 call_rcu(&s390_domain->rcu, s390_iommu_rcu_free_domain); 372 370 } 373 371 374 - static void s390_iommu_detach_device(struct iommu_domain *domain, 375 - struct device *dev) 372 + static void zdev_s390_domain_update(struct zpci_dev *zdev, 373 + struct iommu_domain *domain) 376 374 { 377 - struct s390_domain *s390_domain = to_s390_domain(domain); 378 - struct zpci_dev *zdev = to_zpci_dev(dev); 379 375 unsigned long flags; 380 376 377 + spin_lock_irqsave(&zdev->dom_lock, flags); 378 + zdev->s390_domain = domain; 379 + spin_unlock_irqrestore(&zdev->dom_lock, flags); 380 + } 381 + 382 + static int blocking_domain_attach_device(struct iommu_domain *domain, 383 + struct device *dev) 384 + { 385 + struct zpci_dev *zdev = to_zpci_dev(dev); 386 + struct s390_domain *s390_domain; 387 + unsigned long flags; 388 + 389 + if (zdev->s390_domain->type == IOMMU_DOMAIN_BLOCKED) 390 + return 0; 391 + 392 + s390_domain = to_s390_domain(zdev->s390_domain); 381 393 spin_lock_irqsave(&s390_domain->list_lock, flags); 382 394 list_del_rcu(&zdev->iommu_list); 383 395 spin_unlock_irqrestore(&s390_domain->list_lock, flags); 384 396 385 397 zpci_unregister_ioat(zdev, 0); 386 - zdev->s390_domain = NULL; 387 398 zdev->dma_table = NULL; 399 + zdev_s390_domain_update(zdev, domain); 400 + 401 + return 0; 388 402 } 389 403 390 404 static int s390_iommu_attach_device(struct iommu_domain *domain, ··· 419 401 domain->geometry.aperture_end < zdev->start_dma)) 420 402 return -EINVAL; 421 403 422 - if (zdev->s390_domain) 423 - s390_iommu_detach_device(&zdev->s390_domain->domain, dev); 404 + blocking_domain_attach_device(&blocking_domain, dev); 424 405 406 + /* If we fail now DMA remains blocked via blocking domain */ 425 407 cc = zpci_register_ioat(zdev, 0, zdev->start_dma, zdev->end_dma, 426 408 virt_to_phys(s390_domain->dma_table), &status); 427 - /* 428 - * If the device is undergoing error recovery the reset code 429 - * will re-establish the new domain. 430 - */ 431 409 if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) 432 410 return -EIO; 433 - 434 411 zdev->dma_table = s390_domain->dma_table; 435 - zdev->s390_domain = s390_domain; 412 + zdev_s390_domain_update(zdev, domain); 436 413 437 414 spin_lock_irqsave(&s390_domain->list_lock, flags); 438 415 list_add_rcu(&zdev->iommu_list, &s390_domain->devices); ··· 479 466 if (zdev->tlb_refresh) 480 467 dev->iommu->shadow_on_flush = 1; 481 468 469 + /* Start with DMA blocked */ 470 + spin_lock_init(&zdev->dom_lock); 471 + zdev_s390_domain_update(zdev, &blocking_domain); 472 + 482 473 return &zdev->iommu_dev; 483 - } 484 - 485 - static void s390_iommu_release_device(struct device *dev) 486 - { 487 - struct zpci_dev *zdev = to_zpci_dev(dev); 488 - 489 - /* 490 - * release_device is expected to detach any domain currently attached 491 - * to the device, but keep it attached to other devices in the group. 492 - */ 493 - if (zdev) 494 - s390_iommu_detach_device(&zdev->s390_domain->domain, dev); 495 474 } 496 475 497 476 static int zpci_refresh_all(struct zpci_dev *zdev) ··· 702 697 703 698 struct zpci_iommu_ctrs *zpci_get_iommu_ctrs(struct zpci_dev *zdev) 704 699 { 705 - if (!zdev || !zdev->s390_domain) 700 + struct s390_domain *s390_domain; 701 + 702 + lockdep_assert_held(&zdev->dom_lock); 703 + 704 + if (zdev->s390_domain->type == IOMMU_DOMAIN_BLOCKED) 706 705 return NULL; 707 - return &zdev->s390_domain->ctrs; 706 + 707 + s390_domain = to_s390_domain(zdev->s390_domain); 708 + return &s390_domain->ctrs; 708 709 } 709 710 710 711 int zpci_init_iommu(struct zpci_dev *zdev) ··· 787 776 } 788 777 subsys_initcall(s390_iommu_init); 789 778 779 + static struct iommu_domain blocking_domain = { 780 + .type = IOMMU_DOMAIN_BLOCKED, 781 + .ops = &(const struct iommu_domain_ops) { 782 + .attach_dev = blocking_domain_attach_device, 783 + } 784 + }; 785 + 790 786 static const struct iommu_ops s390_iommu_ops = { 787 + .blocked_domain = &blocking_domain, 788 + .release_domain = &blocking_domain, 791 789 .capable = s390_iommu_capable, 792 790 .domain_alloc_paging = s390_domain_alloc_paging, 793 791 .probe_device = s390_iommu_probe_device, 794 - .release_device = s390_iommu_release_device, 795 792 .device_group = generic_device_group, 796 793 .pgsize_bitmap = SZ_4K, 797 794 .get_resv_regions = s390_iommu_get_resv_regions,
+3 -3
drivers/remoteproc/remoteproc_core.c
··· 109 109 return 0; 110 110 } 111 111 112 - domain = iommu_domain_alloc(dev->bus); 113 - if (!domain) { 112 + domain = iommu_paging_domain_alloc(dev); 113 + if (IS_ERR(domain)) { 114 114 dev_err(dev, "can't alloc iommu domain\n"); 115 - return -ENOMEM; 115 + return PTR_ERR(domain); 116 116 } 117 117 118 118 iommu_set_fault_handler(domain, rproc_iommu_fault, rproc);
-1
include/linux/dmar.h
··· 292 292 struct irq_data; 293 293 extern void dmar_msi_unmask(struct irq_data *data); 294 294 extern void dmar_msi_mask(struct irq_data *data); 295 - extern void dmar_msi_read(int irq, struct msi_msg *msg); 296 295 extern void dmar_msi_write(int irq, struct msi_msg *msg); 297 296 extern int dmar_set_interrupt(struct intel_iommu *iommu); 298 297 extern irqreturn_t dmar_fault(int irq, void *dev_id);
+11 -14
include/linux/iommu.h
··· 559 559 * the caller iommu_domain_alloc() returns. 560 560 * @domain_alloc_user: Allocate an iommu domain corresponding to the input 561 561 * parameters as defined in include/uapi/linux/iommufd.h. 562 - * Unlike @domain_alloc, it is called only by IOMMUFD and 563 - * must fully initialize the new domain before return. 564 562 * Upon success, if the @user_data is valid and the @parent 565 563 * points to a kernel-managed domain, the new domain must be 566 564 * IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be ··· 674 676 * * EBUSY - device is attached to a domain and cannot be changed 675 677 * * ENODEV - device specific errors, not able to be attached 676 678 * * <others> - treated as ENODEV by the caller. Use is discouraged 677 - * @set_dev_pasid: set an iommu domain to a pasid of device 679 + * @set_dev_pasid: set or replace an iommu domain to a pasid of device. The pasid of 680 + * the device should be left in the old config in error case. 678 681 * @map_pages: map a physically contiguous set of pages of the same size to 679 682 * an iommu domain. 680 683 * @unmap_pages: unmap a number of pages of the same size from an iommu domain ··· 700 701 struct iommu_domain_ops { 701 702 int (*attach_dev)(struct iommu_domain *domain, struct device *dev); 702 703 int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev, 703 - ioasid_t pasid); 704 + ioasid_t pasid, struct iommu_domain *old); 704 705 705 706 int (*map_pages)(struct iommu_domain *domain, unsigned long iova, 706 707 phys_addr_t paddr, size_t pgsize, size_t pgcount, ··· 841 842 }; 842 843 } 843 844 844 - extern int bus_iommu_probe(const struct bus_type *bus); 845 845 extern bool iommu_present(const struct bus_type *bus); 846 846 extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); 847 847 extern bool iommu_group_has_isolated_msi(struct iommu_group *group); 848 - extern struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus); 849 - struct iommu_domain *iommu_paging_domain_alloc(struct device *dev); 848 + struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev, unsigned int flags); 849 + static inline struct iommu_domain *iommu_paging_domain_alloc(struct device *dev) 850 + { 851 + return iommu_paging_domain_alloc_flags(dev, 0); 852 + } 850 853 extern void iommu_domain_free(struct iommu_domain *domain); 851 854 extern int iommu_attach_device(struct iommu_domain *domain, 852 855 struct device *dev); ··· 1141 1140 struct iommu_dirty_bitmap {}; 1142 1141 struct iommu_dirty_ops {}; 1143 1142 1144 - static inline bool iommu_present(const struct bus_type *bus) 1145 - { 1146 - return false; 1147 - } 1148 - 1149 1143 static inline bool device_iommu_capable(struct device *dev, enum iommu_cap cap) 1150 1144 { 1151 1145 return false; 1152 1146 } 1153 1147 1154 - static inline struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus) 1148 + static inline struct iommu_domain *iommu_paging_domain_alloc_flags(struct device *dev, 1149 + unsigned int flags) 1155 1150 { 1156 - return NULL; 1151 + return ERR_PTR(-ENODEV); 1157 1152 } 1158 1153 1159 1154 static inline struct iommu_domain *iommu_paging_domain_alloc(struct device *dev)
+8
include/uapi/linux/iommufd.h
··· 387 387 * enforced on device attachment 388 388 * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is 389 389 * valid. 390 + * @IOMMU_HWPT_ALLOC_PASID: Requests a domain that can be used with PASID. The 391 + * domain can be attached to any PASID on the device. 392 + * Any domain attached to the non-PASID part of the 393 + * device must also be flaged, otherwise attaching a 394 + * PASID will blocked. 395 + * If IOMMU does not support PASID it will return 396 + * error (-EOPNOTSUPP). 390 397 */ 391 398 enum iommufd_hwpt_alloc_flags { 392 399 IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, 393 400 IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, 394 401 IOMMU_HWPT_FAULT_ID_VALID = 1 << 2, 402 + IOMMU_HWPT_ALLOC_PASID = 1 << 3, 395 403 }; 396 404 397 405 /**