Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
"Core:
- IOMMU memory usage observability - This will make the memory used
for IO page tables explicitly visible.
- Simplify arch_setup_dma_ops()

Intel VT-d:
- Consolidate domain cache invalidation
- Remove private data from page fault message
- Allocate DMAR fault interrupts locally
- Cleanup and refactoring

ARM-SMMUv2:
- Support for fault debugging hardware on Qualcomm implementations
- Re-land support for the ->domain_alloc_paging() callback

ARM-SMMUv3:
- Improve handling of MSI allocation failure
- Drop support for the "disable_bypass" cmdline option
- Major rework of the CD creation code, following on directly from
the STE rework merged last time around.
- Add unit tests for the new STE/CD manipulation logic

AMD-Vi:
- Final part of SVA changes with generic IO page fault handling

Renesas IPMMU:
- Add support for R8A779H0 hardware

... and a couple smaller fixes and updates across the sub-tree"

* tag 'iommu-updates-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (80 commits)
iommu/arm-smmu-v3: Make the kunit into a module
arm64: Properly clean up iommu-dma remnants
iommu/amd: Enable Guest Translation after reading IOMMU feature register
iommu/vt-d: Decouple igfx_off from graphic identity mapping
iommu/amd: Fix compilation error
iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
iommu/arm-smmu-v3: Move the CD generation for SVA into a function
iommu/arm-smmu-v3: Allocate the CD table entry in advance
iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
iommu/arm-smmu-v3: Consolidate clearing a CD table entry
iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
iommu/arm-smmu-v3: Add an ops indirection to the STE code
iommu/arm-smmu-qcom: Don't build debug features as a kernel module
iommu/amd: Add SVA domain support
iommu: Add ops->domain_alloc_sva()
iommu/amd: Initial SVA support for AMD IOMMU
iommu/amd: Add support for enable/disable IOPF
iommu/amd: Add IO page fault notifier handler
...

+3539 -1601
+1 -1
Documentation/admin-guide/cgroup-v2.rst
··· 1435 1435 sec_pagetables 1436 1436 Amount of memory allocated for secondary page tables, 1437 1437 this currently includes KVM mmu allocations on x86 1438 - and arm64. 1438 + and arm64 and IOMMU page tables. 1439 1439 1440 1440 percpu (npn) 1441 1441 Amount of memory used for storing per-cpu kernel
+69
Documentation/devicetree/bindings/iommu/qcom,tbu.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/iommu/qcom,tbu.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Qualcomm TBU (Translation Buffer Unit) 8 + 9 + maintainers: 10 + - Georgi Djakov <quic_c_gdjako@quicinc.com> 11 + 12 + description: 13 + The Qualcomm SMMU500 implementation consists of TCU and TBU. The TBU contains 14 + a Translation Lookaside Buffer (TLB) that caches page tables. TBUs provides 15 + debug features to trace and trigger debug transactions. There are multiple TBU 16 + instances with each client core. 17 + 18 + properties: 19 + compatible: 20 + enum: 21 + - qcom,sc7280-tbu 22 + - qcom,sdm845-tbu 23 + 24 + reg: 25 + maxItems: 1 26 + 27 + clocks: 28 + maxItems: 1 29 + 30 + interconnects: 31 + maxItems: 1 32 + 33 + power-domains: 34 + maxItems: 1 35 + 36 + qcom,stream-id-range: 37 + description: | 38 + Phandle of a SMMU device and Stream ID range (address and size) that 39 + is assigned by the TBU 40 + $ref: /schemas/types.yaml#/definitions/phandle-array 41 + items: 42 + - items: 43 + - description: phandle of a smmu node 44 + - description: stream id base address 45 + - description: stream id size 46 + 47 + required: 48 + - compatible 49 + - reg 50 + - qcom,stream-id-range 51 + 52 + additionalProperties: false 53 + 54 + examples: 55 + - | 56 + #include <dt-bindings/clock/qcom,gcc-sdm845.h> 57 + #include <dt-bindings/interconnect/qcom,icc.h> 58 + #include <dt-bindings/interconnect/qcom,sdm845.h> 59 + 60 + tbu@150e1000 { 61 + compatible = "qcom,sdm845-tbu"; 62 + reg = <0x150e1000 0x1000>; 63 + clocks = <&gcc GCC_AGGRE_NOC_PCIE_TBU_CLK>; 64 + interconnects = <&system_noc MASTER_GNOC_SNOC QCOM_ICC_TAG_ACTIVE_ONLY 65 + &config_noc SLAVE_IMEM_CFG QCOM_ICC_TAG_ACTIVE_ONLY>; 66 + power-domains = <&gcc HLOS1_VOTE_AGGRE_NOC_MMU_PCIE_TBU_GDSC>; 67 + qcom,stream-id-range = <&apps_smmu 0x1c00 0x400>; 68 + }; 69 + ...
+1
Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml
··· 50 50 - renesas,ipmmu-r8a779a0 # R-Car V3U 51 51 - renesas,ipmmu-r8a779f0 # R-Car S4-8 52 52 - renesas,ipmmu-r8a779g0 # R-Car V4H 53 + - renesas,ipmmu-r8a779h0 # R-Car V4M 53 54 - const: renesas,rcar-gen4-ipmmu-vmsa # R-Car Gen4 54 55 55 56 reg:
+2 -2
Documentation/filesystems/proc.rst
··· 1110 1110 PageTables 1111 1111 Memory consumed by userspace page tables 1112 1112 SecPageTables 1113 - Memory consumed by secondary page tables, this currently 1114 - currently includes KVM mmu allocations on x86 and arm64. 1113 + Memory consumed by secondary page tables, this currently includes 1114 + KVM mmu and IOMMU allocations on x86 and arm64. 1115 1115 NFS_Unstable 1116 1116 Always zero. Previous counted pages which had been written to 1117 1117 the server, but has not been committed to stable storage.
+1 -2
arch/arc/mm/dma.c
··· 90 90 /* 91 91 * Plug in direct dma map ops. 92 92 */ 93 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 94 - bool coherent) 93 + void arch_setup_dma_ops(struct device *dev, bool coherent) 95 94 { 96 95 /* 97 96 * IOC hardware snoops all DMA traffic keeping the caches consistent
+1 -2
arch/arm/mm/dma-mapping-nommu.c
··· 33 33 } 34 34 } 35 35 36 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 37 - bool coherent) 36 + void arch_setup_dma_ops(struct device *dev, bool coherent) 38 37 { 39 38 if (IS_ENABLED(CONFIG_CPU_V7M)) { 40 39 /*
+9 -7
arch/arm/mm/dma-mapping.c
··· 1709 1709 } 1710 1710 EXPORT_SYMBOL_GPL(arm_iommu_detach_device); 1711 1711 1712 - static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, 1713 - bool coherent) 1712 + static void arm_setup_iommu_dma_ops(struct device *dev) 1714 1713 { 1715 1714 struct dma_iommu_mapping *mapping; 1715 + u64 dma_base = 0, size = 1ULL << 32; 1716 1716 1717 + if (dev->dma_range_map) { 1718 + dma_base = dma_range_map_min(dev->dma_range_map); 1719 + size = dma_range_map_max(dev->dma_range_map) - dma_base; 1720 + } 1717 1721 mapping = arm_iommu_create_mapping(dev->bus, dma_base, size); 1718 1722 if (IS_ERR(mapping)) { 1719 1723 pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n", ··· 1748 1744 1749 1745 #else 1750 1746 1751 - static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, 1752 - bool coherent) 1747 + static void arm_setup_iommu_dma_ops(struct device *dev) 1753 1748 { 1754 1749 } 1755 1750 ··· 1756 1753 1757 1754 #endif /* CONFIG_ARM_DMA_USE_IOMMU */ 1758 1755 1759 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 1760 - bool coherent) 1756 + void arch_setup_dma_ops(struct device *dev, bool coherent) 1761 1757 { 1762 1758 /* 1763 1759 * Due to legacy code that sets the ->dma_coherent flag from a bus ··· 1776 1774 return; 1777 1775 1778 1776 if (device_iommu_mapped(dev)) 1779 - arm_setup_iommu_dma_ops(dev, dma_base, size, coherent); 1777 + arm_setup_iommu_dma_ops(dev); 1780 1778 1781 1779 xen_setup_dma_ops(dev); 1782 1780 dev->archdata.dma_ops_setup = true;
-1
arch/arm64/Kconfig
··· 46 46 select ARCH_HAS_SYNC_DMA_FOR_DEVICE 47 47 select ARCH_HAS_SYNC_DMA_FOR_CPU 48 48 select ARCH_HAS_SYSCALL_WRAPPER 49 - select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_SUPPORT 50 49 select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST 51 50 select ARCH_HAS_ZONE_DMA_SET if EXPERT 52 51 select ARCH_HAVE_ELF_PROT
+1 -12
arch/arm64/mm/dma-mapping.c
··· 7 7 #include <linux/gfp.h> 8 8 #include <linux/cache.h> 9 9 #include <linux/dma-map-ops.h> 10 - #include <linux/iommu.h> 11 10 #include <xen/xen.h> 12 11 13 12 #include <asm/cacheflush.h> ··· 38 39 dcache_clean_poc(start, start + size); 39 40 } 40 41 41 - #ifdef CONFIG_IOMMU_DMA 42 - void arch_teardown_dma_ops(struct device *dev) 43 - { 44 - dev->dma_ops = NULL; 45 - } 46 - #endif 47 - 48 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 49 - bool coherent) 42 + void arch_setup_dma_ops(struct device *dev, bool coherent) 50 43 { 51 44 int cls = cache_line_size_of_cpu(); 52 45 ··· 49 58 ARCH_DMA_MINALIGN, cls); 50 59 51 60 dev->dma_coherent = coherent; 52 - if (device_iommu_mapped(dev)) 53 - iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1); 54 61 55 62 xen_setup_dma_ops(dev); 56 63 }
+2 -7
arch/loongarch/kernel/dma.c
··· 8 8 void acpi_arch_dma_setup(struct device *dev) 9 9 { 10 10 int ret; 11 - u64 mask, end = 0; 11 + u64 mask, end; 12 12 const struct bus_dma_region *map = NULL; 13 13 14 14 ret = acpi_dma_get_range(dev, &map); 15 15 if (!ret && map) { 16 - const struct bus_dma_region *r = map; 17 - 18 - for (end = 0; r->size; r++) { 19 - if (r->dma_start + r->size - 1 > end) 20 - end = r->dma_start + r->size - 1; 21 - } 16 + end = dma_range_map_max(map); 22 17 23 18 mask = DMA_BIT_MASK(ilog2(end) + 1); 24 19 dev->bus_dma_limit = end;
+1 -2
arch/mips/mm/dma-noncoherent.c
··· 137 137 #endif 138 138 139 139 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS 140 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 141 - bool coherent) 140 + void arch_setup_dma_ops(struct device *dev, bool coherent) 142 141 { 143 142 dev->dma_coherent = coherent; 144 143 }
+1 -2
arch/riscv/mm/dma-noncoherent.c
··· 128 128 ALT_CMO_OP(FLUSH, flush_addr, size, riscv_cbom_block_size); 129 129 } 130 130 131 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 132 - bool coherent) 131 + void arch_setup_dma_ops(struct device *dev, bool coherent) 133 132 { 134 133 WARN_TAINT(!coherent && riscv_cbom_block_size > ARCH_DMA_MINALIGN, 135 134 TAINT_CPU_OUT_OF_SPEC,
+4 -13
drivers/acpi/arm64/dma.c
··· 8 8 { 9 9 int ret; 10 10 u64 end, mask; 11 - u64 size = 0; 12 11 const struct bus_dma_region *map = NULL; 13 12 14 13 /* ··· 22 23 } 23 24 24 25 if (dev->coherent_dma_mask) 25 - size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); 26 + end = dev->coherent_dma_mask; 26 27 else 27 - size = 1ULL << 32; 28 + end = (1ULL << 32) - 1; 28 29 29 30 ret = acpi_dma_get_range(dev, &map); 30 31 if (!ret && map) { 31 - const struct bus_dma_region *r = map; 32 - 33 - for (end = 0; r->size; r++) { 34 - if (r->dma_start + r->size - 1 > end) 35 - end = r->dma_start + r->size - 1; 36 - } 37 - 38 - size = end + 1; 32 + end = dma_range_map_max(map); 39 33 dev->dma_range_map = map; 40 34 } 41 35 42 36 if (ret == -ENODEV) 43 - ret = iort_dma_get_ranges(dev, &size); 37 + ret = iort_dma_get_ranges(dev, &end); 44 38 if (!ret) { 45 39 /* 46 40 * Limit coherent and dma mask based on size retrieved from 47 41 * firmware. 48 42 */ 49 - end = size - 1; 50 43 mask = DMA_BIT_MASK(ilog2(end) + 1); 51 44 dev->bus_dma_limit = end; 52 45 dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask);
+10 -10
drivers/acpi/arm64/iort.c
··· 1367 1367 { return -ENODEV; } 1368 1368 #endif 1369 1369 1370 - static int nc_dma_get_range(struct device *dev, u64 *size) 1370 + static int nc_dma_get_range(struct device *dev, u64 *limit) 1371 1371 { 1372 1372 struct acpi_iort_node *node; 1373 1373 struct acpi_iort_named_component *ncomp; ··· 1384 1384 return -EINVAL; 1385 1385 } 1386 1386 1387 - *size = ncomp->memory_address_limit >= 64 ? U64_MAX : 1388 - 1ULL<<ncomp->memory_address_limit; 1387 + *limit = ncomp->memory_address_limit >= 64 ? U64_MAX : 1388 + (1ULL << ncomp->memory_address_limit) - 1; 1389 1389 1390 1390 return 0; 1391 1391 } 1392 1392 1393 - static int rc_dma_get_range(struct device *dev, u64 *size) 1393 + static int rc_dma_get_range(struct device *dev, u64 *limit) 1394 1394 { 1395 1395 struct acpi_iort_node *node; 1396 1396 struct acpi_iort_root_complex *rc; ··· 1408 1408 return -EINVAL; 1409 1409 } 1410 1410 1411 - *size = rc->memory_address_limit >= 64 ? U64_MAX : 1412 - 1ULL<<rc->memory_address_limit; 1411 + *limit = rc->memory_address_limit >= 64 ? U64_MAX : 1412 + (1ULL << rc->memory_address_limit) - 1; 1413 1413 1414 1414 return 0; 1415 1415 } ··· 1417 1417 /** 1418 1418 * iort_dma_get_ranges() - Look up DMA addressing limit for the device 1419 1419 * @dev: device to lookup 1420 - * @size: DMA range size result pointer 1420 + * @limit: DMA limit result pointer 1421 1421 * 1422 1422 * Return: 0 on success, an error otherwise. 1423 1423 */ 1424 - int iort_dma_get_ranges(struct device *dev, u64 *size) 1424 + int iort_dma_get_ranges(struct device *dev, u64 *limit) 1425 1425 { 1426 1426 if (dev_is_pci(dev)) 1427 - return rc_dma_get_range(dev, size); 1427 + return rc_dma_get_range(dev, limit); 1428 1428 else 1429 - return nc_dma_get_range(dev, size); 1429 + return nc_dma_get_range(dev, limit); 1430 1430 } 1431 1431 1432 1432 static void __init acpi_iort_register_irq(int hwirq, const char *name,
+1 -6
drivers/acpi/scan.c
··· 1675 1675 if (ret == -EPROBE_DEFER) 1676 1676 return -EPROBE_DEFER; 1677 1677 1678 - /* 1679 - * Historically this routine doesn't fail driver probing due to errors 1680 - * in acpi_iommu_configure_id(). 1681 - */ 1682 - 1683 - arch_setup_dma_ops(dev, 0, U64_MAX, attr == DEV_DMA_COHERENT); 1678 + arch_setup_dma_ops(dev, attr == DEV_DMA_COHERENT); 1684 1679 1685 1680 return 0; 1686 1681 }
+1 -5
drivers/hv/hv_common.c
··· 561 561 562 562 void hv_setup_dma_ops(struct device *dev, bool coherent) 563 563 { 564 - /* 565 - * Hyper-V does not offer a vIOMMU in the guest 566 - * VM, so pass 0/NULL for the IOMMU settings 567 - */ 568 - arch_setup_dma_ops(dev, 0, 0, coherent); 564 + arch_setup_dma_ops(dev, coherent); 569 565 } 570 566 EXPORT_SYMBOL_GPL(hv_setup_dma_ops); 571 567
+20 -5
drivers/iommu/Kconfig
··· 376 376 377 377 config ARM_SMMU_QCOM_DEBUG 378 378 bool "ARM SMMU QCOM implementation defined debug support" 379 - depends on ARM_SMMU_QCOM 379 + depends on ARM_SMMU_QCOM=y 380 380 help 381 381 Support for implementation specific debug features in ARM SMMU 382 - hardware found in QTI platforms. 382 + hardware found in QTI platforms. This include support for 383 + the Translation Buffer Units (TBU) that can be used to obtain 384 + additional information when debugging memory management issues 385 + like context faults. 383 386 384 - Say Y here to enable debug for issues such as TLB sync timeouts 385 - which requires implementation defined register dumps. 387 + Say Y here to enable debug for issues such as context faults 388 + or TLB sync timeouts which requires implementation defined 389 + register dumps. 386 390 387 391 config ARM_SMMU_V3 388 392 tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support" ··· 401 397 Say Y here if your system includes an IOMMU device implementing 402 398 the ARM SMMUv3 architecture. 403 399 400 + if ARM_SMMU_V3 404 401 config ARM_SMMU_V3_SVA 405 402 bool "Shared Virtual Addressing support for the ARM SMMUv3" 406 - depends on ARM_SMMU_V3 407 403 select IOMMU_SVA 408 404 select IOMMU_IOPF 409 405 select MMU_NOTIFIER ··· 413 409 414 410 Say Y here if your system supports SVA extensions such as PCIe PASID 415 411 and PRI. 412 + 413 + config ARM_SMMU_V3_KUNIT_TEST 414 + tristate "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS 415 + depends on KUNIT 416 + depends on ARM_SMMU_V3_SVA 417 + default KUNIT_ALL_TESTS 418 + help 419 + Enable this option to unit-test arm-smmu-v3 driver functions. 420 + 421 + If unsure, say N. 422 + endif 416 423 417 424 config S390_IOMMU 418 425 def_bool y if S390 && PCI
+3
drivers/iommu/amd/Kconfig
··· 7 7 select PCI_ATS 8 8 select PCI_PRI 9 9 select PCI_PASID 10 + select MMU_NOTIFIER 10 11 select IOMMU_API 11 12 select IOMMU_IOVA 12 13 select IOMMU_IO_PGTABLE 14 + select IOMMU_SVA 15 + select IOMMU_IOPF 13 16 select IOMMUFD_DRIVER if IOMMUFD 14 17 depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE 15 18 help
+1 -1
drivers/iommu/amd/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 - obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o 2 + obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o ppr.o pasid.o 3 3 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
+42 -16
drivers/iommu/amd/amd_iommu.h
··· 17 17 irqreturn_t amd_iommu_int_thread_galog(int irq, void *data); 18 18 irqreturn_t amd_iommu_int_handler(int irq, void *data); 19 19 void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid); 20 + void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type, 21 + u8 cntrl_intr, u8 cntrl_log, 22 + u32 status_run_mask, u32 status_overflow_mask); 20 23 void amd_iommu_restart_event_logging(struct amd_iommu *iommu); 21 24 void amd_iommu_restart_ga_log(struct amd_iommu *iommu); 22 25 void amd_iommu_restart_ppr_log(struct amd_iommu *iommu); 23 26 void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid); 27 + void iommu_feature_enable(struct amd_iommu *iommu, u8 bit); 28 + void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, 29 + gfp_t gfp, size_t size); 24 30 25 31 #ifdef CONFIG_AMD_IOMMU_DEBUGFS 26 32 void amd_iommu_debugfs_setup(struct amd_iommu *iommu); ··· 39 33 int amd_iommu_enable(void); 40 34 void amd_iommu_disable(void); 41 35 int amd_iommu_reenable(int mode); 42 - int amd_iommu_enable_faulting(void); 36 + int amd_iommu_enable_faulting(unsigned int cpu); 43 37 extern int amd_iommu_guest_ir; 44 38 extern enum io_pgtable_fmt amd_iommu_pgtable; 45 39 extern int amd_iommu_gpt_level; 46 40 47 - bool amd_iommu_v2_supported(void); 41 + /* Protection domain ops */ 42 + struct protection_domain *protection_domain_alloc(unsigned int type); 43 + void protection_domain_free(struct protection_domain *domain); 44 + struct iommu_domain *amd_iommu_domain_alloc_sva(struct device *dev, 45 + struct mm_struct *mm); 46 + void amd_iommu_domain_free(struct iommu_domain *dom); 47 + int iommu_sva_set_dev_pasid(struct iommu_domain *domain, 48 + struct device *dev, ioasid_t pasid); 49 + void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 50 + struct iommu_domain *domain); 48 51 49 - /* Device capabilities */ 50 - int amd_iommu_pdev_enable_cap_pri(struct pci_dev *pdev); 51 - void amd_iommu_pdev_disable_cap_pri(struct pci_dev *pdev); 52 + /* SVA/PASID */ 53 + bool amd_iommu_pasid_supported(void); 54 + 55 + /* IOPF */ 56 + int amd_iommu_iopf_init(struct amd_iommu *iommu); 57 + void amd_iommu_iopf_uninit(struct amd_iommu *iommu); 58 + void amd_iommu_page_response(struct device *dev, struct iopf_fault *evt, 59 + struct iommu_page_response *resp); 60 + int amd_iommu_iopf_add_device(struct amd_iommu *iommu, 61 + struct iommu_dev_data *dev_data); 62 + void amd_iommu_iopf_remove_device(struct amd_iommu *iommu, 63 + struct iommu_dev_data *dev_data); 52 64 53 65 /* GCR3 setup */ 54 66 int amd_iommu_set_gcr3(struct iommu_dev_data *dev_data, 55 67 ioasid_t pasid, unsigned long gcr3); 56 68 int amd_iommu_clear_gcr3(struct iommu_dev_data *dev_data, ioasid_t pasid); 69 + 70 + /* PPR */ 71 + int __init amd_iommu_alloc_ppr_log(struct amd_iommu *iommu); 72 + void __init amd_iommu_free_ppr_log(struct amd_iommu *iommu); 73 + void amd_iommu_enable_ppr_log(struct amd_iommu *iommu); 74 + void amd_iommu_poll_ppr_log(struct amd_iommu *iommu); 75 + int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag); 57 76 58 77 /* 59 78 * This function flushes all internal caches of ··· 87 56 void amd_iommu_flush_all_caches(struct amd_iommu *iommu); 88 57 void amd_iommu_update_and_flush_device_table(struct protection_domain *domain); 89 58 void amd_iommu_domain_update(struct protection_domain *domain); 59 + void amd_iommu_dev_update_dte(struct iommu_dev_data *dev_data, bool set); 90 60 void amd_iommu_domain_flush_complete(struct protection_domain *domain); 91 61 void amd_iommu_domain_flush_pages(struct protection_domain *domain, 92 62 u64 address, size_t size); ··· 104 72 return 0; 105 73 } 106 74 #endif 107 - 108 - int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid, 109 - int status, int tag); 110 75 111 76 static inline bool is_rd890_iommu(struct pci_dev *pdev) 112 77 { ··· 163 134 return PCI_SEG_DEVID_TO_SBDF(seg, devid); 164 135 } 165 136 166 - static inline void *alloc_pgtable_page(int nid, gfp_t gfp) 167 - { 168 - struct page *page; 169 - 170 - page = alloc_pages_node(nid, gfp | __GFP_ZERO, 0); 171 - return page ? page_address(page) : NULL; 172 - } 173 - 174 137 /* 175 138 * This must be called after device probe completes. During probe 176 139 * use rlookup_amd_iommu() get the iommu. ··· 176 155 static inline struct amd_iommu *get_amd_iommu_from_dev_data(struct iommu_dev_data *dev_data) 177 156 { 178 157 return iommu_get_iommu_dev(dev_data->dev, struct amd_iommu, iommu); 158 + } 159 + 160 + static inline struct protection_domain *to_pdomain(struct iommu_domain *dom) 161 + { 162 + return container_of(dom, struct protection_domain, domain); 179 163 } 180 164 181 165 bool translation_pre_enabled(struct amd_iommu *iommu);
+33
drivers/iommu/amd/amd_iommu_types.h
··· 8 8 #ifndef _ASM_X86_AMD_IOMMU_TYPES_H 9 9 #define _ASM_X86_AMD_IOMMU_TYPES_H 10 10 11 + #include <linux/iommu.h> 11 12 #include <linux/types.h> 13 + #include <linux/mmu_notifier.h> 12 14 #include <linux/mutex.h> 13 15 #include <linux/msi.h> 14 16 #include <linux/list.h> ··· 252 250 #define PPR_LOG_SIZE_512 (0x9ULL << PPR_LOG_SIZE_SHIFT) 253 251 #define PPR_ENTRY_SIZE 16 254 252 #define PPR_LOG_SIZE (PPR_ENTRY_SIZE * PPR_LOG_ENTRIES) 253 + 254 + /* PAGE_SERVICE_REQUEST PPR Log Buffer Entry flags */ 255 + #define PPR_FLAG_EXEC 0x002 /* Execute permission requested */ 256 + #define PPR_FLAG_READ 0x004 /* Read permission requested */ 257 + #define PPR_FLAG_WRITE 0x020 /* Write permission requested */ 258 + #define PPR_FLAG_US 0x040 /* 1: User, 0: Supervisor */ 259 + #define PPR_FLAG_RVSD 0x080 /* Reserved bit not zero */ 260 + #define PPR_FLAG_GN 0x100 /* GVA and PASID is valid */ 255 261 256 262 #define PPR_REQ_TYPE(x) (((x) >> 60) & 0xfULL) 257 263 #define PPR_FLAGS(x) (((x) >> 48) & 0xfffULL) ··· 513 503 list_for_each_entry((iommu), &amd_iommu_list, list) 514 504 #define for_each_iommu_safe(iommu, next) \ 515 505 list_for_each_entry_safe((iommu), (next), &amd_iommu_list, list) 506 + /* Making iterating over protection_domain->dev_data_list easier */ 507 + #define for_each_pdom_dev_data(pdom_dev_data, pdom) \ 508 + list_for_each_entry(pdom_dev_data, &pdom->dev_data_list, list) 509 + #define for_each_pdom_dev_data_safe(pdom_dev_data, next, pdom) \ 510 + list_for_each_entry_safe((pdom_dev_data), (next), &pdom->dev_data_list, list) 516 511 517 512 struct amd_iommu; 518 513 struct iommu_domain; ··· 559 544 PD_MODE_V2, 560 545 }; 561 546 547 + /* Track dev_data/PASID list for the protection domain */ 548 + struct pdom_dev_data { 549 + /* Points to attached device data */ 550 + struct iommu_dev_data *dev_data; 551 + /* PASID attached to the protection domain */ 552 + ioasid_t pasid; 553 + /* For protection_domain->dev_data_list */ 554 + struct list_head list; 555 + }; 556 + 562 557 /* 563 558 * This structure contains generic data for IOMMU protection domains 564 559 * independent of their use. ··· 585 560 bool dirty_tracking; /* dirty tracking is enabled in the domain */ 586 561 unsigned dev_cnt; /* devices assigned to this domain */ 587 562 unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */ 563 + 564 + struct mmu_notifier mn; /* mmu notifier for the SVA domain */ 565 + struct list_head dev_data_list; /* List of pdom_dev_data */ 588 566 }; 589 567 590 568 /* ··· 790 762 /* DebugFS Info */ 791 763 struct dentry *debugfs; 792 764 #endif 765 + 766 + /* IOPF support */ 767 + struct iopf_queue *iopf_queue; 768 + unsigned char iopfq_name[32]; 793 769 }; 794 770 795 771 static inline struct amd_iommu *dev_to_amd_iommu(struct device *dev) ··· 845 813 struct device *dev; 846 814 u16 devid; /* PCI Device ID */ 847 815 816 + u32 max_pasids; /* Max supported PASIDs */ 848 817 u32 flags; /* Holds AMD_IOMMU_DEVICE_FLAG_<*> */ 849 818 int ats_qdep; 850 819 u8 ats_enabled :1; /* ATS state */
+69 -108
drivers/iommu/amd/init.c
··· 36 36 37 37 #include "amd_iommu.h" 38 38 #include "../irq_remapping.h" 39 + #include "../iommu-pages.h" 39 40 40 41 /* 41 42 * definitions for the ACPI scanning code ··· 420 419 } 421 420 422 421 /* Generic functions to enable/disable certain features of the IOMMU. */ 423 - static void iommu_feature_enable(struct amd_iommu *iommu, u8 bit) 422 + void iommu_feature_enable(struct amd_iommu *iommu, u8 bit) 424 423 { 425 424 u64 ctrl; 426 425 ··· 650 649 /* Allocate per PCI segment device table */ 651 650 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg) 652 651 { 653 - pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | GFP_DMA32, 654 - get_order(pci_seg->dev_table_size)); 652 + pci_seg->dev_table = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32, 653 + get_order(pci_seg->dev_table_size)); 655 654 if (!pci_seg->dev_table) 656 655 return -ENOMEM; 657 656 ··· 660 659 661 660 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg) 662 661 { 663 - free_pages((unsigned long)pci_seg->dev_table, 664 - get_order(pci_seg->dev_table_size)); 662 + iommu_free_pages(pci_seg->dev_table, 663 + get_order(pci_seg->dev_table_size)); 665 664 pci_seg->dev_table = NULL; 666 665 } 667 666 668 667 /* Allocate per PCI segment IOMMU rlookup table. */ 669 668 static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg) 670 669 { 671 - pci_seg->rlookup_table = (void *)__get_free_pages( 672 - GFP_KERNEL | __GFP_ZERO, 673 - get_order(pci_seg->rlookup_table_size)); 670 + pci_seg->rlookup_table = iommu_alloc_pages(GFP_KERNEL, 671 + get_order(pci_seg->rlookup_table_size)); 674 672 if (pci_seg->rlookup_table == NULL) 675 673 return -ENOMEM; 676 674 ··· 678 678 679 679 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg) 680 680 { 681 - free_pages((unsigned long)pci_seg->rlookup_table, 682 - get_order(pci_seg->rlookup_table_size)); 681 + iommu_free_pages(pci_seg->rlookup_table, 682 + get_order(pci_seg->rlookup_table_size)); 683 683 pci_seg->rlookup_table = NULL; 684 684 } 685 685 686 686 static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg) 687 687 { 688 - pci_seg->irq_lookup_table = (void *)__get_free_pages( 689 - GFP_KERNEL | __GFP_ZERO, 690 - get_order(pci_seg->rlookup_table_size)); 688 + pci_seg->irq_lookup_table = iommu_alloc_pages(GFP_KERNEL, 689 + get_order(pci_seg->rlookup_table_size)); 691 690 kmemleak_alloc(pci_seg->irq_lookup_table, 692 691 pci_seg->rlookup_table_size, 1, GFP_KERNEL); 693 692 if (pci_seg->irq_lookup_table == NULL) ··· 698 699 static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg) 699 700 { 700 701 kmemleak_free(pci_seg->irq_lookup_table); 701 - free_pages((unsigned long)pci_seg->irq_lookup_table, 702 - get_order(pci_seg->rlookup_table_size)); 702 + iommu_free_pages(pci_seg->irq_lookup_table, 703 + get_order(pci_seg->rlookup_table_size)); 703 704 pci_seg->irq_lookup_table = NULL; 704 705 } 705 706 ··· 707 708 { 708 709 int i; 709 710 710 - pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL, 711 - get_order(pci_seg->alias_table_size)); 711 + pci_seg->alias_table = iommu_alloc_pages(GFP_KERNEL, 712 + get_order(pci_seg->alias_table_size)); 712 713 if (!pci_seg->alias_table) 713 714 return -ENOMEM; 714 715 ··· 723 724 724 725 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg) 725 726 { 726 - free_pages((unsigned long)pci_seg->alias_table, 727 - get_order(pci_seg->alias_table_size)); 727 + iommu_free_pages(pci_seg->alias_table, 728 + get_order(pci_seg->alias_table_size)); 728 729 pci_seg->alias_table = NULL; 729 730 } 730 731 ··· 735 736 */ 736 737 static int __init alloc_command_buffer(struct amd_iommu *iommu) 737 738 { 738 - iommu->cmd_buf = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 739 - get_order(CMD_BUFFER_SIZE)); 739 + iommu->cmd_buf = iommu_alloc_pages(GFP_KERNEL, 740 + get_order(CMD_BUFFER_SIZE)); 740 741 741 742 return iommu->cmd_buf ? 0 : -ENOMEM; 742 743 } ··· 745 746 * Interrupt handler has processed all pending events and adjusted head 746 747 * and tail pointer. Reset overflow mask and restart logging again. 747 748 */ 748 - static void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type, 749 - u8 cntrl_intr, u8 cntrl_log, 750 - u32 status_run_mask, u32 status_overflow_mask) 749 + void amd_iommu_restart_log(struct amd_iommu *iommu, const char *evt_type, 750 + u8 cntrl_intr, u8 cntrl_log, 751 + u32 status_run_mask, u32 status_overflow_mask) 751 752 { 752 753 u32 status; 753 754 ··· 786 787 amd_iommu_restart_log(iommu, "GA", CONTROL_GAINT_EN, 787 788 CONTROL_GALOG_EN, MMIO_STATUS_GALOG_RUN_MASK, 788 789 MMIO_STATUS_GALOG_OVERFLOW_MASK); 789 - } 790 - 791 - /* 792 - * This function restarts ppr logging in case the IOMMU experienced 793 - * PPR log overflow. 794 - */ 795 - void amd_iommu_restart_ppr_log(struct amd_iommu *iommu) 796 - { 797 - amd_iommu_restart_log(iommu, "PPR", CONTROL_PPRINT_EN, 798 - CONTROL_PPRLOG_EN, MMIO_STATUS_PPR_RUN_MASK, 799 - MMIO_STATUS_PPR_OVERFLOW_MASK); 800 790 } 801 791 802 792 /* ··· 833 845 834 846 static void __init free_command_buffer(struct amd_iommu *iommu) 835 847 { 836 - free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE)); 848 + iommu_free_pages(iommu->cmd_buf, get_order(CMD_BUFFER_SIZE)); 837 849 } 838 850 839 - static void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, 840 - gfp_t gfp, size_t size) 851 + void *__init iommu_alloc_4k_pages(struct amd_iommu *iommu, gfp_t gfp, 852 + size_t size) 841 853 { 842 854 int order = get_order(size); 843 - void *buf = (void *)__get_free_pages(gfp, order); 855 + void *buf = iommu_alloc_pages(gfp, order); 844 856 845 857 if (buf && 846 858 check_feature(FEATURE_SNP) && 847 859 set_memory_4k((unsigned long)buf, (1 << order))) { 848 - free_pages((unsigned long)buf, order); 860 + iommu_free_pages(buf, order); 849 861 buf = NULL; 850 862 } 851 863 ··· 855 867 /* allocates the memory where the IOMMU will log its events to */ 856 868 static int __init alloc_event_buffer(struct amd_iommu *iommu) 857 869 { 858 - iommu->evt_buf = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO, 870 + iommu->evt_buf = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 859 871 EVT_BUFFER_SIZE); 860 872 861 873 return iommu->evt_buf ? 0 : -ENOMEM; ··· 889 901 890 902 static void __init free_event_buffer(struct amd_iommu *iommu) 891 903 { 892 - free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE)); 893 - } 894 - 895 - /* allocates the memory where the IOMMU will log its events to */ 896 - static int __init alloc_ppr_log(struct amd_iommu *iommu) 897 - { 898 - iommu->ppr_log = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO, 899 - PPR_LOG_SIZE); 900 - 901 - return iommu->ppr_log ? 0 : -ENOMEM; 902 - } 903 - 904 - static void iommu_enable_ppr_log(struct amd_iommu *iommu) 905 - { 906 - u64 entry; 907 - 908 - if (iommu->ppr_log == NULL) 909 - return; 910 - 911 - iommu_feature_enable(iommu, CONTROL_PPR_EN); 912 - 913 - entry = iommu_virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512; 914 - 915 - memcpy_toio(iommu->mmio_base + MMIO_PPR_LOG_OFFSET, 916 - &entry, sizeof(entry)); 917 - 918 - /* set head and tail to zero manually */ 919 - writel(0x00, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 920 - writel(0x00, iommu->mmio_base + MMIO_PPR_TAIL_OFFSET); 921 - 922 - iommu_feature_enable(iommu, CONTROL_PPRLOG_EN); 923 - iommu_feature_enable(iommu, CONTROL_PPRINT_EN); 924 - } 925 - 926 - static void __init free_ppr_log(struct amd_iommu *iommu) 927 - { 928 - free_pages((unsigned long)iommu->ppr_log, get_order(PPR_LOG_SIZE)); 904 + iommu_free_pages(iommu->evt_buf, get_order(EVT_BUFFER_SIZE)); 929 905 } 930 906 931 907 static void free_ga_log(struct amd_iommu *iommu) 932 908 { 933 909 #ifdef CONFIG_IRQ_REMAP 934 - free_pages((unsigned long)iommu->ga_log, get_order(GA_LOG_SIZE)); 935 - free_pages((unsigned long)iommu->ga_log_tail, get_order(8)); 910 + iommu_free_pages(iommu->ga_log, get_order(GA_LOG_SIZE)); 911 + iommu_free_pages(iommu->ga_log_tail, get_order(8)); 936 912 #endif 937 913 } 938 914 ··· 941 989 if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) 942 990 return 0; 943 991 944 - iommu->ga_log = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 945 - get_order(GA_LOG_SIZE)); 992 + iommu->ga_log = iommu_alloc_pages(GFP_KERNEL, get_order(GA_LOG_SIZE)); 946 993 if (!iommu->ga_log) 947 994 goto err_out; 948 995 949 - iommu->ga_log_tail = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 950 - get_order(8)); 996 + iommu->ga_log_tail = iommu_alloc_pages(GFP_KERNEL, get_order(8)); 951 997 if (!iommu->ga_log_tail) 952 998 goto err_out; 953 999 ··· 958 1008 959 1009 static int __init alloc_cwwb_sem(struct amd_iommu *iommu) 960 1010 { 961 - iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO, 1); 1011 + iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1); 962 1012 963 1013 return iommu->cmd_sem ? 0 : -ENOMEM; 964 1014 } ··· 966 1016 static void __init free_cwwb_sem(struct amd_iommu *iommu) 967 1017 { 968 1018 if (iommu->cmd_sem) 969 - free_page((unsigned long)iommu->cmd_sem); 1019 + iommu_free_page((void *)iommu->cmd_sem); 970 1020 } 971 1021 972 1022 static void iommu_enable_xt(struct amd_iommu *iommu) ··· 1031 1081 u32 lo, hi, devid, old_devtb_size; 1032 1082 phys_addr_t old_devtb_phys; 1033 1083 u16 dom_id, dte_v, irq_v; 1034 - gfp_t gfp_flag; 1035 1084 u64 tmp; 1036 1085 1037 1086 /* Each IOMMU use separate device table with the same size */ ··· 1064 1115 if (!old_devtb) 1065 1116 return false; 1066 1117 1067 - gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32; 1068 - pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag, 1069 - get_order(pci_seg->dev_table_size)); 1118 + pci_seg->old_dev_tbl_cpy = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32, 1119 + get_order(pci_seg->dev_table_size)); 1070 1120 if (pci_seg->old_dev_tbl_cpy == NULL) { 1071 1121 pr_err("Failed to allocate memory for copying old device table!\n"); 1072 1122 memunmap(old_devtb); ··· 1631 1683 free_cwwb_sem(iommu); 1632 1684 free_command_buffer(iommu); 1633 1685 free_event_buffer(iommu); 1634 - free_ppr_log(iommu); 1686 + amd_iommu_free_ppr_log(iommu); 1635 1687 free_ga_log(iommu); 1636 1688 iommu_unmap_mmio_space(iommu); 1689 + amd_iommu_iopf_uninit(iommu); 1637 1690 } 1638 1691 1639 1692 static void __init free_iommu_all(void) ··· 2046 2097 amd_iommu_max_glx_val = glxval; 2047 2098 else 2048 2099 amd_iommu_max_glx_val = min(amd_iommu_max_glx_val, glxval); 2100 + 2101 + iommu_enable_gt(iommu); 2049 2102 } 2050 2103 2051 - if (check_feature(FEATURE_PPR) && alloc_ppr_log(iommu)) 2104 + if (check_feature(FEATURE_PPR) && amd_iommu_alloc_ppr_log(iommu)) 2052 2105 return -ENOMEM; 2053 2106 2054 2107 if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) { ··· 2105 2154 amd_iommu_groups, "ivhd%d", iommu->index); 2106 2155 if (ret) 2107 2156 return ret; 2157 + 2158 + /* 2159 + * Allocate per IOMMU IOPF queue here so that in attach device path, 2160 + * PRI capable device can be added to IOPF queue 2161 + */ 2162 + if (amd_iommu_gt_ppr_supported()) { 2163 + ret = amd_iommu_iopf_init(iommu); 2164 + if (ret) 2165 + return ret; 2166 + } 2108 2167 2109 2168 iommu_device_register(&iommu->iommu, &amd_iommu_ops, NULL); 2110 2169 ··· 2734 2773 iommu_enable_command_buffer(iommu); 2735 2774 iommu_enable_event_buffer(iommu); 2736 2775 iommu_set_exclusion_range(iommu); 2737 - iommu_enable_gt(iommu); 2738 2776 iommu_enable_ga(iommu); 2739 2777 iommu_enable_xt(iommu); 2740 2778 iommu_enable_irtcachedis(iommu); ··· 2765 2805 2766 2806 for_each_pci_segment(pci_seg) { 2767 2807 if (pci_seg->old_dev_tbl_cpy != NULL) { 2768 - free_pages((unsigned long)pci_seg->old_dev_tbl_cpy, 2769 - get_order(pci_seg->dev_table_size)); 2808 + iommu_free_pages(pci_seg->old_dev_tbl_cpy, 2809 + get_order(pci_seg->dev_table_size)); 2770 2810 pci_seg->old_dev_tbl_cpy = NULL; 2771 2811 } 2772 2812 } ··· 2779 2819 pr_info("Copied DEV table from previous kernel.\n"); 2780 2820 2781 2821 for_each_pci_segment(pci_seg) { 2782 - free_pages((unsigned long)pci_seg->dev_table, 2783 - get_order(pci_seg->dev_table_size)); 2822 + iommu_free_pages(pci_seg->dev_table, 2823 + get_order(pci_seg->dev_table_size)); 2784 2824 pci_seg->dev_table = pci_seg->old_dev_tbl_cpy; 2785 2825 } 2786 2826 ··· 2790 2830 iommu_disable_irtcachedis(iommu); 2791 2831 iommu_enable_command_buffer(iommu); 2792 2832 iommu_enable_event_buffer(iommu); 2793 - iommu_enable_gt(iommu); 2794 2833 iommu_enable_ga(iommu); 2795 2834 iommu_enable_xt(iommu); 2796 2835 iommu_enable_irtcachedis(iommu); ··· 2799 2840 } 2800 2841 } 2801 2842 2802 - static void enable_iommus_v2(void) 2843 + static void enable_iommus_ppr(void) 2803 2844 { 2804 2845 struct amd_iommu *iommu; 2805 2846 2847 + if (!amd_iommu_gt_ppr_supported()) 2848 + return; 2849 + 2806 2850 for_each_iommu(iommu) 2807 - iommu_enable_ppr_log(iommu); 2851 + amd_iommu_enable_ppr_log(iommu); 2808 2852 } 2809 2853 2810 2854 static void enable_iommus_vapic(void) ··· 2984 3022 2985 3023 static void __init free_dma_resources(void) 2986 3024 { 2987 - free_pages((unsigned long)amd_iommu_pd_alloc_bitmap, 2988 - get_order(MAX_DOMAIN_ID/8)); 3025 + iommu_free_pages(amd_iommu_pd_alloc_bitmap, 3026 + get_order(MAX_DOMAIN_ID / 8)); 2989 3027 amd_iommu_pd_alloc_bitmap = NULL; 2990 3028 2991 3029 free_unity_maps(); ··· 3057 3095 /* Device table - directly used by all IOMMUs */ 3058 3096 ret = -ENOMEM; 3059 3097 3060 - amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages( 3061 - GFP_KERNEL | __GFP_ZERO, 3062 - get_order(MAX_DOMAIN_ID/8)); 3098 + amd_iommu_pd_alloc_bitmap = iommu_alloc_pages(GFP_KERNEL, 3099 + get_order(MAX_DOMAIN_ID / 8)); 3063 3100 if (amd_iommu_pd_alloc_bitmap == NULL) 3064 3101 goto out; 3065 3102 ··· 3142 3181 * PPR and GA log interrupt for all IOMMUs. 3143 3182 */ 3144 3183 enable_iommus_vapic(); 3145 - enable_iommus_v2(); 3184 + enable_iommus_ppr(); 3146 3185 3147 3186 out: 3148 3187 return ret; ··· 3353 3392 return 0; 3354 3393 } 3355 3394 3356 - int __init amd_iommu_enable_faulting(void) 3395 + int __init amd_iommu_enable_faulting(unsigned int cpu) 3357 3396 { 3358 3397 /* We enable MSI later when PCI is initialized */ 3359 3398 return 0; ··· 3651 3690 __setup("ivrs_hpet", parse_ivrs_hpet); 3652 3691 __setup("ivrs_acpihid", parse_ivrs_acpihid); 3653 3692 3654 - bool amd_iommu_v2_supported(void) 3693 + bool amd_iommu_pasid_supported(void) 3655 3694 { 3656 3695 /* CPU page table size should match IOMMU guest page table size */ 3657 3696 if (cpu_feature_enabled(X86_FEATURE_LA57) &&
+7 -6
drivers/iommu/amd/io_pgtable.c
··· 22 22 23 23 #include "amd_iommu_types.h" 24 24 #include "amd_iommu.h" 25 + #include "../iommu-pages.h" 25 26 26 27 static void v1_tlb_flush_all(void *cookie) 27 28 { ··· 157 156 bool ret = true; 158 157 u64 *pte; 159 158 160 - pte = alloc_pgtable_page(domain->nid, gfp); 159 + pte = iommu_alloc_page_node(domain->nid, gfp); 161 160 if (!pte) 162 161 return false; 163 162 ··· 188 187 189 188 out: 190 189 spin_unlock_irqrestore(&domain->lock, flags); 191 - free_page((unsigned long)pte); 190 + iommu_free_page(pte); 192 191 193 192 return ret; 194 193 } ··· 251 250 252 251 if (!IOMMU_PTE_PRESENT(__pte) || 253 252 pte_level == PAGE_MODE_NONE) { 254 - page = alloc_pgtable_page(domain->nid, gfp); 253 + page = iommu_alloc_page_node(domain->nid, gfp); 255 254 256 255 if (!page) 257 256 return NULL; ··· 260 259 261 260 /* pte could have been changed somewhere. */ 262 261 if (!try_cmpxchg64(pte, &__pte, __npte)) 263 - free_page((unsigned long)page); 262 + iommu_free_page(page); 264 263 else if (IOMMU_PTE_PRESENT(__pte)) 265 264 *updated = true; 266 265 ··· 432 431 } 433 432 434 433 /* Everything flushed out, free pages now */ 435 - put_pages_list(&freelist); 434 + iommu_put_pages_list(&freelist); 436 435 437 436 return ret; 438 437 } ··· 581 580 /* Make changes visible to IOMMUs */ 582 581 amd_iommu_domain_update(dom); 583 582 584 - put_pages_list(&freelist); 583 + iommu_put_pages_list(&freelist); 585 584 } 586 585 587 586 static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+7 -11
drivers/iommu/amd/io_pgtable_v2.c
··· 18 18 19 19 #include "amd_iommu_types.h" 20 20 #include "amd_iommu.h" 21 + #include "../iommu-pages.h" 21 22 22 23 #define IOMMU_PAGE_PRESENT BIT_ULL(0) /* Is present */ 23 24 #define IOMMU_PAGE_RW BIT_ULL(1) /* Writeable */ ··· 100 99 return PAGE_MODE_1_LEVEL; 101 100 } 102 101 103 - static inline void free_pgtable_page(u64 *pt) 104 - { 105 - free_page((unsigned long)pt); 106 - } 107 - 108 102 static void free_pgtable(u64 *pt, int level) 109 103 { 110 104 u64 *p; ··· 121 125 if (level > 2) 122 126 free_pgtable(p, level - 1); 123 127 else 124 - free_pgtable_page(p); 128 + iommu_free_page(p); 125 129 } 126 130 127 - free_pgtable_page(pt); 131 + iommu_free_page(pt); 128 132 } 129 133 130 134 /* Allocate page table */ ··· 152 156 } 153 157 154 158 if (!IOMMU_PTE_PRESENT(__pte)) { 155 - page = alloc_pgtable_page(nid, gfp); 159 + page = iommu_alloc_page_node(nid, gfp); 156 160 if (!page) 157 161 return NULL; 158 162 159 163 __npte = set_pgtable_attr(page); 160 164 /* pte could have been changed somewhere. */ 161 165 if (cmpxchg64(pte, __pte, __npte) != __pte) 162 - free_pgtable_page(page); 166 + iommu_free_page(page); 163 167 else if (IOMMU_PTE_PRESENT(__pte)) 164 168 *updated = true; 165 169 ··· 181 185 if (pg_size == IOMMU_PAGE_SIZE_1G) 182 186 free_pgtable(__pte, end_level - 1); 183 187 else if (pg_size == IOMMU_PAGE_SIZE_2M) 184 - free_pgtable_page(__pte); 188 + iommu_free_page(__pte); 185 189 } 186 190 187 191 return pte; ··· 362 366 struct protection_domain *pdom = (struct protection_domain *)cookie; 363 367 int ias = IOMMU_IN_ADDR_BIT_SIZE; 364 368 365 - pgtable->pgd = alloc_pgtable_page(pdom->nid, GFP_ATOMIC); 369 + pgtable->pgd = iommu_alloc_page_node(pdom->nid, GFP_ATOMIC); 366 370 if (!pgtable->pgd) 367 371 return NULL; 368 372
+189 -113
drivers/iommu/amd/iommu.c
··· 42 42 #include "amd_iommu.h" 43 43 #include "../dma-iommu.h" 44 44 #include "../irq_remapping.h" 45 + #include "../iommu-pages.h" 45 46 46 47 #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28)) 47 48 ··· 88 87 static inline bool pdom_is_v2_pgtbl_mode(struct protection_domain *pdom) 89 88 { 90 89 return (pdom && (pdom->pd_mode == PD_MODE_V2)); 90 + } 91 + 92 + static inline bool pdom_is_in_pt_mode(struct protection_domain *pdom) 93 + { 94 + return (pdom->domain.type == IOMMU_DOMAIN_IDENTITY); 95 + } 96 + 97 + /* 98 + * We cannot support PASID w/ existing v1 page table in the same domain 99 + * since it will be nested. However, existing domain w/ v2 page table 100 + * or passthrough mode can be used for PASID. 101 + */ 102 + static inline bool pdom_is_sva_capable(struct protection_domain *pdom) 103 + { 104 + return pdom_is_v2_pgtbl_mode(pdom) || pdom_is_in_pt_mode(pdom); 91 105 } 92 106 93 107 static inline int get_acpihid_device_id(struct device *dev, ··· 193 177 if (devid < 0) 194 178 return NULL; 195 179 return __rlookup_amd_iommu(seg, PCI_SBDF_TO_DEVID(devid)); 196 - } 197 - 198 - static struct protection_domain *to_pdomain(struct iommu_domain *dom) 199 - { 200 - return container_of(dom, struct protection_domain, domain); 201 180 } 202 181 203 182 static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 devid) ··· 395 384 } 396 385 } 397 386 398 - int amd_iommu_pdev_enable_cap_pri(struct pci_dev *pdev) 387 + static inline int pdev_enable_cap_pri(struct pci_dev *pdev) 399 388 { 400 389 struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev); 401 390 int ret = -EINVAL; 402 391 403 392 if (dev_data->pri_enabled) 393 + return 0; 394 + 395 + if (!dev_data->ats_enabled) 404 396 return 0; 405 397 406 398 if (dev_data->flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) { ··· 422 408 return ret; 423 409 } 424 410 425 - void amd_iommu_pdev_disable_cap_pri(struct pci_dev *pdev) 411 + static inline void pdev_disable_cap_pri(struct pci_dev *pdev) 426 412 { 427 413 struct iommu_dev_data *dev_data = dev_iommu_priv_get(&pdev->dev); 428 414 ··· 464 450 { 465 451 pdev_enable_cap_ats(pdev); 466 452 pdev_enable_cap_pasid(pdev); 467 - amd_iommu_pdev_enable_cap_pri(pdev); 468 - 453 + pdev_enable_cap_pri(pdev); 469 454 } 470 455 471 456 static void pdev_disable_caps(struct pci_dev *pdev) 472 457 { 473 458 pdev_disable_cap_ats(pdev); 474 459 pdev_disable_cap_pasid(pdev); 475 - amd_iommu_pdev_disable_cap_pri(pdev); 460 + pdev_disable_cap_pri(pdev); 476 461 } 477 462 478 463 /* ··· 831 818 writel(head, iommu->mmio_base + MMIO_EVT_HEAD_OFFSET); 832 819 } 833 820 834 - static void iommu_poll_ppr_log(struct amd_iommu *iommu) 835 - { 836 - u32 head, tail; 837 - 838 - if (iommu->ppr_log == NULL) 839 - return; 840 - 841 - head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 842 - tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET); 843 - 844 - while (head != tail) { 845 - volatile u64 *raw; 846 - u64 entry[2]; 847 - int i; 848 - 849 - raw = (u64 *)(iommu->ppr_log + head); 850 - 851 - /* 852 - * Hardware bug: Interrupt may arrive before the entry is 853 - * written to memory. If this happens we need to wait for the 854 - * entry to arrive. 855 - */ 856 - for (i = 0; i < LOOP_TIMEOUT; ++i) { 857 - if (PPR_REQ_TYPE(raw[0]) != 0) 858 - break; 859 - udelay(1); 860 - } 861 - 862 - /* Avoid memcpy function-call overhead */ 863 - entry[0] = raw[0]; 864 - entry[1] = raw[1]; 865 - 866 - /* 867 - * To detect the hardware errata 733 we need to clear the 868 - * entry back to zero. This issue does not exist on SNP 869 - * enabled system. Also this buffer is not writeable on 870 - * SNP enabled system. 871 - */ 872 - if (!amd_iommu_snp_en) 873 - raw[0] = raw[1] = 0UL; 874 - 875 - /* Update head pointer of hardware ring-buffer */ 876 - head = (head + PPR_ENTRY_SIZE) % PPR_LOG_SIZE; 877 - writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 878 - 879 - /* TODO: PPR Handler will be added when we add IOPF support */ 880 - 881 - /* Refresh ring-buffer information */ 882 - head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 883 - tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET); 884 - } 885 - } 886 - 887 821 #ifdef CONFIG_IRQ_REMAP 888 822 static int (*iommu_ga_log_notifier)(u32); 889 823 ··· 951 991 { 952 992 amd_iommu_handle_irq(data, "PPR", MMIO_STATUS_PPR_INT_MASK, 953 993 MMIO_STATUS_PPR_OVERFLOW_MASK, 954 - iommu_poll_ppr_log, amd_iommu_restart_ppr_log); 994 + amd_iommu_poll_ppr_log, amd_iommu_restart_ppr_log); 955 995 956 996 return IRQ_HANDLED; 957 997 } ··· 1624 1664 amd_iommu_domain_flush_all(domain); 1625 1665 } 1626 1666 1627 - int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid, 1628 - int status, int tag) 1667 + int amd_iommu_complete_ppr(struct device *dev, u32 pasid, int status, int tag) 1629 1668 { 1630 1669 struct iommu_dev_data *dev_data; 1631 1670 struct amd_iommu *iommu; 1632 1671 struct iommu_cmd cmd; 1633 1672 1634 - dev_data = dev_iommu_priv_get(&pdev->dev); 1635 - iommu = get_amd_iommu_from_dev(&pdev->dev); 1673 + dev_data = dev_iommu_priv_get(dev); 1674 + iommu = get_amd_iommu_from_dev(dev); 1636 1675 1637 1676 build_complete_ppr(&cmd, dev_data->devid, pasid, status, 1638 1677 tag, dev_data->pri_tlp); ··· 1687 1728 1688 1729 ptr = iommu_phys_to_virt(tbl[i] & PAGE_MASK); 1689 1730 1690 - free_page((unsigned long)ptr); 1731 + iommu_free_page(ptr); 1691 1732 } 1692 1733 } 1693 1734 ··· 1720 1761 /* Free per device domain ID */ 1721 1762 domain_id_free(gcr3_info->domid); 1722 1763 1723 - free_page((unsigned long)gcr3_info->gcr3_tbl); 1764 + iommu_free_page(gcr3_info->gcr3_tbl); 1724 1765 gcr3_info->gcr3_tbl = NULL; 1725 1766 } 1726 1767 ··· 1755 1796 /* Allocate per device domain ID */ 1756 1797 gcr3_info->domid = domain_id_alloc(); 1757 1798 1758 - gcr3_info->gcr3_tbl = alloc_pgtable_page(nid, GFP_ATOMIC); 1799 + gcr3_info->gcr3_tbl = iommu_alloc_page_node(nid, GFP_ATOMIC); 1759 1800 if (gcr3_info->gcr3_tbl == NULL) { 1760 1801 domain_id_free(gcr3_info->domid); 1761 1802 return -ENOMEM; ··· 1961 2002 amd_iommu_apply_erratum_63(iommu, devid); 1962 2003 } 1963 2004 2005 + /* Update and flush DTE for the given device */ 2006 + void amd_iommu_dev_update_dte(struct iommu_dev_data *dev_data, bool set) 2007 + { 2008 + struct amd_iommu *iommu = get_amd_iommu_from_dev(dev_data->dev); 2009 + 2010 + if (set) 2011 + set_dte_entry(iommu, dev_data); 2012 + else 2013 + clear_dte_entry(iommu, dev_data->devid); 2014 + 2015 + clone_aliases(iommu, dev_data->dev); 2016 + device_flush_dte(dev_data); 2017 + iommu_completion_wait(iommu); 2018 + } 2019 + 2020 + /* 2021 + * If domain is SVA capable then initialize GCR3 table. Also if domain is 2022 + * in v2 page table mode then update GCR3[0]. 2023 + */ 2024 + static int init_gcr3_table(struct iommu_dev_data *dev_data, 2025 + struct protection_domain *pdom) 2026 + { 2027 + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2028 + int max_pasids = dev_data->max_pasids; 2029 + int ret = 0; 2030 + 2031 + /* 2032 + * If domain is in pt mode then setup GCR3 table only if device 2033 + * is PASID capable 2034 + */ 2035 + if (pdom_is_in_pt_mode(pdom) && !pdev_pasid_supported(dev_data)) 2036 + return ret; 2037 + 2038 + /* 2039 + * By default, setup GCR3 table to support MAX PASIDs 2040 + * supported by the device/IOMMU. 2041 + */ 2042 + ret = setup_gcr3_table(&dev_data->gcr3_info, iommu, 2043 + max_pasids > 0 ? max_pasids : 1); 2044 + if (ret) 2045 + return ret; 2046 + 2047 + /* Setup GCR3[0] only if domain is setup with v2 page table mode */ 2048 + if (!pdom_is_v2_pgtbl_mode(pdom)) 2049 + return ret; 2050 + 2051 + ret = update_gcr3(dev_data, 0, iommu_virt_to_phys(pdom->iop.pgd), true); 2052 + if (ret) 2053 + free_gcr3_table(&dev_data->gcr3_info); 2054 + 2055 + return ret; 2056 + } 2057 + 2058 + static void destroy_gcr3_table(struct iommu_dev_data *dev_data, 2059 + struct protection_domain *pdom) 2060 + { 2061 + struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info; 2062 + 2063 + if (pdom_is_v2_pgtbl_mode(pdom)) 2064 + update_gcr3(dev_data, 0, 0, false); 2065 + 2066 + if (gcr3_info->gcr3_tbl == NULL) 2067 + return; 2068 + 2069 + free_gcr3_table(gcr3_info); 2070 + } 2071 + 1964 2072 static int do_attach(struct iommu_dev_data *dev_data, 1965 2073 struct protection_domain *domain) 1966 2074 { 1967 2075 struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2076 + struct pci_dev *pdev; 1968 2077 int ret = 0; 1969 2078 1970 2079 /* Update data structures */ ··· 2047 2020 domain->dev_iommu[iommu->index] += 1; 2048 2021 domain->dev_cnt += 1; 2049 2022 2050 - /* Init GCR3 table and update device table */ 2051 - if (domain->pd_mode == PD_MODE_V2) { 2052 - /* By default, setup GCR3 table to support single PASID */ 2053 - ret = setup_gcr3_table(&dev_data->gcr3_info, iommu, 1); 2023 + pdev = dev_is_pci(dev_data->dev) ? to_pci_dev(dev_data->dev) : NULL; 2024 + if (pdom_is_sva_capable(domain)) { 2025 + ret = init_gcr3_table(dev_data, domain); 2054 2026 if (ret) 2055 2027 return ret; 2056 2028 2057 - ret = update_gcr3(dev_data, 0, 2058 - iommu_virt_to_phys(domain->iop.pgd), true); 2059 - if (ret) { 2060 - free_gcr3_table(&dev_data->gcr3_info); 2061 - return ret; 2029 + if (pdev) { 2030 + pdev_enable_caps(pdev); 2031 + 2032 + /* 2033 + * Device can continue to function even if IOPF 2034 + * enablement failed. Hence in error path just 2035 + * disable device PRI support. 2036 + */ 2037 + if (amd_iommu_iopf_add_device(iommu, dev_data)) 2038 + pdev_disable_cap_pri(pdev); 2062 2039 } 2040 + } else if (pdev) { 2041 + pdev_enable_cap_ats(pdev); 2063 2042 } 2064 2043 2065 2044 /* Update device table */ 2066 - set_dte_entry(iommu, dev_data); 2067 - clone_aliases(iommu, dev_data->dev); 2068 - 2069 - device_flush_dte(dev_data); 2045 + amd_iommu_dev_update_dte(dev_data, true); 2070 2046 2071 2047 return ret; 2072 2048 } ··· 2080 2050 struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2081 2051 2082 2052 /* Clear GCR3 table */ 2083 - if (domain->pd_mode == PD_MODE_V2) { 2084 - update_gcr3(dev_data, 0, 0, false); 2085 - free_gcr3_table(&dev_data->gcr3_info); 2086 - } 2053 + if (pdom_is_sva_capable(domain)) 2054 + destroy_gcr3_table(dev_data, domain); 2087 2055 2088 2056 /* Update data structures */ 2089 2057 dev_data->domain = NULL; 2090 2058 list_del(&dev_data->list); 2091 - clear_dte_entry(iommu, dev_data->devid); 2092 - clone_aliases(iommu, dev_data->dev); 2093 2059 2094 - /* Flush the DTE entry */ 2095 - device_flush_dte(dev_data); 2060 + /* Clear DTE and flush the entry */ 2061 + amd_iommu_dev_update_dte(dev_data, false); 2096 2062 2097 2063 /* Flush IOTLB and wait for the flushes to finish */ 2098 2064 amd_iommu_domain_flush_all(domain); ··· 2120 2094 goto out; 2121 2095 } 2122 2096 2123 - if (dev_is_pci(dev)) 2124 - pdev_enable_caps(to_pci_dev(dev)); 2125 - 2126 2097 ret = do_attach(dev_data, domain); 2127 2098 2128 2099 out: ··· 2135 2112 */ 2136 2113 static void detach_device(struct device *dev) 2137 2114 { 2138 - struct protection_domain *domain; 2139 - struct iommu_dev_data *dev_data; 2115 + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 2116 + struct protection_domain *domain = dev_data->domain; 2117 + struct amd_iommu *iommu = get_amd_iommu_from_dev_data(dev_data); 2140 2118 unsigned long flags; 2141 - 2142 - dev_data = dev_iommu_priv_get(dev); 2143 - domain = dev_data->domain; 2119 + bool ppr = dev_data->ppr; 2144 2120 2145 2121 spin_lock_irqsave(&domain->lock, flags); 2146 2122 ··· 2154 2132 if (WARN_ON(!dev_data->domain)) 2155 2133 goto out; 2156 2134 2135 + if (ppr) { 2136 + iopf_queue_flush_dev(dev); 2137 + 2138 + /* Updated here so that it gets reflected in DTE */ 2139 + dev_data->ppr = false; 2140 + } 2141 + 2157 2142 do_detach(dev_data); 2143 + 2144 + /* Remove IOPF handler */ 2145 + if (ppr) 2146 + amd_iommu_iopf_remove_device(iommu, dev_data); 2158 2147 2159 2148 if (dev_is_pci(dev)) 2160 2149 pdev_disable_caps(to_pci_dev(dev)); ··· 2180 2147 { 2181 2148 struct iommu_device *iommu_dev; 2182 2149 struct amd_iommu *iommu; 2150 + struct iommu_dev_data *dev_data; 2183 2151 int ret; 2184 2152 2185 2153 if (!check_device(dev)) ··· 2207 2173 iommu_dev = &iommu->iommu; 2208 2174 } 2209 2175 2176 + /* 2177 + * If IOMMU and device supports PASID then it will contain max 2178 + * supported PASIDs, else it will be zero. 2179 + */ 2180 + dev_data = dev_iommu_priv_get(dev); 2181 + if (amd_iommu_pasid_supported() && dev_is_pci(dev) && 2182 + pdev_pasid_supported(dev_data)) { 2183 + dev_data->max_pasids = min_t(u32, iommu->iommu.max_pasids, 2184 + pci_max_pasids(to_pci_dev(dev))); 2185 + } 2186 + 2210 2187 iommu_completion_wait(iommu); 2211 2188 2212 2189 return iommu_dev; 2213 - } 2214 - 2215 - static void amd_iommu_probe_finalize(struct device *dev) 2216 - { 2217 - /* Domains are initialized for this device - have a look what we ended up with */ 2218 - set_dma_ops(dev, NULL); 2219 - iommu_setup_dma_ops(dev, 0, U64_MAX); 2220 2190 } 2221 2191 2222 2192 static void amd_iommu_release_device(struct device *dev) ··· 2274 2236 WARN_ON(domain->dev_cnt != 0); 2275 2237 } 2276 2238 2277 - static void protection_domain_free(struct protection_domain *domain) 2239 + void protection_domain_free(struct protection_domain *domain) 2278 2240 { 2279 2241 if (!domain) 2280 2242 return; ··· 2283 2245 free_io_pgtable_ops(&domain->iop.iop.ops); 2284 2246 2285 2247 if (domain->iop.root) 2286 - free_page((unsigned long)domain->iop.root); 2248 + iommu_free_page(domain->iop.root); 2287 2249 2288 2250 if (domain->id) 2289 2251 domain_id_free(domain->id); ··· 2298 2260 BUG_ON(mode < PAGE_MODE_NONE || mode > PAGE_MODE_6_LEVEL); 2299 2261 2300 2262 if (mode != PAGE_MODE_NONE) { 2301 - pt_root = (void *)get_zeroed_page(GFP_KERNEL); 2263 + pt_root = iommu_alloc_page(GFP_KERNEL); 2302 2264 if (!pt_root) 2303 2265 return -ENOMEM; 2304 2266 } ··· 2317 2279 return 0; 2318 2280 } 2319 2281 2320 - static struct protection_domain *protection_domain_alloc(unsigned int type) 2282 + struct protection_domain *protection_domain_alloc(unsigned int type) 2321 2283 { 2322 2284 struct io_pgtable_ops *pgtbl_ops; 2323 2285 struct protection_domain *domain; ··· 2334 2296 2335 2297 spin_lock_init(&domain->lock); 2336 2298 INIT_LIST_HEAD(&domain->dev_list); 2299 + INIT_LIST_HEAD(&domain->dev_data_list); 2337 2300 domain->nid = NUMA_NO_NODE; 2338 2301 2339 2302 switch (type) { 2340 2303 /* No need to allocate io pgtable ops in passthrough mode */ 2341 2304 case IOMMU_DOMAIN_IDENTITY: 2305 + case IOMMU_DOMAIN_SVA: 2342 2306 return domain; 2343 2307 case IOMMU_DOMAIN_DMA: 2344 2308 pgtable = amd_iommu_pgtable; ··· 2460 2420 return do_iommu_domain_alloc(type, dev, flags); 2461 2421 } 2462 2422 2463 - static void amd_iommu_domain_free(struct iommu_domain *dom) 2423 + void amd_iommu_domain_free(struct iommu_domain *dom) 2464 2424 { 2465 2425 struct protection_domain *domain; 2466 2426 unsigned long flags; ··· 2825 2785 .read_and_clear_dirty = amd_iommu_read_and_clear_dirty, 2826 2786 }; 2827 2787 2788 + static int amd_iommu_dev_enable_feature(struct device *dev, 2789 + enum iommu_dev_features feat) 2790 + { 2791 + int ret = 0; 2792 + 2793 + switch (feat) { 2794 + case IOMMU_DEV_FEAT_IOPF: 2795 + case IOMMU_DEV_FEAT_SVA: 2796 + break; 2797 + default: 2798 + ret = -EINVAL; 2799 + break; 2800 + } 2801 + return ret; 2802 + } 2803 + 2804 + static int amd_iommu_dev_disable_feature(struct device *dev, 2805 + enum iommu_dev_features feat) 2806 + { 2807 + int ret = 0; 2808 + 2809 + switch (feat) { 2810 + case IOMMU_DEV_FEAT_IOPF: 2811 + case IOMMU_DEV_FEAT_SVA: 2812 + break; 2813 + default: 2814 + ret = -EINVAL; 2815 + break; 2816 + } 2817 + return ret; 2818 + } 2819 + 2828 2820 const struct iommu_ops amd_iommu_ops = { 2829 2821 .capable = amd_iommu_capable, 2830 2822 .domain_alloc = amd_iommu_domain_alloc, 2831 2823 .domain_alloc_user = amd_iommu_domain_alloc_user, 2824 + .domain_alloc_sva = amd_iommu_domain_alloc_sva, 2832 2825 .probe_device = amd_iommu_probe_device, 2833 2826 .release_device = amd_iommu_release_device, 2834 - .probe_finalize = amd_iommu_probe_finalize, 2835 2827 .device_group = amd_iommu_device_group, 2836 2828 .get_resv_regions = amd_iommu_get_resv_regions, 2837 2829 .is_attach_deferred = amd_iommu_is_attach_deferred, 2838 2830 .pgsize_bitmap = AMD_IOMMU_PGSIZES, 2839 2831 .def_domain_type = amd_iommu_def_domain_type, 2832 + .dev_enable_feat = amd_iommu_dev_enable_feature, 2833 + .dev_disable_feat = amd_iommu_dev_disable_feature, 2834 + .remove_dev_pasid = amd_iommu_remove_dev_pasid, 2835 + .page_response = amd_iommu_page_response, 2840 2836 .default_domain_ops = &(const struct iommu_domain_ops) { 2841 2837 .attach_dev = amd_iommu_attach_device, 2842 2838 .map_pages = amd_iommu_map_pages,
+198
drivers/iommu/amd/pasid.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2024 Advanced Micro Devices, Inc. 4 + */ 5 + 6 + #define pr_fmt(fmt) "AMD-Vi: " fmt 7 + #define dev_fmt(fmt) pr_fmt(fmt) 8 + 9 + #include <linux/iommu.h> 10 + #include <linux/mm_types.h> 11 + 12 + #include "amd_iommu.h" 13 + 14 + static inline bool is_pasid_enabled(struct iommu_dev_data *dev_data) 15 + { 16 + if (dev_data->pasid_enabled && dev_data->max_pasids && 17 + dev_data->gcr3_info.gcr3_tbl != NULL) 18 + return true; 19 + 20 + return false; 21 + } 22 + 23 + static inline bool is_pasid_valid(struct iommu_dev_data *dev_data, 24 + ioasid_t pasid) 25 + { 26 + if (pasid > 0 && pasid < dev_data->max_pasids) 27 + return true; 28 + 29 + return false; 30 + } 31 + 32 + static void remove_dev_pasid(struct pdom_dev_data *pdom_dev_data) 33 + { 34 + /* Update GCR3 table and flush IOTLB */ 35 + amd_iommu_clear_gcr3(pdom_dev_data->dev_data, pdom_dev_data->pasid); 36 + 37 + list_del(&pdom_dev_data->list); 38 + kfree(pdom_dev_data); 39 + } 40 + 41 + /* Clear PASID from device GCR3 table and remove pdom_dev_data from list */ 42 + static void remove_pdom_dev_pasid(struct protection_domain *pdom, 43 + struct device *dev, ioasid_t pasid) 44 + { 45 + struct pdom_dev_data *pdom_dev_data; 46 + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 47 + 48 + lockdep_assert_held(&pdom->lock); 49 + 50 + for_each_pdom_dev_data(pdom_dev_data, pdom) { 51 + if (pdom_dev_data->dev_data == dev_data && 52 + pdom_dev_data->pasid == pasid) { 53 + remove_dev_pasid(pdom_dev_data); 54 + break; 55 + } 56 + } 57 + } 58 + 59 + static void sva_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, 60 + struct mm_struct *mm, 61 + unsigned long start, unsigned long end) 62 + { 63 + struct pdom_dev_data *pdom_dev_data; 64 + struct protection_domain *sva_pdom; 65 + unsigned long flags; 66 + 67 + sva_pdom = container_of(mn, struct protection_domain, mn); 68 + 69 + spin_lock_irqsave(&sva_pdom->lock, flags); 70 + 71 + for_each_pdom_dev_data(pdom_dev_data, sva_pdom) { 72 + amd_iommu_dev_flush_pasid_pages(pdom_dev_data->dev_data, 73 + pdom_dev_data->pasid, 74 + start, end - start); 75 + } 76 + 77 + spin_unlock_irqrestore(&sva_pdom->lock, flags); 78 + } 79 + 80 + static void sva_mn_release(struct mmu_notifier *mn, struct mm_struct *mm) 81 + { 82 + struct pdom_dev_data *pdom_dev_data, *next; 83 + struct protection_domain *sva_pdom; 84 + unsigned long flags; 85 + 86 + sva_pdom = container_of(mn, struct protection_domain, mn); 87 + 88 + spin_lock_irqsave(&sva_pdom->lock, flags); 89 + 90 + /* Assume dev_data_list contains same PASID with different devices */ 91 + for_each_pdom_dev_data_safe(pdom_dev_data, next, sva_pdom) 92 + remove_dev_pasid(pdom_dev_data); 93 + 94 + spin_unlock_irqrestore(&sva_pdom->lock, flags); 95 + } 96 + 97 + static const struct mmu_notifier_ops sva_mn = { 98 + .arch_invalidate_secondary_tlbs = sva_arch_invalidate_secondary_tlbs, 99 + .release = sva_mn_release, 100 + }; 101 + 102 + int iommu_sva_set_dev_pasid(struct iommu_domain *domain, 103 + struct device *dev, ioasid_t pasid) 104 + { 105 + struct pdom_dev_data *pdom_dev_data; 106 + struct protection_domain *sva_pdom = to_pdomain(domain); 107 + struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev); 108 + unsigned long flags; 109 + int ret = -EINVAL; 110 + 111 + /* PASID zero is used for requests from the I/O device without PASID */ 112 + if (!is_pasid_valid(dev_data, pasid)) 113 + return ret; 114 + 115 + /* Make sure PASID is enabled */ 116 + if (!is_pasid_enabled(dev_data)) 117 + return ret; 118 + 119 + /* Add PASID to protection domain pasid list */ 120 + pdom_dev_data = kzalloc(sizeof(*pdom_dev_data), GFP_KERNEL); 121 + if (pdom_dev_data == NULL) 122 + return ret; 123 + 124 + pdom_dev_data->pasid = pasid; 125 + pdom_dev_data->dev_data = dev_data; 126 + 127 + spin_lock_irqsave(&sva_pdom->lock, flags); 128 + 129 + /* Setup GCR3 table */ 130 + ret = amd_iommu_set_gcr3(dev_data, pasid, 131 + iommu_virt_to_phys(domain->mm->pgd)); 132 + if (ret) { 133 + kfree(pdom_dev_data); 134 + goto out_unlock; 135 + } 136 + 137 + list_add(&pdom_dev_data->list, &sva_pdom->dev_data_list); 138 + 139 + out_unlock: 140 + spin_unlock_irqrestore(&sva_pdom->lock, flags); 141 + return ret; 142 + } 143 + 144 + void amd_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 145 + struct iommu_domain *domain) 146 + { 147 + struct protection_domain *sva_pdom; 148 + unsigned long flags; 149 + 150 + if (!is_pasid_valid(dev_iommu_priv_get(dev), pasid)) 151 + return; 152 + 153 + sva_pdom = to_pdomain(domain); 154 + 155 + spin_lock_irqsave(&sva_pdom->lock, flags); 156 + 157 + /* Remove PASID from dev_data_list */ 158 + remove_pdom_dev_pasid(sva_pdom, dev, pasid); 159 + 160 + spin_unlock_irqrestore(&sva_pdom->lock, flags); 161 + } 162 + 163 + static void iommu_sva_domain_free(struct iommu_domain *domain) 164 + { 165 + struct protection_domain *sva_pdom = to_pdomain(domain); 166 + 167 + if (sva_pdom->mn.ops) 168 + mmu_notifier_unregister(&sva_pdom->mn, domain->mm); 169 + 170 + amd_iommu_domain_free(domain); 171 + } 172 + 173 + static const struct iommu_domain_ops amd_sva_domain_ops = { 174 + .set_dev_pasid = iommu_sva_set_dev_pasid, 175 + .free = iommu_sva_domain_free 176 + }; 177 + 178 + struct iommu_domain *amd_iommu_domain_alloc_sva(struct device *dev, 179 + struct mm_struct *mm) 180 + { 181 + struct protection_domain *pdom; 182 + int ret; 183 + 184 + pdom = protection_domain_alloc(IOMMU_DOMAIN_SVA); 185 + if (!pdom) 186 + return ERR_PTR(-ENOMEM); 187 + 188 + pdom->domain.ops = &amd_sva_domain_ops; 189 + pdom->mn.ops = &sva_mn; 190 + 191 + ret = mmu_notifier_register(&pdom->mn, mm); 192 + if (ret) { 193 + protection_domain_free(pdom); 194 + return ERR_PTR(ret); 195 + } 196 + 197 + return &pdom->domain; 198 + }
+288
drivers/iommu/amd/ppr.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2023 Advanced Micro Devices, Inc. 4 + */ 5 + 6 + #define pr_fmt(fmt) "AMD-Vi: " fmt 7 + #define dev_fmt(fmt) pr_fmt(fmt) 8 + 9 + #include <linux/amd-iommu.h> 10 + #include <linux/delay.h> 11 + #include <linux/mmu_notifier.h> 12 + 13 + #include <asm/iommu.h> 14 + 15 + #include "amd_iommu.h" 16 + #include "amd_iommu_types.h" 17 + 18 + #include "../iommu-pages.h" 19 + 20 + int __init amd_iommu_alloc_ppr_log(struct amd_iommu *iommu) 21 + { 22 + iommu->ppr_log = iommu_alloc_4k_pages(iommu, GFP_KERNEL | __GFP_ZERO, 23 + PPR_LOG_SIZE); 24 + return iommu->ppr_log ? 0 : -ENOMEM; 25 + } 26 + 27 + void amd_iommu_enable_ppr_log(struct amd_iommu *iommu) 28 + { 29 + u64 entry; 30 + 31 + if (iommu->ppr_log == NULL) 32 + return; 33 + 34 + iommu_feature_enable(iommu, CONTROL_PPR_EN); 35 + 36 + entry = iommu_virt_to_phys(iommu->ppr_log) | PPR_LOG_SIZE_512; 37 + 38 + memcpy_toio(iommu->mmio_base + MMIO_PPR_LOG_OFFSET, 39 + &entry, sizeof(entry)); 40 + 41 + /* set head and tail to zero manually */ 42 + writel(0x00, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 43 + writel(0x00, iommu->mmio_base + MMIO_PPR_TAIL_OFFSET); 44 + 45 + iommu_feature_enable(iommu, CONTROL_PPRINT_EN); 46 + iommu_feature_enable(iommu, CONTROL_PPRLOG_EN); 47 + } 48 + 49 + void __init amd_iommu_free_ppr_log(struct amd_iommu *iommu) 50 + { 51 + iommu_free_pages(iommu->ppr_log, get_order(PPR_LOG_SIZE)); 52 + } 53 + 54 + /* 55 + * This function restarts ppr logging in case the IOMMU experienced 56 + * PPR log overflow. 57 + */ 58 + void amd_iommu_restart_ppr_log(struct amd_iommu *iommu) 59 + { 60 + amd_iommu_restart_log(iommu, "PPR", CONTROL_PPRINT_EN, 61 + CONTROL_PPRLOG_EN, MMIO_STATUS_PPR_RUN_MASK, 62 + MMIO_STATUS_PPR_OVERFLOW_MASK); 63 + } 64 + 65 + static inline u32 ppr_flag_to_fault_perm(u16 flag) 66 + { 67 + int perm = 0; 68 + 69 + if (flag & PPR_FLAG_READ) 70 + perm |= IOMMU_FAULT_PERM_READ; 71 + if (flag & PPR_FLAG_WRITE) 72 + perm |= IOMMU_FAULT_PERM_WRITE; 73 + if (flag & PPR_FLAG_EXEC) 74 + perm |= IOMMU_FAULT_PERM_EXEC; 75 + if (!(flag & PPR_FLAG_US)) 76 + perm |= IOMMU_FAULT_PERM_PRIV; 77 + 78 + return perm; 79 + } 80 + 81 + static bool ppr_is_valid(struct amd_iommu *iommu, u64 *raw) 82 + { 83 + struct device *dev = iommu->iommu.dev; 84 + u16 devid = PPR_DEVID(raw[0]); 85 + 86 + if (!(PPR_FLAGS(raw[0]) & PPR_FLAG_GN)) { 87 + dev_dbg(dev, "PPR logged [Request ignored due to GN=0 (device=%04x:%02x:%02x.%x " 88 + "pasid=0x%05llx address=0x%llx flags=0x%04llx tag=0x%03llx]\n", 89 + iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), 90 + PPR_PASID(raw[0]), raw[1], PPR_FLAGS(raw[0]), PPR_TAG(raw[0])); 91 + return false; 92 + } 93 + 94 + if (PPR_FLAGS(raw[0]) & PPR_FLAG_RVSD) { 95 + dev_dbg(dev, "PPR logged [Invalid request format (device=%04x:%02x:%02x.%x " 96 + "pasid=0x%05llx address=0x%llx flags=0x%04llx tag=0x%03llx]\n", 97 + iommu->pci_seg->id, PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), 98 + PPR_PASID(raw[0]), raw[1], PPR_FLAGS(raw[0]), PPR_TAG(raw[0])); 99 + return false; 100 + } 101 + 102 + return true; 103 + } 104 + 105 + static void iommu_call_iopf_notifier(struct amd_iommu *iommu, u64 *raw) 106 + { 107 + struct iommu_dev_data *dev_data; 108 + struct iopf_fault event; 109 + struct pci_dev *pdev; 110 + u16 devid = PPR_DEVID(raw[0]); 111 + 112 + if (PPR_REQ_TYPE(raw[0]) != PPR_REQ_FAULT) { 113 + pr_info_ratelimited("Unknown PPR request received\n"); 114 + return; 115 + } 116 + 117 + pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 118 + PCI_BUS_NUM(devid), devid & 0xff); 119 + if (!pdev) 120 + return; 121 + 122 + if (!ppr_is_valid(iommu, raw)) 123 + goto out; 124 + 125 + memset(&event, 0, sizeof(struct iopf_fault)); 126 + 127 + event.fault.type = IOMMU_FAULT_PAGE_REQ; 128 + event.fault.prm.perm = ppr_flag_to_fault_perm(PPR_FLAGS(raw[0])); 129 + event.fault.prm.addr = (u64)(raw[1] & PAGE_MASK); 130 + event.fault.prm.pasid = PPR_PASID(raw[0]); 131 + event.fault.prm.grpid = PPR_TAG(raw[0]) & 0x1FF; 132 + 133 + /* 134 + * PASID zero is used for requests from the I/O device without 135 + * a PASID 136 + */ 137 + dev_data = dev_iommu_priv_get(&pdev->dev); 138 + if (event.fault.prm.pasid == 0 || 139 + event.fault.prm.pasid >= dev_data->max_pasids) { 140 + pr_info_ratelimited("Invalid PASID : 0x%x, device : 0x%x\n", 141 + event.fault.prm.pasid, pdev->dev.id); 142 + goto out; 143 + } 144 + 145 + event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID; 146 + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 147 + if (PPR_TAG(raw[0]) & 0x200) 148 + event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 149 + 150 + /* Submit event */ 151 + iommu_report_device_fault(&pdev->dev, &event); 152 + 153 + return; 154 + 155 + out: 156 + /* Nobody cared, abort */ 157 + amd_iommu_complete_ppr(&pdev->dev, PPR_PASID(raw[0]), 158 + IOMMU_PAGE_RESP_FAILURE, 159 + PPR_TAG(raw[0]) & 0x1FF); 160 + } 161 + 162 + void amd_iommu_poll_ppr_log(struct amd_iommu *iommu) 163 + { 164 + u32 head, tail; 165 + 166 + if (iommu->ppr_log == NULL) 167 + return; 168 + 169 + head = readl(iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 170 + tail = readl(iommu->mmio_base + MMIO_PPR_TAIL_OFFSET); 171 + 172 + while (head != tail) { 173 + volatile u64 *raw; 174 + u64 entry[2]; 175 + int i; 176 + 177 + raw = (u64 *)(iommu->ppr_log + head); 178 + 179 + /* 180 + * Hardware bug: Interrupt may arrive before the entry is 181 + * written to memory. If this happens we need to wait for the 182 + * entry to arrive. 183 + */ 184 + for (i = 0; i < LOOP_TIMEOUT; ++i) { 185 + if (PPR_REQ_TYPE(raw[0]) != 0) 186 + break; 187 + udelay(1); 188 + } 189 + 190 + /* Avoid memcpy function-call overhead */ 191 + entry[0] = raw[0]; 192 + entry[1] = raw[1]; 193 + 194 + /* 195 + * To detect the hardware errata 733 we need to clear the 196 + * entry back to zero. This issue does not exist on SNP 197 + * enabled system. Also this buffer is not writeable on 198 + * SNP enabled system. 199 + */ 200 + if (!amd_iommu_snp_en) 201 + raw[0] = raw[1] = 0UL; 202 + 203 + /* Update head pointer of hardware ring-buffer */ 204 + head = (head + PPR_ENTRY_SIZE) % PPR_LOG_SIZE; 205 + writel(head, iommu->mmio_base + MMIO_PPR_HEAD_OFFSET); 206 + 207 + /* Handle PPR entry */ 208 + iommu_call_iopf_notifier(iommu, entry); 209 + } 210 + } 211 + 212 + /************************************************************** 213 + * 214 + * IOPF handling stuff 215 + */ 216 + 217 + /* Setup per-IOMMU IOPF queue if not exist. */ 218 + int amd_iommu_iopf_init(struct amd_iommu *iommu) 219 + { 220 + int ret = 0; 221 + 222 + if (iommu->iopf_queue) 223 + return ret; 224 + 225 + snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name), 226 + "amdiommu-%#x-iopfq", 227 + PCI_SEG_DEVID_TO_SBDF(iommu->pci_seg->id, iommu->devid)); 228 + 229 + iommu->iopf_queue = iopf_queue_alloc(iommu->iopfq_name); 230 + if (!iommu->iopf_queue) 231 + ret = -ENOMEM; 232 + 233 + return ret; 234 + } 235 + 236 + /* Destroy per-IOMMU IOPF queue if no longer needed. */ 237 + void amd_iommu_iopf_uninit(struct amd_iommu *iommu) 238 + { 239 + iopf_queue_free(iommu->iopf_queue); 240 + iommu->iopf_queue = NULL; 241 + } 242 + 243 + void amd_iommu_page_response(struct device *dev, struct iopf_fault *evt, 244 + struct iommu_page_response *resp) 245 + { 246 + amd_iommu_complete_ppr(dev, resp->pasid, resp->code, resp->grpid); 247 + } 248 + 249 + int amd_iommu_iopf_add_device(struct amd_iommu *iommu, 250 + struct iommu_dev_data *dev_data) 251 + { 252 + unsigned long flags; 253 + int ret = 0; 254 + 255 + if (!dev_data->pri_enabled) 256 + return ret; 257 + 258 + raw_spin_lock_irqsave(&iommu->lock, flags); 259 + 260 + if (!iommu->iopf_queue) { 261 + ret = -EINVAL; 262 + goto out_unlock; 263 + } 264 + 265 + ret = iopf_queue_add_device(iommu->iopf_queue, dev_data->dev); 266 + if (ret) 267 + goto out_unlock; 268 + 269 + dev_data->ppr = true; 270 + 271 + out_unlock: 272 + raw_spin_unlock_irqrestore(&iommu->lock, flags); 273 + return ret; 274 + } 275 + 276 + /* Its assumed that caller has verified that device was added to iopf queue */ 277 + void amd_iommu_iopf_remove_device(struct amd_iommu *iommu, 278 + struct iommu_dev_data *dev_data) 279 + { 280 + unsigned long flags; 281 + 282 + raw_spin_lock_irqsave(&iommu->lock, flags); 283 + 284 + iopf_queue_remove_device(iommu->iopf_queue, dev_data->dev); 285 + dev_data->ppr = false; 286 + 287 + raw_spin_unlock_irqrestore(&iommu->lock, flags); 288 + }
+2
drivers/iommu/arm/arm-smmu-v3/Makefile
··· 3 3 arm_smmu_v3-objs-y += arm-smmu-v3.o 4 4 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o 5 5 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y) 6 + 7 + obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
+119 -49
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
··· 8 8 #include <linux/mmu_notifier.h> 9 9 #include <linux/sched/mm.h> 10 10 #include <linux/slab.h> 11 + #include <kunit/visibility.h> 11 12 12 13 #include "arm-smmu-v3.h" 13 14 #include "../../io-pgtable-arm.h" ··· 35 34 36 35 static DEFINE_MUTEX(sva_lock); 37 36 38 - /* 39 - * Write the CD to the CD tables for all masters that this domain is attached 40 - * to. Note that this is only used to update existing CD entries in the target 41 - * CD table, for which it's assumed that arm_smmu_write_ctx_desc can't fail. 42 - */ 43 - static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain, 44 - int ssid, 45 - struct arm_smmu_ctx_desc *cd) 37 + static void 38 + arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain) 46 39 { 47 40 struct arm_smmu_master *master; 41 + struct arm_smmu_cd target_cd; 48 42 unsigned long flags; 49 43 50 44 spin_lock_irqsave(&smmu_domain->devices_lock, flags); 51 45 list_for_each_entry(master, &smmu_domain->devices, domain_head) { 52 - arm_smmu_write_ctx_desc(master, ssid, cd); 46 + struct arm_smmu_cd *cdptr; 47 + 48 + /* S1 domains only support RID attachment right now */ 49 + cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID); 50 + if (WARN_ON(!cdptr)) 51 + continue; 52 + 53 + arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); 54 + arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, 55 + &target_cd); 53 56 } 54 57 spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); 55 58 } ··· 101 96 * be some overlap between use of both ASIDs, until we invalidate the 102 97 * TLB. 103 98 */ 104 - arm_smmu_update_ctx_desc_devices(smmu_domain, IOMMU_NO_PASID, cd); 99 + arm_smmu_update_s1_domain_cd_entry(smmu_domain); 105 100 106 101 /* Invalidate TLB entries previously associated with that context */ 107 102 arm_smmu_tlb_inv_asid(smmu, asid); ··· 110 105 return NULL; 111 106 } 112 107 108 + static u64 page_size_to_cd(void) 109 + { 110 + static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K || 111 + PAGE_SIZE == SZ_64K); 112 + if (PAGE_SIZE == SZ_64K) 113 + return ARM_LPAE_TCR_TG0_64K; 114 + if (PAGE_SIZE == SZ_16K) 115 + return ARM_LPAE_TCR_TG0_16K; 116 + return ARM_LPAE_TCR_TG0_4K; 117 + } 118 + 119 + VISIBLE_IF_KUNIT 120 + void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, 121 + struct arm_smmu_master *master, struct mm_struct *mm, 122 + u16 asid) 123 + { 124 + u64 par; 125 + 126 + memset(target, 0, sizeof(*target)); 127 + 128 + par = cpuid_feature_extract_unsigned_field( 129 + read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1), 130 + ID_AA64MMFR0_EL1_PARANGE_SHIFT); 131 + 132 + target->data[0] = cpu_to_le64( 133 + CTXDESC_CD_0_TCR_EPD1 | 134 + #ifdef __BIG_ENDIAN 135 + CTXDESC_CD_0_ENDI | 136 + #endif 137 + CTXDESC_CD_0_V | 138 + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) | 139 + CTXDESC_CD_0_AA64 | 140 + (master->stall_enabled ? CTXDESC_CD_0_S : 0) | 141 + CTXDESC_CD_0_R | 142 + CTXDESC_CD_0_A | 143 + CTXDESC_CD_0_ASET | 144 + FIELD_PREP(CTXDESC_CD_0_ASID, asid)); 145 + 146 + /* 147 + * If no MM is passed then this creates a SVA entry that faults 148 + * everything. arm_smmu_write_cd_entry() can hitlessly go between these 149 + * two entries types since TTB0 is ignored by HW when EPD0 is set. 150 + */ 151 + if (mm) { 152 + target->data[0] |= cpu_to_le64( 153 + FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 154 + 64ULL - vabits_actual) | 155 + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, page_size_to_cd()) | 156 + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, 157 + ARM_LPAE_TCR_RGN_WBWA) | 158 + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, 159 + ARM_LPAE_TCR_RGN_WBWA) | 160 + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS)); 161 + 162 + target->data[1] = cpu_to_le64(virt_to_phys(mm->pgd) & 163 + CTXDESC_CD_1_TTB0_MASK); 164 + } else { 165 + target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_EPD0); 166 + 167 + /* 168 + * Disable stall and immediately generate an abort if stall 169 + * disable is permitted. This speeds up cleanup for an unclean 170 + * exit if the device is still doing a lot of DMA. 171 + */ 172 + if (!(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) 173 + target->data[0] &= 174 + cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R)); 175 + } 176 + 177 + /* 178 + * MAIR value is pretty much constant and global, so we can just get it 179 + * from the current CPU register 180 + */ 181 + target->data[3] = cpu_to_le64(read_sysreg(mair_el1)); 182 + } 183 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_sva_cd); 184 + 113 185 static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm) 114 186 { 115 187 u16 asid; 116 188 int err = 0; 117 - u64 tcr, par, reg; 118 189 struct arm_smmu_ctx_desc *cd; 119 190 struct arm_smmu_ctx_desc *ret = NULL; 120 191 ··· 224 143 if (err) 225 144 goto out_free_asid; 226 145 227 - tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) | 228 - FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) | 229 - FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) | 230 - FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) | 231 - CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; 232 - 233 - switch (PAGE_SIZE) { 234 - case SZ_4K: 235 - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_4K); 236 - break; 237 - case SZ_16K: 238 - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_16K); 239 - break; 240 - case SZ_64K: 241 - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_64K); 242 - break; 243 - default: 244 - WARN_ON(1); 245 - err = -EINVAL; 246 - goto out_free_asid; 247 - } 248 - 249 - reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); 250 - par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_EL1_PARANGE_SHIFT); 251 - tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par); 252 - 253 - cd->ttbr = virt_to_phys(mm->pgd); 254 - cd->tcr = tcr; 255 - /* 256 - * MAIR value is pretty much constant and global, so we can just get it 257 - * from the current CPU register 258 - */ 259 - cd->mair = read_sysreg(mair_el1); 260 146 cd->asid = asid; 261 147 cd->mm = mm; 262 148 ··· 301 253 { 302 254 struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn); 303 255 struct arm_smmu_domain *smmu_domain = smmu_mn->domain; 256 + struct arm_smmu_master *master; 257 + unsigned long flags; 304 258 305 259 mutex_lock(&sva_lock); 306 260 if (smmu_mn->cleared) { ··· 314 264 * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events, 315 265 * but disable translation. 316 266 */ 317 - arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm), 318 - &quiet_cd); 267 + spin_lock_irqsave(&smmu_domain->devices_lock, flags); 268 + list_for_each_entry(master, &smmu_domain->devices, domain_head) { 269 + struct arm_smmu_cd target; 270 + struct arm_smmu_cd *cdptr; 271 + 272 + cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm)); 273 + if (WARN_ON(!cdptr)) 274 + continue; 275 + arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid); 276 + arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr, 277 + &target); 278 + } 279 + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); 319 280 320 281 arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid); 321 282 arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0); ··· 421 360 struct mm_struct *mm) 422 361 { 423 362 int ret; 363 + struct arm_smmu_cd target; 364 + struct arm_smmu_cd *cdptr; 424 365 struct arm_smmu_bond *bond; 425 366 struct arm_smmu_master *master = dev_iommu_priv_get(dev); 426 367 struct iommu_domain *domain = iommu_get_domain_for_dev(dev); ··· 449 386 goto err_free_bond; 450 387 } 451 388 452 - ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd); 453 - if (ret) 389 + cdptr = arm_smmu_alloc_cd_ptr(master, mm_get_enqcmd_pasid(mm)); 390 + if (!cdptr) { 391 + ret = -ENOMEM; 454 392 goto err_put_notifier; 393 + } 394 + arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid); 395 + arm_smmu_write_cd_entry(master, pasid, cdptr, &target); 455 396 456 397 list_add(&bond->list, &master->bonds); 457 398 return 0; ··· 613 546 614 547 mutex_lock(&sva_lock); 615 548 616 - arm_smmu_write_ctx_desc(master, id, NULL); 549 + arm_smmu_clear_cd(master, id); 617 550 618 551 list_for_each_entry(t, &master->bonds, list) { 619 552 if (t->mm == mm) { ··· 635 568 { 636 569 int ret = 0; 637 570 struct mm_struct *mm = domain->mm; 571 + 572 + if (mm_get_enqcmd_pasid(mm) != id) 573 + return -EINVAL; 638 574 639 575 mutex_lock(&sva_lock); 640 576 ret = __arm_smmu_sva_bind(dev, id, mm);
+468
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright 2024 Google LLC. 4 + */ 5 + #include <kunit/test.h> 6 + #include <linux/io-pgtable.h> 7 + 8 + #include "arm-smmu-v3.h" 9 + 10 + struct arm_smmu_test_writer { 11 + struct arm_smmu_entry_writer writer; 12 + struct kunit *test; 13 + const __le64 *init_entry; 14 + const __le64 *target_entry; 15 + __le64 *entry; 16 + 17 + bool invalid_entry_written; 18 + unsigned int num_syncs; 19 + }; 20 + 21 + #define NUM_ENTRY_QWORDS 8 22 + #define NUM_EXPECTED_SYNCS(x) x 23 + 24 + static struct arm_smmu_ste bypass_ste; 25 + static struct arm_smmu_ste abort_ste; 26 + static struct arm_smmu_device smmu = { 27 + .features = ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_ATTR_TYPES_OVR 28 + }; 29 + static struct mm_struct sva_mm = { 30 + .pgd = (void *)0xdaedbeefdeadbeefULL, 31 + }; 32 + 33 + static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry, 34 + const __le64 *used_bits, 35 + const __le64 *target, 36 + unsigned int length) 37 + { 38 + bool differs = false; 39 + unsigned int i; 40 + 41 + for (i = 0; i < length; i++) { 42 + if ((entry[i] & used_bits[i]) != target[i]) 43 + differs = true; 44 + } 45 + return differs; 46 + } 47 + 48 + static void 49 + arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer) 50 + { 51 + struct arm_smmu_test_writer *test_writer = 52 + container_of(writer, struct arm_smmu_test_writer, writer); 53 + __le64 *entry_used_bits; 54 + 55 + entry_used_bits = kunit_kzalloc( 56 + test_writer->test, sizeof(*entry_used_bits) * NUM_ENTRY_QWORDS, 57 + GFP_KERNEL); 58 + KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits); 59 + 60 + pr_debug("STE value is now set to: "); 61 + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, 62 + test_writer->entry, 63 + NUM_ENTRY_QWORDS * sizeof(*test_writer->entry), 64 + false); 65 + 66 + test_writer->num_syncs += 1; 67 + if (!test_writer->entry[0]) { 68 + test_writer->invalid_entry_written = true; 69 + } else { 70 + /* 71 + * At any stage in a hitless transition, the entry must be 72 + * equivalent to either the initial entry or the target entry 73 + * when only considering the bits used by the current 74 + * configuration. 75 + */ 76 + writer->ops->get_used(test_writer->entry, entry_used_bits); 77 + KUNIT_EXPECT_FALSE( 78 + test_writer->test, 79 + arm_smmu_entry_differs_in_used_bits( 80 + test_writer->entry, entry_used_bits, 81 + test_writer->init_entry, NUM_ENTRY_QWORDS) && 82 + arm_smmu_entry_differs_in_used_bits( 83 + test_writer->entry, entry_used_bits, 84 + test_writer->target_entry, 85 + NUM_ENTRY_QWORDS)); 86 + } 87 + } 88 + 89 + static void 90 + arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer, 91 + const __le64 *ste) 92 + { 93 + __le64 used_bits[NUM_ENTRY_QWORDS] = {}; 94 + 95 + arm_smmu_get_ste_used(ste, used_bits); 96 + pr_debug("STE used bits: "); 97 + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, used_bits, 98 + sizeof(used_bits), false); 99 + } 100 + 101 + static const struct arm_smmu_entry_writer_ops test_ste_ops = { 102 + .sync = arm_smmu_test_writer_record_syncs, 103 + .get_used = arm_smmu_get_ste_used, 104 + }; 105 + 106 + static const struct arm_smmu_entry_writer_ops test_cd_ops = { 107 + .sync = arm_smmu_test_writer_record_syncs, 108 + .get_used = arm_smmu_get_cd_used, 109 + }; 110 + 111 + static void arm_smmu_v3_test_ste_expect_transition( 112 + struct kunit *test, const struct arm_smmu_ste *cur, 113 + const struct arm_smmu_ste *target, unsigned int num_syncs_expected, 114 + bool hitless) 115 + { 116 + struct arm_smmu_ste cur_copy = *cur; 117 + struct arm_smmu_test_writer test_writer = { 118 + .writer = { 119 + .ops = &test_ste_ops, 120 + }, 121 + .test = test, 122 + .init_entry = cur->data, 123 + .target_entry = target->data, 124 + .entry = cur_copy.data, 125 + .num_syncs = 0, 126 + .invalid_entry_written = false, 127 + 128 + }; 129 + 130 + pr_debug("STE initial value: "); 131 + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data, 132 + sizeof(cur_copy), false); 133 + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data); 134 + pr_debug("STE target value: "); 135 + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, target->data, 136 + sizeof(cur_copy), false); 137 + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, 138 + target->data); 139 + 140 + arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data); 141 + 142 + KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless); 143 + KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected); 144 + KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy)); 145 + } 146 + 147 + static void arm_smmu_v3_test_ste_expect_hitless_transition( 148 + struct kunit *test, const struct arm_smmu_ste *cur, 149 + const struct arm_smmu_ste *target, unsigned int num_syncs_expected) 150 + { 151 + arm_smmu_v3_test_ste_expect_transition(test, cur, target, 152 + num_syncs_expected, true); 153 + } 154 + 155 + static const dma_addr_t fake_cdtab_dma_addr = 0xF0F0F0F0F0F0; 156 + 157 + static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste, 158 + const dma_addr_t dma_addr) 159 + { 160 + struct arm_smmu_master master = { 161 + .cd_table.cdtab_dma = dma_addr, 162 + .cd_table.s1cdmax = 0xFF, 163 + .cd_table.s1fmt = STRTAB_STE_0_S1FMT_64K_L2, 164 + .smmu = &smmu, 165 + }; 166 + 167 + arm_smmu_make_cdtable_ste(ste, &master); 168 + } 169 + 170 + static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test) 171 + { 172 + /* 173 + * Bypass STEs has used bits in the first two Qwords, while abort STEs 174 + * only have used bits in the first QWord. Transitioning from bypass to 175 + * abort requires two syncs: the first to set the first qword and make 176 + * the STE into an abort, the second to clean up the second qword. 177 + */ 178 + arm_smmu_v3_test_ste_expect_hitless_transition( 179 + test, &bypass_ste, &abort_ste, NUM_EXPECTED_SYNCS(2)); 180 + } 181 + 182 + static void arm_smmu_v3_write_ste_test_abort_to_bypass(struct kunit *test) 183 + { 184 + /* 185 + * Transitioning from abort to bypass also requires two syncs: the first 186 + * to set the second qword data required by the bypass STE, and the 187 + * second to set the first qword and switch to bypass. 188 + */ 189 + arm_smmu_v3_test_ste_expect_hitless_transition( 190 + test, &abort_ste, &bypass_ste, NUM_EXPECTED_SYNCS(2)); 191 + } 192 + 193 + static void arm_smmu_v3_write_ste_test_cdtable_to_abort(struct kunit *test) 194 + { 195 + struct arm_smmu_ste ste; 196 + 197 + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); 198 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste, 199 + NUM_EXPECTED_SYNCS(2)); 200 + } 201 + 202 + static void arm_smmu_v3_write_ste_test_abort_to_cdtable(struct kunit *test) 203 + { 204 + struct arm_smmu_ste ste; 205 + 206 + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); 207 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste, 208 + NUM_EXPECTED_SYNCS(2)); 209 + } 210 + 211 + static void arm_smmu_v3_write_ste_test_cdtable_to_bypass(struct kunit *test) 212 + { 213 + struct arm_smmu_ste ste; 214 + 215 + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); 216 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste, 217 + NUM_EXPECTED_SYNCS(3)); 218 + } 219 + 220 + static void arm_smmu_v3_write_ste_test_bypass_to_cdtable(struct kunit *test) 221 + { 222 + struct arm_smmu_ste ste; 223 + 224 + arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr); 225 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste, 226 + NUM_EXPECTED_SYNCS(3)); 227 + } 228 + 229 + static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste, 230 + bool ats_enabled) 231 + { 232 + struct arm_smmu_master master = { 233 + .smmu = &smmu, 234 + .ats_enabled = ats_enabled, 235 + }; 236 + struct io_pgtable io_pgtable = {}; 237 + struct arm_smmu_domain smmu_domain = { 238 + .pgtbl_ops = &io_pgtable.ops, 239 + }; 240 + 241 + io_pgtable.cfg.arm_lpae_s2_cfg.vttbr = 0xdaedbeefdeadbeefULL; 242 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.ps = 1; 243 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tg = 2; 244 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sh = 3; 245 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.orgn = 1; 246 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.irgn = 2; 247 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sl = 3; 248 + io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tsz = 4; 249 + 250 + arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain); 251 + } 252 + 253 + static void arm_smmu_v3_write_ste_test_s2_to_abort(struct kunit *test) 254 + { 255 + struct arm_smmu_ste ste; 256 + 257 + arm_smmu_test_make_s2_ste(&ste, true); 258 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste, 259 + NUM_EXPECTED_SYNCS(2)); 260 + } 261 + 262 + static void arm_smmu_v3_write_ste_test_abort_to_s2(struct kunit *test) 263 + { 264 + struct arm_smmu_ste ste; 265 + 266 + arm_smmu_test_make_s2_ste(&ste, true); 267 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste, 268 + NUM_EXPECTED_SYNCS(2)); 269 + } 270 + 271 + static void arm_smmu_v3_write_ste_test_s2_to_bypass(struct kunit *test) 272 + { 273 + struct arm_smmu_ste ste; 274 + 275 + arm_smmu_test_make_s2_ste(&ste, true); 276 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste, 277 + NUM_EXPECTED_SYNCS(2)); 278 + } 279 + 280 + static void arm_smmu_v3_write_ste_test_bypass_to_s2(struct kunit *test) 281 + { 282 + struct arm_smmu_ste ste; 283 + 284 + arm_smmu_test_make_s2_ste(&ste, true); 285 + arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste, 286 + NUM_EXPECTED_SYNCS(2)); 287 + } 288 + 289 + static void arm_smmu_v3_test_cd_expect_transition( 290 + struct kunit *test, const struct arm_smmu_cd *cur, 291 + const struct arm_smmu_cd *target, unsigned int num_syncs_expected, 292 + bool hitless) 293 + { 294 + struct arm_smmu_cd cur_copy = *cur; 295 + struct arm_smmu_test_writer test_writer = { 296 + .writer = { 297 + .ops = &test_cd_ops, 298 + }, 299 + .test = test, 300 + .init_entry = cur->data, 301 + .target_entry = target->data, 302 + .entry = cur_copy.data, 303 + .num_syncs = 0, 304 + .invalid_entry_written = false, 305 + 306 + }; 307 + 308 + pr_debug("CD initial value: "); 309 + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data, 310 + sizeof(cur_copy), false); 311 + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data); 312 + pr_debug("CD target value: "); 313 + print_hex_dump_debug(" ", DUMP_PREFIX_NONE, 16, 8, target->data, 314 + sizeof(cur_copy), false); 315 + arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, 316 + target->data); 317 + 318 + arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data); 319 + 320 + KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless); 321 + KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected); 322 + KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy)); 323 + } 324 + 325 + static void arm_smmu_v3_test_cd_expect_non_hitless_transition( 326 + struct kunit *test, const struct arm_smmu_cd *cur, 327 + const struct arm_smmu_cd *target, unsigned int num_syncs_expected) 328 + { 329 + arm_smmu_v3_test_cd_expect_transition(test, cur, target, 330 + num_syncs_expected, false); 331 + } 332 + 333 + static void arm_smmu_v3_test_cd_expect_hitless_transition( 334 + struct kunit *test, const struct arm_smmu_cd *cur, 335 + const struct arm_smmu_cd *target, unsigned int num_syncs_expected) 336 + { 337 + arm_smmu_v3_test_cd_expect_transition(test, cur, target, 338 + num_syncs_expected, true); 339 + } 340 + 341 + static void arm_smmu_test_make_s1_cd(struct arm_smmu_cd *cd, unsigned int asid) 342 + { 343 + struct arm_smmu_master master = { 344 + .smmu = &smmu, 345 + }; 346 + struct io_pgtable io_pgtable = {}; 347 + struct arm_smmu_domain smmu_domain = { 348 + .pgtbl_ops = &io_pgtable.ops, 349 + .cd = { 350 + .asid = asid, 351 + }, 352 + }; 353 + 354 + io_pgtable.cfg.arm_lpae_s1_cfg.ttbr = 0xdaedbeefdeadbeefULL; 355 + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.ips = 1; 356 + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tg = 2; 357 + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.sh = 3; 358 + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.orgn = 1; 359 + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.irgn = 2; 360 + io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tsz = 4; 361 + io_pgtable.cfg.arm_lpae_s1_cfg.mair = 0xabcdef012345678ULL; 362 + 363 + arm_smmu_make_s1_cd(cd, &master, &smmu_domain); 364 + } 365 + 366 + static void arm_smmu_v3_write_cd_test_s1_clear(struct kunit *test) 367 + { 368 + struct arm_smmu_cd cd = {}; 369 + struct arm_smmu_cd cd_2; 370 + 371 + arm_smmu_test_make_s1_cd(&cd_2, 1997); 372 + arm_smmu_v3_test_cd_expect_non_hitless_transition( 373 + test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2)); 374 + arm_smmu_v3_test_cd_expect_non_hitless_transition( 375 + test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2)); 376 + } 377 + 378 + static void arm_smmu_v3_write_cd_test_s1_change_asid(struct kunit *test) 379 + { 380 + struct arm_smmu_cd cd = {}; 381 + struct arm_smmu_cd cd_2; 382 + 383 + arm_smmu_test_make_s1_cd(&cd, 778); 384 + arm_smmu_test_make_s1_cd(&cd_2, 1997); 385 + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2, 386 + NUM_EXPECTED_SYNCS(1)); 387 + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd, 388 + NUM_EXPECTED_SYNCS(1)); 389 + } 390 + 391 + static void arm_smmu_test_make_sva_cd(struct arm_smmu_cd *cd, unsigned int asid) 392 + { 393 + struct arm_smmu_master master = { 394 + .smmu = &smmu, 395 + }; 396 + 397 + arm_smmu_make_sva_cd(cd, &master, &sva_mm, asid); 398 + } 399 + 400 + static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd, 401 + unsigned int asid) 402 + { 403 + struct arm_smmu_master master = { 404 + .smmu = &smmu, 405 + }; 406 + 407 + arm_smmu_make_sva_cd(cd, &master, NULL, asid); 408 + } 409 + 410 + static void arm_smmu_v3_write_cd_test_sva_clear(struct kunit *test) 411 + { 412 + struct arm_smmu_cd cd = {}; 413 + struct arm_smmu_cd cd_2; 414 + 415 + arm_smmu_test_make_sva_cd(&cd_2, 1997); 416 + arm_smmu_v3_test_cd_expect_non_hitless_transition( 417 + test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2)); 418 + arm_smmu_v3_test_cd_expect_non_hitless_transition( 419 + test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2)); 420 + } 421 + 422 + static void arm_smmu_v3_write_cd_test_sva_release(struct kunit *test) 423 + { 424 + struct arm_smmu_cd cd; 425 + struct arm_smmu_cd cd_2; 426 + 427 + arm_smmu_test_make_sva_cd(&cd, 1997); 428 + arm_smmu_test_make_sva_release_cd(&cd_2, 1997); 429 + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2, 430 + NUM_EXPECTED_SYNCS(2)); 431 + arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd, 432 + NUM_EXPECTED_SYNCS(2)); 433 + } 434 + 435 + static struct kunit_case arm_smmu_v3_test_cases[] = { 436 + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort), 437 + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass), 438 + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort), 439 + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_cdtable), 440 + KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_bypass), 441 + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_cdtable), 442 + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_abort), 443 + KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_s2), 444 + KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_bypass), 445 + KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_s2), 446 + KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_clear), 447 + KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_change_asid), 448 + KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear), 449 + KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_release), 450 + {}, 451 + }; 452 + 453 + static int arm_smmu_v3_test_suite_init(struct kunit_suite *test) 454 + { 455 + arm_smmu_make_bypass_ste(&smmu, &bypass_ste); 456 + arm_smmu_make_abort_ste(&abort_ste); 457 + return 0; 458 + } 459 + 460 + static struct kunit_suite arm_smmu_v3_test_module = { 461 + .name = "arm-smmu-v3-kunit-test", 462 + .suite_init = arm_smmu_v3_test_suite_init, 463 + .test_cases = arm_smmu_v3_test_cases, 464 + }; 465 + kunit_test_suites(&arm_smmu_v3_test_module); 466 + 467 + MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING); 468 + MODULE_LICENSE("GPL v2");
+297 -279
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
··· 26 26 #include <linux/pci.h> 27 27 #include <linux/pci-ats.h> 28 28 #include <linux/platform_device.h> 29 + #include <kunit/visibility.h> 29 30 30 31 #include "arm-smmu-v3.h" 31 32 #include "../../dma-iommu.h" 32 - 33 - static bool disable_bypass = true; 34 - module_param(disable_bypass, bool, 0444); 35 - MODULE_PARM_DESC(disable_bypass, 36 - "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); 37 33 38 34 static bool disable_msipolling; 39 35 module_param(disable_msipolling, bool, 0444); ··· 43 47 ARM_SMMU_MAX_MSIS, 44 48 }; 45 49 46 - static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, 47 - ioasid_t sid); 50 + #define NUM_ENTRY_QWORDS 8 51 + static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64)); 52 + static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64)); 48 53 49 54 static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { 50 55 [EVTQ_MSI_INDEX] = { ··· 73 76 DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa); 74 77 DEFINE_MUTEX(arm_smmu_asid_lock); 75 78 76 - /* 77 - * Special value used by SVA when a process dies, to quiesce a CD without 78 - * disabling it. 79 - */ 80 - struct arm_smmu_ctx_desc quiet_cd = { 0 }; 81 - 82 79 static struct arm_smmu_option_prop arm_smmu_options[] = { 83 80 { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" }, 84 81 { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"}, ··· 81 90 82 91 static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain, 83 92 struct arm_smmu_device *smmu); 93 + static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master); 84 94 85 95 static void parse_driver_options(struct arm_smmu_device *smmu) 86 96 { ··· 969 977 * would be nice if this was complete according to the spec, but minimally it 970 978 * has to capture the bits this driver uses. 971 979 */ 972 - static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent, 973 - struct arm_smmu_ste *used_bits) 980 + VISIBLE_IF_KUNIT 981 + void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits) 974 982 { 975 - unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0])); 983 + unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0])); 976 984 977 - used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V); 978 - if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V))) 985 + used_bits[0] = cpu_to_le64(STRTAB_STE_0_V); 986 + if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V))) 979 987 return; 980 988 981 - used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG); 989 + used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG); 982 990 983 991 /* S1 translates */ 984 992 if (cfg & BIT(0)) { 985 - used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | 986 - STRTAB_STE_0_S1CTXPTR_MASK | 987 - STRTAB_STE_0_S1CDMAX); 988 - used_bits->data[1] |= 993 + used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT | 994 + STRTAB_STE_0_S1CTXPTR_MASK | 995 + STRTAB_STE_0_S1CDMAX); 996 + used_bits[1] |= 989 997 cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | 990 998 STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | 991 999 STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW | 992 1000 STRTAB_STE_1_EATS); 993 - used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); 1001 + used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID); 994 1002 } 995 1003 996 1004 /* S2 translates */ 997 1005 if (cfg & BIT(1)) { 998 - used_bits->data[1] |= 1006 + used_bits[1] |= 999 1007 cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); 1000 - used_bits->data[2] |= 1008 + used_bits[2] |= 1001 1009 cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | 1002 1010 STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | 1003 1011 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R); 1004 - used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); 1012 + used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK); 1005 1013 } 1006 1014 1007 1015 if (cfg == STRTAB_STE_0_CFG_BYPASS) 1008 - used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); 1016 + used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG); 1009 1017 } 1018 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_used); 1010 1019 1011 1020 /* 1012 1021 * Figure out if we can do a hitless update of entry to become target. Returns a ··· 1015 1022 * unused_update is an intermediate value of entry that has unused bits set to 1016 1023 * their new values. 1017 1024 */ 1018 - static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry, 1019 - const struct arm_smmu_ste *target, 1020 - struct arm_smmu_ste *unused_update) 1025 + static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer, 1026 + const __le64 *entry, const __le64 *target, 1027 + __le64 *unused_update) 1021 1028 { 1022 - struct arm_smmu_ste target_used = {}; 1023 - struct arm_smmu_ste cur_used = {}; 1029 + __le64 target_used[NUM_ENTRY_QWORDS] = {}; 1030 + __le64 cur_used[NUM_ENTRY_QWORDS] = {}; 1024 1031 u8 used_qword_diff = 0; 1025 1032 unsigned int i; 1026 1033 1027 - arm_smmu_get_ste_used(entry, &cur_used); 1028 - arm_smmu_get_ste_used(target, &target_used); 1034 + writer->ops->get_used(entry, cur_used); 1035 + writer->ops->get_used(target, target_used); 1029 1036 1030 - for (i = 0; i != ARRAY_SIZE(target_used.data); i++) { 1037 + for (i = 0; i != NUM_ENTRY_QWORDS; i++) { 1031 1038 /* 1032 1039 * Check that masks are up to date, the make functions are not 1033 1040 * allowed to set a bit to 1 if the used function doesn't say it 1034 1041 * is used. 1035 1042 */ 1036 - WARN_ON_ONCE(target->data[i] & ~target_used.data[i]); 1043 + WARN_ON_ONCE(target[i] & ~target_used[i]); 1037 1044 1038 1045 /* Bits can change because they are not currently being used */ 1039 - unused_update->data[i] = (entry->data[i] & cur_used.data[i]) | 1040 - (target->data[i] & ~cur_used.data[i]); 1046 + unused_update[i] = (entry[i] & cur_used[i]) | 1047 + (target[i] & ~cur_used[i]); 1041 1048 /* 1042 1049 * Each bit indicates that a used bit in a qword needs to be 1043 1050 * changed after unused_update is applied. 1044 1051 */ 1045 - if ((unused_update->data[i] & target_used.data[i]) != 1046 - target->data[i]) 1052 + if ((unused_update[i] & target_used[i]) != target[i]) 1047 1053 used_qword_diff |= 1 << i; 1048 1054 } 1049 1055 return used_qword_diff; 1050 1056 } 1051 1057 1052 - static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid, 1053 - struct arm_smmu_ste *entry, 1054 - const struct arm_smmu_ste *target, unsigned int start, 1058 + static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry, 1059 + const __le64 *target, unsigned int start, 1055 1060 unsigned int len) 1056 1061 { 1057 1062 bool changed = false; 1058 1063 unsigned int i; 1059 1064 1060 1065 for (i = start; len != 0; len--, i++) { 1061 - if (entry->data[i] != target->data[i]) { 1062 - WRITE_ONCE(entry->data[i], target->data[i]); 1066 + if (entry[i] != target[i]) { 1067 + WRITE_ONCE(entry[i], target[i]); 1063 1068 changed = true; 1064 1069 } 1065 1070 } 1066 1071 1067 1072 if (changed) 1068 - arm_smmu_sync_ste_for_sid(smmu, sid); 1073 + writer->ops->sync(writer); 1069 1074 return changed; 1070 1075 } 1071 1076 ··· 1093 1102 * V=0 process. This relies on the IGNORED behavior described in the 1094 1103 * specification. 1095 1104 */ 1096 - static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, 1097 - struct arm_smmu_ste *entry, 1098 - const struct arm_smmu_ste *target) 1105 + VISIBLE_IF_KUNIT 1106 + void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry, 1107 + const __le64 *target) 1099 1108 { 1100 - unsigned int num_entry_qwords = ARRAY_SIZE(target->data); 1101 - struct arm_smmu_device *smmu = master->smmu; 1102 - struct arm_smmu_ste unused_update; 1109 + __le64 unused_update[NUM_ENTRY_QWORDS]; 1103 1110 u8 used_qword_diff; 1104 1111 1105 1112 used_qword_diff = 1106 - arm_smmu_entry_qword_diff(entry, target, &unused_update); 1113 + arm_smmu_entry_qword_diff(writer, entry, target, unused_update); 1107 1114 if (hweight8(used_qword_diff) == 1) { 1108 1115 /* 1109 1116 * Only one qword needs its used bits to be changed. This is a 1110 - * hitless update, update all bits the current STE is ignoring 1111 - * to their new values, then update a single "critical qword" to 1112 - * change the STE and finally 0 out any bits that are now unused 1113 - * in the target configuration. 1117 + * hitless update, update all bits the current STE/CD is 1118 + * ignoring to their new values, then update a single "critical 1119 + * qword" to change the STE/CD and finally 0 out any bits that 1120 + * are now unused in the target configuration. 1114 1121 */ 1115 1122 unsigned int critical_qword_index = ffs(used_qword_diff) - 1; 1116 1123 ··· 1117 1128 * writing it in the next step anyways. This can save a sync 1118 1129 * when the only change is in that qword. 1119 1130 */ 1120 - unused_update.data[critical_qword_index] = 1121 - entry->data[critical_qword_index]; 1122 - entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords); 1123 - entry_set(smmu, sid, entry, target, critical_qword_index, 1); 1124 - entry_set(smmu, sid, entry, target, 0, num_entry_qwords); 1131 + unused_update[critical_qword_index] = 1132 + entry[critical_qword_index]; 1133 + entry_set(writer, entry, unused_update, 0, NUM_ENTRY_QWORDS); 1134 + entry_set(writer, entry, target, critical_qword_index, 1); 1135 + entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS); 1125 1136 } else if (used_qword_diff) { 1126 1137 /* 1127 1138 * At least two qwords need their inuse bits to be changed. This 1128 1139 * requires a breaking update, zero the V bit, write all qwords 1129 1140 * but 0, then set qword 0 1130 1141 */ 1131 - unused_update.data[0] = entry->data[0] & 1132 - cpu_to_le64(~STRTAB_STE_0_V); 1133 - entry_set(smmu, sid, entry, &unused_update, 0, 1); 1134 - entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1); 1135 - entry_set(smmu, sid, entry, target, 0, 1); 1142 + unused_update[0] = 0; 1143 + entry_set(writer, entry, unused_update, 0, 1); 1144 + entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1); 1145 + entry_set(writer, entry, target, 0, 1); 1136 1146 } else { 1137 1147 /* 1138 1148 * No inuse bit changed. Sanity check that all unused bits are 0 ··· 1139 1151 * compute_qword_diff(). 1140 1152 */ 1141 1153 WARN_ON_ONCE( 1142 - entry_set(smmu, sid, entry, target, 0, num_entry_qwords)); 1143 - } 1144 - 1145 - /* It's likely that we'll want to use the new STE soon */ 1146 - if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { 1147 - struct arm_smmu_cmdq_ent 1148 - prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, 1149 - .prefetch = { 1150 - .sid = sid, 1151 - } }; 1152 - 1153 - arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); 1154 + entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS)); 1154 1155 } 1155 1156 } 1157 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_write_entry); 1156 1158 1157 1159 static void arm_smmu_sync_cd(struct arm_smmu_master *master, 1158 1160 int ssid, bool leaf) ··· 1188 1210 u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | 1189 1211 CTXDESC_L1_DESC_V; 1190 1212 1191 - /* See comment in arm_smmu_write_ctx_desc() */ 1213 + /* The HW has 64 bit atomicity with stores to the L2 CD table */ 1192 1214 WRITE_ONCE(*dst, cpu_to_le64(val)); 1193 1215 } 1194 1216 1195 - static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, u32 ssid) 1217 + struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, 1218 + u32 ssid) 1196 1219 { 1197 - __le64 *l1ptr; 1198 - unsigned int idx; 1199 1220 struct arm_smmu_l1_ctx_desc *l1_desc; 1200 - struct arm_smmu_device *smmu = master->smmu; 1201 1221 struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; 1222 + 1223 + if (!cd_table->cdtab) 1224 + return NULL; 1202 1225 1203 1226 if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) 1204 - return cd_table->cdtab + ssid * CTXDESC_CD_DWORDS; 1227 + return (struct arm_smmu_cd *)(cd_table->cdtab + 1228 + ssid * CTXDESC_CD_DWORDS); 1205 1229 1206 - idx = ssid >> CTXDESC_SPLIT; 1207 - l1_desc = &cd_table->l1_desc[idx]; 1208 - if (!l1_desc->l2ptr) { 1209 - if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) 1210 - return NULL; 1211 - 1212 - l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS; 1213 - arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); 1214 - /* An invalid L1CD can be cached */ 1215 - arm_smmu_sync_cd(master, ssid, false); 1216 - } 1217 - idx = ssid & (CTXDESC_L2_ENTRIES - 1); 1218 - return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; 1230 + l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES]; 1231 + if (!l1_desc->l2ptr) 1232 + return NULL; 1233 + return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES]; 1219 1234 } 1220 1235 1221 - int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid, 1222 - struct arm_smmu_ctx_desc *cd) 1236 + struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, 1237 + u32 ssid) 1223 1238 { 1224 - /* 1225 - * This function handles the following cases: 1226 - * 1227 - * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0). 1228 - * (2) Install a secondary CD, for SID+SSID traffic. 1229 - * (3) Update ASID of a CD. Atomically write the first 64 bits of the 1230 - * CD, then invalidate the old entry and mappings. 1231 - * (4) Quiesce the context without clearing the valid bit. Disable 1232 - * translation, and ignore any translation fault. 1233 - * (5) Remove a secondary CD. 1234 - */ 1235 - u64 val; 1236 - bool cd_live; 1237 - __le64 *cdptr; 1238 1239 struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; 1239 1240 struct arm_smmu_device *smmu = master->smmu; 1240 1241 1241 - if (WARN_ON(ssid >= (1 << cd_table->s1cdmax))) 1242 - return -E2BIG; 1242 + might_sleep(); 1243 + iommu_group_mutex_assert(master->dev); 1243 1244 1244 - cdptr = arm_smmu_get_cd_ptr(master, ssid); 1245 - if (!cdptr) 1246 - return -ENOMEM; 1247 - 1248 - val = le64_to_cpu(cdptr[0]); 1249 - cd_live = !!(val & CTXDESC_CD_0_V); 1250 - 1251 - if (!cd) { /* (5) */ 1252 - val = 0; 1253 - } else if (cd == &quiet_cd) { /* (4) */ 1254 - if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE)) 1255 - val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R); 1256 - val |= CTXDESC_CD_0_TCR_EPD0; 1257 - } else if (cd_live) { /* (3) */ 1258 - val &= ~CTXDESC_CD_0_ASID; 1259 - val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid); 1260 - /* 1261 - * Until CD+TLB invalidation, both ASIDs may be used for tagging 1262 - * this substream's traffic 1263 - */ 1264 - } else { /* (1) and (2) */ 1265 - cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); 1266 - cdptr[2] = 0; 1267 - cdptr[3] = cpu_to_le64(cd->mair); 1268 - 1269 - /* 1270 - * STE may be live, and the SMMU might read dwords of this CD in any 1271 - * order. Ensure that it observes valid values before reading 1272 - * V=1. 1273 - */ 1274 - arm_smmu_sync_cd(master, ssid, true); 1275 - 1276 - val = cd->tcr | 1277 - #ifdef __BIG_ENDIAN 1278 - CTXDESC_CD_0_ENDI | 1279 - #endif 1280 - CTXDESC_CD_0_R | CTXDESC_CD_0_A | 1281 - (cd->mm ? 0 : CTXDESC_CD_0_ASET) | 1282 - CTXDESC_CD_0_AA64 | 1283 - FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) | 1284 - CTXDESC_CD_0_V; 1285 - 1286 - if (cd_table->stall_enabled) 1287 - val |= CTXDESC_CD_0_S; 1245 + if (!cd_table->cdtab) { 1246 + if (arm_smmu_alloc_cd_tables(master)) 1247 + return NULL; 1288 1248 } 1289 1249 1250 + if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) { 1251 + unsigned int idx = ssid / CTXDESC_L2_ENTRIES; 1252 + struct arm_smmu_l1_ctx_desc *l1_desc; 1253 + 1254 + l1_desc = &cd_table->l1_desc[idx]; 1255 + if (!l1_desc->l2ptr) { 1256 + __le64 *l1ptr; 1257 + 1258 + if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) 1259 + return NULL; 1260 + 1261 + l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS; 1262 + arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); 1263 + /* An invalid L1CD can be cached */ 1264 + arm_smmu_sync_cd(master, ssid, false); 1265 + } 1266 + } 1267 + return arm_smmu_get_cd_ptr(master, ssid); 1268 + } 1269 + 1270 + struct arm_smmu_cd_writer { 1271 + struct arm_smmu_entry_writer writer; 1272 + unsigned int ssid; 1273 + }; 1274 + 1275 + VISIBLE_IF_KUNIT 1276 + void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits) 1277 + { 1278 + used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V); 1279 + if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V))) 1280 + return; 1281 + memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd)); 1282 + 1290 1283 /* 1291 - * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3 1292 - * "Configuration structures and configuration invalidation completion" 1293 - * 1294 - * The size of single-copy atomic reads made by the SMMU is 1295 - * IMPLEMENTATION DEFINED but must be at least 64 bits. Any single 1296 - * field within an aligned 64-bit span of a structure can be altered 1297 - * without first making the structure invalid. 1284 + * If EPD0 is set by the make function it means 1285 + * T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED 1298 1286 */ 1299 - WRITE_ONCE(cdptr[0], cpu_to_le64(val)); 1300 - arm_smmu_sync_cd(master, ssid, true); 1301 - return 0; 1287 + if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) { 1288 + used_bits[0] &= ~cpu_to_le64( 1289 + CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 | 1290 + CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 | 1291 + CTXDESC_CD_0_TCR_SH0); 1292 + used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK); 1293 + } 1294 + } 1295 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_cd_used); 1296 + 1297 + static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer) 1298 + { 1299 + struct arm_smmu_cd_writer *cd_writer = 1300 + container_of(writer, struct arm_smmu_cd_writer, writer); 1301 + 1302 + arm_smmu_sync_cd(writer->master, cd_writer->ssid, true); 1303 + } 1304 + 1305 + static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = { 1306 + .sync = arm_smmu_cd_writer_sync_entry, 1307 + .get_used = arm_smmu_get_cd_used, 1308 + }; 1309 + 1310 + void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, 1311 + struct arm_smmu_cd *cdptr, 1312 + const struct arm_smmu_cd *target) 1313 + { 1314 + struct arm_smmu_cd_writer cd_writer = { 1315 + .writer = { 1316 + .ops = &arm_smmu_cd_writer_ops, 1317 + .master = master, 1318 + }, 1319 + .ssid = ssid, 1320 + }; 1321 + 1322 + arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data); 1323 + } 1324 + 1325 + void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, 1326 + struct arm_smmu_master *master, 1327 + struct arm_smmu_domain *smmu_domain) 1328 + { 1329 + struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; 1330 + const struct io_pgtable_cfg *pgtbl_cfg = 1331 + &io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg; 1332 + typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = 1333 + &pgtbl_cfg->arm_lpae_s1_cfg.tcr; 1334 + 1335 + memset(target, 0, sizeof(*target)); 1336 + 1337 + target->data[0] = cpu_to_le64( 1338 + FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | 1339 + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | 1340 + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | 1341 + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | 1342 + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | 1343 + #ifdef __BIG_ENDIAN 1344 + CTXDESC_CD_0_ENDI | 1345 + #endif 1346 + CTXDESC_CD_0_TCR_EPD1 | 1347 + CTXDESC_CD_0_V | 1348 + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | 1349 + CTXDESC_CD_0_AA64 | 1350 + (master->stall_enabled ? CTXDESC_CD_0_S : 0) | 1351 + CTXDESC_CD_0_R | 1352 + CTXDESC_CD_0_A | 1353 + CTXDESC_CD_0_ASET | 1354 + FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) 1355 + ); 1356 + target->data[1] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.ttbr & 1357 + CTXDESC_CD_1_TTB0_MASK); 1358 + target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.mair); 1359 + } 1360 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_s1_cd); 1361 + 1362 + void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid) 1363 + { 1364 + struct arm_smmu_cd target = {}; 1365 + struct arm_smmu_cd *cdptr; 1366 + 1367 + if (!master->cd_table.cdtab) 1368 + return; 1369 + cdptr = arm_smmu_get_cd_ptr(master, ssid); 1370 + if (WARN_ON(!cdptr)) 1371 + return; 1372 + arm_smmu_write_cd_entry(master, ssid, cdptr, &target); 1302 1373 } 1303 1374 1304 1375 static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master) ··· 1358 1331 struct arm_smmu_device *smmu = master->smmu; 1359 1332 struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; 1360 1333 1361 - cd_table->stall_enabled = master->stall_enabled; 1362 1334 cd_table->s1cdmax = master->ssid_bits; 1363 1335 max_contexts = 1 << cd_table->s1cdmax; 1364 1336 ··· 1455 1429 val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span); 1456 1430 val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK; 1457 1431 1458 - /* See comment in arm_smmu_write_ctx_desc() */ 1432 + /* The HW has 64 bit atomicity with stores to the L2 STE table */ 1459 1433 WRITE_ONCE(*dst, cpu_to_le64(val)); 1460 1434 } 1461 1435 1462 - static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) 1436 + struct arm_smmu_ste_writer { 1437 + struct arm_smmu_entry_writer writer; 1438 + u32 sid; 1439 + }; 1440 + 1441 + static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer) 1463 1442 { 1443 + struct arm_smmu_ste_writer *ste_writer = 1444 + container_of(writer, struct arm_smmu_ste_writer, writer); 1464 1445 struct arm_smmu_cmdq_ent cmd = { 1465 1446 .opcode = CMDQ_OP_CFGI_STE, 1466 1447 .cfgi = { 1467 - .sid = sid, 1448 + .sid = ste_writer->sid, 1468 1449 .leaf = true, 1469 1450 }, 1470 1451 }; 1471 1452 1472 - arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); 1453 + arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd); 1473 1454 } 1474 1455 1475 - static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) 1456 + static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = { 1457 + .sync = arm_smmu_ste_writer_sync_entry, 1458 + .get_used = arm_smmu_get_ste_used, 1459 + }; 1460 + 1461 + static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid, 1462 + struct arm_smmu_ste *ste, 1463 + const struct arm_smmu_ste *target) 1464 + { 1465 + struct arm_smmu_device *smmu = master->smmu; 1466 + struct arm_smmu_ste_writer ste_writer = { 1467 + .writer = { 1468 + .ops = &arm_smmu_ste_writer_ops, 1469 + .master = master, 1470 + }, 1471 + .sid = sid, 1472 + }; 1473 + 1474 + arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data); 1475 + 1476 + /* It's likely that we'll want to use the new STE soon */ 1477 + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) { 1478 + struct arm_smmu_cmdq_ent 1479 + prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG, 1480 + .prefetch = { 1481 + .sid = sid, 1482 + } }; 1483 + 1484 + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); 1485 + } 1486 + } 1487 + 1488 + VISIBLE_IF_KUNIT 1489 + void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) 1476 1490 { 1477 1491 memset(target, 0, sizeof(*target)); 1478 1492 target->data[0] = cpu_to_le64( 1479 1493 STRTAB_STE_0_V | 1480 1494 FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT)); 1481 1495 } 1496 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_abort_ste); 1482 1497 1483 - static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, 1484 - struct arm_smmu_ste *target) 1498 + VISIBLE_IF_KUNIT 1499 + void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, 1500 + struct arm_smmu_ste *target) 1485 1501 { 1486 1502 memset(target, 0, sizeof(*target)); 1487 1503 target->data[0] = cpu_to_le64( ··· 1534 1466 target->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, 1535 1467 STRTAB_STE_1_SHCFG_INCOMING)); 1536 1468 } 1469 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_bypass_ste); 1537 1470 1538 - static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, 1539 - struct arm_smmu_master *master) 1471 + VISIBLE_IF_KUNIT 1472 + void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, 1473 + struct arm_smmu_master *master) 1540 1474 { 1541 1475 struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table; 1542 1476 struct arm_smmu_device *smmu = master->smmu; ··· 1586 1516 cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0)); 1587 1517 } 1588 1518 } 1519 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_cdtable_ste); 1589 1520 1590 - static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, 1591 - struct arm_smmu_master *master, 1592 - struct arm_smmu_domain *smmu_domain) 1521 + VISIBLE_IF_KUNIT 1522 + void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, 1523 + struct arm_smmu_master *master, 1524 + struct arm_smmu_domain *smmu_domain) 1593 1525 { 1594 1526 struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg; 1595 1527 const struct io_pgtable_cfg *pgtbl_cfg = ··· 1634 1562 target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s2_cfg.vttbr & 1635 1563 STRTAB_STE_3_S2TTB_MASK); 1636 1564 } 1565 + EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_s2_domain_ste); 1637 1566 1638 1567 /* 1639 1568 * This can safely directly manipulate the STE memory without a sync sequence 1640 1569 * because the STE table has not been installed in the SMMU yet. 1641 1570 */ 1642 - static void arm_smmu_init_initial_stes(struct arm_smmu_device *smmu, 1643 - struct arm_smmu_ste *strtab, 1571 + static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab, 1644 1572 unsigned int nent) 1645 1573 { 1646 1574 unsigned int i; 1647 1575 1648 1576 for (i = 0; i < nent; ++i) { 1649 - if (disable_bypass) 1650 - arm_smmu_make_abort_ste(strtab); 1651 - else 1652 - arm_smmu_make_bypass_ste(smmu, strtab); 1577 + arm_smmu_make_abort_ste(strtab); 1653 1578 strtab++; 1654 1579 } 1655 1580 } ··· 1674 1605 return -ENOMEM; 1675 1606 } 1676 1607 1677 - arm_smmu_init_initial_stes(smmu, desc->l2ptr, 1 << STRTAB_SPLIT); 1608 + arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT); 1678 1609 arm_smmu_write_strtab_l1_desc(strtab, desc); 1679 1610 return 0; 1680 1611 } ··· 2299 2230 } 2300 2231 2301 2232 static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu, 2302 - struct arm_smmu_domain *smmu_domain, 2303 - struct io_pgtable_cfg *pgtbl_cfg) 2233 + struct arm_smmu_domain *smmu_domain) 2304 2234 { 2305 2235 int ret; 2306 2236 u32 asid; 2307 2237 struct arm_smmu_ctx_desc *cd = &smmu_domain->cd; 2308 - typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr; 2309 2238 2310 2239 refcount_set(&cd->refs, 1); 2311 2240 ··· 2311 2244 mutex_lock(&arm_smmu_asid_lock); 2312 2245 ret = xa_alloc(&arm_smmu_asid_xa, &asid, cd, 2313 2246 XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); 2314 - if (ret) 2315 - goto out_unlock; 2316 - 2317 2247 cd->asid = (u16)asid; 2318 - cd->ttbr = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; 2319 - cd->tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | 2320 - FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | 2321 - FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | 2322 - FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | 2323 - FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | 2324 - FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | 2325 - CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; 2326 - cd->mair = pgtbl_cfg->arm_lpae_s1_cfg.mair; 2327 - 2328 - mutex_unlock(&arm_smmu_asid_lock); 2329 - return 0; 2330 - 2331 - out_unlock: 2332 2248 mutex_unlock(&arm_smmu_asid_lock); 2333 2249 return ret; 2334 2250 } 2335 2251 2336 2252 static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu, 2337 - struct arm_smmu_domain *smmu_domain, 2338 - struct io_pgtable_cfg *pgtbl_cfg) 2253 + struct arm_smmu_domain *smmu_domain) 2339 2254 { 2340 2255 int vmid; 2341 2256 struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; ··· 2341 2292 struct io_pgtable_cfg pgtbl_cfg; 2342 2293 struct io_pgtable_ops *pgtbl_ops; 2343 2294 int (*finalise_stage_fn)(struct arm_smmu_device *smmu, 2344 - struct arm_smmu_domain *smmu_domain, 2345 - struct io_pgtable_cfg *pgtbl_cfg); 2295 + struct arm_smmu_domain *smmu_domain); 2346 2296 2347 2297 /* Restrict the stage to what we can actually support */ 2348 2298 if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) ··· 2384 2336 smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; 2385 2337 smmu_domain->domain.geometry.force_aperture = true; 2386 2338 2387 - ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg); 2339 + ret = finalise_stage_fn(smmu, smmu_domain); 2388 2340 if (ret < 0) { 2389 2341 free_io_pgtable_ops(pgtbl_ops); 2390 2342 return ret; ··· 2467 2419 pdev = to_pci_dev(master->dev); 2468 2420 2469 2421 atomic_inc(&smmu_domain->nr_ats_masters); 2470 - arm_smmu_atc_inv_domain(smmu_domain, IOMMU_NO_PASID, 0, 0); 2422 + /* 2423 + * ATC invalidation of PASID 0 causes the entire ATC to be flushed. 2424 + */ 2425 + arm_smmu_atc_inv_master(master); 2471 2426 if (pci_enable_ats(pdev, stu)) 2472 2427 dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); 2473 2428 } ··· 2566 2515 struct arm_smmu_device *smmu; 2567 2516 struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 2568 2517 struct arm_smmu_master *master; 2518 + struct arm_smmu_cd *cdptr; 2569 2519 2570 2520 if (!fwspec) 2571 2521 return -ENOENT; ··· 2595 2543 if (ret) 2596 2544 return ret; 2597 2545 2546 + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { 2547 + cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID); 2548 + if (!cdptr) 2549 + return -ENOMEM; 2550 + } 2551 + 2598 2552 /* 2599 2553 * Prevent arm_smmu_share_asid() from trying to change the ASID 2600 2554 * of either the old or new domain while we are working on it. ··· 2618 2560 spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); 2619 2561 2620 2562 switch (smmu_domain->stage) { 2621 - case ARM_SMMU_DOMAIN_S1: 2622 - if (!master->cd_table.cdtab) { 2623 - ret = arm_smmu_alloc_cd_tables(master); 2624 - if (ret) 2625 - goto out_list_del; 2626 - } else { 2627 - /* 2628 - * arm_smmu_write_ctx_desc() relies on the entry being 2629 - * invalid to work, clear any existing entry. 2630 - */ 2631 - ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, 2632 - NULL); 2633 - if (ret) 2634 - goto out_list_del; 2635 - } 2563 + case ARM_SMMU_DOMAIN_S1: { 2564 + struct arm_smmu_cd target_cd; 2636 2565 2637 - ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd); 2638 - if (ret) 2639 - goto out_list_del; 2640 - 2566 + arm_smmu_make_s1_cd(&target_cd, master, smmu_domain); 2567 + arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr, 2568 + &target_cd); 2641 2569 arm_smmu_make_cdtable_ste(&target, master); 2642 2570 arm_smmu_install_ste_for_dev(master, &target); 2643 2571 break; 2572 + } 2644 2573 case ARM_SMMU_DOMAIN_S2: 2645 2574 arm_smmu_make_s2_domain_ste(&target, master, smmu_domain); 2646 2575 arm_smmu_install_ste_for_dev(master, &target); 2647 - if (master->cd_table.cdtab) 2648 - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, 2649 - NULL); 2576 + arm_smmu_clear_cd(master, IOMMU_NO_PASID); 2650 2577 break; 2651 2578 } 2652 2579 2653 2580 arm_smmu_enable_ats(master, smmu_domain); 2654 - goto out_unlock; 2655 - 2656 - out_list_del: 2657 - spin_lock_irqsave(&smmu_domain->devices_lock, flags); 2658 - list_del_init(&master->domain_head); 2659 - spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); 2660 - 2661 - out_unlock: 2662 2581 mutex_unlock(&arm_smmu_asid_lock); 2663 - return ret; 2582 + return 0; 2664 2583 } 2665 2584 2666 2585 static int arm_smmu_attach_dev_ste(struct device *dev, ··· 2671 2636 * arm_smmu_domain->devices to avoid races updating the same context 2672 2637 * descriptor from arm_smmu_share_asid(). 2673 2638 */ 2674 - if (master->cd_table.cdtab) 2675 - arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL); 2639 + arm_smmu_clear_cd(master, IOMMU_NO_PASID); 2676 2640 return 0; 2677 2641 } 2678 2642 ··· 2949 2915 iopf_queue_remove_device(master->smmu->evtq.iopf, dev); 2950 2916 2951 2917 /* Put the STE back to what arm_smmu_init_strtab() sets */ 2952 - if (disable_bypass && !dev->iommu->require_direct) 2953 - arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev); 2954 - else 2918 + if (dev->iommu->require_direct) 2955 2919 arm_smmu_attach_dev_identity(&arm_smmu_identity_domain, dev); 2920 + else 2921 + arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev); 2956 2922 2957 2923 arm_smmu_disable_pasid(master); 2958 2924 arm_smmu_remove_master(master); ··· 3087 3053 return 0; 3088 3054 } 3089 3055 3090 - static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid) 3056 + static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 3057 + struct iommu_domain *domain) 3091 3058 { 3092 - struct iommu_domain *domain; 3093 - 3094 - domain = iommu_get_domain_for_dev_pasid(dev, pasid, IOMMU_DOMAIN_SVA); 3095 - if (WARN_ON(IS_ERR(domain)) || !domain) 3096 - return; 3097 - 3098 3059 arm_smmu_sva_remove_dev_pasid(domain, dev, pasid); 3099 3060 } 3100 3061 ··· 3302 3273 reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); 3303 3274 cfg->strtab_base_cfg = reg; 3304 3275 3305 - arm_smmu_init_initial_stes(smmu, strtab, cfg->num_l1_ents); 3276 + arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents); 3306 3277 return 0; 3307 3278 } 3308 3279 ··· 3431 3402 smmu->priq.q.irq = msi_get_virq(dev, PRIQ_MSI_INDEX); 3432 3403 3433 3404 /* Add callback to free MSIs on teardown */ 3434 - devm_add_action(dev, arm_smmu_free_msis, dev); 3405 + devm_add_action_or_reset(dev, arm_smmu_free_msis, dev); 3435 3406 } 3436 3407 3437 3408 static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu) ··· 3532 3503 return ret; 3533 3504 } 3534 3505 3535 - static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) 3506 + static int arm_smmu_device_reset(struct arm_smmu_device *smmu) 3536 3507 { 3537 3508 int ret; 3538 3509 u32 reg, enables; ··· 3542 3513 reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); 3543 3514 if (reg & CR0_SMMUEN) { 3544 3515 dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); 3545 - WARN_ON(is_kdump_kernel() && !disable_bypass); 3546 3516 arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); 3547 3517 } 3548 3518 ··· 3648 3620 if (is_kdump_kernel()) 3649 3621 enables &= ~(CR0_EVTQEN | CR0_PRIQEN); 3650 3622 3651 - /* Enable the SMMU interface, or ensure bypass */ 3652 - if (!bypass || disable_bypass) { 3653 - enables |= CR0_SMMUEN; 3654 - } else { 3655 - ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT); 3656 - if (ret) 3657 - return ret; 3658 - } 3623 + /* Enable the SMMU interface */ 3624 + enables |= CR0_SMMUEN; 3659 3625 ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, 3660 3626 ARM_SMMU_CR0ACK); 3661 3627 if (ret) { ··· 4041 4019 resource_size_t ioaddr; 4042 4020 struct arm_smmu_device *smmu; 4043 4021 struct device *dev = &pdev->dev; 4044 - bool bypass; 4045 4022 4046 4023 smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL); 4047 4024 if (!smmu) ··· 4051 4030 ret = arm_smmu_device_dt_probe(pdev, smmu); 4052 4031 } else { 4053 4032 ret = arm_smmu_device_acpi_probe(pdev, smmu); 4054 - if (ret == -ENODEV) 4055 - return ret; 4056 4033 } 4057 - 4058 - /* Set bypass mode according to firmware probing result */ 4059 - bypass = !!ret; 4034 + if (ret) 4035 + return ret; 4060 4036 4061 4037 /* Base address */ 4062 4038 res = platform_get_resource(pdev, IORESOURCE_MEM, 0); ··· 4117 4099 arm_smmu_rmr_install_bypass_ste(smmu); 4118 4100 4119 4101 /* Reset the device */ 4120 - ret = arm_smmu_device_reset(smmu, bypass); 4102 + ret = arm_smmu_device_reset(smmu); 4121 4103 if (ret) 4122 4104 return ret; 4123 4105
+49 -11
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
··· 275 275 * 2lvl: at most 1024 L1 entries, 276 276 * 1024 lazy entries per table. 277 277 */ 278 - #define CTXDESC_SPLIT 10 279 - #define CTXDESC_L2_ENTRIES (1 << CTXDESC_SPLIT) 278 + #define CTXDESC_L2_ENTRIES 1024 280 279 281 280 #define CTXDESC_L1_DESC_DWORDS 1 282 281 #define CTXDESC_L1_DESC_V (1UL << 0) 283 282 #define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12) 284 283 285 284 #define CTXDESC_CD_DWORDS 8 285 + 286 + struct arm_smmu_cd { 287 + __le64 data[CTXDESC_CD_DWORDS]; 288 + }; 289 + 286 290 #define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0) 287 291 #define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6) 288 292 #define CTXDESC_CD_0_TCR_IRGN0 GENMASK_ULL(9, 8) ··· 587 583 588 584 struct arm_smmu_ctx_desc { 589 585 u16 asid; 590 - u64 ttbr; 591 - u64 tcr; 592 - u64 mair; 593 586 594 587 refcount_t refs; 595 588 struct mm_struct *mm; 596 589 }; 597 590 598 591 struct arm_smmu_l1_ctx_desc { 599 - __le64 *l2ptr; 592 + struct arm_smmu_cd *l2ptr; 600 593 dma_addr_t l2ptr_dma; 601 594 }; 602 595 ··· 605 604 u8 s1fmt; 606 605 /* log2 of the maximum number of CDs supported by this table */ 607 606 u8 s1cdmax; 608 - /* Whether CD entries in this table have the stall bit set. */ 609 - u8 stall_enabled:1; 610 607 }; 611 608 612 609 struct arm_smmu_s2_cfg { ··· 736 737 struct list_head mmu_notifiers; 737 738 }; 738 739 740 + /* The following are exposed for testing purposes. */ 741 + struct arm_smmu_entry_writer_ops; 742 + struct arm_smmu_entry_writer { 743 + const struct arm_smmu_entry_writer_ops *ops; 744 + struct arm_smmu_master *master; 745 + }; 746 + 747 + struct arm_smmu_entry_writer_ops { 748 + void (*get_used)(const __le64 *entry, __le64 *used); 749 + void (*sync)(struct arm_smmu_entry_writer *writer); 750 + }; 751 + 752 + #if IS_ENABLED(CONFIG_KUNIT) 753 + void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits); 754 + void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur, 755 + const __le64 *target); 756 + void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits); 757 + void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); 758 + void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, 759 + struct arm_smmu_ste *target); 760 + void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, 761 + struct arm_smmu_master *master); 762 + void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, 763 + struct arm_smmu_master *master, 764 + struct arm_smmu_domain *smmu_domain); 765 + void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, 766 + struct arm_smmu_master *master, struct mm_struct *mm, 767 + u16 asid); 768 + #endif 769 + 739 770 static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) 740 771 { 741 772 return container_of(dom, struct arm_smmu_domain, domain); ··· 773 744 774 745 extern struct xarray arm_smmu_asid_xa; 775 746 extern struct mutex arm_smmu_asid_lock; 776 - extern struct arm_smmu_ctx_desc quiet_cd; 777 747 778 - int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid, 779 - struct arm_smmu_ctx_desc *cd); 748 + void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid); 749 + struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master, 750 + u32 ssid); 751 + struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master, 752 + u32 ssid); 753 + void arm_smmu_make_s1_cd(struct arm_smmu_cd *target, 754 + struct arm_smmu_master *master, 755 + struct arm_smmu_domain *smmu_domain); 756 + void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid, 757 + struct arm_smmu_cd *cdptr, 758 + const struct arm_smmu_cd *target); 759 + 780 760 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid); 781 761 void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, 782 762 size_t granule, bool leaf,
+496
drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* 3 3 * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved. 4 + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. 4 5 */ 5 6 7 + #include <linux/cleanup.h> 6 8 #include <linux/device.h> 9 + #include <linux/interconnect.h> 7 10 #include <linux/firmware/qcom/qcom_scm.h> 11 + #include <linux/iopoll.h> 12 + #include <linux/list.h> 13 + #include <linux/mod_devicetable.h> 14 + #include <linux/mutex.h> 15 + #include <linux/platform_device.h> 8 16 #include <linux/ratelimit.h> 17 + #include <linux/spinlock.h> 9 18 10 19 #include "arm-smmu.h" 11 20 #include "arm-smmu-qcom.h" 21 + 22 + #define TBU_DBG_TIMEOUT_US 100 23 + #define DEBUG_AXUSER_REG 0x30 24 + #define DEBUG_AXUSER_CDMID GENMASK_ULL(43, 36) 25 + #define DEBUG_AXUSER_CDMID_VAL 0xff 26 + #define DEBUG_PAR_REG 0x28 27 + #define DEBUG_PAR_FAULT_VAL BIT(0) 28 + #define DEBUG_PAR_PA GENMASK_ULL(47, 12) 29 + #define DEBUG_SID_HALT_REG 0x0 30 + #define DEBUG_SID_HALT_VAL BIT(16) 31 + #define DEBUG_SID_HALT_SID GENMASK(9, 0) 32 + #define DEBUG_SR_HALT_ACK_REG 0x20 33 + #define DEBUG_SR_HALT_ACK_VAL BIT(1) 34 + #define DEBUG_SR_ECATS_RUNNING_VAL BIT(0) 35 + #define DEBUG_TXN_AXCACHE GENMASK(5, 2) 36 + #define DEBUG_TXN_AXPROT GENMASK(8, 6) 37 + #define DEBUG_TXN_AXPROT_PRIV 0x1 38 + #define DEBUG_TXN_AXPROT_NSEC 0x2 39 + #define DEBUG_TXN_TRIGG_REG 0x18 40 + #define DEBUG_TXN_TRIGGER BIT(0) 41 + #define DEBUG_VA_ADDR_REG 0x8 42 + 43 + static LIST_HEAD(tbu_list); 44 + static DEFINE_MUTEX(tbu_list_lock); 45 + static DEFINE_SPINLOCK(atos_lock); 46 + 47 + struct qcom_tbu { 48 + struct device *dev; 49 + struct device_node *smmu_np; 50 + u32 sid_range[2]; 51 + struct list_head list; 52 + struct clk *clk; 53 + struct icc_path *path; 54 + void __iomem *base; 55 + spinlock_t halt_lock; /* multiple halt or resume can't execute concurrently */ 56 + int halt_count; 57 + }; 58 + 59 + static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) 60 + { 61 + return container_of(smmu, struct qcom_smmu, smmu); 62 + } 12 63 13 64 void qcom_smmu_tlb_sync_debug(struct arm_smmu_device *smmu) 14 65 { ··· 100 49 tbu_pwr_status, sync_inv_ack, sync_inv_progress); 101 50 } 102 51 } 52 + 53 + static struct qcom_tbu *qcom_find_tbu(struct qcom_smmu *qsmmu, u32 sid) 54 + { 55 + struct qcom_tbu *tbu; 56 + u32 start, end; 57 + 58 + guard(mutex)(&tbu_list_lock); 59 + 60 + if (list_empty(&tbu_list)) 61 + return NULL; 62 + 63 + list_for_each_entry(tbu, &tbu_list, list) { 64 + start = tbu->sid_range[0]; 65 + end = start + tbu->sid_range[1]; 66 + 67 + if (qsmmu->smmu.dev->of_node == tbu->smmu_np && 68 + start <= sid && sid < end) 69 + return tbu; 70 + } 71 + dev_err(qsmmu->smmu.dev, "Unable to find TBU for sid 0x%x\n", sid); 72 + 73 + return NULL; 74 + } 75 + 76 + static int qcom_tbu_halt(struct qcom_tbu *tbu, struct arm_smmu_domain *smmu_domain) 77 + { 78 + struct arm_smmu_device *smmu = smmu_domain->smmu; 79 + int ret = 0, idx = smmu_domain->cfg.cbndx; 80 + u32 val, fsr, status; 81 + 82 + guard(spinlock_irqsave)(&tbu->halt_lock); 83 + if (tbu->halt_count) { 84 + tbu->halt_count++; 85 + return ret; 86 + } 87 + 88 + val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG); 89 + val |= DEBUG_SID_HALT_VAL; 90 + writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG); 91 + 92 + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); 93 + if ((fsr & ARM_SMMU_FSR_FAULT) && (fsr & ARM_SMMU_FSR_SS)) { 94 + u32 sctlr_orig, sctlr; 95 + 96 + /* 97 + * We are in a fault. Our request to halt the bus will not 98 + * complete until transactions in front of us (such as the fault 99 + * itself) have completed. Disable iommu faults and terminate 100 + * any existing transactions. 101 + */ 102 + sctlr_orig = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR); 103 + sctlr = sctlr_orig & ~(ARM_SMMU_SCTLR_CFCFG | ARM_SMMU_SCTLR_CFIE); 104 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr); 105 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); 106 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME, ARM_SMMU_RESUME_TERMINATE); 107 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr_orig); 108 + } 109 + 110 + if (readl_poll_timeout_atomic(tbu->base + DEBUG_SR_HALT_ACK_REG, status, 111 + (status & DEBUG_SR_HALT_ACK_VAL), 112 + 0, TBU_DBG_TIMEOUT_US)) { 113 + dev_err(tbu->dev, "Timeout while trying to halt TBU!\n"); 114 + ret = -ETIMEDOUT; 115 + 116 + val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG); 117 + val &= ~DEBUG_SID_HALT_VAL; 118 + writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG); 119 + 120 + return ret; 121 + } 122 + 123 + tbu->halt_count = 1; 124 + 125 + return ret; 126 + } 127 + 128 + static void qcom_tbu_resume(struct qcom_tbu *tbu) 129 + { 130 + u32 val; 131 + 132 + guard(spinlock_irqsave)(&tbu->halt_lock); 133 + if (!tbu->halt_count) { 134 + WARN(1, "%s: halt_count is 0", dev_name(tbu->dev)); 135 + return; 136 + } 137 + 138 + if (tbu->halt_count > 1) { 139 + tbu->halt_count--; 140 + return; 141 + } 142 + 143 + val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG); 144 + val &= ~DEBUG_SID_HALT_VAL; 145 + writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG); 146 + 147 + tbu->halt_count = 0; 148 + } 149 + 150 + static phys_addr_t qcom_tbu_trigger_atos(struct arm_smmu_domain *smmu_domain, 151 + struct qcom_tbu *tbu, dma_addr_t iova, u32 sid) 152 + { 153 + bool atos_timedout = false; 154 + phys_addr_t phys = 0; 155 + ktime_t timeout; 156 + u64 val; 157 + 158 + /* Set address and stream-id */ 159 + val = readq_relaxed(tbu->base + DEBUG_SID_HALT_REG); 160 + val &= ~DEBUG_SID_HALT_SID; 161 + val |= FIELD_PREP(DEBUG_SID_HALT_SID, sid); 162 + writeq_relaxed(val, tbu->base + DEBUG_SID_HALT_REG); 163 + writeq_relaxed(iova, tbu->base + DEBUG_VA_ADDR_REG); 164 + val = FIELD_PREP(DEBUG_AXUSER_CDMID, DEBUG_AXUSER_CDMID_VAL); 165 + writeq_relaxed(val, tbu->base + DEBUG_AXUSER_REG); 166 + 167 + /* Write-back read and write-allocate */ 168 + val = FIELD_PREP(DEBUG_TXN_AXCACHE, 0xf); 169 + 170 + /* Non-secure access */ 171 + val |= FIELD_PREP(DEBUG_TXN_AXPROT, DEBUG_TXN_AXPROT_NSEC); 172 + 173 + /* Privileged access */ 174 + val |= FIELD_PREP(DEBUG_TXN_AXPROT, DEBUG_TXN_AXPROT_PRIV); 175 + 176 + val |= DEBUG_TXN_TRIGGER; 177 + writeq_relaxed(val, tbu->base + DEBUG_TXN_TRIGG_REG); 178 + 179 + timeout = ktime_add_us(ktime_get(), TBU_DBG_TIMEOUT_US); 180 + for (;;) { 181 + val = readl_relaxed(tbu->base + DEBUG_SR_HALT_ACK_REG); 182 + if (!(val & DEBUG_SR_ECATS_RUNNING_VAL)) 183 + break; 184 + val = readl_relaxed(tbu->base + DEBUG_PAR_REG); 185 + if (val & DEBUG_PAR_FAULT_VAL) 186 + break; 187 + if (ktime_compare(ktime_get(), timeout) > 0) { 188 + atos_timedout = true; 189 + break; 190 + } 191 + } 192 + 193 + val = readq_relaxed(tbu->base + DEBUG_PAR_REG); 194 + if (val & DEBUG_PAR_FAULT_VAL) 195 + dev_err(tbu->dev, "ATOS generated a fault interrupt! PAR = %llx, SID=0x%x\n", 196 + val, sid); 197 + else if (atos_timedout) 198 + dev_err_ratelimited(tbu->dev, "ATOS translation timed out!\n"); 199 + else 200 + phys = FIELD_GET(DEBUG_PAR_PA, val); 201 + 202 + /* Reset hardware */ 203 + writeq_relaxed(0, tbu->base + DEBUG_TXN_TRIGG_REG); 204 + writeq_relaxed(0, tbu->base + DEBUG_VA_ADDR_REG); 205 + val = readl_relaxed(tbu->base + DEBUG_SID_HALT_REG); 206 + val &= ~DEBUG_SID_HALT_SID; 207 + writel_relaxed(val, tbu->base + DEBUG_SID_HALT_REG); 208 + 209 + return phys; 210 + } 211 + 212 + static phys_addr_t qcom_iova_to_phys(struct arm_smmu_domain *smmu_domain, 213 + dma_addr_t iova, u32 sid) 214 + { 215 + struct arm_smmu_device *smmu = smmu_domain->smmu; 216 + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); 217 + int idx = smmu_domain->cfg.cbndx; 218 + struct qcom_tbu *tbu; 219 + u32 sctlr_orig, sctlr; 220 + phys_addr_t phys = 0; 221 + int attempt = 0; 222 + int ret; 223 + u64 fsr; 224 + 225 + tbu = qcom_find_tbu(qsmmu, sid); 226 + if (!tbu) 227 + return 0; 228 + 229 + ret = icc_set_bw(tbu->path, 0, UINT_MAX); 230 + if (ret) 231 + return ret; 232 + 233 + ret = clk_prepare_enable(tbu->clk); 234 + if (ret) 235 + goto disable_icc; 236 + 237 + ret = qcom_tbu_halt(tbu, smmu_domain); 238 + if (ret) 239 + goto disable_clk; 240 + 241 + /* 242 + * ATOS/ECATS can trigger the fault interrupt, so disable it temporarily 243 + * and check for an interrupt manually. 244 + */ 245 + sctlr_orig = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR); 246 + sctlr = sctlr_orig & ~(ARM_SMMU_SCTLR_CFCFG | ARM_SMMU_SCTLR_CFIE); 247 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr); 248 + 249 + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); 250 + if (fsr & ARM_SMMU_FSR_FAULT) { 251 + /* Clear pending interrupts */ 252 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); 253 + 254 + /* 255 + * TBU halt takes care of resuming any stalled transcation. 256 + * Kept it here for completeness sake. 257 + */ 258 + if (fsr & ARM_SMMU_FSR_SS) 259 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME, 260 + ARM_SMMU_RESUME_TERMINATE); 261 + } 262 + 263 + /* Only one concurrent atos operation */ 264 + scoped_guard(spinlock_irqsave, &atos_lock) { 265 + /* 266 + * If the translation fails, attempt the lookup more time." 267 + */ 268 + do { 269 + phys = qcom_tbu_trigger_atos(smmu_domain, tbu, iova, sid); 270 + 271 + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); 272 + if (fsr & ARM_SMMU_FSR_FAULT) { 273 + /* Clear pending interrupts */ 274 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); 275 + 276 + if (fsr & ARM_SMMU_FSR_SS) 277 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME, 278 + ARM_SMMU_RESUME_TERMINATE); 279 + } 280 + } while (!phys && attempt++ < 2); 281 + 282 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, sctlr_orig); 283 + } 284 + qcom_tbu_resume(tbu); 285 + 286 + /* Read to complete prior write transcations */ 287 + readl_relaxed(tbu->base + DEBUG_SR_HALT_ACK_REG); 288 + 289 + disable_clk: 290 + clk_disable_unprepare(tbu->clk); 291 + disable_icc: 292 + icc_set_bw(tbu->path, 0, 0); 293 + 294 + return phys; 295 + } 296 + 297 + static phys_addr_t qcom_smmu_iova_to_phys_hard(struct arm_smmu_domain *smmu_domain, dma_addr_t iova) 298 + { 299 + struct arm_smmu_device *smmu = smmu_domain->smmu; 300 + int idx = smmu_domain->cfg.cbndx; 301 + u32 frsynra; 302 + u16 sid; 303 + 304 + frsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx)); 305 + sid = FIELD_GET(ARM_SMMU_CBFRSYNRA_SID, frsynra); 306 + 307 + return qcom_iova_to_phys(smmu_domain, iova, sid); 308 + } 309 + 310 + static phys_addr_t qcom_smmu_verify_fault(struct arm_smmu_domain *smmu_domain, dma_addr_t iova, u32 fsr) 311 + { 312 + struct io_pgtable *iop = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops); 313 + struct arm_smmu_device *smmu = smmu_domain->smmu; 314 + phys_addr_t phys_post_tlbiall; 315 + phys_addr_t phys; 316 + 317 + phys = qcom_smmu_iova_to_phys_hard(smmu_domain, iova); 318 + io_pgtable_tlb_flush_all(iop); 319 + phys_post_tlbiall = qcom_smmu_iova_to_phys_hard(smmu_domain, iova); 320 + 321 + if (phys != phys_post_tlbiall) { 322 + dev_err(smmu->dev, 323 + "ATOS results differed across TLBIALL... (before: %pa after: %pa)\n", 324 + &phys, &phys_post_tlbiall); 325 + } 326 + 327 + return (phys == 0 ? phys_post_tlbiall : phys); 328 + } 329 + 330 + irqreturn_t qcom_smmu_context_fault(int irq, void *dev) 331 + { 332 + struct arm_smmu_domain *smmu_domain = dev; 333 + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; 334 + struct arm_smmu_device *smmu = smmu_domain->smmu; 335 + u32 fsr, fsynr, cbfrsynra, resume = 0; 336 + int idx = smmu_domain->cfg.cbndx; 337 + phys_addr_t phys_soft; 338 + unsigned long iova; 339 + int ret, tmp; 340 + 341 + static DEFINE_RATELIMIT_STATE(_rs, 342 + DEFAULT_RATELIMIT_INTERVAL, 343 + DEFAULT_RATELIMIT_BURST); 344 + 345 + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); 346 + if (!(fsr & ARM_SMMU_FSR_FAULT)) 347 + return IRQ_NONE; 348 + 349 + fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0); 350 + iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR); 351 + cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx)); 352 + 353 + if (list_empty(&tbu_list)) { 354 + ret = report_iommu_fault(&smmu_domain->domain, NULL, iova, 355 + fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); 356 + 357 + if (ret == -ENOSYS) 358 + dev_err_ratelimited(smmu->dev, 359 + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", 360 + fsr, iova, fsynr, cbfrsynra, idx); 361 + 362 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); 363 + return IRQ_HANDLED; 364 + } 365 + 366 + phys_soft = ops->iova_to_phys(ops, iova); 367 + 368 + tmp = report_iommu_fault(&smmu_domain->domain, NULL, iova, 369 + fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); 370 + if (!tmp || tmp == -EBUSY) { 371 + dev_dbg(smmu->dev, 372 + "Context fault handled by client: iova=0x%08lx, fsr=0x%x, fsynr=0x%x, cb=%d\n", 373 + iova, fsr, fsynr, idx); 374 + dev_dbg(smmu->dev, "soft iova-to-phys=%pa\n", &phys_soft); 375 + ret = IRQ_HANDLED; 376 + resume = ARM_SMMU_RESUME_TERMINATE; 377 + } else { 378 + phys_addr_t phys_atos = qcom_smmu_verify_fault(smmu_domain, iova, fsr); 379 + 380 + if (__ratelimit(&_rs)) { 381 + dev_err(smmu->dev, 382 + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", 383 + fsr, iova, fsynr, cbfrsynra, idx); 384 + dev_err(smmu->dev, 385 + "FSR = %08x [%s%s%s%s%s%s%s%s%s], SID=0x%x\n", 386 + fsr, 387 + (fsr & 0x02) ? "TF " : "", 388 + (fsr & 0x04) ? "AFF " : "", 389 + (fsr & 0x08) ? "PF " : "", 390 + (fsr & 0x10) ? "EF " : "", 391 + (fsr & 0x20) ? "TLBMCF " : "", 392 + (fsr & 0x40) ? "TLBLKF " : "", 393 + (fsr & 0x80) ? "MHF " : "", 394 + (fsr & 0x40000000) ? "SS " : "", 395 + (fsr & 0x80000000) ? "MULTI " : "", 396 + cbfrsynra); 397 + 398 + dev_err(smmu->dev, 399 + "soft iova-to-phys=%pa\n", &phys_soft); 400 + if (!phys_soft) 401 + dev_err(smmu->dev, 402 + "SOFTWARE TABLE WALK FAILED! Looks like %s accessed an unmapped address!\n", 403 + dev_name(smmu->dev)); 404 + if (phys_atos) 405 + dev_err(smmu->dev, "hard iova-to-phys (ATOS)=%pa\n", 406 + &phys_atos); 407 + else 408 + dev_err(smmu->dev, "hard iova-to-phys (ATOS) failed\n"); 409 + } 410 + ret = IRQ_NONE; 411 + resume = ARM_SMMU_RESUME_TERMINATE; 412 + } 413 + 414 + /* 415 + * If the client returns -EBUSY, do not clear FSR and do not RESUME 416 + * if stalled. This is required to keep the IOMMU client stalled on 417 + * the outstanding fault. This gives the client a chance to take any 418 + * debug action and then terminate the stalled transaction. 419 + * So, the sequence in case of stall on fault should be: 420 + * 1) Do not clear FSR or write to RESUME here 421 + * 2) Client takes any debug action 422 + * 3) Client terminates the stalled transaction and resumes the IOMMU 423 + * 4) Client clears FSR. The FSR should only be cleared after 3) and 424 + * not before so that the fault remains outstanding. This ensures 425 + * SCTLR.HUPCF has the desired effect if subsequent transactions also 426 + * need to be terminated. 427 + */ 428 + if (tmp != -EBUSY) { 429 + /* Clear the faulting FSR */ 430 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, fsr); 431 + 432 + /* Retry or terminate any stalled transactions */ 433 + if (fsr & ARM_SMMU_FSR_SS) 434 + arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_RESUME, resume); 435 + } 436 + 437 + return ret; 438 + } 439 + 440 + static int qcom_tbu_probe(struct platform_device *pdev) 441 + { 442 + struct of_phandle_args args = { .args_count = 2 }; 443 + struct device_node *np = pdev->dev.of_node; 444 + struct device *dev = &pdev->dev; 445 + struct qcom_tbu *tbu; 446 + 447 + tbu = devm_kzalloc(dev, sizeof(*tbu), GFP_KERNEL); 448 + if (!tbu) 449 + return -ENOMEM; 450 + 451 + tbu->dev = dev; 452 + INIT_LIST_HEAD(&tbu->list); 453 + spin_lock_init(&tbu->halt_lock); 454 + 455 + if (of_parse_phandle_with_args(np, "qcom,stream-id-range", "#iommu-cells", 0, &args)) { 456 + dev_err(dev, "Cannot parse the 'qcom,stream-id-range' DT property\n"); 457 + return -EINVAL; 458 + } 459 + 460 + tbu->smmu_np = args.np; 461 + tbu->sid_range[0] = args.args[0]; 462 + tbu->sid_range[1] = args.args[1]; 463 + of_node_put(args.np); 464 + 465 + tbu->base = devm_of_iomap(dev, np, 0, NULL); 466 + if (IS_ERR(tbu->base)) 467 + return PTR_ERR(tbu->base); 468 + 469 + tbu->clk = devm_clk_get_optional(dev, NULL); 470 + if (IS_ERR(tbu->clk)) 471 + return PTR_ERR(tbu->clk); 472 + 473 + tbu->path = devm_of_icc_get(dev, NULL); 474 + if (IS_ERR(tbu->path)) 475 + return PTR_ERR(tbu->path); 476 + 477 + guard(mutex)(&tbu_list_lock); 478 + list_add_tail(&tbu->list, &tbu_list); 479 + 480 + return 0; 481 + } 482 + 483 + static const struct of_device_id qcom_tbu_of_match[] = { 484 + { .compatible = "qcom,sc7280-tbu" }, 485 + { .compatible = "qcom,sdm845-tbu" }, 486 + { } 487 + }; 488 + 489 + static struct platform_driver qcom_tbu_driver = { 490 + .driver = { 491 + .name = "qcom_tbu", 492 + .of_match_table = qcom_tbu_of_match, 493 + }, 494 + .probe = qcom_tbu_probe, 495 + }; 496 + builtin_platform_driver(qcom_tbu_driver);
+8
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
··· 413 413 .reset = arm_mmu500_reset, 414 414 .write_s2cr = qcom_smmu_write_s2cr, 415 415 .tlb_sync = qcom_smmu_tlb_sync, 416 + #ifdef CONFIG_ARM_SMMU_QCOM_DEBUG 417 + .context_fault = qcom_smmu_context_fault, 418 + .context_fault_needs_threaded_irq = true, 419 + #endif 416 420 }; 417 421 418 422 static const struct arm_smmu_impl sdm845_smmu_500_impl = { ··· 426 422 .reset = qcom_sdm845_smmu500_reset, 427 423 .write_s2cr = qcom_smmu_write_s2cr, 428 424 .tlb_sync = qcom_smmu_tlb_sync, 425 + #ifdef CONFIG_ARM_SMMU_QCOM_DEBUG 426 + .context_fault = qcom_smmu_context_fault, 427 + .context_fault_needs_threaded_irq = true, 428 + #endif 429 429 }; 430 430 431 431 static const struct arm_smmu_impl qcom_adreno_smmu_v2_impl = {
+2
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.h
··· 30 30 const struct arm_smmu_impl *adreno_impl; 31 31 }; 32 32 33 + irqreturn_t qcom_smmu_context_fault(int irq, void *dev); 34 + 33 35 #ifdef CONFIG_ARM_SMMU_QCOM_DEBUG 34 36 void qcom_smmu_tlb_sync_debug(struct arm_smmu_device *smmu); 35 37 #else
+12 -8
drivers/iommu/arm/arm-smmu/arm-smmu.c
··· 806 806 else 807 807 context_fault = arm_smmu_context_fault; 808 808 809 - ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED, 810 - "arm-smmu-context-fault", smmu_domain); 809 + if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) 810 + ret = devm_request_threaded_irq(smmu->dev, irq, NULL, 811 + context_fault, 812 + IRQF_ONESHOT | IRQF_SHARED, 813 + "arm-smmu-context-fault", 814 + smmu_domain); 815 + else 816 + ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED, 817 + "arm-smmu-context-fault", smmu_domain); 818 + 811 819 if (ret < 0) { 812 820 dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n", 813 821 cfg->irptndx, irq); ··· 867 859 arm_smmu_rpm_put(smmu); 868 860 } 869 861 870 - static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) 862 + static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev) 871 863 { 872 864 struct arm_smmu_domain *smmu_domain; 873 865 874 - if (type != IOMMU_DOMAIN_UNMANAGED) { 875 - if (using_legacy_binding || type != IOMMU_DOMAIN_DMA) 876 - return NULL; 877 - } 878 866 /* 879 867 * Allocate the domain and initialise some of its data structures. 880 868 * We can't really do anything meaningful until we've added a ··· 1600 1596 .identity_domain = &arm_smmu_identity_domain, 1601 1597 .blocked_domain = &arm_smmu_blocked_domain, 1602 1598 .capable = arm_smmu_capable, 1603 - .domain_alloc = arm_smmu_domain_alloc, 1599 + .domain_alloc_paging = arm_smmu_domain_alloc_paging, 1604 1600 .probe_device = arm_smmu_probe_device, 1605 1601 .release_device = arm_smmu_release_device, 1606 1602 .probe_finalize = arm_smmu_probe_finalize,
+3
drivers/iommu/arm/arm-smmu/arm-smmu.h
··· 136 136 #define ARM_SMMU_CBAR_VMID GENMASK(7, 0) 137 137 138 138 #define ARM_SMMU_GR1_CBFRSYNRA(n) (0x400 + ((n) << 2)) 139 + #define ARM_SMMU_CBFRSYNRA_SID GENMASK(15, 0) 139 140 140 141 #define ARM_SMMU_GR1_CBA2R(n) (0x800 + ((n) << 2)) 141 142 #define ARM_SMMU_CBA2R_VMID16 GENMASK(31, 16) ··· 239 238 #define ARM_SMMU_CB_ATSR 0x8f0 240 239 #define ARM_SMMU_ATSR_ACTIVE BIT(0) 241 240 241 + #define ARM_SMMU_RESUME_TERMINATE BIT(0) 242 242 243 243 /* Maximum number of context banks per SMMU */ 244 244 #define ARM_SMMU_MAX_CBS 128 ··· 438 436 int (*def_domain_type)(struct device *dev); 439 437 irqreturn_t (*global_fault)(int irq, void *dev); 440 438 irqreturn_t (*context_fault)(int irq, void *dev); 439 + bool context_fault_needs_threaded_irq; 441 440 int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain, 442 441 struct arm_smmu_device *smmu, 443 442 struct device *dev, int start);
+19 -27
drivers/iommu/dma-iommu.c
··· 32 32 #include <trace/events/swiotlb.h> 33 33 34 34 #include "dma-iommu.h" 35 + #include "iommu-pages.h" 35 36 36 37 struct iommu_dma_msi_page { 37 38 struct list_head list; ··· 157 156 if (fq->entries[idx].counter >= counter) 158 157 break; 159 158 160 - put_pages_list(&fq->entries[idx].freelist); 159 + iommu_put_pages_list(&fq->entries[idx].freelist); 161 160 free_iova_fast(&cookie->iovad, 162 161 fq->entries[idx].iova_pfn, 163 162 fq->entries[idx].pages); ··· 255 254 int idx; 256 255 257 256 fq_ring_for_each(idx, fq) 258 - put_pages_list(&fq->entries[idx].freelist); 257 + iommu_put_pages_list(&fq->entries[idx].freelist); 259 258 vfree(fq); 260 259 } 261 260 ··· 268 267 struct iova_fq *fq = per_cpu_ptr(percpu_fq, cpu); 269 268 270 269 fq_ring_for_each(idx, fq) 271 - put_pages_list(&fq->entries[idx].freelist); 270 + iommu_put_pages_list(&fq->entries[idx].freelist); 272 271 } 273 272 274 273 free_percpu(percpu_fq); ··· 661 660 /** 662 661 * iommu_dma_init_domain - Initialise a DMA mapping domain 663 662 * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() 664 - * @base: IOVA at which the mappable address space starts 665 - * @limit: Last address of the IOVA space 666 663 * @dev: Device the domain is being initialised for 667 664 * 668 - * @base and @limit + 1 should be exact multiples of IOMMU page granularity to 669 - * avoid rounding surprises. If necessary, we reserve the page at address 0 665 + * If the geometry and dma_range_map include address 0, we reserve that page 670 666 * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but 671 667 * any change which could make prior IOVAs invalid will fail. 672 668 */ 673 - static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, 674 - dma_addr_t limit, struct device *dev) 669 + static int iommu_dma_init_domain(struct iommu_domain *domain, struct device *dev) 675 670 { 676 671 struct iommu_dma_cookie *cookie = domain->iova_cookie; 672 + const struct bus_dma_region *map = dev->dma_range_map; 677 673 unsigned long order, base_pfn; 678 674 struct iova_domain *iovad; 679 675 int ret; ··· 682 684 683 685 /* Use the smallest supported page size for IOVA granularity */ 684 686 order = __ffs(domain->pgsize_bitmap); 685 - base_pfn = max_t(unsigned long, 1, base >> order); 687 + base_pfn = 1; 686 688 687 689 /* Check the domain allows at least some access to the device... */ 688 - if (domain->geometry.force_aperture) { 690 + if (map) { 691 + dma_addr_t base = dma_range_map_min(map); 689 692 if (base > domain->geometry.aperture_end || 690 - limit < domain->geometry.aperture_start) { 693 + dma_range_map_max(map) < domain->geometry.aperture_start) { 691 694 pr_warn("specified DMA range outside IOMMU capability\n"); 692 695 return -EFAULT; 693 696 } 694 697 /* ...then finally give it a kicking to make sure it fits */ 695 - base_pfn = max_t(unsigned long, base_pfn, 696 - domain->geometry.aperture_start >> order); 698 + base_pfn = max(base, domain->geometry.aperture_start) >> order; 697 699 } 698 700 699 701 /* start_pfn is always nonzero for an already-initialised domain */ ··· 1742 1744 .max_mapping_size = iommu_dma_max_mapping_size, 1743 1745 }; 1744 1746 1745 - /* 1746 - * The IOMMU core code allocates the default DMA domain, which the underlying 1747 - * IOMMU driver needs to support via the dma-iommu layer. 1748 - */ 1749 - void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) 1747 + void iommu_setup_dma_ops(struct device *dev) 1750 1748 { 1751 1749 struct iommu_domain *domain = iommu_get_domain_for_dev(dev); 1752 1750 1753 - if (!domain) 1754 - goto out_err; 1751 + if (dev_is_pci(dev)) 1752 + dev->iommu->pci_32bit_workaround = !iommu_dma_forcedac; 1755 1753 1756 - /* 1757 - * The IOMMU core code allocates the default DMA domain, which the 1758 - * underlying IOMMU driver needs to support via the dma-iommu layer. 1759 - */ 1760 1754 if (iommu_is_dma_domain(domain)) { 1761 - if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) 1755 + if (iommu_dma_init_domain(domain, dev)) 1762 1756 goto out_err; 1763 1757 dev->dma_ops = &iommu_dma_ops; 1758 + } else if (dev->dma_ops == &iommu_dma_ops) { 1759 + /* Clean up if we've switched *from* a DMA domain */ 1760 + dev->dma_ops = NULL; 1764 1761 } 1765 1762 1766 1763 return; ··· 1763 1770 pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n", 1764 1771 dev_name(dev)); 1765 1772 } 1766 - EXPORT_SYMBOL_GPL(iommu_setup_dma_ops); 1767 1773 1768 1774 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, 1769 1775 phys_addr_t msi_addr, struct iommu_domain *domain)
+6 -8
drivers/iommu/dma-iommu.h
··· 9 9 10 10 #ifdef CONFIG_IOMMU_DMA 11 11 12 + void iommu_setup_dma_ops(struct device *dev); 13 + 12 14 int iommu_get_dma_cookie(struct iommu_domain *domain); 13 15 void iommu_put_dma_cookie(struct iommu_domain *domain); 14 16 ··· 19 17 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); 20 18 21 19 extern bool iommu_dma_forcedac; 22 - static inline void iommu_dma_set_pci_32bit_workaround(struct device *dev) 23 - { 24 - dev->iommu->pci_32bit_workaround = !iommu_dma_forcedac; 25 - } 26 20 27 21 #else /* CONFIG_IOMMU_DMA */ 22 + 23 + static inline void iommu_setup_dma_ops(struct device *dev) 24 + { 25 + } 28 26 29 27 static inline int iommu_dma_init_fq(struct iommu_domain *domain) 30 28 { ··· 41 39 } 42 40 43 41 static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) 44 - { 45 - } 46 - 47 - static inline void iommu_dma_set_pci_32bit_workaround(struct device *dev) 48 42 { 49 43 } 50 44
+8 -6
drivers/iommu/exynos-iommu.c
··· 22 22 #include <linux/pm_runtime.h> 23 23 #include <linux/slab.h> 24 24 25 + #include "iommu-pages.h" 26 + 25 27 typedef u32 sysmmu_iova_t; 26 28 typedef u32 sysmmu_pte_t; 27 29 static struct iommu_domain exynos_identity_domain; ··· 902 900 if (!domain) 903 901 return NULL; 904 902 905 - domain->pgtable = (sysmmu_pte_t *)__get_free_pages(GFP_KERNEL, 2); 903 + domain->pgtable = iommu_alloc_pages(GFP_KERNEL, 2); 906 904 if (!domain->pgtable) 907 905 goto err_pgtable; 908 906 909 - domain->lv2entcnt = (short *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); 907 + domain->lv2entcnt = iommu_alloc_pages(GFP_KERNEL, 1); 910 908 if (!domain->lv2entcnt) 911 909 goto err_counter; 912 910 ··· 932 930 return &domain->domain; 933 931 934 932 err_lv2ent: 935 - free_pages((unsigned long)domain->lv2entcnt, 1); 933 + iommu_free_pages(domain->lv2entcnt, 1); 936 934 err_counter: 937 - free_pages((unsigned long)domain->pgtable, 2); 935 + iommu_free_pages(domain->pgtable, 2); 938 936 err_pgtable: 939 937 kfree(domain); 940 938 return NULL; ··· 975 973 phys_to_virt(base)); 976 974 } 977 975 978 - free_pages((unsigned long)domain->pgtable, 2); 979 - free_pages((unsigned long)domain->lv2entcnt, 1); 976 + iommu_free_pages(domain->pgtable, 2); 977 + iommu_free_pages(domain->lv2entcnt, 1); 980 978 kfree(domain); 981 979 } 982 980
+1 -1
drivers/iommu/intel/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_DMAR_TABLE) += dmar.o 3 - obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o 3 + obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o 4 4 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o 5 5 obj-$(CONFIG_DMAR_PERF) += perf.o 6 6 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
+419
drivers/iommu/intel/cache.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * cache.c - Intel VT-d cache invalidation 4 + * 5 + * Copyright (C) 2024 Intel Corporation 6 + * 7 + * Author: Lu Baolu <baolu.lu@linux.intel.com> 8 + */ 9 + 10 + #define pr_fmt(fmt) "DMAR: " fmt 11 + 12 + #include <linux/dmar.h> 13 + #include <linux/iommu.h> 14 + #include <linux/memory.h> 15 + #include <linux/pci.h> 16 + #include <linux/spinlock.h> 17 + 18 + #include "iommu.h" 19 + #include "pasid.h" 20 + #include "trace.h" 21 + 22 + /* Check if an existing cache tag can be reused for a new association. */ 23 + static bool cache_tage_match(struct cache_tag *tag, u16 domain_id, 24 + struct intel_iommu *iommu, struct device *dev, 25 + ioasid_t pasid, enum cache_tag_type type) 26 + { 27 + if (tag->type != type) 28 + return false; 29 + 30 + if (tag->domain_id != domain_id || tag->pasid != pasid) 31 + return false; 32 + 33 + if (type == CACHE_TAG_IOTLB || type == CACHE_TAG_NESTING_IOTLB) 34 + return tag->iommu == iommu; 35 + 36 + if (type == CACHE_TAG_DEVTLB || type == CACHE_TAG_NESTING_DEVTLB) 37 + return tag->dev == dev; 38 + 39 + return false; 40 + } 41 + 42 + /* Assign a cache tag with specified type to domain. */ 43 + static int cache_tag_assign(struct dmar_domain *domain, u16 did, 44 + struct device *dev, ioasid_t pasid, 45 + enum cache_tag_type type) 46 + { 47 + struct device_domain_info *info = dev_iommu_priv_get(dev); 48 + struct intel_iommu *iommu = info->iommu; 49 + struct cache_tag *tag, *temp; 50 + unsigned long flags; 51 + 52 + tag = kzalloc(sizeof(*tag), GFP_KERNEL); 53 + if (!tag) 54 + return -ENOMEM; 55 + 56 + tag->type = type; 57 + tag->iommu = iommu; 58 + tag->domain_id = did; 59 + tag->pasid = pasid; 60 + tag->users = 1; 61 + 62 + if (type == CACHE_TAG_DEVTLB || type == CACHE_TAG_NESTING_DEVTLB) 63 + tag->dev = dev; 64 + else 65 + tag->dev = iommu->iommu.dev; 66 + 67 + spin_lock_irqsave(&domain->cache_lock, flags); 68 + list_for_each_entry(temp, &domain->cache_tags, node) { 69 + if (cache_tage_match(temp, did, iommu, dev, pasid, type)) { 70 + temp->users++; 71 + spin_unlock_irqrestore(&domain->cache_lock, flags); 72 + kfree(tag); 73 + trace_cache_tag_assign(temp); 74 + return 0; 75 + } 76 + } 77 + list_add_tail(&tag->node, &domain->cache_tags); 78 + spin_unlock_irqrestore(&domain->cache_lock, flags); 79 + trace_cache_tag_assign(tag); 80 + 81 + return 0; 82 + } 83 + 84 + /* Unassign a cache tag with specified type from domain. */ 85 + static void cache_tag_unassign(struct dmar_domain *domain, u16 did, 86 + struct device *dev, ioasid_t pasid, 87 + enum cache_tag_type type) 88 + { 89 + struct device_domain_info *info = dev_iommu_priv_get(dev); 90 + struct intel_iommu *iommu = info->iommu; 91 + struct cache_tag *tag; 92 + unsigned long flags; 93 + 94 + spin_lock_irqsave(&domain->cache_lock, flags); 95 + list_for_each_entry(tag, &domain->cache_tags, node) { 96 + if (cache_tage_match(tag, did, iommu, dev, pasid, type)) { 97 + trace_cache_tag_unassign(tag); 98 + if (--tag->users == 0) { 99 + list_del(&tag->node); 100 + kfree(tag); 101 + } 102 + break; 103 + } 104 + } 105 + spin_unlock_irqrestore(&domain->cache_lock, flags); 106 + } 107 + 108 + static int __cache_tag_assign_domain(struct dmar_domain *domain, u16 did, 109 + struct device *dev, ioasid_t pasid) 110 + { 111 + struct device_domain_info *info = dev_iommu_priv_get(dev); 112 + int ret; 113 + 114 + ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_IOTLB); 115 + if (ret || !info->ats_enabled) 116 + return ret; 117 + 118 + ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_DEVTLB); 119 + if (ret) 120 + cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_IOTLB); 121 + 122 + return ret; 123 + } 124 + 125 + static void __cache_tag_unassign_domain(struct dmar_domain *domain, u16 did, 126 + struct device *dev, ioasid_t pasid) 127 + { 128 + struct device_domain_info *info = dev_iommu_priv_get(dev); 129 + 130 + cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_IOTLB); 131 + 132 + if (info->ats_enabled) 133 + cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_DEVTLB); 134 + } 135 + 136 + static int __cache_tag_assign_parent_domain(struct dmar_domain *domain, u16 did, 137 + struct device *dev, ioasid_t pasid) 138 + { 139 + struct device_domain_info *info = dev_iommu_priv_get(dev); 140 + int ret; 141 + 142 + ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB); 143 + if (ret || !info->ats_enabled) 144 + return ret; 145 + 146 + ret = cache_tag_assign(domain, did, dev, pasid, CACHE_TAG_NESTING_DEVTLB); 147 + if (ret) 148 + cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB); 149 + 150 + return ret; 151 + } 152 + 153 + static void __cache_tag_unassign_parent_domain(struct dmar_domain *domain, u16 did, 154 + struct device *dev, ioasid_t pasid) 155 + { 156 + struct device_domain_info *info = dev_iommu_priv_get(dev); 157 + 158 + cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_NESTING_IOTLB); 159 + 160 + if (info->ats_enabled) 161 + cache_tag_unassign(domain, did, dev, pasid, CACHE_TAG_NESTING_DEVTLB); 162 + } 163 + 164 + static u16 domain_get_id_for_dev(struct dmar_domain *domain, struct device *dev) 165 + { 166 + struct device_domain_info *info = dev_iommu_priv_get(dev); 167 + struct intel_iommu *iommu = info->iommu; 168 + 169 + /* 170 + * The driver assigns different domain IDs for all domains except 171 + * the SVA type. 172 + */ 173 + if (domain->domain.type == IOMMU_DOMAIN_SVA) 174 + return FLPT_DEFAULT_DID; 175 + 176 + return domain_id_iommu(domain, iommu); 177 + } 178 + 179 + /* 180 + * Assign cache tags to a domain when it's associated with a device's 181 + * PASID using a specific domain ID. 182 + * 183 + * On success (return value of 0), cache tags are created and added to the 184 + * domain's cache tag list. On failure (negative return value), an error 185 + * code is returned indicating the reason for the failure. 186 + */ 187 + int cache_tag_assign_domain(struct dmar_domain *domain, 188 + struct device *dev, ioasid_t pasid) 189 + { 190 + u16 did = domain_get_id_for_dev(domain, dev); 191 + int ret; 192 + 193 + ret = __cache_tag_assign_domain(domain, did, dev, pasid); 194 + if (ret || domain->domain.type != IOMMU_DOMAIN_NESTED) 195 + return ret; 196 + 197 + ret = __cache_tag_assign_parent_domain(domain->s2_domain, did, dev, pasid); 198 + if (ret) 199 + __cache_tag_unassign_domain(domain, did, dev, pasid); 200 + 201 + return ret; 202 + } 203 + 204 + /* 205 + * Remove the cache tags associated with a device's PASID when the domain is 206 + * detached from the device. 207 + * 208 + * The cache tags must be previously assigned to the domain by calling the 209 + * assign interface. 210 + */ 211 + void cache_tag_unassign_domain(struct dmar_domain *domain, 212 + struct device *dev, ioasid_t pasid) 213 + { 214 + u16 did = domain_get_id_for_dev(domain, dev); 215 + 216 + __cache_tag_unassign_domain(domain, did, dev, pasid); 217 + if (domain->domain.type == IOMMU_DOMAIN_NESTED) 218 + __cache_tag_unassign_parent_domain(domain->s2_domain, did, dev, pasid); 219 + } 220 + 221 + static unsigned long calculate_psi_aligned_address(unsigned long start, 222 + unsigned long end, 223 + unsigned long *_pages, 224 + unsigned long *_mask) 225 + { 226 + unsigned long pages = aligned_nrpages(start, end - start + 1); 227 + unsigned long aligned_pages = __roundup_pow_of_two(pages); 228 + unsigned long bitmask = aligned_pages - 1; 229 + unsigned long mask = ilog2(aligned_pages); 230 + unsigned long pfn = IOVA_PFN(start); 231 + 232 + /* 233 + * PSI masks the low order bits of the base address. If the 234 + * address isn't aligned to the mask, then compute a mask value 235 + * needed to ensure the target range is flushed. 236 + */ 237 + if (unlikely(bitmask & pfn)) { 238 + unsigned long end_pfn = pfn + pages - 1, shared_bits; 239 + 240 + /* 241 + * Since end_pfn <= pfn + bitmask, the only way bits 242 + * higher than bitmask can differ in pfn and end_pfn is 243 + * by carrying. This means after masking out bitmask, 244 + * high bits starting with the first set bit in 245 + * shared_bits are all equal in both pfn and end_pfn. 246 + */ 247 + shared_bits = ~(pfn ^ end_pfn) & ~bitmask; 248 + mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG; 249 + } 250 + 251 + *_pages = aligned_pages; 252 + *_mask = mask; 253 + 254 + return ALIGN_DOWN(start, VTD_PAGE_SIZE << mask); 255 + } 256 + 257 + /* 258 + * Invalidates a range of IOVA from @start (inclusive) to @end (inclusive) 259 + * when the memory mappings in the target domain have been modified. 260 + */ 261 + void cache_tag_flush_range(struct dmar_domain *domain, unsigned long start, 262 + unsigned long end, int ih) 263 + { 264 + unsigned long pages, mask, addr; 265 + struct cache_tag *tag; 266 + unsigned long flags; 267 + 268 + addr = calculate_psi_aligned_address(start, end, &pages, &mask); 269 + 270 + spin_lock_irqsave(&domain->cache_lock, flags); 271 + list_for_each_entry(tag, &domain->cache_tags, node) { 272 + struct intel_iommu *iommu = tag->iommu; 273 + struct device_domain_info *info; 274 + u16 sid; 275 + 276 + switch (tag->type) { 277 + case CACHE_TAG_IOTLB: 278 + case CACHE_TAG_NESTING_IOTLB: 279 + if (domain->use_first_level) { 280 + qi_flush_piotlb(iommu, tag->domain_id, 281 + tag->pasid, addr, pages, ih); 282 + } else { 283 + /* 284 + * Fallback to domain selective flush if no 285 + * PSI support or the size is too big. 286 + */ 287 + if (!cap_pgsel_inv(iommu->cap) || 288 + mask > cap_max_amask_val(iommu->cap)) 289 + iommu->flush.flush_iotlb(iommu, tag->domain_id, 290 + 0, 0, DMA_TLB_DSI_FLUSH); 291 + else 292 + iommu->flush.flush_iotlb(iommu, tag->domain_id, 293 + addr | ih, mask, 294 + DMA_TLB_PSI_FLUSH); 295 + } 296 + break; 297 + case CACHE_TAG_NESTING_DEVTLB: 298 + /* 299 + * Address translation cache in device side caches the 300 + * result of nested translation. There is no easy way 301 + * to identify the exact set of nested translations 302 + * affected by a change in S2. So just flush the entire 303 + * device cache. 304 + */ 305 + addr = 0; 306 + mask = MAX_AGAW_PFN_WIDTH; 307 + fallthrough; 308 + case CACHE_TAG_DEVTLB: 309 + info = dev_iommu_priv_get(tag->dev); 310 + sid = PCI_DEVID(info->bus, info->devfn); 311 + 312 + if (tag->pasid == IOMMU_NO_PASID) 313 + qi_flush_dev_iotlb(iommu, sid, info->pfsid, 314 + info->ats_qdep, addr, mask); 315 + else 316 + qi_flush_dev_iotlb_pasid(iommu, sid, info->pfsid, 317 + tag->pasid, info->ats_qdep, 318 + addr, mask); 319 + 320 + quirk_extra_dev_tlb_flush(info, addr, mask, tag->pasid, info->ats_qdep); 321 + break; 322 + } 323 + 324 + trace_cache_tag_flush_range(tag, start, end, addr, pages, mask); 325 + } 326 + spin_unlock_irqrestore(&domain->cache_lock, flags); 327 + } 328 + 329 + /* 330 + * Invalidates all ranges of IOVA when the memory mappings in the target 331 + * domain have been modified. 332 + */ 333 + void cache_tag_flush_all(struct dmar_domain *domain) 334 + { 335 + struct cache_tag *tag; 336 + unsigned long flags; 337 + 338 + spin_lock_irqsave(&domain->cache_lock, flags); 339 + list_for_each_entry(tag, &domain->cache_tags, node) { 340 + struct intel_iommu *iommu = tag->iommu; 341 + struct device_domain_info *info; 342 + u16 sid; 343 + 344 + switch (tag->type) { 345 + case CACHE_TAG_IOTLB: 346 + case CACHE_TAG_NESTING_IOTLB: 347 + if (domain->use_first_level) 348 + qi_flush_piotlb(iommu, tag->domain_id, 349 + tag->pasid, 0, -1, 0); 350 + else 351 + iommu->flush.flush_iotlb(iommu, tag->domain_id, 352 + 0, 0, DMA_TLB_DSI_FLUSH); 353 + break; 354 + case CACHE_TAG_DEVTLB: 355 + case CACHE_TAG_NESTING_DEVTLB: 356 + info = dev_iommu_priv_get(tag->dev); 357 + sid = PCI_DEVID(info->bus, info->devfn); 358 + 359 + qi_flush_dev_iotlb(iommu, sid, info->pfsid, info->ats_qdep, 360 + 0, MAX_AGAW_PFN_WIDTH); 361 + quirk_extra_dev_tlb_flush(info, 0, MAX_AGAW_PFN_WIDTH, 362 + IOMMU_NO_PASID, info->ats_qdep); 363 + break; 364 + } 365 + 366 + trace_cache_tag_flush_all(tag); 367 + } 368 + spin_unlock_irqrestore(&domain->cache_lock, flags); 369 + } 370 + 371 + /* 372 + * Invalidate a range of IOVA when new mappings are created in the target 373 + * domain. 374 + * 375 + * - VT-d spec, Section 6.1 Caching Mode: When the CM field is reported as 376 + * Set, any software updates to remapping structures other than first- 377 + * stage mapping requires explicit invalidation of the caches. 378 + * - VT-d spec, Section 6.8 Write Buffer Flushing: For hardware that requires 379 + * write buffer flushing, software must explicitly perform write-buffer 380 + * flushing, if cache invalidation is not required. 381 + */ 382 + void cache_tag_flush_range_np(struct dmar_domain *domain, unsigned long start, 383 + unsigned long end) 384 + { 385 + unsigned long pages, mask, addr; 386 + struct cache_tag *tag; 387 + unsigned long flags; 388 + 389 + addr = calculate_psi_aligned_address(start, end, &pages, &mask); 390 + 391 + spin_lock_irqsave(&domain->cache_lock, flags); 392 + list_for_each_entry(tag, &domain->cache_tags, node) { 393 + struct intel_iommu *iommu = tag->iommu; 394 + 395 + if (!cap_caching_mode(iommu->cap) || domain->use_first_level) { 396 + iommu_flush_write_buffer(iommu); 397 + continue; 398 + } 399 + 400 + if (tag->type == CACHE_TAG_IOTLB || 401 + tag->type == CACHE_TAG_NESTING_IOTLB) { 402 + /* 403 + * Fallback to domain selective flush if no 404 + * PSI support or the size is too big. 405 + */ 406 + if (!cap_pgsel_inv(iommu->cap) || 407 + mask > cap_max_amask_val(iommu->cap)) 408 + iommu->flush.flush_iotlb(iommu, tag->domain_id, 409 + 0, 0, DMA_TLB_DSI_FLUSH); 410 + else 411 + iommu->flush.flush_iotlb(iommu, tag->domain_id, 412 + addr, mask, 413 + DMA_TLB_PSI_FLUSH); 414 + } 415 + 416 + trace_cache_tag_flush_range_np(tag, start, end, addr, pages, mask); 417 + } 418 + spin_unlock_irqrestore(&domain->cache_lock, flags); 419 + }
-7
drivers/iommu/intel/debugfs.c
··· 706 706 dmar_latency_disable(iommu, DMAR_LATENCY_INV_IOTLB); 707 707 dmar_latency_disable(iommu, DMAR_LATENCY_INV_DEVTLB); 708 708 dmar_latency_disable(iommu, DMAR_LATENCY_INV_IEC); 709 - dmar_latency_disable(iommu, DMAR_LATENCY_PRQ); 710 709 } 711 710 rcu_read_unlock(); 712 711 break; ··· 725 726 rcu_read_lock(); 726 727 for_each_active_iommu(iommu, drhd) 727 728 dmar_latency_enable(iommu, DMAR_LATENCY_INV_IEC); 728 - rcu_read_unlock(); 729 - break; 730 - case 4: 731 - rcu_read_lock(); 732 - for_each_active_iommu(iommu, drhd) 733 - dmar_latency_enable(iommu, DMAR_LATENCY_PRQ); 734 729 rcu_read_unlock(); 735 730 break; 736 731 default:
+16 -10
drivers/iommu/intel/dmar.c
··· 32 32 33 33 #include "iommu.h" 34 34 #include "../irq_remapping.h" 35 + #include "../iommu-pages.h" 35 36 #include "perf.h" 36 37 #include "trace.h" 37 38 #include "perfmon.h" ··· 1068 1067 goto error_free_seq_id; 1069 1068 } 1070 1069 1071 - err = -EINVAL; 1072 1070 if (!cap_sagaw(iommu->cap) && 1073 1071 (!ecap_smts(iommu->ecap) || ecap_slts(iommu->ecap))) { 1074 1072 pr_info("%s: No supported address widths. Not attempting DMA translation.\n", ··· 1187 1187 } 1188 1188 1189 1189 if (iommu->qi) { 1190 - free_page((unsigned long)iommu->qi->desc); 1190 + iommu_free_page(iommu->qi->desc); 1191 1191 kfree(iommu->qi->desc_status); 1192 1192 kfree(iommu->qi); 1193 1193 } ··· 1755 1755 int dmar_enable_qi(struct intel_iommu *iommu) 1756 1756 { 1757 1757 struct q_inval *qi; 1758 - struct page *desc_page; 1758 + void *desc; 1759 + int order; 1759 1760 1760 1761 if (!ecap_qis(iommu->ecap)) 1761 1762 return -ENOENT; ··· 1777 1776 * Need two pages to accommodate 256 descriptors of 256 bits each 1778 1777 * if the remapping hardware supports scalable mode translation. 1779 1778 */ 1780 - desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO, 1781 - !!ecap_smts(iommu->ecap)); 1782 - if (!desc_page) { 1779 + order = ecap_smts(iommu->ecap) ? 1 : 0; 1780 + desc = iommu_alloc_pages_node(iommu->node, GFP_ATOMIC, order); 1781 + if (!desc) { 1783 1782 kfree(qi); 1784 1783 iommu->qi = NULL; 1785 1784 return -ENOMEM; 1786 1785 } 1787 1786 1788 - qi->desc = page_address(desc_page); 1787 + qi->desc = desc; 1789 1788 1790 1789 qi->desc_status = kcalloc(QI_LENGTH, sizeof(int), GFP_ATOMIC); 1791 1790 if (!qi->desc_status) { 1792 - free_page((unsigned long) qi->desc); 1791 + iommu_free_page(qi->desc); 1793 1792 kfree(qi); 1794 1793 iommu->qi = NULL; 1795 1794 return -ENOMEM; ··· 2123 2122 return ret; 2124 2123 } 2125 2124 2126 - int __init enable_drhd_fault_handling(void) 2125 + int enable_drhd_fault_handling(unsigned int cpu) 2127 2126 { 2128 2127 struct dmar_drhd_unit *drhd; 2129 2128 struct intel_iommu *iommu; ··· 2133 2132 */ 2134 2133 for_each_iommu(iommu, drhd) { 2135 2134 u32 fault_status; 2136 - int ret = dmar_set_interrupt(iommu); 2135 + int ret; 2136 + 2137 + if (iommu->irq || iommu->node != cpu_to_node(cpu)) 2138 + continue; 2139 + 2140 + ret = dmar_set_interrupt(iommu); 2137 2141 2138 2142 if (ret) { 2139 2143 pr_err("DRHD %Lx: failed to enable fault, interrupt, ret %d\n",
+61 -322
drivers/iommu/intel/iommu.c
··· 27 27 #include "iommu.h" 28 28 #include "../dma-iommu.h" 29 29 #include "../irq_remapping.h" 30 + #include "../iommu-pages.h" 30 31 #include "pasid.h" 31 32 #include "cap_audit.h" 32 33 #include "perfmon.h" ··· 54 53 #define DOMAIN_MAX_PFN(gaw) ((unsigned long) min_t(uint64_t, \ 55 54 __DOMAIN_MAX_PFN(gaw), (unsigned long)-1)) 56 55 #define DOMAIN_MAX_ADDR(gaw) (((uint64_t)__DOMAIN_MAX_PFN(gaw)) << VTD_PAGE_SHIFT) 57 - 58 - /* IO virtual address start page frame number */ 59 - #define IOVA_START_PFN (1) 60 - 61 - #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT) 62 56 63 57 static void __init check_tylersburg_isoch(void); 64 58 static int rwbf_quirk; ··· 217 221 int intel_iommu_enabled = 0; 218 222 EXPORT_SYMBOL_GPL(intel_iommu_enabled); 219 223 220 - static int dmar_map_gfx = 1; 221 224 static int intel_iommu_superpage = 1; 222 225 static int iommu_identity_mapping; 223 226 static int iommu_skip_te_disable; 227 + static int disable_igfx_iommu; 224 228 225 - #define IDENTMAP_GFX 2 226 229 #define IDENTMAP_AZALIA 4 227 230 228 231 const struct iommu_ops intel_iommu_ops; ··· 260 265 no_platform_optin = 1; 261 266 pr_info("IOMMU disabled\n"); 262 267 } else if (!strncmp(str, "igfx_off", 8)) { 263 - dmar_map_gfx = 0; 268 + disable_igfx_iommu = 1; 264 269 pr_info("Disable GFX device mapping\n"); 265 270 } else if (!strncmp(str, "forcedac", 8)) { 266 271 pr_warn("intel_iommu=forcedac deprecated; use iommu.forcedac instead\n"); ··· 292 297 return 1; 293 298 } 294 299 __setup("intel_iommu=", intel_iommu_setup); 295 - 296 - void *alloc_pgtable_page(int node, gfp_t gfp) 297 - { 298 - struct page *page; 299 - void *vaddr = NULL; 300 - 301 - page = alloc_pages_node(node, gfp | __GFP_ZERO, 0); 302 - if (page) 303 - vaddr = page_address(page); 304 - return vaddr; 305 - } 306 - 307 - void free_pgtable_page(void *vaddr) 308 - { 309 - free_page((unsigned long)vaddr); 310 - } 311 300 312 301 static int domain_type_is_si(struct dmar_domain *domain) 313 302 { ··· 524 545 if (!alloc) 525 546 return NULL; 526 547 527 - context = alloc_pgtable_page(iommu->node, GFP_ATOMIC); 548 + context = iommu_alloc_page_node(iommu->node, GFP_ATOMIC); 528 549 if (!context) 529 550 return NULL; 530 551 ··· 698 719 for (i = 0; i < ROOT_ENTRY_NR; i++) { 699 720 context = iommu_context_addr(iommu, i, 0, 0); 700 721 if (context) 701 - free_pgtable_page(context); 722 + iommu_free_page(context); 702 723 703 724 if (!sm_supported(iommu)) 704 725 continue; 705 726 706 727 context = iommu_context_addr(iommu, i, 0x80, 0); 707 728 if (context) 708 - free_pgtable_page(context); 729 + iommu_free_page(context); 709 730 } 710 731 711 - free_pgtable_page(iommu->root_entry); 732 + iommu_free_page(iommu->root_entry); 712 733 iommu->root_entry = NULL; 713 734 } 714 735 ··· 844 865 break; 845 866 846 867 if (!dma_pte_present(pte)) { 847 - uint64_t pteval; 868 + uint64_t pteval, tmp; 848 869 849 - tmp_page = alloc_pgtable_page(domain->nid, gfp); 870 + tmp_page = iommu_alloc_page_node(domain->nid, gfp); 850 871 851 872 if (!tmp_page) 852 873 return NULL; ··· 856 877 if (domain->use_first_level) 857 878 pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US | DMA_FL_PTE_ACCESS; 858 879 859 - if (cmpxchg64(&pte->val, 0ULL, pteval)) 880 + tmp = 0ULL; 881 + if (!try_cmpxchg64(&pte->val, &tmp, pteval)) 860 882 /* Someone else set it while we were thinking; use theirs. */ 861 - free_pgtable_page(tmp_page); 883 + iommu_free_page(tmp_page); 862 884 else 863 885 domain_flush_cache(domain, pte, sizeof(*pte)); 864 886 } ··· 972 992 last_pfn < level_pfn + level_size(level) - 1)) { 973 993 dma_clear_pte(pte); 974 994 domain_flush_cache(domain, pte, sizeof(*pte)); 975 - free_pgtable_page(level_pte); 995 + iommu_free_page(level_pte); 976 996 } 977 997 next: 978 998 pfn += level_size(level); ··· 996 1016 997 1017 /* free pgd */ 998 1018 if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { 999 - free_pgtable_page(domain->pgd); 1019 + iommu_free_page(domain->pgd); 1000 1020 domain->pgd = NULL; 1001 1021 } 1002 1022 } ··· 1098 1118 { 1099 1119 struct root_entry *root; 1100 1120 1101 - root = alloc_pgtable_page(iommu->node, GFP_ATOMIC); 1121 + root = iommu_alloc_page_node(iommu->node, GFP_ATOMIC); 1102 1122 if (!root) { 1103 1123 pr_err("Allocating root entry for %s failed\n", 1104 1124 iommu->name); ··· 1374 1394 quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_NO_PASID, qdep); 1375 1395 } 1376 1396 1377 - static void iommu_flush_dev_iotlb(struct dmar_domain *domain, 1378 - u64 addr, unsigned mask) 1379 - { 1380 - struct dev_pasid_info *dev_pasid; 1381 - struct device_domain_info *info; 1382 - unsigned long flags; 1383 - 1384 - if (!domain->has_iotlb_device) 1385 - return; 1386 - 1387 - spin_lock_irqsave(&domain->lock, flags); 1388 - list_for_each_entry(info, &domain->devices, link) 1389 - __iommu_flush_dev_iotlb(info, addr, mask); 1390 - 1391 - list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) { 1392 - info = dev_iommu_priv_get(dev_pasid->dev); 1393 - 1394 - if (!info->ats_enabled) 1395 - continue; 1396 - 1397 - qi_flush_dev_iotlb_pasid(info->iommu, 1398 - PCI_DEVID(info->bus, info->devfn), 1399 - info->pfsid, dev_pasid->pasid, 1400 - info->ats_qdep, addr, 1401 - mask); 1402 - } 1403 - spin_unlock_irqrestore(&domain->lock, flags); 1404 - } 1405 - 1406 - static void domain_flush_pasid_iotlb(struct intel_iommu *iommu, 1407 - struct dmar_domain *domain, u64 addr, 1408 - unsigned long npages, bool ih) 1409 - { 1410 - u16 did = domain_id_iommu(domain, iommu); 1411 - struct dev_pasid_info *dev_pasid; 1412 - unsigned long flags; 1413 - 1414 - spin_lock_irqsave(&domain->lock, flags); 1415 - list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) 1416 - qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih); 1417 - 1418 - if (!list_empty(&domain->devices)) 1419 - qi_flush_piotlb(iommu, did, IOMMU_NO_PASID, addr, npages, ih); 1420 - spin_unlock_irqrestore(&domain->lock, flags); 1421 - } 1422 - 1423 - static void __iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, 1424 - unsigned long pfn, unsigned int pages, 1425 - int ih) 1426 - { 1427 - unsigned int aligned_pages = __roundup_pow_of_two(pages); 1428 - unsigned long bitmask = aligned_pages - 1; 1429 - unsigned int mask = ilog2(aligned_pages); 1430 - u64 addr = (u64)pfn << VTD_PAGE_SHIFT; 1431 - 1432 - /* 1433 - * PSI masks the low order bits of the base address. If the 1434 - * address isn't aligned to the mask, then compute a mask value 1435 - * needed to ensure the target range is flushed. 1436 - */ 1437 - if (unlikely(bitmask & pfn)) { 1438 - unsigned long end_pfn = pfn + pages - 1, shared_bits; 1439 - 1440 - /* 1441 - * Since end_pfn <= pfn + bitmask, the only way bits 1442 - * higher than bitmask can differ in pfn and end_pfn is 1443 - * by carrying. This means after masking out bitmask, 1444 - * high bits starting with the first set bit in 1445 - * shared_bits are all equal in both pfn and end_pfn. 1446 - */ 1447 - shared_bits = ~(pfn ^ end_pfn) & ~bitmask; 1448 - mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG; 1449 - } 1450 - 1451 - /* 1452 - * Fallback to domain selective flush if no PSI support or 1453 - * the size is too big. 1454 - */ 1455 - if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap)) 1456 - iommu->flush.flush_iotlb(iommu, did, 0, 0, 1457 - DMA_TLB_DSI_FLUSH); 1458 - else 1459 - iommu->flush.flush_iotlb(iommu, did, addr | ih, mask, 1460 - DMA_TLB_PSI_FLUSH); 1461 - } 1462 - 1463 - static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, 1464 - struct dmar_domain *domain, 1465 - unsigned long pfn, unsigned int pages, 1466 - int ih, int map) 1467 - { 1468 - unsigned int aligned_pages = __roundup_pow_of_two(pages); 1469 - unsigned int mask = ilog2(aligned_pages); 1470 - uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT; 1471 - u16 did = domain_id_iommu(domain, iommu); 1472 - 1473 - if (WARN_ON(!pages)) 1474 - return; 1475 - 1476 - if (ih) 1477 - ih = 1 << 6; 1478 - 1479 - if (domain->use_first_level) 1480 - domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih); 1481 - else 1482 - __iommu_flush_iotlb_psi(iommu, did, pfn, pages, ih); 1483 - 1484 - /* 1485 - * In caching mode, changes of pages from non-present to present require 1486 - * flush. However, device IOTLB doesn't need to be flushed in this case. 1487 - */ 1488 - if (!cap_caching_mode(iommu->cap) || !map) 1489 - iommu_flush_dev_iotlb(domain, addr, mask); 1490 - } 1491 - 1492 - /* Notification for newly created mappings */ 1493 - static void __mapping_notify_one(struct intel_iommu *iommu, struct dmar_domain *domain, 1494 - unsigned long pfn, unsigned int pages) 1495 - { 1496 - /* 1497 - * It's a non-present to present mapping. Only flush if caching mode 1498 - * and second level. 1499 - */ 1500 - if (cap_caching_mode(iommu->cap) && !domain->use_first_level) 1501 - iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1); 1502 - else 1503 - iommu_flush_write_buffer(iommu); 1504 - } 1505 - 1506 - /* 1507 - * Flush the relevant caches in nested translation if the domain 1508 - * also serves as a parent 1509 - */ 1510 - static void parent_domain_flush(struct dmar_domain *domain, 1511 - unsigned long pfn, 1512 - unsigned long pages, int ih) 1513 - { 1514 - struct dmar_domain *s1_domain; 1515 - 1516 - spin_lock(&domain->s1_lock); 1517 - list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) { 1518 - struct device_domain_info *device_info; 1519 - struct iommu_domain_info *info; 1520 - unsigned long flags; 1521 - unsigned long i; 1522 - 1523 - xa_for_each(&s1_domain->iommu_array, i, info) 1524 - __iommu_flush_iotlb_psi(info->iommu, info->did, 1525 - pfn, pages, ih); 1526 - 1527 - if (!s1_domain->has_iotlb_device) 1528 - continue; 1529 - 1530 - spin_lock_irqsave(&s1_domain->lock, flags); 1531 - list_for_each_entry(device_info, &s1_domain->devices, link) 1532 - /* 1533 - * Address translation cache in device side caches the 1534 - * result of nested translation. There is no easy way 1535 - * to identify the exact set of nested translations 1536 - * affected by a change in S2. So just flush the entire 1537 - * device cache. 1538 - */ 1539 - __iommu_flush_dev_iotlb(device_info, 0, 1540 - MAX_AGAW_PFN_WIDTH); 1541 - spin_unlock_irqrestore(&s1_domain->lock, flags); 1542 - } 1543 - spin_unlock(&domain->s1_lock); 1544 - } 1545 - 1546 1397 static void intel_flush_iotlb_all(struct iommu_domain *domain) 1547 1398 { 1548 - struct dmar_domain *dmar_domain = to_dmar_domain(domain); 1549 - struct iommu_domain_info *info; 1550 - unsigned long idx; 1551 - 1552 - xa_for_each(&dmar_domain->iommu_array, idx, info) { 1553 - struct intel_iommu *iommu = info->iommu; 1554 - u16 did = domain_id_iommu(dmar_domain, iommu); 1555 - 1556 - if (dmar_domain->use_first_level) 1557 - domain_flush_pasid_iotlb(iommu, dmar_domain, 0, -1, 0); 1558 - else 1559 - iommu->flush.flush_iotlb(iommu, did, 0, 0, 1560 - DMA_TLB_DSI_FLUSH); 1561 - 1562 - if (!cap_caching_mode(iommu->cap)) 1563 - iommu_flush_dev_iotlb(dmar_domain, 0, MAX_AGAW_PFN_WIDTH); 1564 - } 1565 - 1566 - if (dmar_domain->nested_parent) 1567 - parent_domain_flush(dmar_domain, 0, -1, 0); 1399 + cache_tag_flush_all(to_dmar_domain(domain)); 1568 1400 } 1569 1401 1570 1402 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu) ··· 1542 1750 domain->has_iotlb_device = false; 1543 1751 INIT_LIST_HEAD(&domain->devices); 1544 1752 INIT_LIST_HEAD(&domain->dev_pasids); 1753 + INIT_LIST_HEAD(&domain->cache_tags); 1545 1754 spin_lock_init(&domain->lock); 1755 + spin_lock_init(&domain->cache_lock); 1546 1756 xa_init(&domain->iommu_array); 1547 1757 1548 1758 return domain; ··· 1555 1761 struct iommu_domain_info *info, *curr; 1556 1762 unsigned long ndomains; 1557 1763 int num, ret = -ENOSPC; 1764 + 1765 + if (domain->domain.type == IOMMU_DOMAIN_SVA) 1766 + return 0; 1558 1767 1559 1768 info = kzalloc(sizeof(*info), GFP_KERNEL); 1560 1769 if (!info) ··· 1606 1809 { 1607 1810 struct iommu_domain_info *info; 1608 1811 1812 + if (domain->domain.type == IOMMU_DOMAIN_SVA) 1813 + return; 1814 + 1609 1815 spin_lock(&iommu->lock); 1610 1816 info = xa_load(&domain->iommu_array, iommu->seq_id); 1611 1817 if (--info->refcnt == 0) { ··· 1641 1841 LIST_HEAD(freelist); 1642 1842 1643 1843 domain_unmap(domain, 0, DOMAIN_MAX_PFN(domain->gaw), &freelist); 1644 - put_pages_list(&freelist); 1844 + iommu_put_pages_list(&freelist); 1645 1845 } 1646 1846 1647 1847 if (WARN_ON(!list_empty(&domain->devices))) ··· 1788 1988 domain_context_mapping_cb, domain); 1789 1989 } 1790 1990 1791 - /* Returns a number of VTD pages, but aligned to MM page size */ 1792 - static unsigned long aligned_nrpages(unsigned long host_addr, size_t size) 1793 - { 1794 - host_addr &= ~PAGE_MASK; 1795 - return PAGE_ALIGN(host_addr + size) >> VTD_PAGE_SHIFT; 1796 - } 1797 - 1798 1991 /* Return largest possible superpage level for a given mapping */ 1799 1992 static int hardware_largepage_caps(struct dmar_domain *domain, unsigned long iov_pfn, 1800 1993 unsigned long phy_pfn, unsigned long pages) ··· 1824 2031 unsigned long end_pfn, int level) 1825 2032 { 1826 2033 unsigned long lvl_pages = lvl_to_nr_pages(level); 1827 - struct iommu_domain_info *info; 1828 2034 struct dma_pte *pte = NULL; 1829 - unsigned long i; 1830 2035 1831 2036 while (start_pfn <= end_pfn) { 1832 2037 if (!pte) ··· 1836 2045 start_pfn + lvl_pages - 1, 1837 2046 level + 1); 1838 2047 1839 - xa_for_each(&domain->iommu_array, i, info) 1840 - iommu_flush_iotlb_psi(info->iommu, domain, 1841 - start_pfn, lvl_pages, 1842 - 0, 0); 1843 - if (domain->nested_parent) 1844 - parent_domain_flush(domain, start_pfn, 1845 - lvl_pages, 0); 2048 + cache_tag_flush_range(domain, start_pfn << VTD_PAGE_SHIFT, 2049 + end_pfn << VTD_PAGE_SHIFT, 0); 1846 2050 } 1847 2051 1848 2052 pte++; ··· 1914 2128 /* We don't need lock here, nobody else 1915 2129 * touches the iova range 1916 2130 */ 1917 - tmp = cmpxchg64_local(&pte->val, 0ULL, pteval); 1918 - if (tmp) { 2131 + tmp = 0ULL; 2132 + if (!try_cmpxchg64_local(&pte->val, &tmp, pteval)) { 1919 2133 static int dumps = 5; 1920 2134 pr_crit("ERROR: DMA PTE for vPFN 0x%lx already set (to %llx not %llx)\n", 1921 2135 iov_pfn, tmp, (unsigned long long)pteval); ··· 2113 2327 ret = domain_attach_iommu(domain, iommu); 2114 2328 if (ret) 2115 2329 return ret; 2330 + 2331 + ret = cache_tag_assign_domain(domain, dev, IOMMU_NO_PASID); 2332 + if (ret) { 2333 + domain_detach_iommu(domain, iommu); 2334 + return ret; 2335 + } 2336 + 2116 2337 info->domain = domain; 2117 2338 spin_lock_irqsave(&domain->lock, flags); 2118 2339 list_add(&info->link, &domain->devices); ··· 2194 2401 struct pci_dev *pdev = to_pci_dev(dev); 2195 2402 2196 2403 if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev)) 2197 - return IOMMU_DOMAIN_IDENTITY; 2198 - 2199 - if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev)) 2200 2404 return IOMMU_DOMAIN_IDENTITY; 2201 2405 } 2202 2406 ··· 2287 2497 if (!old_ce) 2288 2498 goto out; 2289 2499 2290 - new_ce = alloc_pgtable_page(iommu->node, GFP_KERNEL); 2500 + new_ce = iommu_alloc_page_node(iommu->node, GFP_KERNEL); 2291 2501 if (!new_ce) 2292 2502 goto out_unmap; 2293 2503 ··· 2495 2705 iommu_set_root_entry(iommu); 2496 2706 } 2497 2707 2498 - if (!dmar_map_gfx) 2499 - iommu_identity_mapping |= IDENTMAP_GFX; 2500 - 2501 2708 check_tylersburg_isoch(); 2502 2709 2503 2710 ret = si_domain_init(hw_pass_through); ··· 2585 2798 /* This IOMMU has *only* gfx devices. Either bypass it or 2586 2799 set the gfx_mapped flag, as appropriate */ 2587 2800 drhd->gfx_dedicated = 1; 2588 - if (!dmar_map_gfx) 2801 + if (disable_igfx_iommu) 2589 2802 drhd->ignored = 1; 2590 2803 } 2591 2804 } ··· 3201 3414 case MEM_OFFLINE: 3202 3415 case MEM_CANCEL_ONLINE: 3203 3416 { 3204 - struct dmar_drhd_unit *drhd; 3205 - struct intel_iommu *iommu; 3206 3417 LIST_HEAD(freelist); 3207 3418 3208 3419 domain_unmap(si_domain, start_vpfn, last_vpfn, &freelist); 3209 - 3210 - rcu_read_lock(); 3211 - for_each_active_iommu(iommu, drhd) 3212 - iommu_flush_iotlb_psi(iommu, si_domain, 3213 - start_vpfn, mhp->nr_pages, 3214 - list_empty(&freelist), 0); 3215 - rcu_read_unlock(); 3216 - put_pages_list(&freelist); 3420 + iommu_put_pages_list(&freelist); 3217 3421 } 3218 3422 break; 3219 3423 } ··· 3593 3815 list_del(&info->link); 3594 3816 spin_unlock_irqrestore(&info->domain->lock, flags); 3595 3817 3818 + cache_tag_unassign_domain(info->domain, dev, IOMMU_NO_PASID); 3596 3819 domain_detach_iommu(info->domain, iommu); 3597 3820 info->domain = NULL; 3598 3821 } ··· 3612 3833 domain->max_addr = 0; 3613 3834 3614 3835 /* always allocate the top pgd */ 3615 - domain->pgd = alloc_pgtable_page(domain->nid, GFP_ATOMIC); 3836 + domain->pgd = iommu_alloc_page_node(domain->nid, GFP_ATOMIC); 3616 3837 if (!domain->pgd) 3617 3838 return -ENOMEM; 3618 3839 domain_flush_cache(domain, domain->pgd, PAGE_SIZE); ··· 3661 3882 return domain; 3662 3883 case IOMMU_DOMAIN_IDENTITY: 3663 3884 return &si_domain->domain; 3664 - case IOMMU_DOMAIN_SVA: 3665 - return intel_svm_domain_alloc(); 3666 3885 default: 3667 3886 return NULL; 3668 3887 } ··· 3764 3987 pte = dmar_domain->pgd; 3765 3988 if (dma_pte_present(pte)) { 3766 3989 dmar_domain->pgd = phys_to_virt(dma_pte_addr(pte)); 3767 - free_pgtable_page(pte); 3990 + iommu_free_page(pte); 3768 3991 } 3769 3992 dmar_domain->agaw--; 3770 3993 } ··· 3899 4122 static void intel_iommu_tlb_sync(struct iommu_domain *domain, 3900 4123 struct iommu_iotlb_gather *gather) 3901 4124 { 3902 - struct dmar_domain *dmar_domain = to_dmar_domain(domain); 3903 - unsigned long iova_pfn = IOVA_PFN(gather->start); 3904 - size_t size = gather->end - gather->start; 3905 - struct iommu_domain_info *info; 3906 - unsigned long start_pfn; 3907 - unsigned long nrpages; 3908 - unsigned long i; 3909 - 3910 - nrpages = aligned_nrpages(gather->start, size); 3911 - start_pfn = mm_to_dma_pfn_start(iova_pfn); 3912 - 3913 - xa_for_each(&dmar_domain->iommu_array, i, info) 3914 - iommu_flush_iotlb_psi(info->iommu, dmar_domain, 3915 - start_pfn, nrpages, 3916 - list_empty(&gather->freelist), 0); 3917 - 3918 - if (dmar_domain->nested_parent) 3919 - parent_domain_flush(dmar_domain, start_pfn, nrpages, 3920 - list_empty(&gather->freelist)); 3921 - put_pages_list(&gather->freelist); 4125 + cache_tag_flush_range(to_dmar_domain(domain), gather->start, 4126 + gather->end, list_empty(&gather->freelist)); 4127 + iommu_put_pages_list(&gather->freelist); 3922 4128 } 3923 4129 3924 4130 static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, ··· 4110 4350 intel_iommu_debugfs_remove_dev(info); 4111 4351 kfree(info); 4112 4352 set_dma_ops(dev, NULL); 4113 - } 4114 - 4115 - static void intel_iommu_probe_finalize(struct device *dev) 4116 - { 4117 - set_dma_ops(dev, NULL); 4118 - iommu_setup_dma_ops(dev, 0, U64_MAX); 4119 4353 } 4120 4354 4121 4355 static void intel_iommu_get_resv_regions(struct device *device, ··· 4333 4579 static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain, 4334 4580 unsigned long iova, size_t size) 4335 4581 { 4336 - struct dmar_domain *dmar_domain = to_dmar_domain(domain); 4337 - unsigned long pages = aligned_nrpages(iova, size); 4338 - unsigned long pfn = iova >> VTD_PAGE_SHIFT; 4339 - struct iommu_domain_info *info; 4340 - unsigned long i; 4582 + cache_tag_flush_range_np(to_dmar_domain(domain), iova, iova + size - 1); 4341 4583 4342 - xa_for_each(&dmar_domain->iommu_array, i, info) 4343 - __mapping_notify_one(info->iommu, dmar_domain, pfn, pages); 4344 4584 return 0; 4345 4585 } 4346 4586 4347 - static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) 4587 + static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid, 4588 + struct iommu_domain *domain) 4348 4589 { 4349 4590 struct device_domain_info *info = dev_iommu_priv_get(dev); 4591 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 4350 4592 struct dev_pasid_info *curr, *dev_pasid = NULL; 4351 4593 struct intel_iommu *iommu = info->iommu; 4352 - struct dmar_domain *dmar_domain; 4353 - struct iommu_domain *domain; 4354 4594 unsigned long flags; 4355 4595 4356 - domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0); 4357 - if (WARN_ON_ONCE(!domain)) 4358 - goto out_tear_down; 4359 - 4360 - /* 4361 - * The SVA implementation needs to handle its own stuffs like the mm 4362 - * notification. Before consolidating that code into iommu core, let 4363 - * the intel sva code handle it. 4364 - */ 4365 - if (domain->type == IOMMU_DOMAIN_SVA) { 4366 - intel_svm_remove_dev_pasid(dev, pasid); 4367 - goto out_tear_down; 4368 - } 4369 - 4370 - dmar_domain = to_dmar_domain(domain); 4371 4596 spin_lock_irqsave(&dmar_domain->lock, flags); 4372 4597 list_for_each_entry(curr, &dmar_domain->dev_pasids, link_domain) { 4373 4598 if (curr->dev == dev && curr->pasid == pasid) { ··· 4358 4625 WARN_ON_ONCE(!dev_pasid); 4359 4626 spin_unlock_irqrestore(&dmar_domain->lock, flags); 4360 4627 4628 + cache_tag_unassign_domain(dmar_domain, dev, pasid); 4361 4629 domain_detach_iommu(dmar_domain, iommu); 4362 4630 intel_iommu_debugfs_remove_dev_pasid(dev_pasid); 4363 4631 kfree(dev_pasid); 4364 - out_tear_down: 4365 4632 intel_pasid_tear_down_entry(iommu, dev, pasid, false); 4366 4633 intel_drain_pasid_prq(dev, pasid); 4367 4634 } ··· 4397 4664 if (ret) 4398 4665 goto out_free; 4399 4666 4667 + ret = cache_tag_assign_domain(dmar_domain, dev, pasid); 4668 + if (ret) 4669 + goto out_detach_iommu; 4670 + 4400 4671 if (domain_type_is_si(dmar_domain)) 4401 4672 ret = intel_pasid_setup_pass_through(iommu, dev, pasid); 4402 4673 else if (dmar_domain->use_first_level) ··· 4410 4673 ret = intel_pasid_setup_second_level(iommu, dmar_domain, 4411 4674 dev, pasid); 4412 4675 if (ret) 4413 - goto out_detach_iommu; 4676 + goto out_unassign_tag; 4414 4677 4415 4678 dev_pasid->dev = dev; 4416 4679 dev_pasid->pasid = pasid; ··· 4422 4685 intel_iommu_debugfs_create_dev_pasid(dev_pasid); 4423 4686 4424 4687 return 0; 4688 + out_unassign_tag: 4689 + cache_tag_unassign_domain(dmar_domain, dev, pasid); 4425 4690 out_detach_iommu: 4426 4691 domain_detach_iommu(dmar_domain, iommu); 4427 4692 out_free: ··· 4580 4841 .hw_info = intel_iommu_hw_info, 4581 4842 .domain_alloc = intel_iommu_domain_alloc, 4582 4843 .domain_alloc_user = intel_iommu_domain_alloc_user, 4844 + .domain_alloc_sva = intel_svm_domain_alloc, 4583 4845 .probe_device = intel_iommu_probe_device, 4584 - .probe_finalize = intel_iommu_probe_finalize, 4585 4846 .release_device = intel_iommu_release_device, 4586 4847 .get_resv_regions = intel_iommu_get_resv_regions, 4587 4848 .device_group = intel_iommu_device_group, ··· 4614 4875 return; 4615 4876 4616 4877 pci_info(dev, "Disabling IOMMU for graphics on this chipset\n"); 4617 - dmar_map_gfx = 0; 4878 + disable_igfx_iommu = 1; 4618 4879 } 4619 4880 4620 4881 /* G4x/GM45 integrated gfx dmar support is totally busted. */ ··· 4695 4956 4696 4957 if (!(ggc & GGC_MEMORY_VT_ENABLED)) { 4697 4958 pci_info(dev, "BIOS has allocated no shadow GTT; disabling IOMMU for graphics\n"); 4698 - dmar_map_gfx = 0; 4699 - } else if (dmar_map_gfx) { 4959 + disable_igfx_iommu = 1; 4960 + } else if (!disable_igfx_iommu) { 4700 4961 /* we have to ensure the gfx device is idle before we flush */ 4701 4962 pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); 4702 4963 iommu_set_dma_strict();
+61 -27
drivers/iommu/intel/iommu.h
··· 35 35 #define VTD_PAGE_MASK (((u64)-1) << VTD_PAGE_SHIFT) 36 36 #define VTD_PAGE_ALIGN(addr) (((addr) + VTD_PAGE_SIZE - 1) & VTD_PAGE_MASK) 37 37 38 + #define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT) 39 + 38 40 #define VTD_STRIDE_SHIFT (9) 39 41 #define VTD_STRIDE_MASK (((u64)-1) << VTD_STRIDE_SHIFT) 40 42 ··· 457 455 458 456 /* Page group response descriptor QW0 */ 459 457 #define QI_PGRP_PASID_P(p) (((u64)(p)) << 4) 460 - #define QI_PGRP_PDP(p) (((u64)(p)) << 5) 461 458 #define QI_PGRP_RESP_CODE(res) (((u64)(res)) << 12) 462 459 #define QI_PGRP_DID(rid) (((u64)(rid)) << 16) 463 460 #define QI_PGRP_PASID(pasid) (((u64)(pasid)) << 32) ··· 608 607 struct list_head devices; /* all devices' list */ 609 608 struct list_head dev_pasids; /* all attached pasids */ 610 609 610 + spinlock_t cache_lock; /* Protect the cache tag list */ 611 + struct list_head cache_tags; /* Cache tag list */ 612 + 611 613 int iommu_superpage;/* Level of superpages supported: 612 614 0 == 4KiB (no superpages), 1 == 2MiB, 613 615 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ ··· 647 643 struct iommu_hwpt_vtd_s1 s1_cfg; 648 644 /* link to parent domain siblings */ 649 645 struct list_head s2_link; 646 + }; 647 + 648 + /* SVA domain */ 649 + struct { 650 + struct mmu_notifier notifier; 650 651 }; 651 652 }; 652 653 ··· 1047 1038 context->lo |= BIT_ULL(4); 1048 1039 } 1049 1040 1041 + /* Returns a number of VTD pages, but aligned to MM page size */ 1042 + static inline unsigned long aligned_nrpages(unsigned long host_addr, size_t size) 1043 + { 1044 + host_addr &= ~PAGE_MASK; 1045 + return PAGE_ALIGN(host_addr + size) >> VTD_PAGE_SHIFT; 1046 + } 1047 + 1048 + /* Return a size from number of VTD pages. */ 1049 + static inline unsigned long nrpages_to_size(unsigned long npages) 1050 + { 1051 + return npages << VTD_PAGE_SHIFT; 1052 + } 1053 + 1050 1054 /* Convert value to context PASID directory size field coding. */ 1051 1055 #define context_pdts(pds) (((pds) & 0x7) << 9) 1052 1056 ··· 1107 1085 1108 1086 int dmar_ir_support(void); 1109 1087 1110 - void *alloc_pgtable_page(int node, gfp_t gfp); 1111 - void free_pgtable_page(void *vaddr); 1112 1088 void iommu_flush_write_buffer(struct intel_iommu *iommu); 1113 1089 struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, 1114 1090 const struct iommu_user_data *user_data); 1115 1091 struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); 1092 + 1093 + enum cache_tag_type { 1094 + CACHE_TAG_IOTLB, 1095 + CACHE_TAG_DEVTLB, 1096 + CACHE_TAG_NESTING_IOTLB, 1097 + CACHE_TAG_NESTING_DEVTLB, 1098 + }; 1099 + 1100 + struct cache_tag { 1101 + struct list_head node; 1102 + enum cache_tag_type type; 1103 + struct intel_iommu *iommu; 1104 + /* 1105 + * The @dev field represents the location of the cache. For IOTLB, it 1106 + * resides on the IOMMU hardware. @dev stores the device pointer to 1107 + * the IOMMU hardware. For DevTLB, it locates in the PCIe endpoint. 1108 + * @dev stores the device pointer to that endpoint. 1109 + */ 1110 + struct device *dev; 1111 + u16 domain_id; 1112 + ioasid_t pasid; 1113 + unsigned int users; 1114 + }; 1115 + 1116 + int cache_tag_assign_domain(struct dmar_domain *domain, 1117 + struct device *dev, ioasid_t pasid); 1118 + void cache_tag_unassign_domain(struct dmar_domain *domain, 1119 + struct device *dev, ioasid_t pasid); 1120 + void cache_tag_flush_range(struct dmar_domain *domain, unsigned long start, 1121 + unsigned long end, int ih); 1122 + void cache_tag_flush_all(struct dmar_domain *domain); 1123 + void cache_tag_flush_range_np(struct dmar_domain *domain, unsigned long start, 1124 + unsigned long end); 1116 1125 1117 1126 #ifdef CONFIG_INTEL_IOMMU_SVM 1118 1127 void intel_svm_check(struct intel_iommu *iommu); ··· 1151 1098 int intel_svm_finish_prq(struct intel_iommu *iommu); 1152 1099 void intel_svm_page_response(struct device *dev, struct iopf_fault *evt, 1153 1100 struct iommu_page_response *msg); 1154 - struct iommu_domain *intel_svm_domain_alloc(void); 1155 - void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid); 1101 + struct iommu_domain *intel_svm_domain_alloc(struct device *dev, 1102 + struct mm_struct *mm); 1156 1103 void intel_drain_pasid_prq(struct device *dev, u32 pasid); 1157 - 1158 - struct intel_svm_dev { 1159 - struct list_head list; 1160 - struct rcu_head rcu; 1161 - struct device *dev; 1162 - struct intel_iommu *iommu; 1163 - u16 did; 1164 - u16 sid, qdep; 1165 - }; 1166 - 1167 - struct intel_svm { 1168 - struct mmu_notifier notifier; 1169 - struct mm_struct *mm; 1170 - u32 pasid; 1171 - struct list_head devs; 1172 - }; 1173 1104 #else 1174 1105 static inline void intel_svm_check(struct intel_iommu *iommu) {} 1175 1106 static inline void intel_drain_pasid_prq(struct device *dev, u32 pasid) {} 1176 - static inline struct iommu_domain *intel_svm_domain_alloc(void) 1107 + static inline struct iommu_domain *intel_svm_domain_alloc(struct device *dev, 1108 + struct mm_struct *mm) 1177 1109 { 1178 - return NULL; 1179 - } 1180 - 1181 - static inline void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid) 1182 - { 1110 + return ERR_PTR(-ENODEV); 1183 1111 } 1184 1112 #endif 1185 1113
+8 -8
drivers/iommu/intel/irq_remapping.c
··· 23 23 24 24 #include "iommu.h" 25 25 #include "../irq_remapping.h" 26 + #include "../iommu-pages.h" 26 27 #include "cap_audit.h" 27 28 28 29 enum irq_mode { ··· 530 529 struct ir_table *ir_table; 531 530 struct fwnode_handle *fn; 532 531 unsigned long *bitmap; 533 - struct page *pages; 532 + void *ir_table_base; 534 533 535 534 if (iommu->ir_table) 536 535 return 0; ··· 539 538 if (!ir_table) 540 539 return -ENOMEM; 541 540 542 - pages = alloc_pages_node(iommu->node, GFP_KERNEL | __GFP_ZERO, 543 - INTR_REMAP_PAGE_ORDER); 544 - if (!pages) { 541 + ir_table_base = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, 542 + INTR_REMAP_PAGE_ORDER); 543 + if (!ir_table_base) { 545 544 pr_err("IR%d: failed to allocate pages of order %d\n", 546 545 iommu->seq_id, INTR_REMAP_PAGE_ORDER); 547 546 goto out_free_table; ··· 576 575 else 577 576 iommu->ir_domain->msi_parent_ops = &dmar_msi_parent_ops; 578 577 579 - ir_table->base = page_address(pages); 578 + ir_table->base = ir_table_base; 580 579 ir_table->bitmap = bitmap; 581 580 iommu->ir_table = ir_table; 582 581 ··· 625 624 out_free_bitmap: 626 625 bitmap_free(bitmap); 627 626 out_free_pages: 628 - __free_pages(pages, INTR_REMAP_PAGE_ORDER); 627 + iommu_free_pages(ir_table_base, INTR_REMAP_PAGE_ORDER); 629 628 out_free_table: 630 629 kfree(ir_table); 631 630 ··· 646 645 irq_domain_free_fwnode(fn); 647 646 iommu->ir_domain = NULL; 648 647 } 649 - free_pages((unsigned long)iommu->ir_table->base, 650 - INTR_REMAP_PAGE_ORDER); 648 + iommu_free_pages(iommu->ir_table->base, INTR_REMAP_PAGE_ORDER); 651 649 bitmap_free(iommu->ir_table->bitmap); 652 650 kfree(iommu->ir_table); 653 651 iommu->ir_table = NULL;
+17 -52
drivers/iommu/intel/nested.c
··· 52 52 return ret; 53 53 } 54 54 55 + ret = cache_tag_assign_domain(dmar_domain, dev, IOMMU_NO_PASID); 56 + if (ret) 57 + goto detach_iommu; 58 + 55 59 ret = intel_pasid_setup_nested(iommu, dev, 56 60 IOMMU_NO_PASID, dmar_domain); 57 - if (ret) { 58 - domain_detach_iommu(dmar_domain, iommu); 59 - dev_err_ratelimited(dev, "Failed to setup pasid entry\n"); 60 - return ret; 61 - } 61 + if (ret) 62 + goto unassign_tag; 62 63 63 64 info->domain = dmar_domain; 64 65 spin_lock_irqsave(&dmar_domain->lock, flags); ··· 69 68 domain_update_iotlb(dmar_domain); 70 69 71 70 return 0; 71 + unassign_tag: 72 + cache_tag_unassign_domain(dmar_domain, dev, IOMMU_NO_PASID); 73 + detach_iommu: 74 + domain_detach_iommu(dmar_domain, iommu); 75 + 76 + return ret; 72 77 } 73 78 74 79 static void intel_nested_domain_free(struct iommu_domain *domain) ··· 86 79 list_del(&dmar_domain->s2_link); 87 80 spin_unlock(&s2_domain->s1_lock); 88 81 kfree(dmar_domain); 89 - } 90 - 91 - static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr, 92 - unsigned int mask) 93 - { 94 - struct device_domain_info *info; 95 - unsigned long flags; 96 - u16 sid, qdep; 97 - 98 - spin_lock_irqsave(&domain->lock, flags); 99 - list_for_each_entry(info, &domain->devices, link) { 100 - if (!info->ats_enabled) 101 - continue; 102 - sid = info->bus << 8 | info->devfn; 103 - qdep = info->ats_qdep; 104 - qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, 105 - qdep, addr, mask); 106 - quirk_extra_dev_tlb_flush(info, addr, mask, 107 - IOMMU_NO_PASID, qdep); 108 - } 109 - spin_unlock_irqrestore(&domain->lock, flags); 110 - } 111 - 112 - static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr, 113 - u64 npages, bool ih) 114 - { 115 - struct iommu_domain_info *info; 116 - unsigned int mask; 117 - unsigned long i; 118 - 119 - xa_for_each(&domain->iommu_array, i, info) 120 - qi_flush_piotlb(info->iommu, 121 - domain_id_iommu(domain, info->iommu), 122 - IOMMU_NO_PASID, addr, npages, ih); 123 - 124 - if (!domain->has_iotlb_device) 125 - return; 126 - 127 - if (npages == U64_MAX) 128 - mask = 64 - VTD_PAGE_SHIFT; 129 - else 130 - mask = ilog2(__roundup_pow_of_two(npages)); 131 - 132 - nested_flush_dev_iotlb(domain, addr, mask); 133 82 } 134 83 135 84 static int intel_nested_cache_invalidate_user(struct iommu_domain *domain, ··· 120 157 break; 121 158 } 122 159 123 - intel_nested_flush_cache(dmar_domain, inv_entry.addr, 124 - inv_entry.npages, 125 - inv_entry.flags & IOMMU_VTD_INV_FLAGS_LEAF); 160 + cache_tag_flush_range(dmar_domain, inv_entry.addr, 161 + inv_entry.addr + nrpages_to_size(inv_entry.npages) - 1, 162 + inv_entry.flags & IOMMU_VTD_INV_FLAGS_LEAF); 126 163 processed++; 127 164 } 128 165 ··· 169 206 domain->domain.type = IOMMU_DOMAIN_NESTED; 170 207 INIT_LIST_HEAD(&domain->devices); 171 208 INIT_LIST_HEAD(&domain->dev_pasids); 209 + INIT_LIST_HEAD(&domain->cache_tags); 172 210 spin_lock_init(&domain->lock); 211 + spin_lock_init(&domain->cache_lock); 173 212 xa_init(&domain->iommu_array); 174 213 175 214 spin_lock(&s2_domain->s1_lock);
+9 -9
drivers/iommu/intel/pasid.c
··· 20 20 21 21 #include "iommu.h" 22 22 #include "pasid.h" 23 + #include "../iommu-pages.h" 23 24 24 25 /* 25 26 * Intel IOMMU system wide PASID name space: ··· 39 38 { 40 39 struct device_domain_info *info; 41 40 struct pasid_table *pasid_table; 42 - struct page *pages; 41 + struct pasid_dir_entry *dir; 43 42 u32 max_pasid = 0; 44 43 int order, size; 45 44 ··· 60 59 61 60 size = max_pasid >> (PASID_PDE_SHIFT - 3); 62 61 order = size ? get_order(size) : 0; 63 - pages = alloc_pages_node(info->iommu->node, 64 - GFP_KERNEL | __GFP_ZERO, order); 65 - if (!pages) { 62 + dir = iommu_alloc_pages_node(info->iommu->node, GFP_KERNEL, order); 63 + if (!dir) { 66 64 kfree(pasid_table); 67 65 return -ENOMEM; 68 66 } 69 67 70 - pasid_table->table = page_address(pages); 68 + pasid_table->table = dir; 71 69 pasid_table->order = order; 72 70 pasid_table->max_pasid = 1 << (order + PAGE_SHIFT + 3); 73 71 info->pasid_table = pasid_table; ··· 97 97 max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT; 98 98 for (i = 0; i < max_pde; i++) { 99 99 table = get_pasid_table_from_pde(&dir[i]); 100 - free_pgtable_page(table); 100 + iommu_free_page(table); 101 101 } 102 102 103 - free_pages((unsigned long)pasid_table->table, pasid_table->order); 103 + iommu_free_pages(pasid_table->table, pasid_table->order); 104 104 kfree(pasid_table); 105 105 } 106 106 ··· 146 146 retry: 147 147 entries = get_pasid_table_from_pde(&dir[dir_index]); 148 148 if (!entries) { 149 - entries = alloc_pgtable_page(info->iommu->node, GFP_ATOMIC); 149 + entries = iommu_alloc_page_node(info->iommu->node, GFP_ATOMIC); 150 150 if (!entries) 151 151 return NULL; 152 152 ··· 158 158 */ 159 159 if (cmpxchg64(&dir[dir_index].val, 0ULL, 160 160 (u64)virt_to_phys(entries) | PASID_PTE_PRESENT)) { 161 - free_pgtable_page(entries); 161 + iommu_free_page(entries); 162 162 goto retry; 163 163 } 164 164 if (!ecap_coherent(info->iommu->ecap)) {
-1
drivers/iommu/intel/perf.h
··· 11 11 DMAR_LATENCY_INV_IOTLB = 0, 12 12 DMAR_LATENCY_INV_DEVTLB, 13 13 DMAR_LATENCY_INV_IEC, 14 - DMAR_LATENCY_PRQ, 15 14 DMAR_LATENCY_NUM 16 15 }; 17 16
+86 -297
drivers/iommu/intel/svm.c
··· 22 22 #include "iommu.h" 23 23 #include "pasid.h" 24 24 #include "perf.h" 25 + #include "../iommu-pages.h" 25 26 #include "trace.h" 26 27 27 28 static irqreturn_t prq_event_thread(int irq, void *d); 28 29 29 - static DEFINE_XARRAY_ALLOC(pasid_private_array); 30 - static int pasid_private_add(ioasid_t pasid, void *priv) 31 - { 32 - return xa_alloc(&pasid_private_array, &pasid, priv, 33 - XA_LIMIT(pasid, pasid), GFP_ATOMIC); 34 - } 35 - 36 - static void pasid_private_remove(ioasid_t pasid) 37 - { 38 - xa_erase(&pasid_private_array, pasid); 39 - } 40 - 41 - static void *pasid_private_find(ioasid_t pasid) 42 - { 43 - return xa_load(&pasid_private_array, pasid); 44 - } 45 - 46 - static struct intel_svm_dev * 47 - svm_lookup_device_by_dev(struct intel_svm *svm, struct device *dev) 48 - { 49 - struct intel_svm_dev *sdev = NULL, *t; 50 - 51 - rcu_read_lock(); 52 - list_for_each_entry_rcu(t, &svm->devs, list) { 53 - if (t->dev == dev) { 54 - sdev = t; 55 - break; 56 - } 57 - } 58 - rcu_read_unlock(); 59 - 60 - return sdev; 61 - } 62 - 63 30 int intel_svm_enable_prq(struct intel_iommu *iommu) 64 31 { 65 32 struct iopf_queue *iopfq; 66 - struct page *pages; 67 33 int irq, ret; 68 34 69 - pages = alloc_pages_node(iommu->node, GFP_KERNEL | __GFP_ZERO, PRQ_ORDER); 70 - if (!pages) { 35 + iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER); 36 + if (!iommu->prq) { 71 37 pr_warn("IOMMU: %s: Failed to allocate page request queue\n", 72 38 iommu->name); 73 39 return -ENOMEM; 74 40 } 75 - iommu->prq = page_address(pages); 76 41 77 42 irq = dmar_alloc_hwirq(IOMMU_IRQ_ID_OFFSET_PRQ + iommu->seq_id, iommu->node, iommu); 78 43 if (irq <= 0) { ··· 82 117 dmar_free_hwirq(irq); 83 118 iommu->pr_irq = 0; 84 119 free_prq: 85 - free_pages((unsigned long)iommu->prq, PRQ_ORDER); 120 + iommu_free_pages(iommu->prq, PRQ_ORDER); 86 121 iommu->prq = NULL; 87 122 88 123 return ret; ··· 105 140 iommu->iopf_queue = NULL; 106 141 } 107 142 108 - free_pages((unsigned long)iommu->prq, PRQ_ORDER); 143 + iommu_free_pages(iommu->prq, PRQ_ORDER); 109 144 iommu->prq = NULL; 110 145 111 146 return 0; ··· 133 168 iommu->flags |= VTD_FLAG_SVM_CAPABLE; 134 169 } 135 170 136 - static void __flush_svm_range_dev(struct intel_svm *svm, 137 - struct intel_svm_dev *sdev, 138 - unsigned long address, 139 - unsigned long pages, int ih) 140 - { 141 - struct device_domain_info *info = dev_iommu_priv_get(sdev->dev); 142 - 143 - if (WARN_ON(!pages)) 144 - return; 145 - 146 - qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih); 147 - if (info->ats_enabled) { 148 - qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid, 149 - svm->pasid, sdev->qdep, address, 150 - order_base_2(pages)); 151 - quirk_extra_dev_tlb_flush(info, address, order_base_2(pages), 152 - svm->pasid, sdev->qdep); 153 - } 154 - } 155 - 156 - static void intel_flush_svm_range_dev(struct intel_svm *svm, 157 - struct intel_svm_dev *sdev, 158 - unsigned long address, 159 - unsigned long pages, int ih) 160 - { 161 - unsigned long shift = ilog2(__roundup_pow_of_two(pages)); 162 - unsigned long align = (1ULL << (VTD_PAGE_SHIFT + shift)); 163 - unsigned long start = ALIGN_DOWN(address, align); 164 - unsigned long end = ALIGN(address + (pages << VTD_PAGE_SHIFT), align); 165 - 166 - while (start < end) { 167 - __flush_svm_range_dev(svm, sdev, start, align >> VTD_PAGE_SHIFT, ih); 168 - start += align; 169 - } 170 - } 171 - 172 - static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address, 173 - unsigned long pages, int ih) 174 - { 175 - struct intel_svm_dev *sdev; 176 - 177 - rcu_read_lock(); 178 - list_for_each_entry_rcu(sdev, &svm->devs, list) 179 - intel_flush_svm_range_dev(svm, sdev, address, pages, ih); 180 - rcu_read_unlock(); 181 - } 182 - 183 - static void intel_flush_svm_all(struct intel_svm *svm) 184 - { 185 - struct device_domain_info *info; 186 - struct intel_svm_dev *sdev; 187 - 188 - rcu_read_lock(); 189 - list_for_each_entry_rcu(sdev, &svm->devs, list) { 190 - info = dev_iommu_priv_get(sdev->dev); 191 - 192 - qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 0); 193 - if (info->ats_enabled) { 194 - qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid, 195 - svm->pasid, sdev->qdep, 196 - 0, 64 - VTD_PAGE_SHIFT); 197 - quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT, 198 - svm->pasid, sdev->qdep); 199 - } 200 - } 201 - rcu_read_unlock(); 202 - } 203 - 204 171 /* Pages have been freed at this point */ 205 172 static void intel_arch_invalidate_secondary_tlbs(struct mmu_notifier *mn, 206 173 struct mm_struct *mm, 207 174 unsigned long start, unsigned long end) 208 175 { 209 - struct intel_svm *svm = container_of(mn, struct intel_svm, notifier); 176 + struct dmar_domain *domain = container_of(mn, struct dmar_domain, notifier); 210 177 211 - if (start == 0 && end == -1UL) { 212 - intel_flush_svm_all(svm); 178 + if (start == 0 && end == ULONG_MAX) { 179 + cache_tag_flush_all(domain); 213 180 return; 214 181 } 215 182 216 - intel_flush_svm_range(svm, start, 217 - (end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0); 183 + /* 184 + * The mm_types defines vm_end as the first byte after the end address, 185 + * different from IOMMU subsystem using the last address of an address 186 + * range. 187 + */ 188 + cache_tag_flush_range(domain, start, end - 1, 0); 218 189 } 219 190 220 191 static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) 221 192 { 222 - struct intel_svm *svm = container_of(mn, struct intel_svm, notifier); 223 - struct intel_svm_dev *sdev; 193 + struct dmar_domain *domain = container_of(mn, struct dmar_domain, notifier); 194 + struct dev_pasid_info *dev_pasid; 195 + struct device_domain_info *info; 196 + unsigned long flags; 224 197 225 198 /* This might end up being called from exit_mmap(), *before* the page 226 199 * tables are cleared. And __mmu_notifier_release() will delete us from ··· 172 269 * page) so that we end up taking a fault that the hardware really 173 270 * *has* to handle gracefully without affecting other processes. 174 271 */ 175 - rcu_read_lock(); 176 - list_for_each_entry_rcu(sdev, &svm->devs, list) 177 - intel_pasid_tear_down_entry(sdev->iommu, sdev->dev, 178 - svm->pasid, true); 179 - rcu_read_unlock(); 272 + spin_lock_irqsave(&domain->lock, flags); 273 + list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) { 274 + info = dev_iommu_priv_get(dev_pasid->dev); 275 + intel_pasid_tear_down_entry(info->iommu, dev_pasid->dev, 276 + dev_pasid->pasid, true); 277 + } 278 + spin_unlock_irqrestore(&domain->lock, flags); 180 279 280 + } 281 + 282 + static void intel_mm_free_notifier(struct mmu_notifier *mn) 283 + { 284 + kfree(container_of(mn, struct dmar_domain, notifier)); 181 285 } 182 286 183 287 static const struct mmu_notifier_ops intel_mmuops = { 184 288 .release = intel_mm_release, 185 289 .arch_invalidate_secondary_tlbs = intel_arch_invalidate_secondary_tlbs, 290 + .free_notifier = intel_mm_free_notifier, 186 291 }; 187 - 188 - static int pasid_to_svm_sdev(struct device *dev, unsigned int pasid, 189 - struct intel_svm **rsvm, 190 - struct intel_svm_dev **rsdev) 191 - { 192 - struct intel_svm_dev *sdev = NULL; 193 - struct intel_svm *svm; 194 - 195 - if (pasid == IOMMU_PASID_INVALID || pasid >= PASID_MAX) 196 - return -EINVAL; 197 - 198 - svm = pasid_private_find(pasid); 199 - if (IS_ERR(svm)) 200 - return PTR_ERR(svm); 201 - 202 - if (!svm) 203 - goto out; 204 - 205 - /* 206 - * If we found svm for the PASID, there must be at least one device 207 - * bond. 208 - */ 209 - if (WARN_ON(list_empty(&svm->devs))) 210 - return -EINVAL; 211 - sdev = svm_lookup_device_by_dev(svm, dev); 212 - 213 - out: 214 - *rsvm = svm; 215 - *rsdev = sdev; 216 - 217 - return 0; 218 - } 219 292 220 293 static int intel_svm_set_dev_pasid(struct iommu_domain *domain, 221 294 struct device *dev, ioasid_t pasid) 222 295 { 223 296 struct device_domain_info *info = dev_iommu_priv_get(dev); 297 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 224 298 struct intel_iommu *iommu = info->iommu; 225 299 struct mm_struct *mm = domain->mm; 226 - struct intel_svm_dev *sdev; 227 - struct intel_svm *svm; 300 + struct dev_pasid_info *dev_pasid; 228 301 unsigned long sflags; 302 + unsigned long flags; 229 303 int ret = 0; 230 304 231 - svm = pasid_private_find(pasid); 232 - if (!svm) { 233 - svm = kzalloc(sizeof(*svm), GFP_KERNEL); 234 - if (!svm) 235 - return -ENOMEM; 305 + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL); 306 + if (!dev_pasid) 307 + return -ENOMEM; 236 308 237 - svm->pasid = pasid; 238 - svm->mm = mm; 239 - INIT_LIST_HEAD_RCU(&svm->devs); 309 + dev_pasid->dev = dev; 310 + dev_pasid->pasid = pasid; 240 311 241 - svm->notifier.ops = &intel_mmuops; 242 - ret = mmu_notifier_register(&svm->notifier, mm); 243 - if (ret) { 244 - kfree(svm); 245 - return ret; 246 - } 247 - 248 - ret = pasid_private_add(svm->pasid, svm); 249 - if (ret) { 250 - mmu_notifier_unregister(&svm->notifier, mm); 251 - kfree(svm); 252 - return ret; 253 - } 254 - } 255 - 256 - sdev = kzalloc(sizeof(*sdev), GFP_KERNEL); 257 - if (!sdev) { 258 - ret = -ENOMEM; 259 - goto free_svm; 260 - } 261 - 262 - sdev->dev = dev; 263 - sdev->iommu = iommu; 264 - sdev->did = FLPT_DEFAULT_DID; 265 - sdev->sid = PCI_DEVID(info->bus, info->devfn); 266 - if (info->ats_enabled) { 267 - sdev->qdep = info->ats_qdep; 268 - if (sdev->qdep >= QI_DEV_EIOTLB_MAX_INVS) 269 - sdev->qdep = 0; 270 - } 312 + ret = cache_tag_assign_domain(to_dmar_domain(domain), dev, pasid); 313 + if (ret) 314 + goto free_dev_pasid; 271 315 272 316 /* Setup the pasid table: */ 273 317 sflags = cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0; 274 318 ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, pasid, 275 319 FLPT_DEFAULT_DID, sflags); 276 320 if (ret) 277 - goto free_sdev; 321 + goto unassign_tag; 278 322 279 - list_add_rcu(&sdev->list, &svm->devs); 323 + spin_lock_irqsave(&dmar_domain->lock, flags); 324 + list_add(&dev_pasid->link_domain, &dmar_domain->dev_pasids); 325 + spin_unlock_irqrestore(&dmar_domain->lock, flags); 280 326 281 327 return 0; 282 328 283 - free_sdev: 284 - kfree(sdev); 285 - free_svm: 286 - if (list_empty(&svm->devs)) { 287 - mmu_notifier_unregister(&svm->notifier, mm); 288 - pasid_private_remove(pasid); 289 - kfree(svm); 290 - } 329 + unassign_tag: 330 + cache_tag_unassign_domain(to_dmar_domain(domain), dev, pasid); 331 + free_dev_pasid: 332 + kfree(dev_pasid); 291 333 292 334 return ret; 293 - } 294 - 295 - void intel_svm_remove_dev_pasid(struct device *dev, u32 pasid) 296 - { 297 - struct intel_svm_dev *sdev; 298 - struct intel_svm *svm; 299 - struct mm_struct *mm; 300 - 301 - if (pasid_to_svm_sdev(dev, pasid, &svm, &sdev)) 302 - return; 303 - mm = svm->mm; 304 - 305 - if (sdev) { 306 - list_del_rcu(&sdev->list); 307 - kfree_rcu(sdev, rcu); 308 - 309 - if (list_empty(&svm->devs)) { 310 - if (svm->notifier.ops) 311 - mmu_notifier_unregister(&svm->notifier, mm); 312 - pasid_private_remove(svm->pasid); 313 - kfree(svm); 314 - } 315 - } 316 335 } 317 336 318 337 /* Page request queue descriptor */ ··· 243 418 struct { 244 419 u64 type:8; 245 420 u64 pasid_present:1; 246 - u64 priv_data_present:1; 247 - u64 rsvd:6; 421 + u64 rsvd:7; 248 422 u64 rid:16; 249 423 u64 pasid:20; 250 424 u64 exe_req:1; ··· 262 438 }; 263 439 u64 qw_1; 264 440 }; 265 - u64 priv_data[2]; 441 + u64 qw_2; 442 + u64 qw_3; 266 443 }; 267 444 268 445 static bool is_canonical_address(u64 addr) ··· 397 572 event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 398 573 event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID; 399 574 } 400 - if (desc->priv_data_present) { 401 - /* 402 - * Set last page in group bit if private data is present, 403 - * page response is required as it does for LPIG. 404 - * iommu_report_device_fault() doesn't understand this vendor 405 - * specific requirement thus we set last_page as a workaround. 406 - */ 407 - event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 408 - event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA; 409 - event.fault.prm.private_data[0] = desc->priv_data[0]; 410 - event.fault.prm.private_data[1] = desc->priv_data[1]; 411 - } else if (dmar_latency_enabled(iommu, DMAR_LATENCY_PRQ)) { 412 - /* 413 - * If the private data fields are not used by hardware, use it 414 - * to monitor the prq handle latency. 415 - */ 416 - event.fault.prm.private_data[0] = ktime_to_ns(ktime_get()); 417 - } 418 575 419 576 iommu_report_device_fault(dev, &event); 420 577 } ··· 404 597 static void handle_bad_prq_event(struct intel_iommu *iommu, 405 598 struct page_req_dsc *req, int result) 406 599 { 407 - struct qi_desc desc; 600 + struct qi_desc desc = { }; 408 601 409 602 pr_err("%s: Invalid page request: %08llx %08llx\n", 410 603 iommu->name, ((unsigned long long *)req)[0], 411 604 ((unsigned long long *)req)[1]); 412 605 413 - /* 414 - * Per VT-d spec. v3.0 ch7.7, system software must 415 - * respond with page group response if private data 416 - * is present (PDP) or last page in group (LPIG) bit 417 - * is set. This is an additional VT-d feature beyond 418 - * PCI ATS spec. 419 - */ 420 - if (!req->lpig && !req->priv_data_present) 606 + if (!req->lpig) 421 607 return; 422 608 423 609 desc.qw0 = QI_PGRP_PASID(req->pasid) | 424 610 QI_PGRP_DID(req->rid) | 425 611 QI_PGRP_PASID_P(req->pasid_present) | 426 - QI_PGRP_PDP(req->priv_data_present) | 427 612 QI_PGRP_RESP_CODE(result) | 428 613 QI_PGRP_RESP_TYPE; 429 614 desc.qw1 = QI_PGRP_IDX(req->prg_index) | 430 615 QI_PGRP_LPIG(req->lpig); 431 - 432 - if (req->priv_data_present) { 433 - desc.qw2 = req->priv_data[0]; 434 - desc.qw3 = req->priv_data[1]; 435 - } else { 436 - desc.qw2 = 0; 437 - desc.qw3 = 0; 438 - } 439 616 440 617 qi_submit_sync(iommu, &desc, 1, 0); 441 618 } ··· 488 697 489 698 intel_svm_prq_report(iommu, dev, req); 490 699 trace_prq_report(iommu, dev, req->qw_0, req->qw_1, 491 - req->priv_data[0], req->priv_data[1], 700 + req->qw_2, req->qw_3, 492 701 iommu->prq_seq_number++); 493 702 mutex_unlock(&iommu->iopf_lock); 494 703 prq_advance: ··· 527 736 struct intel_iommu *iommu = info->iommu; 528 737 u8 bus = info->bus, devfn = info->devfn; 529 738 struct iommu_fault_page_request *prm; 530 - bool private_present; 739 + struct qi_desc desc; 531 740 bool pasid_present; 532 741 bool last_page; 533 742 u16 sid; ··· 535 744 prm = &evt->fault.prm; 536 745 sid = PCI_DEVID(bus, devfn); 537 746 pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; 538 - private_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA; 539 747 last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE; 540 748 541 - /* 542 - * Per VT-d spec. v3.0 ch7.7, system software must respond 543 - * with page group response if private data is present (PDP) 544 - * or last page in group (LPIG) bit is set. This is an 545 - * additional VT-d requirement beyond PCI ATS spec. 546 - */ 547 - if (last_page || private_present) { 548 - struct qi_desc desc; 749 + desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) | 750 + QI_PGRP_PASID_P(pasid_present) | 751 + QI_PGRP_RESP_CODE(msg->code) | 752 + QI_PGRP_RESP_TYPE; 753 + desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page); 754 + desc.qw2 = 0; 755 + desc.qw3 = 0; 549 756 550 - desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) | 551 - QI_PGRP_PASID_P(pasid_present) | 552 - QI_PGRP_PDP(private_present) | 553 - QI_PGRP_RESP_CODE(msg->code) | 554 - QI_PGRP_RESP_TYPE; 555 - desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page); 556 - desc.qw2 = 0; 557 - desc.qw3 = 0; 558 - 559 - if (private_present) { 560 - desc.qw2 = prm->private_data[0]; 561 - desc.qw3 = prm->private_data[1]; 562 - } else if (prm->private_data[0]) { 563 - dmar_latency_update(iommu, DMAR_LATENCY_PRQ, 564 - ktime_to_ns(ktime_get()) - prm->private_data[0]); 565 - } 566 - 567 - qi_submit_sync(iommu, &desc, 1, 0); 568 - } 757 + qi_submit_sync(iommu, &desc, 1, 0); 569 758 } 570 759 571 760 static void intel_svm_domain_free(struct iommu_domain *domain) 572 761 { 573 - kfree(to_dmar_domain(domain)); 762 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 763 + 764 + /* dmar_domain free is deferred to the mmu free_notifier callback. */ 765 + mmu_notifier_put(&dmar_domain->notifier); 574 766 } 575 767 576 768 static const struct iommu_domain_ops intel_svm_domain_ops = { ··· 561 787 .free = intel_svm_domain_free 562 788 }; 563 789 564 - struct iommu_domain *intel_svm_domain_alloc(void) 790 + struct iommu_domain *intel_svm_domain_alloc(struct device *dev, 791 + struct mm_struct *mm) 565 792 { 566 793 struct dmar_domain *domain; 794 + int ret; 567 795 568 796 domain = kzalloc(sizeof(*domain), GFP_KERNEL); 569 797 if (!domain) 570 - return NULL; 798 + return ERR_PTR(-ENOMEM); 799 + 571 800 domain->domain.ops = &intel_svm_domain_ops; 801 + domain->use_first_level = true; 802 + INIT_LIST_HEAD(&domain->dev_pasids); 803 + INIT_LIST_HEAD(&domain->cache_tags); 804 + spin_lock_init(&domain->cache_lock); 805 + spin_lock_init(&domain->lock); 806 + 807 + domain->notifier.ops = &intel_mmuops; 808 + ret = mmu_notifier_register(&domain->notifier, mm); 809 + if (ret) { 810 + kfree(domain); 811 + return ERR_PTR(ret); 812 + } 572 813 573 814 return &domain->domain; 574 815 }
+97
drivers/iommu/intel/trace.h
··· 89 89 __entry->dw1, __entry->dw2, __entry->dw3) 90 90 ) 91 91 ); 92 + 93 + DECLARE_EVENT_CLASS(cache_tag_log, 94 + TP_PROTO(struct cache_tag *tag), 95 + TP_ARGS(tag), 96 + TP_STRUCT__entry( 97 + __string(iommu, tag->iommu->name) 98 + __string(dev, dev_name(tag->dev)) 99 + __field(u16, type) 100 + __field(u16, domain_id) 101 + __field(u32, pasid) 102 + __field(u32, users) 103 + ), 104 + TP_fast_assign( 105 + __assign_str(iommu, tag->iommu->name); 106 + __assign_str(dev, dev_name(tag->dev)); 107 + __entry->type = tag->type; 108 + __entry->domain_id = tag->domain_id; 109 + __entry->pasid = tag->pasid; 110 + __entry->users = tag->users; 111 + ), 112 + TP_printk("%s/%s type %s did %d pasid %d ref %d", 113 + __get_str(iommu), __get_str(dev), 114 + __print_symbolic(__entry->type, 115 + { CACHE_TAG_IOTLB, "iotlb" }, 116 + { CACHE_TAG_DEVTLB, "devtlb" }, 117 + { CACHE_TAG_NESTING_IOTLB, "nesting_iotlb" }, 118 + { CACHE_TAG_NESTING_DEVTLB, "nesting_devtlb" }), 119 + __entry->domain_id, __entry->pasid, __entry->users 120 + ) 121 + ); 122 + 123 + DEFINE_EVENT(cache_tag_log, cache_tag_assign, 124 + TP_PROTO(struct cache_tag *tag), 125 + TP_ARGS(tag) 126 + ); 127 + 128 + DEFINE_EVENT(cache_tag_log, cache_tag_unassign, 129 + TP_PROTO(struct cache_tag *tag), 130 + TP_ARGS(tag) 131 + ); 132 + 133 + DEFINE_EVENT(cache_tag_log, cache_tag_flush_all, 134 + TP_PROTO(struct cache_tag *tag), 135 + TP_ARGS(tag) 136 + ); 137 + 138 + DECLARE_EVENT_CLASS(cache_tag_flush, 139 + TP_PROTO(struct cache_tag *tag, unsigned long start, unsigned long end, 140 + unsigned long addr, unsigned long pages, unsigned long mask), 141 + TP_ARGS(tag, start, end, addr, pages, mask), 142 + TP_STRUCT__entry( 143 + __string(iommu, tag->iommu->name) 144 + __string(dev, dev_name(tag->dev)) 145 + __field(u16, type) 146 + __field(u16, domain_id) 147 + __field(u32, pasid) 148 + __field(unsigned long, start) 149 + __field(unsigned long, end) 150 + __field(unsigned long, addr) 151 + __field(unsigned long, pages) 152 + __field(unsigned long, mask) 153 + ), 154 + TP_fast_assign( 155 + __assign_str(iommu, tag->iommu->name); 156 + __assign_str(dev, dev_name(tag->dev)); 157 + __entry->type = tag->type; 158 + __entry->domain_id = tag->domain_id; 159 + __entry->pasid = tag->pasid; 160 + __entry->start = start; 161 + __entry->end = end; 162 + __entry->addr = addr; 163 + __entry->pages = pages; 164 + __entry->mask = mask; 165 + ), 166 + TP_printk("%s %s[%d] type %s did %d [0x%lx-0x%lx] addr 0x%lx pages 0x%lx mask 0x%lx", 167 + __get_str(iommu), __get_str(dev), __entry->pasid, 168 + __print_symbolic(__entry->type, 169 + { CACHE_TAG_IOTLB, "iotlb" }, 170 + { CACHE_TAG_DEVTLB, "devtlb" }, 171 + { CACHE_TAG_NESTING_IOTLB, "nesting_iotlb" }, 172 + { CACHE_TAG_NESTING_DEVTLB, "nesting_devtlb" }), 173 + __entry->domain_id, __entry->start, __entry->end, 174 + __entry->addr, __entry->pages, __entry->mask 175 + ) 176 + ); 177 + 178 + DEFINE_EVENT(cache_tag_flush, cache_tag_flush_range, 179 + TP_PROTO(struct cache_tag *tag, unsigned long start, unsigned long end, 180 + unsigned long addr, unsigned long pages, unsigned long mask), 181 + TP_ARGS(tag, start, end, addr, pages, mask) 182 + ); 183 + 184 + DEFINE_EVENT(cache_tag_flush, cache_tag_flush_range_np, 185 + TP_PROTO(struct cache_tag *tag, unsigned long start, unsigned long end, 186 + unsigned long addr, unsigned long pages, unsigned long mask), 187 + TP_ARGS(tag, start, end, addr, pages, mask) 188 + ); 92 189 #endif /* _TRACE_INTEL_IOMMU_H */ 93 190 94 191 /* This part must be outside protection */
+6 -9
drivers/iommu/io-pgtable-arm.c
··· 21 21 #include <asm/barrier.h> 22 22 23 23 #include "io-pgtable-arm.h" 24 + #include "iommu-pages.h" 24 25 25 26 #define ARM_LPAE_MAX_ADDR_BITS 52 26 27 #define ARM_LPAE_S2_MAX_CONCAT_PAGES 16 ··· 199 198 200 199 VM_BUG_ON((gfp & __GFP_HIGHMEM)); 201 200 202 - if (cfg->alloc) { 201 + if (cfg->alloc) 203 202 pages = cfg->alloc(cookie, size, gfp); 204 - } else { 205 - struct page *p; 206 - 207 - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); 208 - pages = p ? page_address(p) : NULL; 209 - } 203 + else 204 + pages = iommu_alloc_pages_node(dev_to_node(dev), gfp, order); 210 205 211 206 if (!pages) 212 207 return NULL; ··· 230 233 if (cfg->free) 231 234 cfg->free(cookie, pages, size); 232 235 else 233 - free_pages((unsigned long)pages, order); 236 + iommu_free_pages(pages, order); 234 237 235 238 return NULL; 236 239 } ··· 246 249 if (cfg->free) 247 250 cfg->free(cookie, pages, size); 248 251 else 249 - free_pages((unsigned long)pages, get_order(size)); 252 + iommu_free_pages(pages, get_order(size)); 250 253 } 251 254 252 255 static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
+14 -23
drivers/iommu/io-pgtable-dart.c
··· 23 23 #include <linux/types.h> 24 24 25 25 #include <asm/barrier.h> 26 + #include "iommu-pages.h" 26 27 27 28 #define DART1_MAX_ADDR_BITS 36 28 29 ··· 107 106 return paddr; 108 107 } 109 108 110 - static void *__dart_alloc_pages(size_t size, gfp_t gfp, 111 - struct io_pgtable_cfg *cfg) 109 + static void *__dart_alloc_pages(size_t size, gfp_t gfp) 112 110 { 113 111 int order = get_order(size); 114 - struct page *p; 115 112 116 113 VM_BUG_ON((gfp & __GFP_HIGHMEM)); 117 - p = alloc_pages(gfp | __GFP_ZERO, order); 118 - if (!p) 119 - return NULL; 120 - 121 - return page_address(p); 114 + return iommu_alloc_pages(gfp, order); 122 115 } 123 116 124 117 static int dart_init_pte(struct dart_io_pgtable *data, ··· 257 262 258 263 /* no L2 table present */ 259 264 if (!pte) { 260 - cptep = __dart_alloc_pages(tblsz, gfp, cfg); 265 + cptep = __dart_alloc_pages(tblsz, gfp); 261 266 if (!cptep) 262 267 return -ENOMEM; 263 268 264 269 pte = dart_install_table(cptep, ptep, 0, data); 265 270 if (pte) 266 - free_pages((unsigned long)cptep, get_order(tblsz)); 271 + iommu_free_pages(cptep, get_order(tblsz)); 267 272 268 273 /* L2 table is present (now) */ 269 274 pte = READ_ONCE(*ptep); ··· 414 419 cfg->apple_dart_cfg.n_ttbrs = 1 << data->tbl_bits; 415 420 416 421 for (i = 0; i < cfg->apple_dart_cfg.n_ttbrs; ++i) { 417 - data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL, 418 - cfg); 422 + data->pgd[i] = __dart_alloc_pages(DART_GRANULE(data), GFP_KERNEL); 419 423 if (!data->pgd[i]) 420 424 goto out_free_data; 421 425 cfg->apple_dart_cfg.ttbr[i] = virt_to_phys(data->pgd[i]); ··· 423 429 return &data->iop; 424 430 425 431 out_free_data: 426 - while (--i >= 0) 427 - free_pages((unsigned long)data->pgd[i], 428 - get_order(DART_GRANULE(data))); 432 + while (--i >= 0) { 433 + iommu_free_pages(data->pgd[i], 434 + get_order(DART_GRANULE(data))); 435 + } 429 436 kfree(data); 430 437 return NULL; 431 438 } ··· 434 439 static void apple_dart_free_pgtable(struct io_pgtable *iop) 435 440 { 436 441 struct dart_io_pgtable *data = io_pgtable_to_data(iop); 442 + int order = get_order(DART_GRANULE(data)); 437 443 dart_iopte *ptep, *end; 438 444 int i; 439 445 ··· 445 449 while (ptep != end) { 446 450 dart_iopte pte = *ptep++; 447 451 448 - if (pte) { 449 - unsigned long page = 450 - (unsigned long)iopte_deref(pte, data); 451 - 452 - free_pages(page, get_order(DART_GRANULE(data))); 453 - } 452 + if (pte) 453 + iommu_free_pages(iopte_deref(pte, data), order); 454 454 } 455 - free_pages((unsigned long)data->pgd[i], 456 - get_order(DART_GRANULE(data))); 455 + iommu_free_pages(data->pgd[i], order); 457 456 } 458 457 459 458 kfree(data);
+186
drivers/iommu/iommu-pages.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (c) 2024, Google LLC. 4 + * Pasha Tatashin <pasha.tatashin@soleen.com> 5 + */ 6 + 7 + #ifndef __IOMMU_PAGES_H 8 + #define __IOMMU_PAGES_H 9 + 10 + #include <linux/vmstat.h> 11 + #include <linux/gfp.h> 12 + #include <linux/mm.h> 13 + 14 + /* 15 + * All page allocations that should be reported to as "iommu-pagetables" to 16 + * userspace must use one of the functions below. This includes allocations of 17 + * page-tables and other per-iommu_domain configuration structures. 18 + * 19 + * This is necessary for the proper accounting as IOMMU state can be rather 20 + * large, i.e. multiple gigabytes in size. 21 + */ 22 + 23 + /** 24 + * __iommu_alloc_account - account for newly allocated page. 25 + * @page: head struct page of the page. 26 + * @order: order of the page 27 + */ 28 + static inline void __iommu_alloc_account(struct page *page, int order) 29 + { 30 + const long pgcnt = 1l << order; 31 + 32 + mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, pgcnt); 33 + mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, pgcnt); 34 + } 35 + 36 + /** 37 + * __iommu_free_account - account a page that is about to be freed. 38 + * @page: head struct page of the page. 39 + * @order: order of the page 40 + */ 41 + static inline void __iommu_free_account(struct page *page, int order) 42 + { 43 + const long pgcnt = 1l << order; 44 + 45 + mod_node_page_state(page_pgdat(page), NR_IOMMU_PAGES, -pgcnt); 46 + mod_lruvec_page_state(page, NR_SECONDARY_PAGETABLE, -pgcnt); 47 + } 48 + 49 + /** 50 + * __iommu_alloc_pages - allocate a zeroed page of a given order. 51 + * @gfp: buddy allocator flags 52 + * @order: page order 53 + * 54 + * returns the head struct page of the allocated page. 55 + */ 56 + static inline struct page *__iommu_alloc_pages(gfp_t gfp, int order) 57 + { 58 + struct page *page; 59 + 60 + page = alloc_pages(gfp | __GFP_ZERO, order); 61 + if (unlikely(!page)) 62 + return NULL; 63 + 64 + __iommu_alloc_account(page, order); 65 + 66 + return page; 67 + } 68 + 69 + /** 70 + * __iommu_free_pages - free page of a given order 71 + * @page: head struct page of the page 72 + * @order: page order 73 + */ 74 + static inline void __iommu_free_pages(struct page *page, int order) 75 + { 76 + if (!page) 77 + return; 78 + 79 + __iommu_free_account(page, order); 80 + __free_pages(page, order); 81 + } 82 + 83 + /** 84 + * iommu_alloc_pages_node - allocate a zeroed page of a given order from 85 + * specific NUMA node. 86 + * @nid: memory NUMA node id 87 + * @gfp: buddy allocator flags 88 + * @order: page order 89 + * 90 + * returns the virtual address of the allocated page 91 + */ 92 + static inline void *iommu_alloc_pages_node(int nid, gfp_t gfp, int order) 93 + { 94 + struct page *page = alloc_pages_node(nid, gfp | __GFP_ZERO, order); 95 + 96 + if (unlikely(!page)) 97 + return NULL; 98 + 99 + __iommu_alloc_account(page, order); 100 + 101 + return page_address(page); 102 + } 103 + 104 + /** 105 + * iommu_alloc_pages - allocate a zeroed page of a given order 106 + * @gfp: buddy allocator flags 107 + * @order: page order 108 + * 109 + * returns the virtual address of the allocated page 110 + */ 111 + static inline void *iommu_alloc_pages(gfp_t gfp, int order) 112 + { 113 + struct page *page = __iommu_alloc_pages(gfp, order); 114 + 115 + if (unlikely(!page)) 116 + return NULL; 117 + 118 + return page_address(page); 119 + } 120 + 121 + /** 122 + * iommu_alloc_page_node - allocate a zeroed page at specific NUMA node. 123 + * @nid: memory NUMA node id 124 + * @gfp: buddy allocator flags 125 + * 126 + * returns the virtual address of the allocated page 127 + */ 128 + static inline void *iommu_alloc_page_node(int nid, gfp_t gfp) 129 + { 130 + return iommu_alloc_pages_node(nid, gfp, 0); 131 + } 132 + 133 + /** 134 + * iommu_alloc_page - allocate a zeroed page 135 + * @gfp: buddy allocator flags 136 + * 137 + * returns the virtual address of the allocated page 138 + */ 139 + static inline void *iommu_alloc_page(gfp_t gfp) 140 + { 141 + return iommu_alloc_pages(gfp, 0); 142 + } 143 + 144 + /** 145 + * iommu_free_pages - free page of a given order 146 + * @virt: virtual address of the page to be freed. 147 + * @order: page order 148 + */ 149 + static inline void iommu_free_pages(void *virt, int order) 150 + { 151 + if (!virt) 152 + return; 153 + 154 + __iommu_free_pages(virt_to_page(virt), order); 155 + } 156 + 157 + /** 158 + * iommu_free_page - free page 159 + * @virt: virtual address of the page to be freed. 160 + */ 161 + static inline void iommu_free_page(void *virt) 162 + { 163 + iommu_free_pages(virt, 0); 164 + } 165 + 166 + /** 167 + * iommu_put_pages_list - free a list of pages. 168 + * @page: the head of the lru list to be freed. 169 + * 170 + * There are no locking requirement for these pages, as they are going to be 171 + * put on a free list as soon as refcount reaches 0. Pages are put on this LRU 172 + * list once they are removed from the IOMMU page tables. However, they can 173 + * still be access through debugfs. 174 + */ 175 + static inline void iommu_put_pages_list(struct list_head *page) 176 + { 177 + while (!list_empty(page)) { 178 + struct page *p = list_entry(page->prev, struct page, lru); 179 + 180 + list_del(&p->lru); 181 + __iommu_free_account(p, 0); 182 + put_page(p); 183 + } 184 + } 185 + 186 + #endif /* __IOMMU_PAGES_H */
+11 -5
drivers/iommu/iommu-sva.c
··· 108 108 109 109 /* Allocate a new domain and set it on device pasid. */ 110 110 domain = iommu_sva_domain_alloc(dev, mm); 111 - if (!domain) { 112 - ret = -ENOMEM; 111 + if (IS_ERR(domain)) { 112 + ret = PTR_ERR(domain); 113 113 goto out_free_handle; 114 114 } 115 115 ··· 283 283 const struct iommu_ops *ops = dev_iommu_ops(dev); 284 284 struct iommu_domain *domain; 285 285 286 - domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); 287 - if (!domain) 288 - return NULL; 286 + if (ops->domain_alloc_sva) { 287 + domain = ops->domain_alloc_sva(dev, mm); 288 + if (IS_ERR(domain)) 289 + return domain; 290 + } else { 291 + domain = ops->domain_alloc(IOMMU_DOMAIN_SVA); 292 + if (!domain) 293 + return ERR_PTR(-ENOMEM); 294 + } 289 295 290 296 domain->type = IOMMU_DOMAIN_SVA; 291 297 mmgrab(mm);
+26 -22
drivers/iommu/iommu.c
··· 581 581 if (list_empty(&group->entry)) 582 582 list_add_tail(&group->entry, group_list); 583 583 } 584 - mutex_unlock(&group->mutex); 585 584 586 - if (dev_is_pci(dev)) 587 - iommu_dma_set_pci_32bit_workaround(dev); 585 + if (group->default_domain) 586 + iommu_setup_dma_ops(dev); 587 + 588 + mutex_unlock(&group->mutex); 588 589 589 590 return 0; 590 591 ··· 1829 1828 mutex_unlock(&group->mutex); 1830 1829 return ret; 1831 1830 } 1831 + for_each_group_device(group, gdev) 1832 + iommu_setup_dma_ops(gdev->dev); 1832 1833 mutex_unlock(&group->mutex); 1833 1834 1834 1835 /* ··· 3069 3066 if (ret) 3070 3067 goto out_unlock; 3071 3068 3072 - /* 3073 - * Release the mutex here because ops->probe_finalize() call-back of 3074 - * some vendor IOMMU drivers calls arm_iommu_attach_device() which 3075 - * in-turn might call back into IOMMU core code, where it tries to take 3076 - * group->mutex, resulting in a deadlock. 3077 - */ 3078 - mutex_unlock(&group->mutex); 3079 - 3080 3069 /* Make sure dma_ops is appropriatley set */ 3081 3070 for_each_group_device(group, gdev) 3082 - iommu_group_do_probe_finalize(gdev->dev); 3083 - return count; 3071 + iommu_setup_dma_ops(gdev->dev); 3084 3072 3085 3073 out_unlock: 3086 3074 mutex_unlock(&group->mutex); ··· 3311 3317 static int __iommu_set_group_pasid(struct iommu_domain *domain, 3312 3318 struct iommu_group *group, ioasid_t pasid) 3313 3319 { 3314 - struct group_device *device; 3315 - int ret = 0; 3320 + struct group_device *device, *last_gdev; 3321 + int ret; 3316 3322 3317 3323 for_each_group_device(group, device) { 3318 3324 ret = domain->ops->set_dev_pasid(domain, device->dev, pasid); 3319 3325 if (ret) 3320 - break; 3326 + goto err_revert; 3321 3327 } 3322 3328 3329 + return 0; 3330 + 3331 + err_revert: 3332 + last_gdev = device; 3333 + for_each_group_device(group, device) { 3334 + const struct iommu_ops *ops = dev_iommu_ops(device->dev); 3335 + 3336 + if (device == last_gdev) 3337 + break; 3338 + ops->remove_dev_pasid(device->dev, pasid, domain); 3339 + } 3323 3340 return ret; 3324 3341 } 3325 3342 3326 3343 static void __iommu_remove_group_pasid(struct iommu_group *group, 3327 - ioasid_t pasid) 3344 + ioasid_t pasid, 3345 + struct iommu_domain *domain) 3328 3346 { 3329 3347 struct group_device *device; 3330 3348 const struct iommu_ops *ops; 3331 3349 3332 3350 for_each_group_device(group, device) { 3333 3351 ops = dev_iommu_ops(device->dev); 3334 - ops->remove_dev_pasid(device->dev, pasid); 3352 + ops->remove_dev_pasid(device->dev, pasid, domain); 3335 3353 } 3336 3354 } 3337 3355 ··· 3389 3383 } 3390 3384 3391 3385 ret = __iommu_set_group_pasid(domain, group, pasid); 3392 - if (ret) { 3393 - __iommu_remove_group_pasid(group, pasid); 3386 + if (ret) 3394 3387 xa_erase(&group->pasid_array, pasid); 3395 - } 3396 3388 out_unlock: 3397 3389 mutex_unlock(&group->mutex); 3398 3390 return ret; ··· 3413 3409 struct iommu_group *group = dev->iommu_group; 3414 3410 3415 3411 mutex_lock(&group->mutex); 3416 - __iommu_remove_group_pasid(group, pasid); 3412 + __iommu_remove_group_pasid(group, pasid, domain); 3417 3413 WARN_ON(xa_erase(&group->pasid_array, pasid) != domain); 3418 3414 mutex_unlock(&group->mutex); 3419 3415 }
+4 -1
drivers/iommu/irq_remapping.c
··· 154 154 if (!remap_ops->enable_faulting) 155 155 return -ENODEV; 156 156 157 - return remap_ops->enable_faulting(); 157 + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "dmar:enable_fault_handling", 158 + remap_ops->enable_faulting, NULL); 159 + 160 + return remap_ops->enable_faulting(smp_processor_id()); 158 161 } 159 162 160 163 void panic_if_irq_remap(const char *msg)
+1 -1
drivers/iommu/irq_remapping.h
··· 41 41 int (*reenable)(int); 42 42 43 43 /* Enable fault handling */ 44 - int (*enable_faulting)(void); 44 + int (*enable_faulting)(unsigned int); 45 45 }; 46 46 47 47 extern struct irq_remap_ops intel_irq_remap_ops;
+8 -6
drivers/iommu/rockchip-iommu.c
··· 26 26 #include <linux/slab.h> 27 27 #include <linux/spinlock.h> 28 28 29 + #include "iommu-pages.h" 30 + 29 31 /** MMU register offsets */ 30 32 #define RK_MMU_DTE_ADDR 0x00 /* Directory table address */ 31 33 #define RK_MMU_STATUS 0x04 ··· 729 727 if (rk_dte_is_pt_valid(dte)) 730 728 goto done; 731 729 732 - page_table = (u32 *)get_zeroed_page(GFP_ATOMIC | rk_ops->gfp_flags); 730 + page_table = iommu_alloc_page(GFP_ATOMIC | rk_ops->gfp_flags); 733 731 if (!page_table) 734 732 return ERR_PTR(-ENOMEM); 735 733 736 734 pt_dma = dma_map_single(dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE); 737 735 if (dma_mapping_error(dma_dev, pt_dma)) { 738 736 dev_err(dma_dev, "DMA mapping error while allocating page table\n"); 739 - free_page((unsigned long)page_table); 737 + iommu_free_page(page_table); 740 738 return ERR_PTR(-ENOMEM); 741 739 } 742 740 ··· 1063 1061 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries. 1064 1062 * Allocate one 4 KiB page for each table. 1065 1063 */ 1066 - rk_domain->dt = (u32 *)get_zeroed_page(GFP_KERNEL | rk_ops->gfp_flags); 1064 + rk_domain->dt = iommu_alloc_page(GFP_KERNEL | rk_ops->gfp_flags); 1067 1065 if (!rk_domain->dt) 1068 1066 goto err_free_domain; 1069 1067 ··· 1085 1083 return &rk_domain->domain; 1086 1084 1087 1085 err_free_dt: 1088 - free_page((unsigned long)rk_domain->dt); 1086 + iommu_free_page(rk_domain->dt); 1089 1087 err_free_domain: 1090 1088 kfree(rk_domain); 1091 1089 ··· 1106 1104 u32 *page_table = phys_to_virt(pt_phys); 1107 1105 dma_unmap_single(dma_dev, pt_phys, 1108 1106 SPAGE_SIZE, DMA_TO_DEVICE); 1109 - free_page((unsigned long)page_table); 1107 + iommu_free_page(page_table); 1110 1108 } 1111 1109 } 1112 1110 1113 1111 dma_unmap_single(dma_dev, rk_domain->dt_dma, 1114 1112 SPAGE_SIZE, DMA_TO_DEVICE); 1115 - free_page((unsigned long)rk_domain->dt); 1113 + iommu_free_page(rk_domain->dt); 1116 1114 1117 1115 kfree(rk_domain); 1118 1116 }
-6
drivers/iommu/s390-iommu.c
··· 695 695 return size; 696 696 } 697 697 698 - static void s390_iommu_probe_finalize(struct device *dev) 699 - { 700 - iommu_setup_dma_ops(dev, 0, U64_MAX); 701 - } 702 - 703 698 struct zpci_iommu_ctrs *zpci_get_iommu_ctrs(struct zpci_dev *zdev) 704 699 { 705 700 if (!zdev || !zdev->s390_domain) ··· 780 785 .capable = s390_iommu_capable, 781 786 .domain_alloc_paging = s390_domain_alloc_paging, 782 787 .probe_device = s390_iommu_probe_device, 783 - .probe_finalize = s390_iommu_probe_finalize, 784 788 .release_device = s390_iommu_release_device, 785 789 .device_group = generic_device_group, 786 790 .pgsize_bitmap = SZ_4K,
+4 -3
drivers/iommu/sun50i-iommu.c
··· 26 26 #include <linux/spinlock.h> 27 27 #include <linux/types.h> 28 28 29 + #include "iommu-pages.h" 30 + 29 31 #define IOMMU_RESET_REG 0x010 30 32 #define IOMMU_RESET_RELEASE_ALL 0xffffffff 31 33 #define IOMMU_ENABLE_REG 0x020 ··· 681 679 if (!sun50i_domain) 682 680 return NULL; 683 681 684 - sun50i_domain->dt = (u32 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 685 - get_order(DT_SIZE)); 682 + sun50i_domain->dt = iommu_alloc_pages(GFP_KERNEL, get_order(DT_SIZE)); 686 683 if (!sun50i_domain->dt) 687 684 goto err_free_domain; 688 685 ··· 703 702 { 704 703 struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain); 705 704 706 - free_pages((unsigned long)sun50i_domain->dt, get_order(DT_SIZE)); 705 + iommu_free_pages(sun50i_domain->dt, get_order(DT_SIZE)); 707 706 sun50i_domain->dt = NULL; 708 707 709 708 kfree(sun50i_domain);
+10 -8
drivers/iommu/tegra-smmu.c
··· 19 19 #include <soc/tegra/ahb.h> 20 20 #include <soc/tegra/mc.h> 21 21 22 + #include "iommu-pages.h" 23 + 22 24 struct tegra_smmu_group { 23 25 struct list_head list; 24 26 struct tegra_smmu *smmu; ··· 284 282 285 283 as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE; 286 284 287 - as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO); 285 + as->pd = __iommu_alloc_pages(GFP_KERNEL | __GFP_DMA, 0); 288 286 if (!as->pd) { 289 287 kfree(as); 290 288 return NULL; ··· 292 290 293 291 as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL); 294 292 if (!as->count) { 295 - __free_page(as->pd); 293 + __iommu_free_pages(as->pd, 0); 296 294 kfree(as); 297 295 return NULL; 298 296 } ··· 300 298 as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL); 301 299 if (!as->pts) { 302 300 kfree(as->count); 303 - __free_page(as->pd); 301 + __iommu_free_pages(as->pd, 0); 304 302 kfree(as); 305 303 return NULL; 306 304 } ··· 601 599 dma = dma_map_page(smmu->dev, page, 0, SMMU_SIZE_PT, 602 600 DMA_TO_DEVICE); 603 601 if (dma_mapping_error(smmu->dev, dma)) { 604 - __free_page(page); 602 + __iommu_free_pages(page, 0); 605 603 return NULL; 606 604 } 607 605 608 606 if (!smmu_dma_addr_valid(smmu, dma)) { 609 607 dma_unmap_page(smmu->dev, dma, SMMU_SIZE_PT, 610 608 DMA_TO_DEVICE); 611 - __free_page(page); 609 + __iommu_free_pages(page, 0); 612 610 return NULL; 613 611 } 614 612 ··· 651 649 tegra_smmu_set_pde(as, iova, 0); 652 650 653 651 dma_unmap_page(smmu->dev, pte_dma, SMMU_SIZE_PT, DMA_TO_DEVICE); 654 - __free_page(page); 652 + __iommu_free_pages(page, 0); 655 653 as->pts[pde] = NULL; 656 654 } 657 655 } ··· 690 688 if (gfpflags_allow_blocking(gfp)) 691 689 spin_unlock_irqrestore(&as->lock, *flags); 692 690 693 - page = alloc_page(gfp | __GFP_DMA | __GFP_ZERO); 691 + page = __iommu_alloc_pages(gfp | __GFP_DMA, 0); 694 692 695 693 if (gfpflags_allow_blocking(gfp)) 696 694 spin_lock_irqsave(&as->lock, *flags); ··· 702 700 */ 703 701 if (as->pts[pde]) { 704 702 if (page) 705 - __free_page(page); 703 + __iommu_free_pages(page, 0); 706 704 707 705 page = as->pts[pde]; 708 706 }
-10
drivers/iommu/virtio-iommu.c
··· 1025 1025 return ERR_PTR(ret); 1026 1026 } 1027 1027 1028 - static void viommu_probe_finalize(struct device *dev) 1029 - { 1030 - #ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS 1031 - /* First clear the DMA ops in case we're switching from a DMA domain */ 1032 - set_dma_ops(dev, NULL); 1033 - iommu_setup_dma_ops(dev, 0, U64_MAX); 1034 - #endif 1035 - } 1036 - 1037 1028 static void viommu_release_device(struct device *dev) 1038 1029 { 1039 1030 struct viommu_endpoint *vdev = dev_iommu_priv_get(dev); ··· 1064 1073 .capable = viommu_capable, 1065 1074 .domain_alloc = viommu_domain_alloc, 1066 1075 .probe_device = viommu_probe_device, 1067 - .probe_finalize = viommu_probe_finalize, 1068 1076 .release_device = viommu_release_device, 1069 1077 .device_group = viommu_device_group, 1070 1078 .get_resv_regions = viommu_get_resv_regions,
+7 -35
drivers/of/device.c
··· 95 95 { 96 96 const struct bus_dma_region *map = NULL; 97 97 struct device_node *bus_np; 98 - u64 dma_start = 0; 99 - u64 mask, end, size = 0; 98 + u64 mask, end = 0; 100 99 bool coherent; 101 100 int iommu_ret; 102 101 int ret; ··· 116 117 if (!force_dma) 117 118 return ret == -ENODEV ? 0 : ret; 118 119 } else { 119 - const struct bus_dma_region *r = map; 120 - u64 dma_end = 0; 121 - 122 120 /* Determine the overall bounds of all DMA regions */ 123 - for (dma_start = ~0; r->size; r++) { 124 - /* Take lower and upper limits */ 125 - if (r->dma_start < dma_start) 126 - dma_start = r->dma_start; 127 - if (r->dma_start + r->size > dma_end) 128 - dma_end = r->dma_start + r->size; 129 - } 130 - size = dma_end - dma_start; 131 - 132 - /* 133 - * Add a work around to treat the size as mask + 1 in case 134 - * it is defined in DT as a mask. 135 - */ 136 - if (size & 1) { 137 - dev_warn(dev, "Invalid size 0x%llx for dma-range(s)\n", 138 - size); 139 - size = size + 1; 140 - } 141 - 142 - if (!size) { 143 - dev_err(dev, "Adjusted size 0x%llx invalid\n", size); 144 - kfree(map); 145 - return -EINVAL; 146 - } 121 + end = dma_range_map_max(map); 147 122 } 148 123 149 124 /* ··· 131 158 dev->dma_mask = &dev->coherent_dma_mask; 132 159 } 133 160 134 - if (!size && dev->coherent_dma_mask) 135 - size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); 136 - else if (!size) 137 - size = 1ULL << 32; 161 + if (!end && dev->coherent_dma_mask) 162 + end = dev->coherent_dma_mask; 163 + else if (!end) 164 + end = (1ULL << 32) - 1; 138 165 139 166 /* 140 167 * Limit coherent and dma mask based on size and default mask 141 168 * set by the driver. 142 169 */ 143 - end = dma_start + size - 1; 144 170 mask = DMA_BIT_MASK(ilog2(end) + 1); 145 171 dev->coherent_dma_mask &= mask; 146 172 *dev->dma_mask &= mask; ··· 173 201 } else 174 202 dev_dbg(dev, "device is behind an iommu\n"); 175 203 176 - arch_setup_dma_ops(dev, dma_start, size, coherent); 204 + arch_setup_dma_ops(dev, coherent); 177 205 178 206 if (iommu_ret) 179 207 of_dma_set_restricted_buffer(dev, np);
+2 -2
include/linux/acpi_iort.h
··· 39 39 void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode, 40 40 struct list_head *head); 41 41 /* IOMMU interface */ 42 - int iort_dma_get_ranges(struct device *dev, u64 *size); 42 + int iort_dma_get_ranges(struct device *dev, u64 *limit); 43 43 int iort_iommu_configure_id(struct device *dev, const u32 *id_in); 44 44 void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head); 45 45 phys_addr_t acpi_iort_dma_get_max_cpu_address(void); ··· 55 55 static inline 56 56 void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode, struct list_head *head) { } 57 57 /* IOMMU interface */ 58 - static inline int iort_dma_get_ranges(struct device *dev, u64 *size) 58 + static inline int iort_dma_get_ranges(struct device *dev, u64 *limit) 59 59 { return -ENODEV; } 60 60 static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in) 61 61 { return -ENODEV; }
+18
include/linux/dma-direct.h
··· 54 54 return (phys_addr_t)-1; 55 55 } 56 56 57 + static inline dma_addr_t dma_range_map_min(const struct bus_dma_region *map) 58 + { 59 + dma_addr_t ret = (dma_addr_t)U64_MAX; 60 + 61 + for (; map->size; map++) 62 + ret = min(ret, map->dma_start); 63 + return ret; 64 + } 65 + 66 + static inline dma_addr_t dma_range_map_max(const struct bus_dma_region *map) 67 + { 68 + dma_addr_t ret = 0; 69 + 70 + for (; map->size; map++) 71 + ret = max(ret, map->dma_start + map->size - 1); 72 + return ret; 73 + } 74 + 57 75 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA 58 76 #include <asm/dma-direct.h> 59 77 #ifndef phys_to_dma_unencrypted
+2 -4
include/linux/dma-map-ops.h
··· 426 426 #endif 427 427 428 428 #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS 429 - void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 430 - bool coherent); 429 + void arch_setup_dma_ops(struct device *dev, bool coherent); 431 430 #else 432 - static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, 433 - u64 size, bool coherent) 431 + static inline void arch_setup_dma_ops(struct device *dev, bool coherent) 434 432 { 435 433 } 436 434 #endif /* CONFIG_ARCH_HAS_SETUP_DMA_OPS */
+1 -1
include/linux/dmar.h
··· 117 117 int count); 118 118 /* Intel IOMMU detection */ 119 119 void detect_intel_iommu(void); 120 - extern int enable_drhd_fault_handling(void); 120 + extern int enable_drhd_fault_handling(unsigned int cpu); 121 121 extern int dmar_device_add(acpi_handle handle); 122 122 extern int dmar_device_remove(acpi_handle handle); 123 123
+6 -10
include/linux/iommu.h
··· 69 69 struct iommu_fault_page_request { 70 70 #define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID (1 << 0) 71 71 #define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE (1 << 1) 72 - #define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA (1 << 2) 73 - #define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID (1 << 3) 72 + #define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID (1 << 2) 74 73 u32 flags; 75 74 u32 pasid; 76 75 u32 grpid; ··· 517 518 * Upon failure, ERR_PTR must be returned. 518 519 * @domain_alloc_paging: Allocate an iommu_domain that can be used for 519 520 * UNMANAGED, DMA, and DMA_FQ domain types. 521 + * @domain_alloc_sva: Allocate an iommu_domain for Shared Virtual Addressing. 520 522 * @probe_device: Add device to iommu driver handling 521 523 * @release_device: Remove device from iommu driver handling 522 524 * @probe_finalize: Do final setup work after the device is added to an IOMMU ··· 558 558 struct device *dev, u32 flags, struct iommu_domain *parent, 559 559 const struct iommu_user_data *user_data); 560 560 struct iommu_domain *(*domain_alloc_paging)(struct device *dev); 561 + struct iommu_domain *(*domain_alloc_sva)(struct device *dev, 562 + struct mm_struct *mm); 561 563 562 564 struct iommu_device *(*probe_device)(struct device *dev); 563 565 void (*release_device)(struct device *dev); ··· 580 578 struct iommu_page_response *msg); 581 579 582 580 int (*def_domain_type)(struct device *dev); 583 - void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid); 581 + void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid, 582 + struct iommu_domain *domain); 584 583 585 584 const struct iommu_domain_ops *default_domain_ops; 586 585 unsigned long pgsize_bitmap; ··· 1448 1445 #ifdef CONFIG_IOMMU_DMA 1449 1446 #include <linux/msi.h> 1450 1447 1451 - /* Setup call for arch DMA mapping code */ 1452 - void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); 1453 - 1454 1448 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); 1455 1449 1456 1450 int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); ··· 1457 1457 1458 1458 struct msi_desc; 1459 1459 struct msi_msg; 1460 - 1461 - static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) 1462 - { 1463 - } 1464 1460 1465 1461 static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) 1466 1462 {
+4 -1
include/linux/mmzone.h
··· 205 205 NR_KERNEL_SCS_KB, /* measured in KiB */ 206 206 #endif 207 207 NR_PAGETABLE, /* used for pagetables */ 208 - NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */ 208 + NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */ 209 + #ifdef CONFIG_IOMMU_SUPPORT 210 + NR_IOMMU_PAGES, /* # of pages allocated by IOMMU */ 211 + #endif 209 212 #ifdef CONFIG_SWAP 210 213 NR_SWAPCACHE, 211 214 #endif
+3
mm/vmstat.c
··· 1242 1242 #endif 1243 1243 "nr_page_table_pages", 1244 1244 "nr_sec_page_table_pages", 1245 + #ifdef CONFIG_IOMMU_SUPPORT 1246 + "nr_iommu_pages", 1247 + #endif 1245 1248 #ifdef CONFIG_SWAP 1246 1249 "nr_swapcached", 1247 1250 #endif