Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
"This brings three new iommufd capabilities:

- Dirty tracking for DMA.

AMD/ARM/Intel CPUs can now record if a DMA writes to a page in the
IOPTEs within the IO page table. This can be used to generate a
record of what memory is being dirtied by DMA activities during a
VM migration process. A VMM like qemu will combine the IOMMU dirty
bits with the CPU's dirty log to determine what memory to transfer.

VFIO already has a DMA dirty tracking framework that requires PCI
devices to implement tracking HW internally. The iommufd version
provides an alternative that the VMM can select, if available. The
two are designed to have very similar APIs.

- Userspace controlled attributes for hardware page tables
(HWPT/iommu_domain). There are currently a few generic attributes
for HWPTs (support dirty tracking, and parent of a nest). This is
an entry point for the userspace iommu driver to control the HW in
detail.

- Nested translation support for HWPTs. This is a 2D translation
scheme similar to the CPU where a DMA goes through a first stage to
determine an intermediate address which is then translated trough a
second stage to a physical address.

Like for CPU translation the first stage table would exist in VM
controlled memory and the second stage is in the kernel and matches
the VM's guest to physical map.

As every IOMMU has a unique set of parameter to describe the S1 IO
page table and its associated parameters the userspace IOMMU driver
has to marshal the information into the correct format.

This is 1/3 of the feature, it allows creating the nested
translation and binding it to VFIO devices, however the API to
support IOTLB and ATC invalidation of the stage 1 io page table,
and forwarding of IO faults are still in progress.

The series includes AMD and Intel support for dirty tracking. Intel
support for nested translation.

Along the way are a number of internal items:

- New iommu core items: ops->domain_alloc_user(),
ops->set_dirty_tracking, ops->read_and_clear_dirty(),
IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user

- UAF fix in iopt_area_split()

- Spelling fixes and some test suite improvement"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (52 commits)
iommufd: Organize the mock domain alloc functions closer to Joerg's tree
iommufd/selftest: Fix page-size check in iommufd_test_dirty()
iommufd: Add iopt_area_alloc()
iommufd: Fix missing update of domains_itree after splitting iopt_area
iommu/vt-d: Disallow read-only mappings to nest parent domain
iommu/vt-d: Add nested domain allocation
iommu/vt-d: Set the nested domain to a device
iommu/vt-d: Make domain attach helpers to be extern
iommu/vt-d: Add helper to setup pasid nested translation
iommu/vt-d: Add helper for nested domain allocation
iommu/vt-d: Extend dmar_domain to support nested domain
iommufd: Add data structure for Intel VT-d stage-1 domain allocation
iommu/vt-d: Enhance capability check for nested parent domain allocation
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs
iommufd/selftest: Add nested domain allocation for mock domain
iommu: Add iommu_copy_struct_from_user helper
iommufd: Add a nested HW pagetable object
iommu: Pass in parent domain with user_data to domain_alloc_user op
iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
...

+2723 -219
+4
drivers/iommu/Kconfig
··· 7 7 config IOMMU_API 8 8 bool 9 9 10 + config IOMMUFD_DRIVER 11 + bool 12 + default n 13 + 10 14 menuconfig IOMMU_SUPPORT 11 15 bool "IOMMU Hardware Support" 12 16 depends on MMU
+1
drivers/iommu/amd/Kconfig
··· 10 10 select IOMMU_API 11 11 select IOMMU_IOVA 12 12 select IOMMU_IO_PGTABLE 13 + select IOMMUFD_DRIVER if IOMMUFD 13 14 depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE 14 15 help 15 16 With this option you can enable support for AMD IOMMU hardware in
+12
drivers/iommu/amd/amd_iommu_types.h
··· 97 97 #define FEATURE_GATS_MASK (3ULL) 98 98 #define FEATURE_GAM_VAPIC BIT_ULL(21) 99 99 #define FEATURE_GIOSUP BIT_ULL(48) 100 + #define FEATURE_HASUP BIT_ULL(49) 100 101 #define FEATURE_EPHSUP BIT_ULL(50) 102 + #define FEATURE_HDSUP BIT_ULL(52) 101 103 #define FEATURE_SNP BIT_ULL(63) 102 104 103 105 #define FEATURE_PASID_SHIFT 32 ··· 214 212 /* macros and definitions for device table entries */ 215 213 #define DEV_ENTRY_VALID 0x00 216 214 #define DEV_ENTRY_TRANSLATION 0x01 215 + #define DEV_ENTRY_HAD 0x07 217 216 #define DEV_ENTRY_PPR 0x34 218 217 #define DEV_ENTRY_IR 0x3d 219 218 #define DEV_ENTRY_IW 0x3e ··· 374 371 (1ULL << (12 + (9 * (level)))) 375 372 376 373 /* 374 + * The IOPTE dirty bit 375 + */ 376 + #define IOMMU_PTE_HD_BIT (6) 377 + 378 + /* 377 379 * Bit value definition for I/O PTE fields 378 380 */ 379 381 #define IOMMU_PTE_PR BIT_ULL(0) 382 + #define IOMMU_PTE_HD BIT_ULL(IOMMU_PTE_HD_BIT) 380 383 #define IOMMU_PTE_U BIT_ULL(59) 381 384 #define IOMMU_PTE_FC BIT_ULL(60) 382 385 #define IOMMU_PTE_IR BIT_ULL(61) ··· 393 384 */ 394 385 #define DTE_FLAG_V BIT_ULL(0) 395 386 #define DTE_FLAG_TV BIT_ULL(1) 387 + #define DTE_FLAG_HAD (3ULL << 7) 396 388 #define DTE_FLAG_GIOV BIT_ULL(54) 397 389 #define DTE_FLAG_GV BIT_ULL(55) 398 390 #define DTE_GLX_SHIFT (56) ··· 423 413 424 414 #define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL) 425 415 #define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR) 416 + #define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD) 426 417 #define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK)) 427 418 #define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07) 428 419 ··· 574 563 int nid; /* Node ID */ 575 564 u64 *gcr3_tbl; /* Guest CR3 table */ 576 565 unsigned long flags; /* flags to find out type of domain */ 566 + bool dirty_tracking; /* dirty tracking is enabled in the domain */ 577 567 unsigned dev_cnt; /* devices assigned to this domain */ 578 568 unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */ 579 569 };
+68
drivers/iommu/amd/io_pgtable.c
··· 486 486 return (__pte & ~offset_mask) | (iova & offset_mask); 487 487 } 488 488 489 + static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size, 490 + unsigned long flags) 491 + { 492 + bool test_only = flags & IOMMU_DIRTY_NO_CLEAR; 493 + bool dirty = false; 494 + int i, count; 495 + 496 + /* 497 + * 2.2.3.2 Host Dirty Support 498 + * When a non-default page size is used , software must OR the 499 + * Dirty bits in all of the replicated host PTEs used to map 500 + * the page. The IOMMU does not guarantee the Dirty bits are 501 + * set in all of the replicated PTEs. Any portion of the page 502 + * may have been written even if the Dirty bit is set in only 503 + * one of the replicated PTEs. 504 + */ 505 + count = PAGE_SIZE_PTE_COUNT(size); 506 + for (i = 0; i < count && test_only; i++) { 507 + if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *)&ptep[i])) { 508 + dirty = true; 509 + break; 510 + } 511 + } 512 + 513 + for (i = 0; i < count && !test_only; i++) { 514 + if (test_and_clear_bit(IOMMU_PTE_HD_BIT, 515 + (unsigned long *)&ptep[i])) { 516 + dirty = true; 517 + } 518 + } 519 + 520 + return dirty; 521 + } 522 + 523 + static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops, 524 + unsigned long iova, size_t size, 525 + unsigned long flags, 526 + struct iommu_dirty_bitmap *dirty) 527 + { 528 + struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops); 529 + unsigned long end = iova + size - 1; 530 + 531 + do { 532 + unsigned long pgsize = 0; 533 + u64 *ptep, pte; 534 + 535 + ptep = fetch_pte(pgtable, iova, &pgsize); 536 + if (ptep) 537 + pte = READ_ONCE(*ptep); 538 + if (!ptep || !IOMMU_PTE_PRESENT(pte)) { 539 + pgsize = pgsize ?: PTE_LEVEL_PAGE_SIZE(0); 540 + iova += pgsize; 541 + continue; 542 + } 543 + 544 + /* 545 + * Mark the whole IOVA range as dirty even if only one of 546 + * the replicated PTEs were marked dirty. 547 + */ 548 + if (pte_test_and_clear_dirty(ptep, pgsize, flags)) 549 + iommu_dirty_bitmap_record(dirty, iova, pgsize); 550 + iova += pgsize; 551 + } while (iova < end); 552 + 553 + return 0; 554 + } 555 + 489 556 /* 490 557 * ---------------------------------------------------- 491 558 */ ··· 594 527 pgtable->iop.ops.map_pages = iommu_v1_map_pages; 595 528 pgtable->iop.ops.unmap_pages = iommu_v1_unmap_pages; 596 529 pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys; 530 + pgtable->iop.ops.read_and_clear_dirty = iommu_v1_read_and_clear_dirty; 597 531 598 532 return &pgtable->iop; 599 533 }
+144 -3
drivers/iommu/amd/iommu.c
··· 37 37 #include <asm/iommu.h> 38 38 #include <asm/gart.h> 39 39 #include <asm/dma.h> 40 + #include <uapi/linux/iommufd.h> 40 41 41 42 #include "amd_iommu.h" 42 43 #include "../dma-iommu.h" ··· 66 65 LIST_HEAD(acpihid_map); 67 66 68 67 const struct iommu_ops amd_iommu_ops; 68 + const struct iommu_dirty_ops amd_dirty_ops; 69 69 70 70 static ATOMIC_NOTIFIER_HEAD(ppr_notifier); 71 71 int amd_iommu_max_glx_val = -1; ··· 1612 1610 pte_root |= 1ULL << DEV_ENTRY_PPR; 1613 1611 } 1614 1612 1613 + if (domain->dirty_tracking) 1614 + pte_root |= DTE_FLAG_HAD; 1615 + 1615 1616 if (domain->flags & PD_IOMMUV2_MASK) { 1616 1617 u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl); 1617 1618 u64 glx = domain->glx; ··· 2160 2155 return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1); 2161 2156 } 2162 2157 2163 - static struct iommu_domain *amd_iommu_domain_alloc(unsigned type) 2158 + static bool amd_iommu_hd_support(struct amd_iommu *iommu) 2164 2159 { 2160 + return iommu && (iommu->features & FEATURE_HDSUP); 2161 + } 2162 + 2163 + static struct iommu_domain *do_iommu_domain_alloc(unsigned int type, 2164 + struct device *dev, u32 flags) 2165 + { 2166 + bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 2165 2167 struct protection_domain *domain; 2168 + struct amd_iommu *iommu = NULL; 2169 + 2170 + if (dev) { 2171 + iommu = rlookup_amd_iommu(dev); 2172 + if (!iommu) 2173 + return ERR_PTR(-ENODEV); 2174 + } 2166 2175 2167 2176 /* 2168 2177 * Since DTE[Mode]=0 is prohibited on SNP-enabled system, 2169 2178 * default to use IOMMU_DOMAIN_DMA[_FQ]. 2170 2179 */ 2171 2180 if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY)) 2172 - return NULL; 2181 + return ERR_PTR(-EINVAL); 2182 + 2183 + if (dirty_tracking && !amd_iommu_hd_support(iommu)) 2184 + return ERR_PTR(-EOPNOTSUPP); 2173 2185 2174 2186 domain = protection_domain_alloc(type); 2175 2187 if (!domain) 2176 - return NULL; 2188 + return ERR_PTR(-ENOMEM); 2177 2189 2178 2190 domain->domain.geometry.aperture_start = 0; 2179 2191 domain->domain.geometry.aperture_end = dma_max_address(); 2180 2192 domain->domain.geometry.force_aperture = true; 2181 2193 2194 + if (iommu) { 2195 + domain->domain.type = type; 2196 + domain->domain.pgsize_bitmap = iommu->iommu.ops->pgsize_bitmap; 2197 + domain->domain.ops = iommu->iommu.ops->default_domain_ops; 2198 + 2199 + if (dirty_tracking) 2200 + domain->domain.dirty_ops = &amd_dirty_ops; 2201 + } 2202 + 2182 2203 return &domain->domain; 2204 + } 2205 + 2206 + static struct iommu_domain *amd_iommu_domain_alloc(unsigned int type) 2207 + { 2208 + struct iommu_domain *domain; 2209 + 2210 + domain = do_iommu_domain_alloc(type, NULL, 0); 2211 + if (IS_ERR(domain)) 2212 + return NULL; 2213 + 2214 + return domain; 2215 + } 2216 + 2217 + static struct iommu_domain * 2218 + amd_iommu_domain_alloc_user(struct device *dev, u32 flags, 2219 + struct iommu_domain *parent, 2220 + const struct iommu_user_data *user_data) 2221 + 2222 + { 2223 + unsigned int type = IOMMU_DOMAIN_UNMANAGED; 2224 + 2225 + if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data) 2226 + return ERR_PTR(-EOPNOTSUPP); 2227 + 2228 + return do_iommu_domain_alloc(type, dev, flags); 2183 2229 } 2184 2230 2185 2231 static void amd_iommu_domain_free(struct iommu_domain *dom) ··· 2269 2213 return 0; 2270 2214 2271 2215 dev_data->defer_attach = false; 2216 + 2217 + /* 2218 + * Restrict to devices with compatible IOMMU hardware support 2219 + * when enforcement of dirty tracking is enabled. 2220 + */ 2221 + if (dom->dirty_ops && !amd_iommu_hd_support(iommu)) 2222 + return -EINVAL; 2272 2223 2273 2224 if (dev_data->domain) 2274 2225 detach_device(dev); ··· 2395 2332 return true; 2396 2333 case IOMMU_CAP_DEFERRED_FLUSH: 2397 2334 return true; 2335 + case IOMMU_CAP_DIRTY_TRACKING: { 2336 + struct amd_iommu *iommu = rlookup_amd_iommu(dev); 2337 + 2338 + return amd_iommu_hd_support(iommu); 2339 + } 2398 2340 default: 2399 2341 break; 2400 2342 } 2401 2343 2402 2344 return false; 2345 + } 2346 + 2347 + static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain, 2348 + bool enable) 2349 + { 2350 + struct protection_domain *pdomain = to_pdomain(domain); 2351 + struct dev_table_entry *dev_table; 2352 + struct iommu_dev_data *dev_data; 2353 + bool domain_flush = false; 2354 + struct amd_iommu *iommu; 2355 + unsigned long flags; 2356 + u64 pte_root; 2357 + 2358 + spin_lock_irqsave(&pdomain->lock, flags); 2359 + if (!(pdomain->dirty_tracking ^ enable)) { 2360 + spin_unlock_irqrestore(&pdomain->lock, flags); 2361 + return 0; 2362 + } 2363 + 2364 + list_for_each_entry(dev_data, &pdomain->dev_list, list) { 2365 + iommu = rlookup_amd_iommu(dev_data->dev); 2366 + if (!iommu) 2367 + continue; 2368 + 2369 + dev_table = get_dev_table(iommu); 2370 + pte_root = dev_table[dev_data->devid].data[0]; 2371 + 2372 + pte_root = (enable ? pte_root | DTE_FLAG_HAD : 2373 + pte_root & ~DTE_FLAG_HAD); 2374 + 2375 + /* Flush device DTE */ 2376 + dev_table[dev_data->devid].data[0] = pte_root; 2377 + device_flush_dte(dev_data); 2378 + domain_flush = true; 2379 + } 2380 + 2381 + /* Flush IOTLB to mark IOPTE dirty on the next translation(s) */ 2382 + if (domain_flush) { 2383 + amd_iommu_domain_flush_tlb_pde(pdomain); 2384 + amd_iommu_domain_flush_complete(pdomain); 2385 + } 2386 + pdomain->dirty_tracking = enable; 2387 + spin_unlock_irqrestore(&pdomain->lock, flags); 2388 + 2389 + return 0; 2390 + } 2391 + 2392 + static int amd_iommu_read_and_clear_dirty(struct iommu_domain *domain, 2393 + unsigned long iova, size_t size, 2394 + unsigned long flags, 2395 + struct iommu_dirty_bitmap *dirty) 2396 + { 2397 + struct protection_domain *pdomain = to_pdomain(domain); 2398 + struct io_pgtable_ops *ops = &pdomain->iop.iop.ops; 2399 + unsigned long lflags; 2400 + 2401 + if (!ops || !ops->read_and_clear_dirty) 2402 + return -EOPNOTSUPP; 2403 + 2404 + spin_lock_irqsave(&pdomain->lock, lflags); 2405 + if (!pdomain->dirty_tracking && dirty->bitmap) { 2406 + spin_unlock_irqrestore(&pdomain->lock, lflags); 2407 + return -EINVAL; 2408 + } 2409 + spin_unlock_irqrestore(&pdomain->lock, lflags); 2410 + 2411 + return ops->read_and_clear_dirty(ops, iova, size, flags, dirty); 2403 2412 } 2404 2413 2405 2414 static void amd_iommu_get_resv_regions(struct device *dev, ··· 2596 2461 return true; 2597 2462 } 2598 2463 2464 + const struct iommu_dirty_ops amd_dirty_ops = { 2465 + .set_dirty_tracking = amd_iommu_set_dirty_tracking, 2466 + .read_and_clear_dirty = amd_iommu_read_and_clear_dirty, 2467 + }; 2468 + 2599 2469 const struct iommu_ops amd_iommu_ops = { 2600 2470 .capable = amd_iommu_capable, 2601 2471 .domain_alloc = amd_iommu_domain_alloc, 2472 + .domain_alloc_user = amd_iommu_domain_alloc_user, 2602 2473 .probe_device = amd_iommu_probe_device, 2603 2474 .release_device = amd_iommu_release_device, 2604 2475 .probe_finalize = amd_iommu_probe_finalize,
+1
drivers/iommu/intel/Kconfig
··· 15 15 select DMA_OPS 16 16 select IOMMU_API 17 17 select IOMMU_IOVA 18 + select IOMMUFD_DRIVER if IOMMUFD 18 19 select NEED_DMA_MAP_STATE 19 20 select DMAR_TABLE 20 21 select SWIOTLB
+1 -1
drivers/iommu/intel/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_DMAR_TABLE) += dmar.o 3 - obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o 3 + obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o 4 4 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o 5 5 obj-$(CONFIG_DMAR_PERF) += perf.o 6 6 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
+147 -9
drivers/iommu/intel/iommu.c
··· 282 282 #define for_each_rmrr_units(rmrr) \ 283 283 list_for_each_entry(rmrr, &dmar_rmrr_units, list) 284 284 285 - static void device_block_translation(struct device *dev); 286 285 static void intel_iommu_domain_free(struct iommu_domain *domain); 287 286 288 287 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON); ··· 299 300 #define IDENTMAP_AZALIA 4 300 301 301 302 const struct iommu_ops intel_iommu_ops; 303 + const struct iommu_dirty_ops intel_dirty_ops; 302 304 303 305 static bool translation_pre_enabled(struct intel_iommu *iommu) 304 306 { ··· 560 560 } 561 561 562 562 /* Some capabilities may be different across iommus */ 563 - static void domain_update_iommu_cap(struct dmar_domain *domain) 563 + void domain_update_iommu_cap(struct dmar_domain *domain) 564 564 { 565 565 domain_update_iommu_coherency(domain); 566 566 domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL); ··· 1778 1778 return domain; 1779 1779 } 1780 1780 1781 - static int domain_attach_iommu(struct dmar_domain *domain, 1782 - struct intel_iommu *iommu) 1781 + int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu) 1783 1782 { 1784 1783 struct iommu_domain_info *info, *curr; 1785 1784 unsigned long ndomains; ··· 1827 1828 return ret; 1828 1829 } 1829 1830 1830 - static void domain_detach_iommu(struct dmar_domain *domain, 1831 - struct intel_iommu *iommu) 1831 + void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu) 1832 1832 { 1833 1833 struct iommu_domain_info *info; 1834 1834 ··· 2193 2195 2194 2196 if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0) 2195 2197 return -EINVAL; 2198 + 2199 + if (!(prot & DMA_PTE_WRITE) && domain->nested_parent) { 2200 + pr_err_ratelimited("Read-only mapping is disallowed on the domain which serves as the parent in a nested configuration, due to HW errata (ERRATA_772415_SPR17)\n"); 2201 + return -EINVAL; 2202 + } 2196 2203 2197 2204 attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP); 2198 2205 attr |= DMA_FL_PTE_PRESENT; ··· 3961 3958 * all DMA requests without PASID from the device are blocked. If the page 3962 3959 * table has been set, clean up the data structures. 3963 3960 */ 3964 - static void device_block_translation(struct device *dev) 3961 + void device_block_translation(struct device *dev) 3965 3962 { 3966 3963 struct device_domain_info *info = dev_iommu_priv_get(dev); 3967 3964 struct intel_iommu *iommu = info->iommu; ··· 4061 4058 return NULL; 4062 4059 } 4063 4060 4061 + static struct iommu_domain * 4062 + intel_iommu_domain_alloc_user(struct device *dev, u32 flags, 4063 + struct iommu_domain *parent, 4064 + const struct iommu_user_data *user_data) 4065 + { 4066 + struct device_domain_info *info = dev_iommu_priv_get(dev); 4067 + bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 4068 + bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT; 4069 + struct intel_iommu *iommu = info->iommu; 4070 + struct iommu_domain *domain; 4071 + 4072 + /* Must be NESTING domain */ 4073 + if (parent) { 4074 + if (!nested_supported(iommu) || flags) 4075 + return ERR_PTR(-EOPNOTSUPP); 4076 + return intel_nested_domain_alloc(parent, user_data); 4077 + } 4078 + 4079 + if (flags & 4080 + (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) 4081 + return ERR_PTR(-EOPNOTSUPP); 4082 + if (nested_parent && !nested_supported(iommu)) 4083 + return ERR_PTR(-EOPNOTSUPP); 4084 + if (user_data || (dirty_tracking && !ssads_supported(iommu))) 4085 + return ERR_PTR(-EOPNOTSUPP); 4086 + 4087 + /* 4088 + * domain_alloc_user op needs to fully initialize a domain before 4089 + * return, so uses iommu_domain_alloc() here for simple. 4090 + */ 4091 + domain = iommu_domain_alloc(dev->bus); 4092 + if (!domain) 4093 + return ERR_PTR(-ENOMEM); 4094 + 4095 + if (nested_parent) 4096 + to_dmar_domain(domain)->nested_parent = true; 4097 + 4098 + if (dirty_tracking) { 4099 + if (to_dmar_domain(domain)->use_first_level) { 4100 + iommu_domain_free(domain); 4101 + return ERR_PTR(-EOPNOTSUPP); 4102 + } 4103 + domain->dirty_ops = &intel_dirty_ops; 4104 + } 4105 + 4106 + return domain; 4107 + } 4108 + 4064 4109 static void intel_iommu_domain_free(struct iommu_domain *domain) 4065 4110 { 4066 4111 if (domain != &si_domain->domain && domain != &blocking_domain) 4067 4112 domain_exit(to_dmar_domain(domain)); 4068 4113 } 4069 4114 4070 - static int prepare_domain_attach_device(struct iommu_domain *domain, 4071 - struct device *dev) 4115 + int prepare_domain_attach_device(struct iommu_domain *domain, 4116 + struct device *dev) 4072 4117 { 4073 4118 struct dmar_domain *dmar_domain = to_dmar_domain(domain); 4074 4119 struct intel_iommu *iommu; ··· 4127 4076 return -ENODEV; 4128 4077 4129 4078 if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap)) 4079 + return -EINVAL; 4080 + 4081 + if (domain->dirty_ops && !ssads_supported(iommu)) 4130 4082 return -EINVAL; 4131 4083 4132 4084 /* check if this iommu agaw is sufficient for max mapped address */ ··· 4386 4332 return dmar_platform_optin(); 4387 4333 case IOMMU_CAP_ENFORCE_CACHE_COHERENCY: 4388 4334 return ecap_sc_support(info->iommu->ecap); 4335 + case IOMMU_CAP_DIRTY_TRACKING: 4336 + return ssads_supported(info->iommu); 4389 4337 default: 4390 4338 return false; 4391 4339 } ··· 4785 4729 if (!pasid_supported(iommu) || dev_is_real_dma_subdevice(dev)) 4786 4730 return -EOPNOTSUPP; 4787 4731 4732 + if (domain->dirty_ops) 4733 + return -EINVAL; 4734 + 4788 4735 if (context_copied(iommu, info->bus, info->devfn)) 4789 4736 return -EBUSY; 4790 4737 ··· 4839 4780 if (!vtd) 4840 4781 return ERR_PTR(-ENOMEM); 4841 4782 4783 + vtd->flags = IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17; 4842 4784 vtd->cap_reg = iommu->cap; 4843 4785 vtd->ecap_reg = iommu->ecap; 4844 4786 *length = sizeof(*vtd); ··· 4847 4787 return vtd; 4848 4788 } 4849 4789 4790 + static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain, 4791 + bool enable) 4792 + { 4793 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 4794 + struct device_domain_info *info; 4795 + int ret; 4796 + 4797 + spin_lock(&dmar_domain->lock); 4798 + if (dmar_domain->dirty_tracking == enable) 4799 + goto out_unlock; 4800 + 4801 + list_for_each_entry(info, &dmar_domain->devices, link) { 4802 + ret = intel_pasid_setup_dirty_tracking(info->iommu, 4803 + info->domain, info->dev, 4804 + IOMMU_NO_PASID, enable); 4805 + if (ret) 4806 + goto err_unwind; 4807 + } 4808 + 4809 + dmar_domain->dirty_tracking = enable; 4810 + out_unlock: 4811 + spin_unlock(&dmar_domain->lock); 4812 + 4813 + return 0; 4814 + 4815 + err_unwind: 4816 + list_for_each_entry(info, &dmar_domain->devices, link) 4817 + intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain, 4818 + info->dev, IOMMU_NO_PASID, 4819 + dmar_domain->dirty_tracking); 4820 + spin_unlock(&dmar_domain->lock); 4821 + return ret; 4822 + } 4823 + 4824 + static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain, 4825 + unsigned long iova, size_t size, 4826 + unsigned long flags, 4827 + struct iommu_dirty_bitmap *dirty) 4828 + { 4829 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 4830 + unsigned long end = iova + size - 1; 4831 + unsigned long pgsize; 4832 + 4833 + /* 4834 + * IOMMUFD core calls into a dirty tracking disabled domain without an 4835 + * IOVA bitmap set in order to clean dirty bits in all PTEs that might 4836 + * have occurred when we stopped dirty tracking. This ensures that we 4837 + * never inherit dirtied bits from a previous cycle. 4838 + */ 4839 + if (!dmar_domain->dirty_tracking && dirty->bitmap) 4840 + return -EINVAL; 4841 + 4842 + do { 4843 + struct dma_pte *pte; 4844 + int lvl = 0; 4845 + 4846 + pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl, 4847 + GFP_ATOMIC); 4848 + pgsize = level_size(lvl) << VTD_PAGE_SHIFT; 4849 + if (!pte || !dma_pte_present(pte)) { 4850 + iova += pgsize; 4851 + continue; 4852 + } 4853 + 4854 + if (dma_sl_pte_test_and_clear_dirty(pte, flags)) 4855 + iommu_dirty_bitmap_record(dirty, iova, pgsize); 4856 + iova += pgsize; 4857 + } while (iova < end); 4858 + 4859 + return 0; 4860 + } 4861 + 4862 + const struct iommu_dirty_ops intel_dirty_ops = { 4863 + .set_dirty_tracking = intel_iommu_set_dirty_tracking, 4864 + .read_and_clear_dirty = intel_iommu_read_and_clear_dirty, 4865 + }; 4866 + 4850 4867 const struct iommu_ops intel_iommu_ops = { 4851 4868 .capable = intel_iommu_capable, 4852 4869 .hw_info = intel_iommu_hw_info, 4853 4870 .domain_alloc = intel_iommu_domain_alloc, 4871 + .domain_alloc_user = intel_iommu_domain_alloc_user, 4854 4872 .probe_device = intel_iommu_probe_device, 4855 4873 .probe_finalize = intel_iommu_probe_finalize, 4856 4874 .release_device = intel_iommu_release_device,
+58 -6
drivers/iommu/intel/iommu.h
··· 25 25 26 26 #include <asm/cacheflush.h> 27 27 #include <asm/iommu.h> 28 + #include <uapi/linux/iommufd.h> 28 29 29 30 /* 30 31 * VT-d hardware uses 4KiB page size regardless of host page size. ··· 48 47 #define DMA_FL_PTE_ACCESS BIT_ULL(5) 49 48 #define DMA_FL_PTE_DIRTY BIT_ULL(6) 50 49 #define DMA_FL_PTE_XD BIT_ULL(63) 50 + 51 + #define DMA_SL_PTE_DIRTY_BIT 9 52 + #define DMA_SL_PTE_DIRTY BIT_ULL(DMA_SL_PTE_DIRTY_BIT) 51 53 52 54 #define ADDR_WIDTH_5LEVEL (57) 53 55 #define ADDR_WIDTH_4LEVEL (48) ··· 543 539 #define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap)) 544 540 #define pasid_supported(iommu) (sm_supported(iommu) && \ 545 541 ecap_pasid((iommu)->ecap)) 542 + #define ssads_supported(iommu) (sm_supported(iommu) && \ 543 + ecap_slads((iommu)->ecap)) 544 + #define nested_supported(iommu) (sm_supported(iommu) && \ 545 + ecap_nest((iommu)->ecap)) 546 546 547 547 struct pasid_entry; 548 548 struct pasid_state_entry; ··· 600 592 * otherwise, goes through the second 601 593 * level. 602 594 */ 595 + u8 dirty_tracking:1; /* Dirty tracking is enabled */ 596 + u8 nested_parent:1; /* Has other domains nested on it */ 603 597 604 598 spinlock_t lock; /* Protect device tracking lists */ 605 599 struct list_head devices; /* all devices' list */ 606 600 struct list_head dev_pasids; /* all attached pasids */ 607 601 608 - struct dma_pte *pgd; /* virtual address */ 609 - int gaw; /* max guest address width */ 610 - 611 - /* adjusted guest address width, 0 is level 2 30-bit */ 612 - int agaw; 613 602 int iommu_superpage;/* Level of superpages supported: 614 603 0 == 4KiB (no superpages), 1 == 2MiB, 615 604 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ 616 - u64 max_addr; /* maximum mapped address */ 605 + union { 606 + /* DMA remapping domain */ 607 + struct { 608 + /* virtual address */ 609 + struct dma_pte *pgd; 610 + /* max guest address width */ 611 + int gaw; 612 + /* 613 + * adjusted guest address width: 614 + * 0: level 2 30-bit 615 + * 1: level 3 39-bit 616 + * 2: level 4 48-bit 617 + * 3: level 5 57-bit 618 + */ 619 + int agaw; 620 + /* maximum mapped address */ 621 + u64 max_addr; 622 + }; 623 + 624 + /* Nested user domain */ 625 + struct { 626 + /* parent page table which the user domain is nested on */ 627 + struct dmar_domain *s2_domain; 628 + /* user page table pointer (in GPA) */ 629 + unsigned long s1_pgtbl; 630 + /* page table attributes */ 631 + struct iommu_hwpt_vtd_s1 s1_cfg; 632 + }; 633 + }; 617 634 618 635 struct iommu_domain domain; /* generic domain data structure for 619 636 iommu core */ ··· 814 781 return (pte->val & 3) != 0; 815 782 } 816 783 784 + static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte, 785 + unsigned long flags) 786 + { 787 + if (flags & IOMMU_DIRTY_NO_CLEAR) 788 + return (pte->val & DMA_SL_PTE_DIRTY) != 0; 789 + 790 + return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT, 791 + (unsigned long *)&pte->val); 792 + } 793 + 817 794 static inline bool dma_pte_superpage(struct dma_pte *pte) 818 795 { 819 796 return (pte->val & DMA_PTE_LARGE_PAGE); ··· 879 836 */ 880 837 #define QI_OPT_WAIT_DRAIN BIT(0) 881 838 839 + int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu); 840 + void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu); 841 + void device_block_translation(struct device *dev); 842 + int prepare_domain_attach_device(struct iommu_domain *domain, 843 + struct device *dev); 844 + void domain_update_iommu_cap(struct dmar_domain *domain); 845 + 882 846 int dmar_ir_support(void); 883 847 884 848 void *alloc_pgtable_page(int node, gfp_t gfp); 885 849 void free_pgtable_page(void *vaddr); 886 850 void iommu_flush_write_buffer(struct intel_iommu *iommu); 887 851 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn); 852 + struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, 853 + const struct iommu_user_data *user_data); 888 854 889 855 #ifdef CONFIG_INTEL_IOMMU_SVM 890 856 void intel_svm_check(struct intel_iommu *iommu);
+117
drivers/iommu/intel/nested.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * nested.c - nested mode translation support 4 + * 5 + * Copyright (C) 2023 Intel Corporation 6 + * 7 + * Author: Lu Baolu <baolu.lu@linux.intel.com> 8 + * Jacob Pan <jacob.jun.pan@linux.intel.com> 9 + * Yi Liu <yi.l.liu@intel.com> 10 + */ 11 + 12 + #define pr_fmt(fmt) "DMAR: " fmt 13 + 14 + #include <linux/iommu.h> 15 + #include <linux/pci.h> 16 + #include <linux/pci-ats.h> 17 + 18 + #include "iommu.h" 19 + #include "pasid.h" 20 + 21 + static int intel_nested_attach_dev(struct iommu_domain *domain, 22 + struct device *dev) 23 + { 24 + struct device_domain_info *info = dev_iommu_priv_get(dev); 25 + struct dmar_domain *dmar_domain = to_dmar_domain(domain); 26 + struct intel_iommu *iommu = info->iommu; 27 + unsigned long flags; 28 + int ret = 0; 29 + 30 + if (info->domain) 31 + device_block_translation(dev); 32 + 33 + if (iommu->agaw < dmar_domain->s2_domain->agaw) { 34 + dev_err_ratelimited(dev, "Adjusted guest address width not compatible\n"); 35 + return -ENODEV; 36 + } 37 + 38 + /* 39 + * Stage-1 domain cannot work alone, it is nested on a s2_domain. 40 + * The s2_domain will be used in nested translation, hence needs 41 + * to ensure the s2_domain is compatible with this IOMMU. 42 + */ 43 + ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev); 44 + if (ret) { 45 + dev_err_ratelimited(dev, "s2 domain is not compatible\n"); 46 + return ret; 47 + } 48 + 49 + ret = domain_attach_iommu(dmar_domain, iommu); 50 + if (ret) { 51 + dev_err_ratelimited(dev, "Failed to attach domain to iommu\n"); 52 + return ret; 53 + } 54 + 55 + ret = intel_pasid_setup_nested(iommu, dev, 56 + IOMMU_NO_PASID, dmar_domain); 57 + if (ret) { 58 + domain_detach_iommu(dmar_domain, iommu); 59 + dev_err_ratelimited(dev, "Failed to setup pasid entry\n"); 60 + return ret; 61 + } 62 + 63 + info->domain = dmar_domain; 64 + spin_lock_irqsave(&dmar_domain->lock, flags); 65 + list_add(&info->link, &dmar_domain->devices); 66 + spin_unlock_irqrestore(&dmar_domain->lock, flags); 67 + 68 + return 0; 69 + } 70 + 71 + static void intel_nested_domain_free(struct iommu_domain *domain) 72 + { 73 + kfree(to_dmar_domain(domain)); 74 + } 75 + 76 + static const struct iommu_domain_ops intel_nested_domain_ops = { 77 + .attach_dev = intel_nested_attach_dev, 78 + .free = intel_nested_domain_free, 79 + }; 80 + 81 + struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, 82 + const struct iommu_user_data *user_data) 83 + { 84 + struct dmar_domain *s2_domain = to_dmar_domain(parent); 85 + struct iommu_hwpt_vtd_s1 vtd; 86 + struct dmar_domain *domain; 87 + int ret; 88 + 89 + /* Must be nested domain */ 90 + if (user_data->type != IOMMU_HWPT_DATA_VTD_S1) 91 + return ERR_PTR(-EOPNOTSUPP); 92 + if (parent->ops != intel_iommu_ops.default_domain_ops || 93 + !s2_domain->nested_parent) 94 + return ERR_PTR(-EINVAL); 95 + 96 + ret = iommu_copy_struct_from_user(&vtd, user_data, 97 + IOMMU_HWPT_DATA_VTD_S1, __reserved); 98 + if (ret) 99 + return ERR_PTR(ret); 100 + 101 + domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT); 102 + if (!domain) 103 + return ERR_PTR(-ENOMEM); 104 + 105 + domain->use_first_level = true; 106 + domain->s2_domain = s2_domain; 107 + domain->s1_pgtbl = vtd.pgtbl_addr; 108 + domain->s1_cfg = vtd; 109 + domain->domain.ops = &intel_nested_domain_ops; 110 + domain->domain.type = IOMMU_DOMAIN_NESTED; 111 + INIT_LIST_HEAD(&domain->devices); 112 + INIT_LIST_HEAD(&domain->dev_pasids); 113 + spin_lock_init(&domain->lock); 114 + xa_init(&domain->iommu_array); 115 + 116 + return &domain->domain; 117 + }
+221
drivers/iommu/intel/pasid.c
··· 277 277 WRITE_ONCE(*ptr, (old & ~mask) | bits); 278 278 } 279 279 280 + static inline u64 pasid_get_bits(u64 *ptr) 281 + { 282 + return READ_ONCE(*ptr); 283 + } 284 + 280 285 /* 281 286 * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode 282 287 * PASID entry. ··· 338 333 static inline void pasid_set_fault_enable(struct pasid_entry *pe) 339 334 { 340 335 pasid_set_bits(&pe->val[0], 1 << 1, 0); 336 + } 337 + 338 + /* 339 + * Enable second level A/D bits by setting the SLADE (Second Level 340 + * Access Dirty Enable) field (Bit 9) of a scalable mode PASID 341 + * entry. 342 + */ 343 + static inline void pasid_set_ssade(struct pasid_entry *pe) 344 + { 345 + pasid_set_bits(&pe->val[0], 1 << 9, 1 << 9); 346 + } 347 + 348 + /* 349 + * Disable second level A/D bits by clearing the SLADE (Second Level 350 + * Access Dirty Enable) field (Bit 9) of a scalable mode PASID 351 + * entry. 352 + */ 353 + static inline void pasid_clear_ssade(struct pasid_entry *pe) 354 + { 355 + pasid_set_bits(&pe->val[0], 1 << 9, 0); 356 + } 357 + 358 + /* 359 + * Checks if second level A/D bits specifically the SLADE (Second Level 360 + * Access Dirty Enable) field (Bit 9) of a scalable mode PASID 361 + * entry is set. 362 + */ 363 + static inline bool pasid_get_ssade(struct pasid_entry *pe) 364 + { 365 + return pasid_get_bits(&pe->val[0]) & (1 << 9); 366 + } 367 + 368 + /* 369 + * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a 370 + * scalable mode PASID entry. 371 + */ 372 + static inline void pasid_set_sre(struct pasid_entry *pe) 373 + { 374 + pasid_set_bits(&pe->val[2], 1 << 0, 1); 341 375 } 342 376 343 377 /* ··· 444 400 pasid_set_flpm(struct pasid_entry *pe, u64 value) 445 401 { 446 402 pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2); 403 + } 404 + 405 + /* 406 + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135) 407 + * of a scalable mode PASID entry. 408 + */ 409 + static inline void pasid_set_eafe(struct pasid_entry *pe) 410 + { 411 + pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7); 447 412 } 448 413 449 414 static void ··· 680 627 pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY); 681 628 pasid_set_fault_enable(pte); 682 629 pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 630 + if (domain->dirty_tracking) 631 + pasid_set_ssade(pte); 683 632 684 633 pasid_set_present(pte); 685 634 spin_unlock(&iommu->lock); 686 635 687 636 pasid_flush_caches(iommu, pte, pasid, did); 637 + 638 + return 0; 639 + } 640 + 641 + /* 642 + * Set up dirty tracking on a second only or nested translation type. 643 + */ 644 + int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu, 645 + struct dmar_domain *domain, 646 + struct device *dev, u32 pasid, 647 + bool enabled) 648 + { 649 + struct pasid_entry *pte; 650 + u16 did, pgtt; 651 + 652 + spin_lock(&iommu->lock); 653 + 654 + pte = intel_pasid_get_entry(dev, pasid); 655 + if (!pte) { 656 + spin_unlock(&iommu->lock); 657 + dev_err_ratelimited( 658 + dev, "Failed to get pasid entry of PASID %d\n", pasid); 659 + return -ENODEV; 660 + } 661 + 662 + did = domain_id_iommu(domain, iommu); 663 + pgtt = pasid_pte_get_pgtt(pte); 664 + if (pgtt != PASID_ENTRY_PGTT_SL_ONLY && 665 + pgtt != PASID_ENTRY_PGTT_NESTED) { 666 + spin_unlock(&iommu->lock); 667 + dev_err_ratelimited( 668 + dev, 669 + "Dirty tracking not supported on translation type %d\n", 670 + pgtt); 671 + return -EOPNOTSUPP; 672 + } 673 + 674 + if (pasid_get_ssade(pte) == enabled) { 675 + spin_unlock(&iommu->lock); 676 + return 0; 677 + } 678 + 679 + if (enabled) 680 + pasid_set_ssade(pte); 681 + else 682 + pasid_clear_ssade(pte); 683 + spin_unlock(&iommu->lock); 684 + 685 + if (!ecap_coherent(iommu->ecap)) 686 + clflush_cache_range(pte, sizeof(*pte)); 687 + 688 + /* 689 + * From VT-d spec table 25 "Guidance to Software for Invalidations": 690 + * 691 + * - PASID-selective-within-Domain PASID-cache invalidation 692 + * If (PGTT=SS or Nested) 693 + * - Domain-selective IOTLB invalidation 694 + * Else 695 + * - PASID-selective PASID-based IOTLB invalidation 696 + * - If (pasid is RID_PASID) 697 + * - Global Device-TLB invalidation to affected functions 698 + * Else 699 + * - PASID-based Device-TLB invalidation (with S=1 and 700 + * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions 701 + */ 702 + pasid_cache_invalidation_with_pasid(iommu, did, pasid); 703 + 704 + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); 705 + 706 + /* Device IOTLB doesn't need to be flushed in caching mode. */ 707 + if (!cap_caching_mode(iommu->cap)) 708 + devtlb_invalidation_with_pasid(iommu, dev, pasid); 688 709 689 710 return 0; 690 711 } ··· 839 712 /* Device IOTLB doesn't need to be flushed in caching mode. */ 840 713 if (!cap_caching_mode(iommu->cap)) 841 714 devtlb_invalidation_with_pasid(iommu, dev, pasid); 715 + } 716 + 717 + /** 718 + * intel_pasid_setup_nested() - Set up PASID entry for nested translation. 719 + * @iommu: IOMMU which the device belong to 720 + * @dev: Device to be set up for translation 721 + * @pasid: PASID to be programmed in the device PASID table 722 + * @domain: User stage-1 domain nested on a stage-2 domain 723 + * 724 + * This is used for nested translation. The input domain should be 725 + * nested type and nested on a parent with 'is_nested_parent' flag 726 + * set. 727 + */ 728 + int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev, 729 + u32 pasid, struct dmar_domain *domain) 730 + { 731 + struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg; 732 + pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl; 733 + struct dmar_domain *s2_domain = domain->s2_domain; 734 + u16 did = domain_id_iommu(domain, iommu); 735 + struct dma_pte *pgd = s2_domain->pgd; 736 + struct pasid_entry *pte; 737 + 738 + /* Address width should match the address width supported by hardware */ 739 + switch (s1_cfg->addr_width) { 740 + case ADDR_WIDTH_4LEVEL: 741 + break; 742 + case ADDR_WIDTH_5LEVEL: 743 + if (!cap_fl5lp_support(iommu->cap)) { 744 + dev_err_ratelimited(dev, 745 + "5-level paging not supported\n"); 746 + return -EINVAL; 747 + } 748 + break; 749 + default: 750 + dev_err_ratelimited(dev, "Invalid stage-1 address width %d\n", 751 + s1_cfg->addr_width); 752 + return -EINVAL; 753 + } 754 + 755 + if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) { 756 + pr_err_ratelimited("No supervisor request support on %s\n", 757 + iommu->name); 758 + return -EINVAL; 759 + } 760 + 761 + if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) { 762 + pr_err_ratelimited("No extended access flag support on %s\n", 763 + iommu->name); 764 + return -EINVAL; 765 + } 766 + 767 + spin_lock(&iommu->lock); 768 + pte = intel_pasid_get_entry(dev, pasid); 769 + if (!pte) { 770 + spin_unlock(&iommu->lock); 771 + return -ENODEV; 772 + } 773 + if (pasid_pte_is_present(pte)) { 774 + spin_unlock(&iommu->lock); 775 + return -EBUSY; 776 + } 777 + 778 + pasid_clear_entry(pte); 779 + 780 + if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL) 781 + pasid_set_flpm(pte, 1); 782 + 783 + pasid_set_flptr(pte, (uintptr_t)s1_gpgd); 784 + 785 + if (s1_cfg->flags & IOMMU_VTD_S1_SRE) { 786 + pasid_set_sre(pte); 787 + if (s1_cfg->flags & IOMMU_VTD_S1_WPE) 788 + pasid_set_wpe(pte); 789 + } 790 + 791 + if (s1_cfg->flags & IOMMU_VTD_S1_EAFE) 792 + pasid_set_eafe(pte); 793 + 794 + if (s2_domain->force_snooping) 795 + pasid_set_pgsnp(pte); 796 + 797 + pasid_set_slptr(pte, virt_to_phys(pgd)); 798 + pasid_set_fault_enable(pte); 799 + pasid_set_domain_id(pte, did); 800 + pasid_set_address_width(pte, s2_domain->agaw); 801 + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap)); 802 + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED); 803 + pasid_set_present(pte); 804 + spin_unlock(&iommu->lock); 805 + 806 + pasid_flush_caches(iommu, pte, pasid, did); 807 + 808 + return 0; 842 809 }
+6
drivers/iommu/intel/pasid.h
··· 106 106 int intel_pasid_setup_second_level(struct intel_iommu *iommu, 107 107 struct dmar_domain *domain, 108 108 struct device *dev, u32 pasid); 109 + int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu, 110 + struct dmar_domain *domain, 111 + struct device *dev, u32 pasid, 112 + bool enabled); 109 113 int intel_pasid_setup_pass_through(struct intel_iommu *iommu, 110 114 struct dmar_domain *domain, 111 115 struct device *dev, u32 pasid); 116 + int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev, 117 + u32 pasid, struct dmar_domain *domain); 112 118 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, 113 119 struct device *dev, u32 pasid, 114 120 bool fault_ignore);
+1
drivers/iommu/iommufd/Makefile
··· 11 11 iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o 12 12 13 13 obj-$(CONFIG_IOMMUFD) += iommufd.o 14 + obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o
+114 -60
drivers/iommu/iommufd/device.c
··· 293 293 EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD); 294 294 295 295 static int iommufd_group_setup_msi(struct iommufd_group *igroup, 296 - struct iommufd_hw_pagetable *hwpt) 296 + struct iommufd_hwpt_paging *hwpt_paging) 297 297 { 298 298 phys_addr_t sw_msi_start = igroup->sw_msi_start; 299 299 int rc; ··· 311 311 * matches what the IRQ layer actually expects in a newly created 312 312 * domain. 313 313 */ 314 - if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) { 315 - rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start); 314 + if (sw_msi_start != PHYS_ADDR_MAX && !hwpt_paging->msi_cookie) { 315 + rc = iommu_get_msi_cookie(hwpt_paging->common.domain, 316 + sw_msi_start); 316 317 if (rc) 317 318 return rc; 318 319 ··· 321 320 * iommu_get_msi_cookie() can only be called once per domain, 322 321 * it returns -EBUSY on later calls. 323 322 */ 324 - hwpt->msi_cookie = true; 323 + hwpt_paging->msi_cookie = true; 324 + } 325 + return 0; 326 + } 327 + 328 + static int iommufd_hwpt_paging_attach(struct iommufd_hwpt_paging *hwpt_paging, 329 + struct iommufd_device *idev) 330 + { 331 + int rc; 332 + 333 + lockdep_assert_held(&idev->igroup->lock); 334 + 335 + rc = iopt_table_enforce_dev_resv_regions(&hwpt_paging->ioas->iopt, 336 + idev->dev, 337 + &idev->igroup->sw_msi_start); 338 + if (rc) 339 + return rc; 340 + 341 + if (list_empty(&idev->igroup->device_list)) { 342 + rc = iommufd_group_setup_msi(idev->igroup, hwpt_paging); 343 + if (rc) { 344 + iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, 345 + idev->dev); 346 + return rc; 347 + } 325 348 } 326 349 return 0; 327 350 } ··· 362 337 goto err_unlock; 363 338 } 364 339 365 - /* Try to upgrade the domain we have */ 366 - if (idev->enforce_cache_coherency) { 367 - rc = iommufd_hw_pagetable_enforce_cc(hwpt); 340 + if (hwpt_is_paging(hwpt)) { 341 + rc = iommufd_hwpt_paging_attach(to_hwpt_paging(hwpt), idev); 368 342 if (rc) 369 343 goto err_unlock; 370 344 } 371 - 372 - rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, 373 - &idev->igroup->sw_msi_start); 374 - if (rc) 375 - goto err_unlock; 376 345 377 346 /* 378 347 * Only attach to the group once for the first device that is in the ··· 376 357 * attachment. 377 358 */ 378 359 if (list_empty(&idev->igroup->device_list)) { 379 - rc = iommufd_group_setup_msi(idev->igroup, hwpt); 380 - if (rc) 381 - goto err_unresv; 382 - 383 360 rc = iommu_attach_group(hwpt->domain, idev->igroup->group); 384 361 if (rc) 385 362 goto err_unresv; ··· 386 371 mutex_unlock(&idev->igroup->lock); 387 372 return 0; 388 373 err_unresv: 389 - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); 374 + if (hwpt_is_paging(hwpt)) 375 + iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, 376 + idev->dev); 390 377 err_unlock: 391 378 mutex_unlock(&idev->igroup->lock); 392 379 return rc; ··· 405 388 iommu_detach_group(hwpt->domain, idev->igroup->group); 406 389 idev->igroup->hwpt = NULL; 407 390 } 408 - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); 391 + if (hwpt_is_paging(hwpt)) 392 + iopt_remove_reserved_iova(&to_hwpt_paging(hwpt)->ioas->iopt, 393 + idev->dev); 409 394 mutex_unlock(&idev->igroup->lock); 410 395 411 396 /* Caller must destroy hwpt */ ··· 426 407 return NULL; 427 408 } 428 409 410 + static void 411 + iommufd_group_remove_reserved_iova(struct iommufd_group *igroup, 412 + struct iommufd_hwpt_paging *hwpt_paging) 413 + { 414 + struct iommufd_device *cur; 415 + 416 + lockdep_assert_held(&igroup->lock); 417 + 418 + list_for_each_entry(cur, &igroup->device_list, group_item) 419 + iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, cur->dev); 420 + } 421 + 422 + static int 423 + iommufd_group_do_replace_paging(struct iommufd_group *igroup, 424 + struct iommufd_hwpt_paging *hwpt_paging) 425 + { 426 + struct iommufd_hw_pagetable *old_hwpt = igroup->hwpt; 427 + struct iommufd_device *cur; 428 + int rc; 429 + 430 + lockdep_assert_held(&igroup->lock); 431 + 432 + if (!hwpt_is_paging(old_hwpt) || 433 + hwpt_paging->ioas != to_hwpt_paging(old_hwpt)->ioas) { 434 + list_for_each_entry(cur, &igroup->device_list, group_item) { 435 + rc = iopt_table_enforce_dev_resv_regions( 436 + &hwpt_paging->ioas->iopt, cur->dev, NULL); 437 + if (rc) 438 + goto err_unresv; 439 + } 440 + } 441 + 442 + rc = iommufd_group_setup_msi(igroup, hwpt_paging); 443 + if (rc) 444 + goto err_unresv; 445 + return 0; 446 + 447 + err_unresv: 448 + iommufd_group_remove_reserved_iova(igroup, hwpt_paging); 449 + return rc; 450 + } 451 + 429 452 static struct iommufd_hw_pagetable * 430 453 iommufd_device_do_replace(struct iommufd_device *idev, 431 454 struct iommufd_hw_pagetable *hwpt) 432 455 { 433 456 struct iommufd_group *igroup = idev->igroup; 434 457 struct iommufd_hw_pagetable *old_hwpt; 435 - unsigned int num_devices = 0; 436 - struct iommufd_device *cur; 458 + unsigned int num_devices; 437 459 int rc; 438 460 439 461 mutex_lock(&idev->igroup->lock); ··· 489 429 return NULL; 490 430 } 491 431 492 - /* Try to upgrade the domain we have */ 493 - list_for_each_entry(cur, &igroup->device_list, group_item) { 494 - num_devices++; 495 - if (cur->enforce_cache_coherency) { 496 - rc = iommufd_hw_pagetable_enforce_cc(hwpt); 497 - if (rc) 498 - goto err_unlock; 499 - } 500 - } 501 - 502 432 old_hwpt = igroup->hwpt; 503 - if (hwpt->ioas != old_hwpt->ioas) { 504 - list_for_each_entry(cur, &igroup->device_list, group_item) { 505 - rc = iopt_table_enforce_dev_resv_regions( 506 - &hwpt->ioas->iopt, cur->dev, NULL); 507 - if (rc) 508 - goto err_unresv; 509 - } 433 + if (hwpt_is_paging(hwpt)) { 434 + rc = iommufd_group_do_replace_paging(igroup, 435 + to_hwpt_paging(hwpt)); 436 + if (rc) 437 + goto err_unlock; 510 438 } 511 - 512 - rc = iommufd_group_setup_msi(idev->igroup, hwpt); 513 - if (rc) 514 - goto err_unresv; 515 439 516 440 rc = iommu_group_replace_domain(igroup->group, hwpt->domain); 517 441 if (rc) 518 442 goto err_unresv; 519 443 520 - if (hwpt->ioas != old_hwpt->ioas) { 521 - list_for_each_entry(cur, &igroup->device_list, group_item) 522 - iopt_remove_reserved_iova(&old_hwpt->ioas->iopt, 523 - cur->dev); 524 - } 444 + if (hwpt_is_paging(old_hwpt) && 445 + (!hwpt_is_paging(hwpt) || 446 + to_hwpt_paging(hwpt)->ioas != to_hwpt_paging(old_hwpt)->ioas)) 447 + iommufd_group_remove_reserved_iova(igroup, 448 + to_hwpt_paging(old_hwpt)); 525 449 526 450 igroup->hwpt = hwpt; 527 451 452 + num_devices = list_count_nodes(&igroup->device_list); 528 453 /* 529 454 * Move the refcounts held by the device_list to the new hwpt. Retain a 530 455 * refcount for this thread as the caller will free it. ··· 523 478 /* Caller must destroy old_hwpt */ 524 479 return old_hwpt; 525 480 err_unresv: 526 - list_for_each_entry(cur, &igroup->device_list, group_item) 527 - iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev); 481 + if (hwpt_is_paging(hwpt)) 482 + iommufd_group_remove_reserved_iova(igroup, 483 + to_hwpt_paging(old_hwpt)); 528 484 err_unlock: 529 485 mutex_unlock(&idev->igroup->lock); 530 486 return ERR_PTR(rc); ··· 553 507 */ 554 508 bool immediate_attach = do_attach == iommufd_device_do_attach; 555 509 struct iommufd_hw_pagetable *destroy_hwpt; 510 + struct iommufd_hwpt_paging *hwpt_paging; 556 511 struct iommufd_hw_pagetable *hwpt; 557 512 558 513 /* ··· 562 515 * other. 563 516 */ 564 517 mutex_lock(&ioas->mutex); 565 - list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { 566 - if (!hwpt->auto_domain) 518 + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) { 519 + if (!hwpt_paging->auto_domain) 567 520 continue; 568 521 522 + hwpt = &hwpt_paging->common; 569 523 if (!iommufd_lock_obj(&hwpt->obj)) 570 524 continue; 571 525 destroy_hwpt = (*do_attach)(idev, hwpt); ··· 587 539 goto out_unlock; 588 540 } 589 541 590 - hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, 591 - immediate_attach); 592 - if (IS_ERR(hwpt)) { 593 - destroy_hwpt = ERR_CAST(hwpt); 542 + hwpt_paging = iommufd_hwpt_paging_alloc(idev->ictx, ioas, idev, 0, 543 + immediate_attach, NULL); 544 + if (IS_ERR(hwpt_paging)) { 545 + destroy_hwpt = ERR_CAST(hwpt_paging); 594 546 goto out_unlock; 595 547 } 548 + hwpt = &hwpt_paging->common; 596 549 597 550 if (!immediate_attach) { 598 551 destroy_hwpt = (*do_attach)(idev, hwpt); ··· 603 554 destroy_hwpt = NULL; 604 555 } 605 556 606 - hwpt->auto_domain = true; 557 + hwpt_paging->auto_domain = true; 607 558 *pt_id = hwpt->obj.id; 608 559 609 560 iommufd_object_finalize(idev->ictx, &hwpt->obj); ··· 628 579 return PTR_ERR(pt_obj); 629 580 630 581 switch (pt_obj->type) { 631 - case IOMMUFD_OBJ_HW_PAGETABLE: { 582 + case IOMMUFD_OBJ_HWPT_NESTED: 583 + case IOMMUFD_OBJ_HWPT_PAGING: { 632 584 struct iommufd_hw_pagetable *hwpt = 633 585 container_of(pt_obj, struct iommufd_hw_pagetable, obj); 634 586 ··· 667 617 /** 668 618 * iommufd_device_attach - Connect a device to an iommu_domain 669 619 * @idev: device to attach 670 - * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE 671 - * Output the IOMMUFD_OBJ_HW_PAGETABLE ID 620 + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING 621 + * Output the IOMMUFD_OBJ_HWPT_PAGING ID 672 622 * 673 623 * This connects the device to an iommu_domain, either automatically or manually 674 624 * selected. Once this completes the device could do DMA. ··· 696 646 /** 697 647 * iommufd_device_replace - Change the device's iommu_domain 698 648 * @idev: device to change 699 - * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE 700 - * Output the IOMMUFD_OBJ_HW_PAGETABLE ID 649 + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HWPT_PAGING 650 + * Output the IOMMUFD_OBJ_HWPT_PAGING ID 701 651 * 702 652 * This is the same as:: 703 653 * ··· 1234 1184 * the kernel capability is. It could be larger than the input buffer. 1235 1185 */ 1236 1186 cmd->data_len = data_len; 1187 + 1188 + cmd->out_capabilities = 0; 1189 + if (device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING)) 1190 + cmd->out_capabilities |= IOMMU_HW_CAP_DIRTY_TRACKING; 1237 1191 1238 1192 rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); 1239 1193 out_free:
+249 -55
drivers/iommu/iommufd/hw_pagetable.c
··· 5 5 #include <linux/iommu.h> 6 6 #include <uapi/linux/iommufd.h> 7 7 8 + #include "../iommu-priv.h" 8 9 #include "iommufd_private.h" 9 10 10 - void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) 11 + void iommufd_hwpt_paging_destroy(struct iommufd_object *obj) 11 12 { 12 - struct iommufd_hw_pagetable *hwpt = 13 - container_of(obj, struct iommufd_hw_pagetable, obj); 13 + struct iommufd_hwpt_paging *hwpt_paging = 14 + container_of(obj, struct iommufd_hwpt_paging, common.obj); 14 15 15 - if (!list_empty(&hwpt->hwpt_item)) { 16 - mutex_lock(&hwpt->ioas->mutex); 17 - list_del(&hwpt->hwpt_item); 18 - mutex_unlock(&hwpt->ioas->mutex); 16 + if (!list_empty(&hwpt_paging->hwpt_item)) { 17 + mutex_lock(&hwpt_paging->ioas->mutex); 18 + list_del(&hwpt_paging->hwpt_item); 19 + mutex_unlock(&hwpt_paging->ioas->mutex); 19 20 20 - iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain); 21 + iopt_table_remove_domain(&hwpt_paging->ioas->iopt, 22 + hwpt_paging->common.domain); 21 23 } 22 24 23 - if (hwpt->domain) 24 - iommu_domain_free(hwpt->domain); 25 + if (hwpt_paging->common.domain) 26 + iommu_domain_free(hwpt_paging->common.domain); 25 27 26 - refcount_dec(&hwpt->ioas->obj.users); 28 + refcount_dec(&hwpt_paging->ioas->obj.users); 27 29 } 28 30 29 - void iommufd_hw_pagetable_abort(struct iommufd_object *obj) 31 + void iommufd_hwpt_paging_abort(struct iommufd_object *obj) 30 32 { 31 - struct iommufd_hw_pagetable *hwpt = 32 - container_of(obj, struct iommufd_hw_pagetable, obj); 33 + struct iommufd_hwpt_paging *hwpt_paging = 34 + container_of(obj, struct iommufd_hwpt_paging, common.obj); 33 35 34 36 /* The ioas->mutex must be held until finalize is called. */ 35 - lockdep_assert_held(&hwpt->ioas->mutex); 37 + lockdep_assert_held(&hwpt_paging->ioas->mutex); 36 38 37 - if (!list_empty(&hwpt->hwpt_item)) { 38 - list_del_init(&hwpt->hwpt_item); 39 - iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain); 39 + if (!list_empty(&hwpt_paging->hwpt_item)) { 40 + list_del_init(&hwpt_paging->hwpt_item); 41 + iopt_table_remove_domain(&hwpt_paging->ioas->iopt, 42 + hwpt_paging->common.domain); 40 43 } 41 - iommufd_hw_pagetable_destroy(obj); 44 + iommufd_hwpt_paging_destroy(obj); 42 45 } 43 46 44 - int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) 47 + void iommufd_hwpt_nested_destroy(struct iommufd_object *obj) 45 48 { 46 - if (hwpt->enforce_cache_coherency) 49 + struct iommufd_hwpt_nested *hwpt_nested = 50 + container_of(obj, struct iommufd_hwpt_nested, common.obj); 51 + 52 + if (hwpt_nested->common.domain) 53 + iommu_domain_free(hwpt_nested->common.domain); 54 + 55 + refcount_dec(&hwpt_nested->parent->common.obj.users); 56 + } 57 + 58 + void iommufd_hwpt_nested_abort(struct iommufd_object *obj) 59 + { 60 + iommufd_hwpt_nested_destroy(obj); 61 + } 62 + 63 + static int 64 + iommufd_hwpt_paging_enforce_cc(struct iommufd_hwpt_paging *hwpt_paging) 65 + { 66 + struct iommu_domain *paging_domain = hwpt_paging->common.domain; 67 + 68 + if (hwpt_paging->enforce_cache_coherency) 47 69 return 0; 48 70 49 - if (hwpt->domain->ops->enforce_cache_coherency) 50 - hwpt->enforce_cache_coherency = 51 - hwpt->domain->ops->enforce_cache_coherency( 52 - hwpt->domain); 53 - if (!hwpt->enforce_cache_coherency) 71 + if (paging_domain->ops->enforce_cache_coherency) 72 + hwpt_paging->enforce_cache_coherency = 73 + paging_domain->ops->enforce_cache_coherency( 74 + paging_domain); 75 + if (!hwpt_paging->enforce_cache_coherency) 54 76 return -EINVAL; 55 77 return 0; 56 78 } 57 79 58 80 /** 59 - * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device 81 + * iommufd_hwpt_paging_alloc() - Get a PAGING iommu_domain for a device 60 82 * @ictx: iommufd context 61 83 * @ioas: IOAS to associate the domain with 62 84 * @idev: Device to get an iommu_domain for 85 + * @flags: Flags from userspace 63 86 * @immediate_attach: True if idev should be attached to the hwpt 87 + * @user_data: The user provided driver specific data describing the domain to 88 + * create 64 89 * 65 90 * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT 66 91 * will be linked to the given ioas and upon return the underlying iommu_domain ··· 95 70 * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on 96 71 * the returned hwpt. 97 72 */ 98 - struct iommufd_hw_pagetable * 99 - iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, 100 - struct iommufd_device *idev, bool immediate_attach) 73 + struct iommufd_hwpt_paging * 74 + iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, 75 + struct iommufd_device *idev, u32 flags, 76 + bool immediate_attach, 77 + const struct iommu_user_data *user_data) 101 78 { 79 + const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT | 80 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 81 + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); 82 + struct iommufd_hwpt_paging *hwpt_paging; 102 83 struct iommufd_hw_pagetable *hwpt; 103 84 int rc; 104 85 105 86 lockdep_assert_held(&ioas->mutex); 106 87 107 - hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); 108 - if (IS_ERR(hwpt)) 109 - return hwpt; 88 + if ((flags || user_data) && !ops->domain_alloc_user) 89 + return ERR_PTR(-EOPNOTSUPP); 90 + if (flags & ~valid_flags) 91 + return ERR_PTR(-EOPNOTSUPP); 110 92 111 - INIT_LIST_HEAD(&hwpt->hwpt_item); 93 + hwpt_paging = __iommufd_object_alloc( 94 + ictx, hwpt_paging, IOMMUFD_OBJ_HWPT_PAGING, common.obj); 95 + if (IS_ERR(hwpt_paging)) 96 + return ERR_CAST(hwpt_paging); 97 + hwpt = &hwpt_paging->common; 98 + 99 + INIT_LIST_HEAD(&hwpt_paging->hwpt_item); 112 100 /* Pairs with iommufd_hw_pagetable_destroy() */ 113 101 refcount_inc(&ioas->obj.users); 114 - hwpt->ioas = ioas; 102 + hwpt_paging->ioas = ioas; 103 + hwpt_paging->nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT; 115 104 116 - hwpt->domain = iommu_domain_alloc(idev->dev->bus); 117 - if (!hwpt->domain) { 118 - rc = -ENOMEM; 119 - goto out_abort; 105 + if (ops->domain_alloc_user) { 106 + hwpt->domain = ops->domain_alloc_user(idev->dev, flags, NULL, 107 + user_data); 108 + if (IS_ERR(hwpt->domain)) { 109 + rc = PTR_ERR(hwpt->domain); 110 + hwpt->domain = NULL; 111 + goto out_abort; 112 + } 113 + } else { 114 + hwpt->domain = iommu_domain_alloc(idev->dev->bus); 115 + if (!hwpt->domain) { 116 + rc = -ENOMEM; 117 + goto out_abort; 118 + } 120 119 } 121 120 122 121 /* ··· 149 100 * doing any maps. It is an iommu driver bug to report 150 101 * IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail enforce_cache_coherency on 151 102 * a new domain. 103 + * 104 + * The cache coherency mode must be configured here and unchanged later. 105 + * Note that a HWPT (non-CC) created for a device (non-CC) can be later 106 + * reused by another device (either non-CC or CC). However, A HWPT (CC) 107 + * created for a device (CC) cannot be reused by another device (non-CC) 108 + * but only devices (CC). Instead user space in this case would need to 109 + * allocate a separate HWPT (non-CC). 152 110 */ 153 111 if (idev->enforce_cache_coherency) { 154 - rc = iommufd_hw_pagetable_enforce_cc(hwpt); 112 + rc = iommufd_hwpt_paging_enforce_cc(hwpt_paging); 155 113 if (WARN_ON(rc)) 156 114 goto out_abort; 157 115 } ··· 175 119 goto out_abort; 176 120 } 177 121 178 - rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); 122 + rc = iopt_table_add_domain(&ioas->iopt, hwpt->domain); 179 123 if (rc) 180 124 goto out_detach; 181 - list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); 182 - return hwpt; 125 + list_add_tail(&hwpt_paging->hwpt_item, &ioas->hwpt_list); 126 + return hwpt_paging; 183 127 184 128 out_detach: 185 129 if (immediate_attach) ··· 189 133 return ERR_PTR(rc); 190 134 } 191 135 136 + /** 137 + * iommufd_hwpt_nested_alloc() - Get a NESTED iommu_domain for a device 138 + * @ictx: iommufd context 139 + * @parent: Parent PAGING-type hwpt to associate the domain with 140 + * @idev: Device to get an iommu_domain for 141 + * @flags: Flags from userspace 142 + * @user_data: user_data pointer. Must be valid 143 + * 144 + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and return it as 145 + * a NESTED hw_pagetable. The given parent PAGING-type hwpt must be capable of 146 + * being a parent. 147 + */ 148 + static struct iommufd_hwpt_nested * 149 + iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, 150 + struct iommufd_hwpt_paging *parent, 151 + struct iommufd_device *idev, u32 flags, 152 + const struct iommu_user_data *user_data) 153 + { 154 + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); 155 + struct iommufd_hwpt_nested *hwpt_nested; 156 + struct iommufd_hw_pagetable *hwpt; 157 + int rc; 158 + 159 + if (flags || !user_data->len || !ops->domain_alloc_user) 160 + return ERR_PTR(-EOPNOTSUPP); 161 + if (parent->auto_domain || !parent->nest_parent) 162 + return ERR_PTR(-EINVAL); 163 + 164 + hwpt_nested = __iommufd_object_alloc( 165 + ictx, hwpt_nested, IOMMUFD_OBJ_HWPT_NESTED, common.obj); 166 + if (IS_ERR(hwpt_nested)) 167 + return ERR_CAST(hwpt_nested); 168 + hwpt = &hwpt_nested->common; 169 + 170 + refcount_inc(&parent->common.obj.users); 171 + hwpt_nested->parent = parent; 172 + 173 + hwpt->domain = ops->domain_alloc_user(idev->dev, flags, 174 + parent->common.domain, user_data); 175 + if (IS_ERR(hwpt->domain)) { 176 + rc = PTR_ERR(hwpt->domain); 177 + hwpt->domain = NULL; 178 + goto out_abort; 179 + } 180 + 181 + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { 182 + rc = -EINVAL; 183 + goto out_abort; 184 + } 185 + return hwpt_nested; 186 + 187 + out_abort: 188 + iommufd_object_abort_and_destroy(ictx, &hwpt->obj); 189 + return ERR_PTR(rc); 190 + } 191 + 192 192 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) 193 193 { 194 194 struct iommu_hwpt_alloc *cmd = ucmd->cmd; 195 + const struct iommu_user_data user_data = { 196 + .type = cmd->data_type, 197 + .uptr = u64_to_user_ptr(cmd->data_uptr), 198 + .len = cmd->data_len, 199 + }; 195 200 struct iommufd_hw_pagetable *hwpt; 201 + struct iommufd_ioas *ioas = NULL; 202 + struct iommufd_object *pt_obj; 196 203 struct iommufd_device *idev; 197 - struct iommufd_ioas *ioas; 198 204 int rc; 199 205 200 - if (cmd->flags || cmd->__reserved) 206 + if (cmd->__reserved) 201 207 return -EOPNOTSUPP; 208 + if (cmd->data_type == IOMMU_HWPT_DATA_NONE && cmd->data_len) 209 + return -EINVAL; 202 210 203 211 idev = iommufd_get_device(ucmd, cmd->dev_id); 204 212 if (IS_ERR(idev)) 205 213 return PTR_ERR(idev); 206 214 207 - ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id); 208 - if (IS_ERR(ioas)) { 209 - rc = PTR_ERR(ioas); 215 + pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY); 216 + if (IS_ERR(pt_obj)) { 217 + rc = -EINVAL; 210 218 goto out_put_idev; 211 219 } 212 220 213 - mutex_lock(&ioas->mutex); 214 - hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false); 215 - if (IS_ERR(hwpt)) { 216 - rc = PTR_ERR(hwpt); 217 - goto out_unlock; 221 + if (pt_obj->type == IOMMUFD_OBJ_IOAS) { 222 + struct iommufd_hwpt_paging *hwpt_paging; 223 + 224 + ioas = container_of(pt_obj, struct iommufd_ioas, obj); 225 + mutex_lock(&ioas->mutex); 226 + hwpt_paging = iommufd_hwpt_paging_alloc( 227 + ucmd->ictx, ioas, idev, cmd->flags, false, 228 + user_data.len ? &user_data : NULL); 229 + if (IS_ERR(hwpt_paging)) { 230 + rc = PTR_ERR(hwpt_paging); 231 + goto out_unlock; 232 + } 233 + hwpt = &hwpt_paging->common; 234 + } else if (pt_obj->type == IOMMUFD_OBJ_HWPT_PAGING) { 235 + struct iommufd_hwpt_nested *hwpt_nested; 236 + 237 + hwpt_nested = iommufd_hwpt_nested_alloc( 238 + ucmd->ictx, 239 + container_of(pt_obj, struct iommufd_hwpt_paging, 240 + common.obj), 241 + idev, cmd->flags, &user_data); 242 + if (IS_ERR(hwpt_nested)) { 243 + rc = PTR_ERR(hwpt_nested); 244 + goto out_unlock; 245 + } 246 + hwpt = &hwpt_nested->common; 247 + } else { 248 + rc = -EINVAL; 249 + goto out_put_pt; 218 250 } 219 251 220 252 cmd->out_hwpt_id = hwpt->obj.id; ··· 315 171 out_hwpt: 316 172 iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); 317 173 out_unlock: 318 - mutex_unlock(&ioas->mutex); 319 - iommufd_put_object(&ioas->obj); 174 + if (ioas) 175 + mutex_unlock(&ioas->mutex); 176 + out_put_pt: 177 + iommufd_put_object(pt_obj); 320 178 out_put_idev: 321 179 iommufd_put_object(&idev->obj); 180 + return rc; 181 + } 182 + 183 + int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd) 184 + { 185 + struct iommu_hwpt_set_dirty_tracking *cmd = ucmd->cmd; 186 + struct iommufd_hwpt_paging *hwpt_paging; 187 + struct iommufd_ioas *ioas; 188 + int rc = -EOPNOTSUPP; 189 + bool enable; 190 + 191 + if (cmd->flags & ~IOMMU_HWPT_DIRTY_TRACKING_ENABLE) 192 + return rc; 193 + 194 + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); 195 + if (IS_ERR(hwpt_paging)) 196 + return PTR_ERR(hwpt_paging); 197 + 198 + ioas = hwpt_paging->ioas; 199 + enable = cmd->flags & IOMMU_HWPT_DIRTY_TRACKING_ENABLE; 200 + 201 + rc = iopt_set_dirty_tracking(&ioas->iopt, hwpt_paging->common.domain, 202 + enable); 203 + 204 + iommufd_put_object(&hwpt_paging->common.obj); 205 + return rc; 206 + } 207 + 208 + int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd) 209 + { 210 + struct iommu_hwpt_get_dirty_bitmap *cmd = ucmd->cmd; 211 + struct iommufd_hwpt_paging *hwpt_paging; 212 + struct iommufd_ioas *ioas; 213 + int rc = -EOPNOTSUPP; 214 + 215 + if ((cmd->flags & ~(IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR)) || 216 + cmd->__reserved) 217 + return -EOPNOTSUPP; 218 + 219 + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); 220 + if (IS_ERR(hwpt_paging)) 221 + return PTR_ERR(hwpt_paging); 222 + 223 + ioas = hwpt_paging->ioas; 224 + rc = iopt_read_and_clear_dirty_data( 225 + &ioas->iopt, hwpt_paging->common.domain, cmd->flags, cmd); 226 + 227 + iommufd_put_object(&hwpt_paging->common.obj); 322 228 return rc; 323 229 }
+197 -3
drivers/iommu/iommufd/io_pagetable.c
··· 15 15 #include <linux/err.h> 16 16 #include <linux/slab.h> 17 17 #include <linux/errno.h> 18 + #include <uapi/linux/iommufd.h> 18 19 19 20 #include "io_pagetable.h" 20 21 #include "double_span.h" ··· 222 221 return 0; 223 222 } 224 223 224 + static struct iopt_area *iopt_area_alloc(void) 225 + { 226 + struct iopt_area *area; 227 + 228 + area = kzalloc(sizeof(*area), GFP_KERNEL_ACCOUNT); 229 + if (!area) 230 + return NULL; 231 + RB_CLEAR_NODE(&area->node.rb); 232 + RB_CLEAR_NODE(&area->pages_node.rb); 233 + return area; 234 + } 235 + 225 236 static int iopt_alloc_area_pages(struct io_pagetable *iopt, 226 237 struct list_head *pages_list, 227 238 unsigned long length, unsigned long *dst_iova, ··· 244 231 int rc = 0; 245 232 246 233 list_for_each_entry(elm, pages_list, next) { 247 - elm->area = kzalloc(sizeof(*elm->area), GFP_KERNEL_ACCOUNT); 234 + elm->area = iopt_area_alloc(); 248 235 if (!elm->area) 249 236 return -ENOMEM; 250 237 } ··· 423 410 return rc; 424 411 } 425 412 return 0; 413 + } 414 + 415 + struct iova_bitmap_fn_arg { 416 + unsigned long flags; 417 + struct io_pagetable *iopt; 418 + struct iommu_domain *domain; 419 + struct iommu_dirty_bitmap *dirty; 420 + }; 421 + 422 + static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap, 423 + unsigned long iova, size_t length, 424 + void *opaque) 425 + { 426 + struct iopt_area *area; 427 + struct iopt_area_contig_iter iter; 428 + struct iova_bitmap_fn_arg *arg = opaque; 429 + struct iommu_domain *domain = arg->domain; 430 + struct iommu_dirty_bitmap *dirty = arg->dirty; 431 + const struct iommu_dirty_ops *ops = domain->dirty_ops; 432 + unsigned long last_iova = iova + length - 1; 433 + unsigned long flags = arg->flags; 434 + int ret; 435 + 436 + iopt_for_each_contig_area(&iter, area, arg->iopt, iova, last_iova) { 437 + unsigned long last = min(last_iova, iopt_area_last_iova(area)); 438 + 439 + ret = ops->read_and_clear_dirty(domain, iter.cur_iova, 440 + last - iter.cur_iova + 1, flags, 441 + dirty); 442 + if (ret) 443 + return ret; 444 + } 445 + 446 + if (!iopt_area_contig_done(&iter)) 447 + return -EINVAL; 448 + return 0; 449 + } 450 + 451 + static int 452 + iommu_read_and_clear_dirty(struct iommu_domain *domain, 453 + struct io_pagetable *iopt, unsigned long flags, 454 + struct iommu_hwpt_get_dirty_bitmap *bitmap) 455 + { 456 + const struct iommu_dirty_ops *ops = domain->dirty_ops; 457 + struct iommu_iotlb_gather gather; 458 + struct iommu_dirty_bitmap dirty; 459 + struct iova_bitmap_fn_arg arg; 460 + struct iova_bitmap *iter; 461 + int ret = 0; 462 + 463 + if (!ops || !ops->read_and_clear_dirty) 464 + return -EOPNOTSUPP; 465 + 466 + iter = iova_bitmap_alloc(bitmap->iova, bitmap->length, 467 + bitmap->page_size, 468 + u64_to_user_ptr(bitmap->data)); 469 + if (IS_ERR(iter)) 470 + return -ENOMEM; 471 + 472 + iommu_dirty_bitmap_init(&dirty, iter, &gather); 473 + 474 + arg.flags = flags; 475 + arg.iopt = iopt; 476 + arg.domain = domain; 477 + arg.dirty = &dirty; 478 + iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty); 479 + 480 + if (!(flags & IOMMU_DIRTY_NO_CLEAR)) 481 + iommu_iotlb_sync(domain, &gather); 482 + 483 + iova_bitmap_free(iter); 484 + 485 + return ret; 486 + } 487 + 488 + int iommufd_check_iova_range(struct io_pagetable *iopt, 489 + struct iommu_hwpt_get_dirty_bitmap *bitmap) 490 + { 491 + size_t iommu_pgsize = iopt->iova_alignment; 492 + u64 last_iova; 493 + 494 + if (check_add_overflow(bitmap->iova, bitmap->length - 1, &last_iova)) 495 + return -EOVERFLOW; 496 + 497 + if (bitmap->iova > ULONG_MAX || last_iova > ULONG_MAX) 498 + return -EOVERFLOW; 499 + 500 + if ((bitmap->iova & (iommu_pgsize - 1)) || 501 + ((last_iova + 1) & (iommu_pgsize - 1))) 502 + return -EINVAL; 503 + 504 + if (!bitmap->page_size) 505 + return -EINVAL; 506 + 507 + if ((bitmap->iova & (bitmap->page_size - 1)) || 508 + ((last_iova + 1) & (bitmap->page_size - 1))) 509 + return -EINVAL; 510 + 511 + return 0; 512 + } 513 + 514 + int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt, 515 + struct iommu_domain *domain, 516 + unsigned long flags, 517 + struct iommu_hwpt_get_dirty_bitmap *bitmap) 518 + { 519 + int ret; 520 + 521 + ret = iommufd_check_iova_range(iopt, bitmap); 522 + if (ret) 523 + return ret; 524 + 525 + down_read(&iopt->iova_rwsem); 526 + ret = iommu_read_and_clear_dirty(domain, iopt, flags, bitmap); 527 + up_read(&iopt->iova_rwsem); 528 + 529 + return ret; 530 + } 531 + 532 + static int iopt_clear_dirty_data(struct io_pagetable *iopt, 533 + struct iommu_domain *domain) 534 + { 535 + const struct iommu_dirty_ops *ops = domain->dirty_ops; 536 + struct iommu_iotlb_gather gather; 537 + struct iommu_dirty_bitmap dirty; 538 + struct iopt_area *area; 539 + int ret = 0; 540 + 541 + lockdep_assert_held_read(&iopt->iova_rwsem); 542 + 543 + iommu_dirty_bitmap_init(&dirty, NULL, &gather); 544 + 545 + for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; 546 + area = iopt_area_iter_next(area, 0, ULONG_MAX)) { 547 + if (!area->pages) 548 + continue; 549 + 550 + ret = ops->read_and_clear_dirty(domain, iopt_area_iova(area), 551 + iopt_area_length(area), 0, 552 + &dirty); 553 + if (ret) 554 + break; 555 + } 556 + 557 + iommu_iotlb_sync(domain, &gather); 558 + return ret; 559 + } 560 + 561 + int iopt_set_dirty_tracking(struct io_pagetable *iopt, 562 + struct iommu_domain *domain, bool enable) 563 + { 564 + const struct iommu_dirty_ops *ops = domain->dirty_ops; 565 + int ret = 0; 566 + 567 + if (!ops) 568 + return -EOPNOTSUPP; 569 + 570 + down_read(&iopt->iova_rwsem); 571 + 572 + /* Clear dirty bits from PTEs to ensure a clean snapshot */ 573 + if (enable) { 574 + ret = iopt_clear_dirty_data(iopt, domain); 575 + if (ret) 576 + goto out_unlock; 577 + } 578 + 579 + ret = ops->set_dirty_tracking(domain, enable); 580 + 581 + out_unlock: 582 + up_read(&iopt->iova_rwsem); 583 + return ret; 426 584 } 427 585 428 586 int iopt_get_pages(struct io_pagetable *iopt, unsigned long iova, ··· 1189 1005 iopt_area_start_byte(area, new_start) & (alignment - 1)) 1190 1006 return -EINVAL; 1191 1007 1192 - lhs = kzalloc(sizeof(*area), GFP_KERNEL_ACCOUNT); 1008 + lhs = iopt_area_alloc(); 1193 1009 if (!lhs) 1194 1010 return -ENOMEM; 1195 1011 1196 - rhs = kzalloc(sizeof(*area), GFP_KERNEL_ACCOUNT); 1012 + rhs = iopt_area_alloc(); 1197 1013 if (!rhs) { 1198 1014 rc = -ENOMEM; 1199 1015 goto err_free_lhs; ··· 1231 1047 last_iova - new_start + 1, area->iommu_prot); 1232 1048 if (WARN_ON(rc)) 1233 1049 goto err_remove_lhs; 1050 + 1051 + /* 1052 + * If the original area has filled a domain, domains_itree has to be 1053 + * updated. 1054 + */ 1055 + if (area->storage_domain) { 1056 + interval_tree_remove(&area->pages_node, &pages->domains_itree); 1057 + interval_tree_insert(&lhs->pages_node, &pages->domains_itree); 1058 + interval_tree_insert(&rhs->pages_node, &pages->domains_itree); 1059 + } 1234 1060 1235 1061 lhs->storage_domain = area->storage_domain; 1236 1062 lhs->pages = area->pages;
+70 -14
drivers/iommu/iommufd/iommufd_private.h
··· 8 8 #include <linux/xarray.h> 9 9 #include <linux/refcount.h> 10 10 #include <linux/uaccess.h> 11 + #include <linux/iommu.h> 12 + #include <linux/iova_bitmap.h> 13 + #include <uapi/linux/iommufd.h> 11 14 12 15 struct iommu_domain; 13 16 struct iommu_group; ··· 73 70 unsigned long length, unsigned long *unmapped); 74 71 int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped); 75 72 73 + int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt, 74 + struct iommu_domain *domain, 75 + unsigned long flags, 76 + struct iommu_hwpt_get_dirty_bitmap *bitmap); 77 + int iopt_set_dirty_tracking(struct io_pagetable *iopt, 78 + struct iommu_domain *domain, bool enable); 79 + 76 80 void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova, 77 81 unsigned long length); 78 82 int iopt_table_add_domain(struct io_pagetable *iopt, ··· 123 113 IOMMUFD_OBJ_NONE, 124 114 IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, 125 115 IOMMUFD_OBJ_DEVICE, 126 - IOMMUFD_OBJ_HW_PAGETABLE, 116 + IOMMUFD_OBJ_HWPT_PAGING, 117 + IOMMUFD_OBJ_HWPT_NESTED, 127 118 IOMMUFD_OBJ_IOAS, 128 119 IOMMUFD_OBJ_ACCESS, 129 120 #ifdef CONFIG_IOMMUFD_TEST ··· 182 171 size_t size, 183 172 enum iommufd_object_type type); 184 173 185 - #define iommufd_object_alloc(ictx, ptr, type) \ 174 + #define __iommufd_object_alloc(ictx, ptr, type, obj) \ 186 175 container_of(_iommufd_object_alloc( \ 187 176 ictx, \ 188 177 sizeof(*(ptr)) + BUILD_BUG_ON_ZERO( \ ··· 190 179 obj) != 0), \ 191 180 type), \ 192 181 typeof(*(ptr)), obj) 182 + 183 + #define iommufd_object_alloc(ictx, ptr, type) \ 184 + __iommufd_object_alloc(ictx, ptr, type, obj) 193 185 194 186 /* 195 187 * The IO Address Space (IOAS) pagetable is a virtual page table backed by the ··· 236 222 struct iommufd_ctx *ictx); 237 223 238 224 int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); 225 + int iommufd_check_iova_range(struct io_pagetable *iopt, 226 + struct iommu_hwpt_get_dirty_bitmap *bitmap); 239 227 240 228 /* 241 229 * A HW pagetable is called an iommu_domain inside the kernel. This user object ··· 247 231 */ 248 232 struct iommufd_hw_pagetable { 249 233 struct iommufd_object obj; 250 - struct iommufd_ioas *ioas; 251 234 struct iommu_domain *domain; 235 + }; 236 + 237 + struct iommufd_hwpt_paging { 238 + struct iommufd_hw_pagetable common; 239 + struct iommufd_ioas *ioas; 252 240 bool auto_domain : 1; 253 241 bool enforce_cache_coherency : 1; 254 242 bool msi_cookie : 1; 243 + bool nest_parent : 1; 255 244 /* Head at iommufd_ioas::hwpt_list */ 256 245 struct list_head hwpt_item; 257 246 }; 258 247 259 - struct iommufd_hw_pagetable * 260 - iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, 261 - struct iommufd_device *idev, bool immediate_attach); 262 - int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt); 248 + struct iommufd_hwpt_nested { 249 + struct iommufd_hw_pagetable common; 250 + struct iommufd_hwpt_paging *parent; 251 + }; 252 + 253 + static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) 254 + { 255 + return hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING; 256 + } 257 + 258 + static inline struct iommufd_hwpt_paging * 259 + to_hwpt_paging(struct iommufd_hw_pagetable *hwpt) 260 + { 261 + return container_of(hwpt, struct iommufd_hwpt_paging, common); 262 + } 263 + 264 + static inline struct iommufd_hwpt_paging * 265 + iommufd_get_hwpt_paging(struct iommufd_ucmd *ucmd, u32 id) 266 + { 267 + return container_of(iommufd_get_object(ucmd->ictx, id, 268 + IOMMUFD_OBJ_HWPT_PAGING), 269 + struct iommufd_hwpt_paging, common.obj); 270 + } 271 + int iommufd_hwpt_set_dirty_tracking(struct iommufd_ucmd *ucmd); 272 + int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd); 273 + 274 + struct iommufd_hwpt_paging * 275 + iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, 276 + struct iommufd_device *idev, u32 flags, 277 + bool immediate_attach, 278 + const struct iommu_user_data *user_data); 263 279 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, 264 280 struct iommufd_device *idev); 265 281 struct iommufd_hw_pagetable * 266 282 iommufd_hw_pagetable_detach(struct iommufd_device *idev); 267 - void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); 268 - void iommufd_hw_pagetable_abort(struct iommufd_object *obj); 283 + void iommufd_hwpt_paging_destroy(struct iommufd_object *obj); 284 + void iommufd_hwpt_paging_abort(struct iommufd_object *obj); 285 + void iommufd_hwpt_nested_destroy(struct iommufd_object *obj); 286 + void iommufd_hwpt_nested_abort(struct iommufd_object *obj); 269 287 int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd); 270 288 271 289 static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, 272 290 struct iommufd_hw_pagetable *hwpt) 273 291 { 274 - lockdep_assert_not_held(&hwpt->ioas->mutex); 275 - if (hwpt->auto_domain) 276 - iommufd_object_deref_user(ictx, &hwpt->obj); 277 - else 278 - refcount_dec(&hwpt->obj.users); 292 + if (hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING) { 293 + struct iommufd_hwpt_paging *hwpt_paging = to_hwpt_paging(hwpt); 294 + 295 + lockdep_assert_not_held(&hwpt_paging->ioas->mutex); 296 + 297 + if (hwpt_paging->auto_domain) { 298 + iommufd_object_deref_user(ictx, &hwpt->obj); 299 + return; 300 + } 301 + } 302 + refcount_dec(&hwpt->obj.users); 279 303 } 280 304 281 305 struct iommufd_group {
+39
drivers/iommu/iommufd/iommufd_test.h
··· 19 19 IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT, 20 20 IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE, 21 21 IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, 22 + IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, 23 + IOMMU_TEST_OP_DIRTY, 22 24 }; 23 25 24 26 enum { ··· 42 40 MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0, 43 41 }; 44 42 43 + enum { 44 + MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0, 45 + }; 46 + 47 + enum { 48 + MOCK_NESTED_DOMAIN_IOTLB_ID_MAX = 3, 49 + MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, 50 + }; 51 + 45 52 struct iommu_test_cmd { 46 53 __u32 size; 47 54 __u32 op; ··· 67 56 /* out_idev_id is the standard iommufd_bind object */ 68 57 __u32 out_idev_id; 69 58 } mock_domain; 59 + struct { 60 + __u32 out_stdev_id; 61 + __u32 out_hwpt_id; 62 + __u32 out_idev_id; 63 + /* Expand mock_domain to set mock device flags */ 64 + __u32 dev_flags; 65 + } mock_domain_flags; 70 66 struct { 71 67 __u32 pt_id; 72 68 } mock_domain_replace; ··· 113 95 struct { 114 96 __u32 ioas_id; 115 97 } access_replace_ioas; 98 + struct { 99 + __u32 flags; 100 + __aligned_u64 iova; 101 + __aligned_u64 length; 102 + __aligned_u64 page_size; 103 + __aligned_u64 uptr; 104 + __aligned_u64 out_nr_dirty; 105 + } dirty; 116 106 }; 117 107 __u32 last; 118 108 }; ··· 133 107 struct iommu_test_hw_info { 134 108 __u32 flags; 135 109 __u32 test_reg; 110 + }; 111 + 112 + /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ 113 + #define IOMMU_HWPT_DATA_SELFTEST 0xdead 114 + #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef 115 + 116 + /** 117 + * struct iommu_hwpt_selftest 118 + * 119 + * @iotlb: default mock iotlb value, IOMMU_TEST_IOTLB_DEFAULT 120 + */ 121 + struct iommu_hwpt_selftest { 122 + __u32 iotlb; 136 123 }; 137 124 138 125 #endif
+14 -3
drivers/iommu/iommufd/main.c
··· 307 307 struct iommu_destroy destroy; 308 308 struct iommu_hw_info info; 309 309 struct iommu_hwpt_alloc hwpt; 310 + struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap; 311 + struct iommu_hwpt_set_dirty_tracking set_dirty_tracking; 310 312 struct iommu_ioas_alloc alloc; 311 313 struct iommu_ioas_allow_iovas allow_iovas; 312 314 struct iommu_ioas_copy ioas_copy; ··· 344 342 __reserved), 345 343 IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, 346 344 __reserved), 345 + IOCTL_OP(IOMMU_HWPT_GET_DIRTY_BITMAP, iommufd_hwpt_get_dirty_bitmap, 346 + struct iommu_hwpt_get_dirty_bitmap, data), 347 + IOCTL_OP(IOMMU_HWPT_SET_DIRTY_TRACKING, iommufd_hwpt_set_dirty_tracking, 348 + struct iommu_hwpt_set_dirty_tracking, __reserved), 347 349 IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, 348 350 struct iommu_ioas_alloc, out_ioas_id), 349 351 IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, ··· 488 482 [IOMMUFD_OBJ_IOAS] = { 489 483 .destroy = iommufd_ioas_destroy, 490 484 }, 491 - [IOMMUFD_OBJ_HW_PAGETABLE] = { 492 - .destroy = iommufd_hw_pagetable_destroy, 493 - .abort = iommufd_hw_pagetable_abort, 485 + [IOMMUFD_OBJ_HWPT_PAGING] = { 486 + .destroy = iommufd_hwpt_paging_destroy, 487 + .abort = iommufd_hwpt_paging_abort, 488 + }, 489 + [IOMMUFD_OBJ_HWPT_NESTED] = { 490 + .destroy = iommufd_hwpt_nested_destroy, 491 + .abort = iommufd_hwpt_nested_abort, 494 492 }, 495 493 #ifdef CONFIG_IOMMUFD_TEST 496 494 [IOMMUFD_OBJ_SELFTEST] = { ··· 562 552 MODULE_ALIAS("devname:vfio/vfio"); 563 553 #endif 564 554 MODULE_IMPORT_NS(IOMMUFD_INTERNAL); 555 + MODULE_IMPORT_NS(IOMMUFD); 565 556 MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices"); 566 557 MODULE_LICENSE("GPL");
+2
drivers/iommu/iommufd/pages.c
··· 1507 1507 area, domain, iopt_area_index(area), 1508 1508 iopt_area_last_index(area)); 1509 1509 1510 + if (IS_ENABLED(CONFIG_IOMMUFD_TEST)) 1511 + WARN_ON(RB_EMPTY_NODE(&area->pages_node.rb)); 1510 1512 interval_tree_remove(&area->pages_node, &pages->domains_itree); 1511 1513 iopt_area_unfill_domain(area, pages, area->storage_domain); 1512 1514 area->storage_domain = NULL;
+306 -22
drivers/iommu/iommufd/selftest.c
··· 20 20 static DECLARE_FAULT_ATTR(fail_iommufd); 21 21 static struct dentry *dbgfs_root; 22 22 static struct platform_device *selftest_iommu_dev; 23 + static const struct iommu_ops mock_ops; 24 + static struct iommu_domain_ops domain_nested_ops; 23 25 24 26 size_t iommufd_test_memory_limit = 65536; 25 27 26 28 enum { 29 + MOCK_DIRTY_TRACK = 1, 27 30 MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2, 28 31 29 32 /* ··· 39 36 _MOCK_PFN_START = MOCK_PFN_MASK + 1, 40 37 MOCK_PFN_START_IOVA = _MOCK_PFN_START, 41 38 MOCK_PFN_LAST_IOVA = _MOCK_PFN_START, 39 + MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1, 42 40 }; 43 41 44 42 /* ··· 90 86 } 91 87 92 88 struct mock_iommu_domain { 89 + unsigned long flags; 93 90 struct iommu_domain domain; 94 91 struct xarray pfns; 92 + }; 93 + 94 + struct mock_iommu_domain_nested { 95 + struct iommu_domain domain; 96 + struct mock_iommu_domain *parent; 97 + u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; 95 98 }; 96 99 97 100 enum selftest_obj_type { ··· 107 96 108 97 struct mock_dev { 109 98 struct device dev; 99 + unsigned long flags; 110 100 }; 111 101 112 102 struct selftest_obj { ··· 130 118 static int mock_domain_nop_attach(struct iommu_domain *domain, 131 119 struct device *dev) 132 120 { 121 + struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 122 + 123 + if (domain->dirty_ops && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) 124 + return -EINVAL; 125 + 133 126 return 0; 134 127 } 135 128 ··· 163 146 return info; 164 147 } 165 148 166 - static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) 149 + static int mock_domain_set_dirty_tracking(struct iommu_domain *domain, 150 + bool enable) 151 + { 152 + struct mock_iommu_domain *mock = 153 + container_of(domain, struct mock_iommu_domain, domain); 154 + unsigned long flags = mock->flags; 155 + 156 + if (enable && !domain->dirty_ops) 157 + return -EINVAL; 158 + 159 + /* No change? */ 160 + if (!(enable ^ !!(flags & MOCK_DIRTY_TRACK))) 161 + return 0; 162 + 163 + flags = (enable ? flags | MOCK_DIRTY_TRACK : flags & ~MOCK_DIRTY_TRACK); 164 + 165 + mock->flags = flags; 166 + return 0; 167 + } 168 + 169 + static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain, 170 + unsigned long iova, size_t size, 171 + unsigned long flags, 172 + struct iommu_dirty_bitmap *dirty) 173 + { 174 + struct mock_iommu_domain *mock = 175 + container_of(domain, struct mock_iommu_domain, domain); 176 + unsigned long i, max = size / MOCK_IO_PAGE_SIZE; 177 + void *ent, *old; 178 + 179 + if (!(mock->flags & MOCK_DIRTY_TRACK) && dirty->bitmap) 180 + return -EINVAL; 181 + 182 + for (i = 0; i < max; i++) { 183 + unsigned long cur = iova + i * MOCK_IO_PAGE_SIZE; 184 + 185 + ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE); 186 + if (ent && (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) { 187 + /* Clear dirty */ 188 + if (!(flags & IOMMU_DIRTY_NO_CLEAR)) { 189 + unsigned long val; 190 + 191 + val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA; 192 + old = xa_store(&mock->pfns, 193 + cur / MOCK_IO_PAGE_SIZE, 194 + xa_mk_value(val), GFP_KERNEL); 195 + WARN_ON_ONCE(ent != old); 196 + } 197 + iommu_dirty_bitmap_record(dirty, cur, 198 + MOCK_IO_PAGE_SIZE); 199 + } 200 + } 201 + 202 + return 0; 203 + } 204 + 205 + const struct iommu_dirty_ops dirty_ops = { 206 + .set_dirty_tracking = mock_domain_set_dirty_tracking, 207 + .read_and_clear_dirty = mock_domain_read_and_clear_dirty, 208 + }; 209 + 210 + static struct iommu_domain *mock_domain_alloc_paging(struct device *dev) 167 211 { 168 212 struct mock_iommu_domain *mock; 169 - 170 - if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED) 171 - return &mock_blocking_domain; 172 - 173 - if (iommu_domain_type != IOMMU_DOMAIN_UNMANAGED) 174 - return NULL; 175 213 176 214 mock = kzalloc(sizeof(*mock), GFP_KERNEL); 177 215 if (!mock) ··· 234 162 mock->domain.geometry.aperture_start = MOCK_APERTURE_START; 235 163 mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST; 236 164 mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE; 165 + mock->domain.ops = mock_ops.default_domain_ops; 166 + mock->domain.type = IOMMU_DOMAIN_UNMANAGED; 237 167 xa_init(&mock->pfns); 238 168 return &mock->domain; 169 + } 170 + 171 + static struct iommu_domain * 172 + __mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, 173 + const struct iommu_hwpt_selftest *user_cfg) 174 + { 175 + struct mock_iommu_domain_nested *mock_nested; 176 + int i; 177 + 178 + mock_nested = kzalloc(sizeof(*mock_nested), GFP_KERNEL); 179 + if (!mock_nested) 180 + return ERR_PTR(-ENOMEM); 181 + mock_nested->parent = mock_parent; 182 + mock_nested->domain.ops = &domain_nested_ops; 183 + mock_nested->domain.type = IOMMU_DOMAIN_NESTED; 184 + for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) 185 + mock_nested->iotlb[i] = user_cfg->iotlb; 186 + return &mock_nested->domain; 187 + } 188 + 189 + static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) 190 + { 191 + if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED) 192 + return &mock_blocking_domain; 193 + if (iommu_domain_type == IOMMU_DOMAIN_UNMANAGED) 194 + return mock_domain_alloc_paging(NULL); 195 + return NULL; 196 + } 197 + 198 + static struct iommu_domain * 199 + mock_domain_alloc_user(struct device *dev, u32 flags, 200 + struct iommu_domain *parent, 201 + const struct iommu_user_data *user_data) 202 + { 203 + struct mock_iommu_domain *mock_parent; 204 + struct iommu_hwpt_selftest user_cfg; 205 + int rc; 206 + 207 + /* must be mock_domain */ 208 + if (!parent) { 209 + struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 210 + bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 211 + bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; 212 + struct iommu_domain *domain; 213 + 214 + if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT | 215 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) 216 + return ERR_PTR(-EOPNOTSUPP); 217 + if (user_data || (has_dirty_flag && no_dirty_ops)) 218 + return ERR_PTR(-EOPNOTSUPP); 219 + domain = mock_domain_alloc_paging(NULL); 220 + if (!domain) 221 + return ERR_PTR(-ENOMEM); 222 + if (has_dirty_flag) 223 + container_of(domain, struct mock_iommu_domain, domain) 224 + ->domain.dirty_ops = &dirty_ops; 225 + return domain; 226 + } 227 + 228 + /* must be mock_domain_nested */ 229 + if (user_data->type != IOMMU_HWPT_DATA_SELFTEST || flags) 230 + return ERR_PTR(-EOPNOTSUPP); 231 + if (!parent || parent->ops != mock_ops.default_domain_ops) 232 + return ERR_PTR(-EINVAL); 233 + 234 + mock_parent = container_of(parent, struct mock_iommu_domain, domain); 235 + if (!mock_parent) 236 + return ERR_PTR(-EINVAL); 237 + 238 + rc = iommu_copy_struct_from_user(&user_cfg, user_data, 239 + IOMMU_HWPT_DATA_SELFTEST, iotlb); 240 + if (rc) 241 + return ERR_PTR(rc); 242 + 243 + return __mock_domain_alloc_nested(mock_parent, &user_cfg); 239 244 } 240 245 241 246 static void mock_domain_free(struct iommu_domain *domain) ··· 392 243 393 244 for (cur = 0; cur != pgsize; cur += MOCK_IO_PAGE_SIZE) { 394 245 ent = xa_erase(&mock->pfns, iova / MOCK_IO_PAGE_SIZE); 395 - WARN_ON(!ent); 246 + 396 247 /* 397 248 * iommufd generates unmaps that must be a strict 398 249 * superset of the map's performend So every starting ··· 402 253 * passed to map_pages 403 254 */ 404 255 if (first) { 405 - WARN_ON(!(xa_to_value(ent) & 406 - MOCK_PFN_START_IOVA)); 256 + WARN_ON(ent && !(xa_to_value(ent) & 257 + MOCK_PFN_START_IOVA)); 407 258 first = false; 408 259 } 409 260 if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize) 410 - WARN_ON(!(xa_to_value(ent) & 411 - MOCK_PFN_LAST_IOVA)); 261 + WARN_ON(ent && !(xa_to_value(ent) & 262 + MOCK_PFN_LAST_IOVA)); 412 263 413 264 iova += MOCK_IO_PAGE_SIZE; 414 265 ret += MOCK_IO_PAGE_SIZE; ··· 432 283 433 284 static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) 434 285 { 435 - return cap == IOMMU_CAP_CACHE_COHERENCY; 286 + struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 287 + 288 + switch (cap) { 289 + case IOMMU_CAP_CACHE_COHERENCY: 290 + return true; 291 + case IOMMU_CAP_DIRTY_TRACKING: 292 + return !(mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY); 293 + default: 294 + break; 295 + } 296 + 297 + return false; 436 298 } 437 299 438 300 static void mock_domain_set_plaform_dma_ops(struct device *dev) ··· 467 307 .pgsize_bitmap = MOCK_IO_PAGE_SIZE, 468 308 .hw_info = mock_domain_hw_info, 469 309 .domain_alloc = mock_domain_alloc, 310 + .domain_alloc_user = mock_domain_alloc_user, 470 311 .capable = mock_domain_capable, 471 312 .set_platform_dma_ops = mock_domain_set_plaform_dma_ops, 472 313 .device_group = generic_device_group, ··· 482 321 }, 483 322 }; 484 323 324 + static void mock_domain_free_nested(struct iommu_domain *domain) 325 + { 326 + struct mock_iommu_domain_nested *mock_nested = 327 + container_of(domain, struct mock_iommu_domain_nested, domain); 328 + 329 + kfree(mock_nested); 330 + } 331 + 332 + static struct iommu_domain_ops domain_nested_ops = { 333 + .free = mock_domain_free_nested, 334 + .attach_dev = mock_domain_nop_attach, 335 + }; 336 + 337 + static inline struct iommufd_hw_pagetable * 338 + __get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, u32 hwpt_type) 339 + { 340 + struct iommufd_object *obj; 341 + 342 + obj = iommufd_get_object(ucmd->ictx, mockpt_id, hwpt_type); 343 + if (IS_ERR(obj)) 344 + return ERR_CAST(obj); 345 + return container_of(obj, struct iommufd_hw_pagetable, obj); 346 + } 347 + 485 348 static inline struct iommufd_hw_pagetable * 486 349 get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, 487 350 struct mock_iommu_domain **mock) 488 351 { 489 352 struct iommufd_hw_pagetable *hwpt; 490 - struct iommufd_object *obj; 491 353 492 - obj = iommufd_get_object(ucmd->ictx, mockpt_id, 493 - IOMMUFD_OBJ_HW_PAGETABLE); 494 - if (IS_ERR(obj)) 495 - return ERR_CAST(obj); 496 - hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); 497 - if (hwpt->domain->ops != mock_ops.default_domain_ops) { 354 + hwpt = __get_md_pagetable(ucmd, mockpt_id, IOMMUFD_OBJ_HWPT_PAGING); 355 + if (IS_ERR(hwpt)) 356 + return hwpt; 357 + if (hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED || 358 + hwpt->domain->ops != mock_ops.default_domain_ops) { 498 359 iommufd_put_object(&hwpt->obj); 499 360 return ERR_PTR(-EINVAL); 500 361 } 501 362 *mock = container_of(hwpt->domain, struct mock_iommu_domain, domain); 363 + return hwpt; 364 + } 365 + 366 + static inline struct iommufd_hw_pagetable * 367 + get_md_pagetable_nested(struct iommufd_ucmd *ucmd, u32 mockpt_id, 368 + struct mock_iommu_domain_nested **mock_nested) 369 + { 370 + struct iommufd_hw_pagetable *hwpt; 371 + 372 + hwpt = __get_md_pagetable(ucmd, mockpt_id, IOMMUFD_OBJ_HWPT_NESTED); 373 + if (IS_ERR(hwpt)) 374 + return hwpt; 375 + if (hwpt->domain->type != IOMMU_DOMAIN_NESTED || 376 + hwpt->domain->ops != &domain_nested_ops) { 377 + iommufd_put_object(&hwpt->obj); 378 + return ERR_PTR(-EINVAL); 379 + } 380 + *mock_nested = container_of(hwpt->domain, 381 + struct mock_iommu_domain_nested, domain); 502 382 return hwpt; 503 383 } 504 384 ··· 564 362 kfree(mdev); 565 363 } 566 364 567 - static struct mock_dev *mock_dev_create(void) 365 + static struct mock_dev *mock_dev_create(unsigned long dev_flags) 568 366 { 569 367 struct mock_dev *mdev; 570 368 int rc; 369 + 370 + if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY)) 371 + return ERR_PTR(-EINVAL); 571 372 572 373 mdev = kzalloc(sizeof(*mdev), GFP_KERNEL); 573 374 if (!mdev) 574 375 return ERR_PTR(-ENOMEM); 575 376 576 377 device_initialize(&mdev->dev); 378 + mdev->flags = dev_flags; 577 379 mdev->dev.release = mock_dev_release; 578 380 mdev->dev.bus = &iommufd_mock_bus_type.bus; 579 381 ··· 613 407 struct iommufd_device *idev; 614 408 struct selftest_obj *sobj; 615 409 u32 pt_id = cmd->id; 410 + u32 dev_flags = 0; 616 411 u32 idev_id; 617 412 int rc; 618 413 ··· 624 417 sobj->idev.ictx = ucmd->ictx; 625 418 sobj->type = TYPE_IDEV; 626 419 627 - sobj->idev.mock_dev = mock_dev_create(); 420 + if (cmd->op == IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS) 421 + dev_flags = cmd->mock_domain_flags.dev_flags; 422 + 423 + sobj->idev.mock_dev = mock_dev_create(dev_flags); 628 424 if (IS_ERR(sobj->idev.mock_dev)) { 629 425 rc = PTR_ERR(sobj->idev.mock_dev); 630 426 goto out_sobj; ··· 1187 977 static_assert((unsigned int)MOCK_ACCESS_RW_SLOW_PATH == 1188 978 __IOMMUFD_ACCESS_RW_SLOW_PATH); 1189 979 980 + static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id, 981 + unsigned long iova, size_t length, 982 + unsigned long page_size, void __user *uptr, 983 + u32 flags) 984 + { 985 + unsigned long bitmap_size, i, max; 986 + struct iommu_test_cmd *cmd = ucmd->cmd; 987 + struct iommufd_hw_pagetable *hwpt; 988 + struct mock_iommu_domain *mock; 989 + int rc, count = 0; 990 + void *tmp; 991 + 992 + if (!page_size || !length || iova % page_size || length % page_size || 993 + !uptr) 994 + return -EINVAL; 995 + 996 + hwpt = get_md_pagetable(ucmd, mockpt_id, &mock); 997 + if (IS_ERR(hwpt)) 998 + return PTR_ERR(hwpt); 999 + 1000 + if (!(mock->flags & MOCK_DIRTY_TRACK)) { 1001 + rc = -EINVAL; 1002 + goto out_put; 1003 + } 1004 + 1005 + max = length / page_size; 1006 + bitmap_size = max / BITS_PER_BYTE; 1007 + 1008 + tmp = kvzalloc(bitmap_size, GFP_KERNEL_ACCOUNT); 1009 + if (!tmp) { 1010 + rc = -ENOMEM; 1011 + goto out_put; 1012 + } 1013 + 1014 + if (copy_from_user(tmp, uptr, bitmap_size)) { 1015 + rc = -EFAULT; 1016 + goto out_free; 1017 + } 1018 + 1019 + for (i = 0; i < max; i++) { 1020 + unsigned long cur = iova + i * page_size; 1021 + void *ent, *old; 1022 + 1023 + if (!test_bit(i, (unsigned long *)tmp)) 1024 + continue; 1025 + 1026 + ent = xa_load(&mock->pfns, cur / page_size); 1027 + if (ent) { 1028 + unsigned long val; 1029 + 1030 + val = xa_to_value(ent) | MOCK_PFN_DIRTY_IOVA; 1031 + old = xa_store(&mock->pfns, cur / page_size, 1032 + xa_mk_value(val), GFP_KERNEL); 1033 + WARN_ON_ONCE(ent != old); 1034 + count++; 1035 + } 1036 + } 1037 + 1038 + cmd->dirty.out_nr_dirty = count; 1039 + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); 1040 + out_free: 1041 + kvfree(tmp); 1042 + out_put: 1043 + iommufd_put_object(&hwpt->obj); 1044 + return rc; 1045 + } 1046 + 1190 1047 void iommufd_selftest_destroy(struct iommufd_object *obj) 1191 1048 { 1192 1049 struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); ··· 1277 1000 cmd->add_reserved.start, 1278 1001 cmd->add_reserved.length); 1279 1002 case IOMMU_TEST_OP_MOCK_DOMAIN: 1003 + case IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS: 1280 1004 return iommufd_test_mock_domain(ucmd, cmd); 1281 1005 case IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE: 1282 1006 return iommufd_test_mock_domain_replace( ··· 1319 1041 return -EINVAL; 1320 1042 iommufd_test_memory_limit = cmd->memory_limit.limit; 1321 1043 return 0; 1044 + case IOMMU_TEST_OP_DIRTY: 1045 + return iommufd_test_dirty(ucmd, cmd->id, cmd->dirty.iova, 1046 + cmd->dirty.length, 1047 + cmd->dirty.page_size, 1048 + u64_to_user_ptr(cmd->dirty.uptr), 1049 + cmd->dirty.flags); 1322 1050 default: 1323 1051 return -EOPNOTSUPP; 1324 1052 }
+3 -3
drivers/iommu/iommufd/vfio_compat.c
··· 255 255 256 256 static int iommufd_vfio_cc_iommu(struct iommufd_ctx *ictx) 257 257 { 258 - struct iommufd_hw_pagetable *hwpt; 258 + struct iommufd_hwpt_paging *hwpt_paging; 259 259 struct iommufd_ioas *ioas; 260 260 int rc = 1; 261 261 ··· 264 264 return PTR_ERR(ioas); 265 265 266 266 mutex_lock(&ioas->mutex); 267 - list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) { 268 - if (!hwpt->enforce_cache_coherency) { 267 + list_for_each_entry(hwpt_paging, &ioas->hwpt_list, hwpt_item) { 268 + if (!hwpt_paging->enforce_cache_coherency) { 269 269 rc = 0; 270 270 break; 271 271 }
+1 -2
drivers/vfio/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_VFIO) += vfio.o 3 3 4 - vfio-y += vfio_main.o \ 5 - iova_bitmap.o 4 + vfio-y += vfio_main.o 6 5 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o 7 6 vfio-$(CONFIG_VFIO_GROUP) += group.o 8 7 vfio-$(CONFIG_IOMMUFD) += iommufd.o
+4 -1
drivers/vfio/iova_bitmap.c drivers/iommu/iommufd/iova_bitmap.c
··· 268 268 iova_bitmap_free(bitmap); 269 269 return ERR_PTR(rc); 270 270 } 271 + EXPORT_SYMBOL_NS_GPL(iova_bitmap_alloc, IOMMUFD); 271 272 272 273 /** 273 274 * iova_bitmap_free() - Frees an IOVA bitmap object ··· 290 289 291 290 kfree(bitmap); 292 291 } 292 + EXPORT_SYMBOL_NS_GPL(iova_bitmap_free, IOMMUFD); 293 293 294 294 /* 295 295 * Returns the remaining bitmap indexes from mapped_total_index to process for ··· 389 387 390 388 return ret; 391 389 } 390 + EXPORT_SYMBOL_NS_GPL(iova_bitmap_for_each, IOMMUFD); 392 391 393 392 /** 394 393 * iova_bitmap_set() - Records an IOVA range in bitmap ··· 423 420 cur_bit += nbits; 424 421 } while (cur_bit <= last_bit); 425 422 } 426 - EXPORT_SYMBOL_GPL(iova_bitmap_set); 423 + EXPORT_SYMBOL_NS_GPL(iova_bitmap_set, IOMMUFD);
+1
drivers/vfio/pci/mlx5/Kconfig
··· 3 3 tristate "VFIO support for MLX5 PCI devices" 4 4 depends on MLX5_CORE 5 5 select VFIO_PCI_CORE 6 + select IOMMUFD_DRIVER 6 7 help 7 8 This provides migration support for MLX5 devices using the VFIO 8 9 framework.
+1
drivers/vfio/pci/mlx5/main.c
··· 1517 1517 1518 1518 module_pci_driver(mlx5vf_pci_driver); 1519 1519 1520 + MODULE_IMPORT_NS(IOMMUFD); 1520 1521 MODULE_LICENSE("GPL"); 1521 1522 MODULE_AUTHOR("Max Gurtovoy <mgurtovoy@nvidia.com>"); 1522 1523 MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+1
drivers/vfio/pci/pds/Kconfig
··· 5 5 tristate "VFIO support for PDS PCI devices" 6 6 depends on PDS_CORE && PCI_IOV 7 7 select VFIO_PCI_CORE 8 + select IOMMUFD_DRIVER 8 9 help 9 10 This provides generic PCI support for PDS devices using the VFIO 10 11 framework.
+1
drivers/vfio/pci/pds/pci_drv.c
··· 204 204 205 205 module_pci_driver(pds_vfio_pci_driver); 206 206 207 + MODULE_IMPORT_NS(IOMMUFD); 207 208 MODULE_DESCRIPTION(PDS_VFIO_DRV_DESCRIPTION); 208 209 MODULE_AUTHOR("Brett Creeley <brett.creeley@amd.com>"); 209 210 MODULE_LICENSE("GPL");
+1
drivers/vfio/vfio_main.c
··· 1703 1703 module_init(vfio_init); 1704 1704 module_exit(vfio_cleanup); 1705 1705 1706 + MODULE_IMPORT_NS(IOMMUFD); 1706 1707 MODULE_VERSION(DRIVER_VERSION); 1707 1708 MODULE_LICENSE("GPL v2"); 1708 1709 MODULE_AUTHOR(DRIVER_AUTHOR);
+4
include/linux/io-pgtable.h
··· 166 166 struct iommu_iotlb_gather *gather); 167 167 phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops, 168 168 unsigned long iova); 169 + int (*read_and_clear_dirty)(struct io_pgtable_ops *ops, 170 + unsigned long iova, size_t size, 171 + unsigned long flags, 172 + struct iommu_dirty_bitmap *dirty); 169 173 }; 170 174 171 175 /**
+145 -1
include/linux/iommu.h
··· 13 13 #include <linux/errno.h> 14 14 #include <linux/err.h> 15 15 #include <linux/of.h> 16 + #include <linux/iova_bitmap.h> 16 17 #include <uapi/linux/iommu.h> 17 18 18 19 #define IOMMU_READ (1 << 0) ··· 38 37 struct device; 39 38 struct iommu_domain; 40 39 struct iommu_domain_ops; 40 + struct iommu_dirty_ops; 41 41 struct notifier_block; 42 42 struct iommu_sva; 43 43 struct iommu_fault_event; ··· 67 65 68 66 #define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */ 69 67 68 + #define __IOMMU_DOMAIN_NESTED (1U << 6) /* User-managed address space nested 69 + on a stage-2 translation */ 70 + 70 71 #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ 71 72 /* 72 73 * This are the possible domain-types ··· 96 91 __IOMMU_DOMAIN_DMA_API | \ 97 92 __IOMMU_DOMAIN_DMA_FQ) 98 93 #define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA) 94 + #define IOMMU_DOMAIN_NESTED (__IOMMU_DOMAIN_NESTED) 99 95 100 96 struct iommu_domain { 101 97 unsigned type; 102 98 const struct iommu_domain_ops *ops; 99 + const struct iommu_dirty_ops *dirty_ops; 100 + 103 101 unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */ 104 102 struct iommu_domain_geometry geometry; 105 103 struct iommu_dma_cookie *iova_cookie; ··· 141 133 * usefully support the non-strict DMA flush queue. 142 134 */ 143 135 IOMMU_CAP_DEFERRED_FLUSH, 136 + IOMMU_CAP_DIRTY_TRACKING, /* IOMMU supports dirty tracking */ 144 137 }; 145 138 146 139 /* These are the possible reserved region types */ ··· 237 228 }; 238 229 239 230 /** 231 + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state 232 + * @bitmap: IOVA bitmap 233 + * @gather: Range information for a pending IOTLB flush 234 + */ 235 + struct iommu_dirty_bitmap { 236 + struct iova_bitmap *bitmap; 237 + struct iommu_iotlb_gather *gather; 238 + }; 239 + 240 + /* Read but do not clear any dirty bits */ 241 + #define IOMMU_DIRTY_NO_CLEAR (1 << 0) 242 + 243 + /** 244 + * struct iommu_dirty_ops - domain specific dirty tracking operations 245 + * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain 246 + * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled 247 + * into a bitmap, with a bit represented as a page. 248 + * Reads the dirty PTE bits and clears it from IO 249 + * pagetables. 250 + */ 251 + struct iommu_dirty_ops { 252 + int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled); 253 + int (*read_and_clear_dirty)(struct iommu_domain *domain, 254 + unsigned long iova, size_t size, 255 + unsigned long flags, 256 + struct iommu_dirty_bitmap *dirty); 257 + }; 258 + 259 + /** 260 + * struct iommu_user_data - iommu driver specific user space data info 261 + * @type: The data type of the user buffer 262 + * @uptr: Pointer to the user buffer for copy_from_user() 263 + * @len: The length of the user buffer in bytes 264 + * 265 + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h 266 + * @type, @uptr and @len should be just copied from an iommufd core uAPI struct. 267 + */ 268 + struct iommu_user_data { 269 + unsigned int type; 270 + void __user *uptr; 271 + size_t len; 272 + }; 273 + 274 + /** 275 + * __iommu_copy_struct_from_user - Copy iommu driver specific user space data 276 + * @dst_data: Pointer to an iommu driver specific user data that is defined in 277 + * include/uapi/linux/iommufd.h 278 + * @src_data: Pointer to a struct iommu_user_data for user space data info 279 + * @data_type: The data type of the @dst_data. Must match with @src_data.type 280 + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst) 281 + * @min_len: Initial length of user data structure for backward compatibility. 282 + * This should be offsetofend using the last member in the user data 283 + * struct that was initially added to include/uapi/linux/iommufd.h 284 + */ 285 + static inline int __iommu_copy_struct_from_user( 286 + void *dst_data, const struct iommu_user_data *src_data, 287 + unsigned int data_type, size_t data_len, size_t min_len) 288 + { 289 + if (src_data->type != data_type) 290 + return -EINVAL; 291 + if (WARN_ON(!dst_data || !src_data)) 292 + return -EINVAL; 293 + if (src_data->len < min_len || data_len < src_data->len) 294 + return -EINVAL; 295 + return copy_struct_from_user(dst_data, data_len, src_data->uptr, 296 + src_data->len); 297 + } 298 + 299 + /** 300 + * iommu_copy_struct_from_user - Copy iommu driver specific user space data 301 + * @kdst: Pointer to an iommu driver specific user data that is defined in 302 + * include/uapi/linux/iommufd.h 303 + * @user_data: Pointer to a struct iommu_user_data for user space data info 304 + * @data_type: The data type of the @kdst. Must match with @user_data->type 305 + * @min_last: The last memember of the data structure @kdst points in the 306 + * initial version. 307 + * Return 0 for success, otherwise -error. 308 + */ 309 + #define iommu_copy_struct_from_user(kdst, user_data, data_type, min_last) \ 310 + __iommu_copy_struct_from_user(kdst, user_data, data_type, \ 311 + sizeof(*kdst), \ 312 + offsetofend(typeof(*kdst), min_last)) 313 + 314 + /** 240 315 * struct iommu_ops - iommu ops and capabilities 241 316 * @capable: check capability 242 317 * @hw_info: report iommu hardware information. The data buffer returned by this 243 318 * op is allocated in the iommu driver and freed by the caller after 244 319 * use. The information type is one of enum iommu_hw_info_type defined 245 320 * in include/uapi/linux/iommufd.h. 246 - * @domain_alloc: allocate iommu domain 321 + * @domain_alloc: allocate and return an iommu domain if success. Otherwise 322 + * NULL is returned. The domain is not fully initialized until 323 + * the caller iommu_domain_alloc() returns. 324 + * @domain_alloc_user: Allocate an iommu domain corresponding to the input 325 + * parameters as defined in include/uapi/linux/iommufd.h. 326 + * Unlike @domain_alloc, it is called only by IOMMUFD and 327 + * must fully initialize the new domain before return. 328 + * Upon success, if the @user_data is valid and the @parent 329 + * points to a kernel-managed domain, the new domain must be 330 + * IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be 331 + * NULL while the @user_data can be optionally provided, the 332 + * new domain must support __IOMMU_DOMAIN_PAGING. 333 + * Upon failure, ERR_PTR must be returned. 247 334 * @probe_device: Add device to iommu driver handling 248 335 * @release_device: Remove device from iommu driver handling 249 336 * @probe_finalize: Do final setup work after the device is added to an IOMMU ··· 372 267 373 268 /* Domain allocation and freeing by the iommu driver */ 374 269 struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); 270 + struct iommu_domain *(*domain_alloc_user)( 271 + struct device *dev, u32 flags, struct iommu_domain *parent, 272 + const struct iommu_user_data *user_data); 375 273 376 274 struct iommu_device *(*probe_device)(struct device *dev); 377 275 void (*release_device)(struct device *dev); ··· 740 632 return gather && gather->queued; 741 633 } 742 634 635 + static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty, 636 + struct iova_bitmap *bitmap, 637 + struct iommu_iotlb_gather *gather) 638 + { 639 + if (gather) 640 + iommu_iotlb_gather_init(gather); 641 + 642 + dirty->bitmap = bitmap; 643 + dirty->gather = gather; 644 + } 645 + 646 + static inline void iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, 647 + unsigned long iova, 648 + unsigned long length) 649 + { 650 + if (dirty->bitmap) 651 + iova_bitmap_set(dirty->bitmap, iova, length); 652 + 653 + if (dirty->gather) 654 + iommu_iotlb_gather_add_range(dirty->gather, iova, length); 655 + } 656 + 743 657 /* PCI device grouping function */ 744 658 extern struct iommu_group *pci_device_group(struct device *dev); 745 659 /* Generic device grouping function */ ··· 867 737 struct iommu_device {}; 868 738 struct iommu_fault_param {}; 869 739 struct iommu_iotlb_gather {}; 740 + struct iommu_dirty_bitmap {}; 741 + struct iommu_dirty_ops {}; 870 742 871 743 static inline bool iommu_present(const struct bus_type *bus) 872 744 { ··· 1099 967 static inline bool iommu_iotlb_gather_queued(struct iommu_iotlb_gather *gather) 1100 968 { 1101 969 return false; 970 + } 971 + 972 + static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty, 973 + struct iova_bitmap *bitmap, 974 + struct iommu_iotlb_gather *gather) 975 + { 976 + } 977 + 978 + static inline void iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, 979 + unsigned long iova, 980 + unsigned long length) 981 + { 1102 982 } 1103 983 1104 984 static inline void iommu_device_unregister(struct iommu_device *iommu)
+26
include/linux/iova_bitmap.h
··· 7 7 #define _IOVA_BITMAP_H_ 8 8 9 9 #include <linux/types.h> 10 + #include <linux/errno.h> 10 11 11 12 struct iova_bitmap; 12 13 ··· 15 14 unsigned long iova, size_t length, 16 15 void *opaque); 17 16 17 + #if IS_ENABLED(CONFIG_IOMMUFD_DRIVER) 18 18 struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length, 19 19 unsigned long page_size, 20 20 u64 __user *data); ··· 24 22 iova_bitmap_fn_t fn); 25 23 void iova_bitmap_set(struct iova_bitmap *bitmap, 26 24 unsigned long iova, size_t length); 25 + #else 26 + static inline struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, 27 + size_t length, 28 + unsigned long page_size, 29 + u64 __user *data) 30 + { 31 + return NULL; 32 + } 33 + 34 + static inline void iova_bitmap_free(struct iova_bitmap *bitmap) 35 + { 36 + } 37 + 38 + static inline int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque, 39 + iova_bitmap_fn_t fn) 40 + { 41 + return -EOPNOTSUPP; 42 + } 43 + 44 + static inline void iova_bitmap_set(struct iova_bitmap *bitmap, 45 + unsigned long iova, size_t length) 46 + { 47 + } 48 + #endif 27 49 28 50 #endif
+176 -4
include/uapi/linux/iommufd.h
··· 47 47 IOMMUFD_CMD_VFIO_IOAS, 48 48 IOMMUFD_CMD_HWPT_ALLOC, 49 49 IOMMUFD_CMD_GET_HW_INFO, 50 + IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING, 51 + IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP, 50 52 }; 51 53 52 54 /** ··· 350 348 #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS) 351 349 352 350 /** 351 + * enum iommufd_hwpt_alloc_flags - Flags for HWPT allocation 352 + * @IOMMU_HWPT_ALLOC_NEST_PARENT: If set, allocate a HWPT that can serve as 353 + * the parent HWPT in a nesting configuration. 354 + * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is 355 + * enforced on device attachment 356 + */ 357 + enum iommufd_hwpt_alloc_flags { 358 + IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, 359 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1, 360 + }; 361 + 362 + /** 363 + * enum iommu_hwpt_vtd_s1_flags - Intel VT-d stage-1 page table 364 + * entry attributes 365 + * @IOMMU_VTD_S1_SRE: Supervisor request 366 + * @IOMMU_VTD_S1_EAFE: Extended access enable 367 + * @IOMMU_VTD_S1_WPE: Write protect enable 368 + */ 369 + enum iommu_hwpt_vtd_s1_flags { 370 + IOMMU_VTD_S1_SRE = 1 << 0, 371 + IOMMU_VTD_S1_EAFE = 1 << 1, 372 + IOMMU_VTD_S1_WPE = 1 << 2, 373 + }; 374 + 375 + /** 376 + * struct iommu_hwpt_vtd_s1 - Intel VT-d stage-1 page table 377 + * info (IOMMU_HWPT_DATA_VTD_S1) 378 + * @flags: Combination of enum iommu_hwpt_vtd_s1_flags 379 + * @pgtbl_addr: The base address of the stage-1 page table. 380 + * @addr_width: The address width of the stage-1 page table 381 + * @__reserved: Must be 0 382 + */ 383 + struct iommu_hwpt_vtd_s1 { 384 + __aligned_u64 flags; 385 + __aligned_u64 pgtbl_addr; 386 + __u32 addr_width; 387 + __u32 __reserved; 388 + }; 389 + 390 + /** 391 + * enum iommu_hwpt_data_type - IOMMU HWPT Data Type 392 + * @IOMMU_HWPT_DATA_NONE: no data 393 + * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table 394 + */ 395 + enum iommu_hwpt_data_type { 396 + IOMMU_HWPT_DATA_NONE, 397 + IOMMU_HWPT_DATA_VTD_S1, 398 + }; 399 + 400 + /** 353 401 * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) 354 402 * @size: sizeof(struct iommu_hwpt_alloc) 355 - * @flags: Must be 0 403 + * @flags: Combination of enum iommufd_hwpt_alloc_flags 356 404 * @dev_id: The device to allocate this HWPT for 357 - * @pt_id: The IOAS to connect this HWPT to 405 + * @pt_id: The IOAS or HWPT to connect this HWPT to 358 406 * @out_hwpt_id: The ID of the new HWPT 359 407 * @__reserved: Must be 0 408 + * @data_type: One of enum iommu_hwpt_data_type 409 + * @data_len: Length of the type specific data 410 + * @data_uptr: User pointer to the type specific data 360 411 * 361 412 * Explicitly allocate a hardware page table object. This is the same object 362 413 * type that is returned by iommufd_device_attach() and represents the 363 414 * underlying iommu driver's iommu_domain kernel object. 364 415 * 365 - * A HWPT will be created with the IOVA mappings from the given IOAS. 416 + * A kernel-managed HWPT will be created with the mappings from the given 417 + * IOAS via the @pt_id. The @data_type for this allocation must be set to 418 + * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a 419 + * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. 420 + * 421 + * A user-managed nested HWPT will be created from a given parent HWPT via 422 + * @pt_id, in which the parent HWPT must be allocated previously via the 423 + * same ioctl from a given IOAS (@pt_id). In this case, the @data_type 424 + * must be set to a pre-defined type corresponding to an I/O page table 425 + * type supported by the underlying IOMMU hardware. 426 + * 427 + * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and 428 + * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr 429 + * must be given. 366 430 */ 367 431 struct iommu_hwpt_alloc { 368 432 __u32 size; ··· 437 369 __u32 pt_id; 438 370 __u32 out_hwpt_id; 439 371 __u32 __reserved; 372 + __u32 data_type; 373 + __u32 data_len; 374 + __aligned_u64 data_uptr; 440 375 }; 441 376 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC) 442 377 443 378 /** 379 + * enum iommu_hw_info_vtd_flags - Flags for VT-d hw_info 380 + * @IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17: If set, disallow read-only mappings 381 + * on a nested_parent domain. 382 + * https://www.intel.com/content/www/us/en/content-details/772415/content-details.html 383 + */ 384 + enum iommu_hw_info_vtd_flags { 385 + IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 = 1 << 0, 386 + }; 387 + 388 + /** 444 389 * struct iommu_hw_info_vtd - Intel VT-d hardware information 445 390 * 446 - * @flags: Must be 0 391 + * @flags: Combination of enum iommu_hw_info_vtd_flags 447 392 * @__reserved: Must be 0 448 393 * 449 394 * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec ··· 486 405 }; 487 406 488 407 /** 408 + * enum iommufd_hw_capabilities 409 + * @IOMMU_HW_CAP_DIRTY_TRACKING: IOMMU hardware support for dirty tracking 410 + * If available, it means the following APIs 411 + * are supported: 412 + * 413 + * IOMMU_HWPT_GET_DIRTY_BITMAP 414 + * IOMMU_HWPT_SET_DIRTY_TRACKING 415 + * 416 + */ 417 + enum iommufd_hw_capabilities { 418 + IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0, 419 + }; 420 + 421 + /** 489 422 * struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO) 490 423 * @size: sizeof(struct iommu_hw_info) 491 424 * @flags: Must be 0 ··· 510 415 * the iommu type specific hardware information data 511 416 * @out_data_type: Output the iommu hardware info type as defined in the enum 512 417 * iommu_hw_info_type. 418 + * @out_capabilities: Output the generic iommu capability info type as defined 419 + * in the enum iommu_hw_capabilities. 513 420 * @__reserved: Must be 0 514 421 * 515 422 * Query an iommu type specific hardware information data from an iommu behind ··· 536 439 __aligned_u64 data_uptr; 537 440 __u32 out_data_type; 538 441 __u32 __reserved; 442 + __aligned_u64 out_capabilities; 539 443 }; 540 444 #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO) 445 + 446 + /* 447 + * enum iommufd_hwpt_set_dirty_tracking_flags - Flags for steering dirty 448 + * tracking 449 + * @IOMMU_HWPT_DIRTY_TRACKING_ENABLE: Enable dirty tracking 450 + */ 451 + enum iommufd_hwpt_set_dirty_tracking_flags { 452 + IOMMU_HWPT_DIRTY_TRACKING_ENABLE = 1, 453 + }; 454 + 455 + /** 456 + * struct iommu_hwpt_set_dirty_tracking - ioctl(IOMMU_HWPT_SET_DIRTY_TRACKING) 457 + * @size: sizeof(struct iommu_hwpt_set_dirty_tracking) 458 + * @flags: Combination of enum iommufd_hwpt_set_dirty_tracking_flags 459 + * @hwpt_id: HW pagetable ID that represents the IOMMU domain 460 + * @__reserved: Must be 0 461 + * 462 + * Toggle dirty tracking on an HW pagetable. 463 + */ 464 + struct iommu_hwpt_set_dirty_tracking { 465 + __u32 size; 466 + __u32 flags; 467 + __u32 hwpt_id; 468 + __u32 __reserved; 469 + }; 470 + #define IOMMU_HWPT_SET_DIRTY_TRACKING _IO(IOMMUFD_TYPE, \ 471 + IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING) 472 + 473 + /** 474 + * enum iommufd_hwpt_get_dirty_bitmap_flags - Flags for getting dirty bits 475 + * @IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR: Just read the PTEs without clearing 476 + * any dirty bits metadata. This flag 477 + * can be passed in the expectation 478 + * where the next operation is an unmap 479 + * of the same IOVA range. 480 + * 481 + */ 482 + enum iommufd_hwpt_get_dirty_bitmap_flags { 483 + IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR = 1, 484 + }; 485 + 486 + /** 487 + * struct iommu_hwpt_get_dirty_bitmap - ioctl(IOMMU_HWPT_GET_DIRTY_BITMAP) 488 + * @size: sizeof(struct iommu_hwpt_get_dirty_bitmap) 489 + * @hwpt_id: HW pagetable ID that represents the IOMMU domain 490 + * @flags: Combination of enum iommufd_hwpt_get_dirty_bitmap_flags 491 + * @__reserved: Must be 0 492 + * @iova: base IOVA of the bitmap first bit 493 + * @length: IOVA range size 494 + * @page_size: page size granularity of each bit in the bitmap 495 + * @data: bitmap where to set the dirty bits. The bitmap bits each 496 + * represent a page_size which you deviate from an arbitrary iova. 497 + * 498 + * Checking a given IOVA is dirty: 499 + * 500 + * data[(iova / page_size) / 64] & (1ULL << ((iova / page_size) % 64)) 501 + * 502 + * Walk the IOMMU pagetables for a given IOVA range to return a bitmap 503 + * with the dirty IOVAs. In doing so it will also by default clear any 504 + * dirty bit metadata set in the IOPTE. 505 + */ 506 + struct iommu_hwpt_get_dirty_bitmap { 507 + __u32 size; 508 + __u32 hwpt_id; 509 + __u32 flags; 510 + __u32 __reserved; 511 + __aligned_u64 iova; 512 + __aligned_u64 length; 513 + __aligned_u64 page_size; 514 + __aligned_u64 data; 515 + }; 516 + #define IOMMU_HWPT_GET_DIRTY_BITMAP _IO(IOMMUFD_TYPE, \ 517 + IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP) 518 + 541 519 #endif
+362 -17
tools/testing/selftests/iommu/iommufd.c
··· 86 86 87 87 TEST_F(iommufd, cmd_length) 88 88 { 89 - #define TEST_LENGTH(_struct, _ioctl) \ 89 + #define TEST_LENGTH(_struct, _ioctl, _last) \ 90 90 { \ 91 + size_t min_size = offsetofend(struct _struct, _last); \ 91 92 struct { \ 92 93 struct _struct cmd; \ 93 94 uint8_t extra; \ 94 - } cmd = { .cmd = { .size = sizeof(struct _struct) - 1 }, \ 95 + } cmd = { .cmd = { .size = min_size - 1 }, \ 95 96 .extra = UINT8_MAX }; \ 96 97 int old_errno; \ 97 98 int rc; \ ··· 113 112 } \ 114 113 } 115 114 116 - TEST_LENGTH(iommu_destroy, IOMMU_DESTROY); 117 - TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO); 118 - TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC); 119 - TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES); 120 - TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS); 121 - TEST_LENGTH(iommu_ioas_map, IOMMU_IOAS_MAP); 122 - TEST_LENGTH(iommu_ioas_copy, IOMMU_IOAS_COPY); 123 - TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP); 124 - TEST_LENGTH(iommu_option, IOMMU_OPTION); 125 - TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS); 115 + TEST_LENGTH(iommu_destroy, IOMMU_DESTROY, id); 116 + TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO, __reserved); 117 + TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC, __reserved); 118 + TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC, out_ioas_id); 119 + TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES, 120 + out_iova_alignment); 121 + TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS, 122 + allowed_iovas); 123 + TEST_LENGTH(iommu_ioas_map, IOMMU_IOAS_MAP, iova); 124 + TEST_LENGTH(iommu_ioas_copy, IOMMU_IOAS_COPY, src_iova); 125 + TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length); 126 + TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); 127 + TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); 126 128 #undef TEST_LENGTH 127 129 } 128 130 ··· 261 257 } else { 262 258 /* Can allocate and manually free an IOAS table */ 263 259 test_ioctl_destroy(self->ioas_id); 260 + } 261 + } 262 + 263 + TEST_F(iommufd_ioas, alloc_hwpt_nested) 264 + { 265 + const uint32_t min_data_len = 266 + offsetofend(struct iommu_hwpt_selftest, iotlb); 267 + struct iommu_hwpt_selftest data = { 268 + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, 269 + }; 270 + uint32_t nested_hwpt_id[2] = {}; 271 + uint32_t parent_hwpt_id = 0; 272 + uint32_t parent_hwpt_id_not_work = 0; 273 + uint32_t test_hwpt_id = 0; 274 + 275 + if (self->device_id) { 276 + /* Negative tests */ 277 + test_err_hwpt_alloc(ENOENT, self->ioas_id, self->device_id, 0, 278 + &test_hwpt_id); 279 + test_err_hwpt_alloc(EINVAL, self->device_id, self->device_id, 0, 280 + &test_hwpt_id); 281 + 282 + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, 283 + IOMMU_HWPT_ALLOC_NEST_PARENT, 284 + &parent_hwpt_id); 285 + 286 + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, 0, 287 + &parent_hwpt_id_not_work); 288 + 289 + /* Negative nested tests */ 290 + test_err_hwpt_alloc_nested(EINVAL, self->device_id, 291 + parent_hwpt_id, 0, 292 + &nested_hwpt_id[0], 293 + IOMMU_HWPT_DATA_NONE, &data, 294 + sizeof(data)); 295 + test_err_hwpt_alloc_nested(EOPNOTSUPP, self->device_id, 296 + parent_hwpt_id, 0, 297 + &nested_hwpt_id[0], 298 + IOMMU_HWPT_DATA_SELFTEST + 1, &data, 299 + sizeof(data)); 300 + test_err_hwpt_alloc_nested(EINVAL, self->device_id, 301 + parent_hwpt_id, 0, 302 + &nested_hwpt_id[0], 303 + IOMMU_HWPT_DATA_SELFTEST, &data, 304 + min_data_len - 1); 305 + test_err_hwpt_alloc_nested(EFAULT, self->device_id, 306 + parent_hwpt_id, 0, 307 + &nested_hwpt_id[0], 308 + IOMMU_HWPT_DATA_SELFTEST, NULL, 309 + sizeof(data)); 310 + test_err_hwpt_alloc_nested( 311 + EOPNOTSUPP, self->device_id, parent_hwpt_id, 312 + IOMMU_HWPT_ALLOC_NEST_PARENT, &nested_hwpt_id[0], 313 + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); 314 + test_err_hwpt_alloc_nested(EINVAL, self->device_id, 315 + parent_hwpt_id_not_work, 0, 316 + &nested_hwpt_id[0], 317 + IOMMU_HWPT_DATA_SELFTEST, &data, 318 + sizeof(data)); 319 + 320 + /* Allocate two nested hwpts sharing one common parent hwpt */ 321 + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0, 322 + &nested_hwpt_id[0], 323 + IOMMU_HWPT_DATA_SELFTEST, &data, 324 + sizeof(data)); 325 + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0, 326 + &nested_hwpt_id[1], 327 + IOMMU_HWPT_DATA_SELFTEST, &data, 328 + sizeof(data)); 329 + 330 + /* Negative test: a nested hwpt on top of a nested hwpt */ 331 + test_err_hwpt_alloc_nested(EINVAL, self->device_id, 332 + nested_hwpt_id[0], 0, &test_hwpt_id, 333 + IOMMU_HWPT_DATA_SELFTEST, &data, 334 + sizeof(data)); 335 + /* Negative test: parent hwpt now cannot be freed */ 336 + EXPECT_ERRNO(EBUSY, 337 + _test_ioctl_destroy(self->fd, parent_hwpt_id)); 338 + 339 + /* Attach device to nested_hwpt_id[0] that then will be busy */ 340 + test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id[0]); 341 + EXPECT_ERRNO(EBUSY, 342 + _test_ioctl_destroy(self->fd, nested_hwpt_id[0])); 343 + 344 + /* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */ 345 + test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id[1]); 346 + EXPECT_ERRNO(EBUSY, 347 + _test_ioctl_destroy(self->fd, nested_hwpt_id[1])); 348 + test_ioctl_destroy(nested_hwpt_id[0]); 349 + 350 + /* Detach from nested_hwpt_id[1] and destroy it */ 351 + test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id); 352 + test_ioctl_destroy(nested_hwpt_id[1]); 353 + 354 + /* Detach from the parent hw_pagetable and destroy it */ 355 + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); 356 + test_ioctl_destroy(parent_hwpt_id); 357 + test_ioctl_destroy(parent_hwpt_id_not_work); 358 + } else { 359 + test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0, 360 + &parent_hwpt_id); 361 + test_err_hwpt_alloc_nested(ENOENT, self->device_id, 362 + parent_hwpt_id, 0, 363 + &nested_hwpt_id[0], 364 + IOMMU_HWPT_DATA_SELFTEST, &data, 365 + sizeof(data)); 366 + test_err_hwpt_alloc_nested(ENOENT, self->device_id, 367 + parent_hwpt_id, 0, 368 + &nested_hwpt_id[1], 369 + IOMMU_HWPT_DATA_SELFTEST, &data, 370 + sizeof(data)); 371 + test_err_mock_domain_replace(ENOENT, self->stdev_id, 372 + nested_hwpt_id[0]); 373 + test_err_mock_domain_replace(ENOENT, self->stdev_id, 374 + nested_hwpt_id[1]); 264 375 } 265 376 } 266 377 ··· 1523 1404 int i; 1524 1405 1525 1406 for (i = 0; i != variant->mock_domains; i++) { 1407 + uint32_t hwpt_id[2]; 1526 1408 uint32_t stddev_id; 1527 - uint32_t hwpt_id; 1528 1409 1529 - test_cmd_hwpt_alloc(self->idev_ids[0], self->ioas_id, &hwpt_id); 1530 - test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); 1410 + test_err_hwpt_alloc(EOPNOTSUPP, 1411 + self->idev_ids[i], self->ioas_id, 1412 + ~IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id[0]); 1413 + test_cmd_hwpt_alloc(self->idev_ids[i], self->ioas_id, 1414 + 0, &hwpt_id[0]); 1415 + test_cmd_hwpt_alloc(self->idev_ids[i], self->ioas_id, 1416 + IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id[1]); 1417 + 1418 + /* Do a hw_pagetable rotation test */ 1419 + test_cmd_mock_domain_replace(self->stdev_ids[i], hwpt_id[0]); 1420 + EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, hwpt_id[0])); 1421 + test_cmd_mock_domain_replace(self->stdev_ids[i], hwpt_id[1]); 1422 + EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, hwpt_id[1])); 1423 + test_cmd_mock_domain_replace(self->stdev_ids[i], self->ioas_id); 1424 + test_ioctl_destroy(hwpt_id[1]); 1425 + 1426 + test_cmd_mock_domain(hwpt_id[0], &stddev_id, NULL, NULL); 1531 1427 test_ioctl_destroy(stddev_id); 1532 - test_ioctl_destroy(hwpt_id); 1428 + test_ioctl_destroy(hwpt_id[0]); 1533 1429 } 1430 + } 1431 + 1432 + FIXTURE(iommufd_dirty_tracking) 1433 + { 1434 + int fd; 1435 + uint32_t ioas_id; 1436 + uint32_t hwpt_id; 1437 + uint32_t stdev_id; 1438 + uint32_t idev_id; 1439 + unsigned long page_size; 1440 + unsigned long bitmap_size; 1441 + void *bitmap; 1442 + void *buffer; 1443 + }; 1444 + 1445 + FIXTURE_VARIANT(iommufd_dirty_tracking) 1446 + { 1447 + unsigned long buffer_size; 1448 + }; 1449 + 1450 + FIXTURE_SETUP(iommufd_dirty_tracking) 1451 + { 1452 + void *vrc; 1453 + int rc; 1454 + 1455 + self->fd = open("/dev/iommu", O_RDWR); 1456 + ASSERT_NE(-1, self->fd); 1457 + 1458 + rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, variant->buffer_size); 1459 + if (rc || !self->buffer) { 1460 + SKIP(return, "Skipping buffer_size=%lu due to errno=%d", 1461 + variant->buffer_size, rc); 1462 + } 1463 + 1464 + assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0); 1465 + vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE, 1466 + MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0); 1467 + assert(vrc == self->buffer); 1468 + 1469 + self->page_size = MOCK_PAGE_SIZE; 1470 + self->bitmap_size = 1471 + variant->buffer_size / self->page_size / BITS_PER_BYTE; 1472 + 1473 + /* Provision with an extra (MOCK_PAGE_SIZE) for the unaligned case */ 1474 + rc = posix_memalign(&self->bitmap, PAGE_SIZE, 1475 + self->bitmap_size + MOCK_PAGE_SIZE); 1476 + assert(!rc); 1477 + assert(self->bitmap); 1478 + assert((uintptr_t)self->bitmap % PAGE_SIZE == 0); 1479 + 1480 + test_ioctl_ioas_alloc(&self->ioas_id); 1481 + test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id, 1482 + &self->idev_id); 1483 + } 1484 + 1485 + FIXTURE_TEARDOWN(iommufd_dirty_tracking) 1486 + { 1487 + munmap(self->buffer, variant->buffer_size); 1488 + munmap(self->bitmap, self->bitmap_size); 1489 + teardown_iommufd(self->fd, _metadata); 1490 + } 1491 + 1492 + FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128k) 1493 + { 1494 + /* one u32 index bitmap */ 1495 + .buffer_size = 128UL * 1024UL, 1496 + }; 1497 + 1498 + FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256k) 1499 + { 1500 + /* one u64 index bitmap */ 1501 + .buffer_size = 256UL * 1024UL, 1502 + }; 1503 + 1504 + FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty640k) 1505 + { 1506 + /* two u64 index and trailing end bitmap */ 1507 + .buffer_size = 640UL * 1024UL, 1508 + }; 1509 + 1510 + FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M) 1511 + { 1512 + /* 4K bitmap (128M IOVA range) */ 1513 + .buffer_size = 128UL * 1024UL * 1024UL, 1514 + }; 1515 + 1516 + FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M) 1517 + { 1518 + /* 8K bitmap (256M IOVA range) */ 1519 + .buffer_size = 256UL * 1024UL * 1024UL, 1520 + }; 1521 + 1522 + TEST_F(iommufd_dirty_tracking, enforce_dirty) 1523 + { 1524 + uint32_t ioas_id, stddev_id, idev_id; 1525 + uint32_t hwpt_id, _hwpt_id; 1526 + uint32_t dev_flags; 1527 + 1528 + /* Regular case */ 1529 + dev_flags = MOCK_FLAGS_DEVICE_NO_DIRTY; 1530 + test_cmd_hwpt_alloc(self->idev_id, self->ioas_id, 1531 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id); 1532 + test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); 1533 + test_err_mock_domain_flags(EINVAL, hwpt_id, dev_flags, &stddev_id, 1534 + NULL); 1535 + test_ioctl_destroy(stddev_id); 1536 + test_ioctl_destroy(hwpt_id); 1537 + 1538 + /* IOMMU device does not support dirty tracking */ 1539 + test_ioctl_ioas_alloc(&ioas_id); 1540 + test_cmd_mock_domain_flags(ioas_id, dev_flags, &stddev_id, &_hwpt_id, 1541 + &idev_id); 1542 + test_err_hwpt_alloc(EOPNOTSUPP, idev_id, ioas_id, 1543 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id); 1544 + test_ioctl_destroy(stddev_id); 1545 + } 1546 + 1547 + TEST_F(iommufd_dirty_tracking, set_dirty_tracking) 1548 + { 1549 + uint32_t stddev_id; 1550 + uint32_t hwpt_id; 1551 + 1552 + test_cmd_hwpt_alloc(self->idev_id, self->ioas_id, 1553 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id); 1554 + test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); 1555 + test_cmd_set_dirty_tracking(hwpt_id, true); 1556 + test_cmd_set_dirty_tracking(hwpt_id, false); 1557 + 1558 + test_ioctl_destroy(stddev_id); 1559 + test_ioctl_destroy(hwpt_id); 1560 + } 1561 + 1562 + TEST_F(iommufd_dirty_tracking, device_dirty_capability) 1563 + { 1564 + uint32_t caps = 0; 1565 + uint32_t stddev_id; 1566 + uint32_t hwpt_id; 1567 + 1568 + test_cmd_hwpt_alloc(self->idev_id, self->ioas_id, 0, &hwpt_id); 1569 + test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); 1570 + test_cmd_get_hw_capabilities(self->idev_id, caps, 1571 + IOMMU_HW_CAP_DIRTY_TRACKING); 1572 + ASSERT_EQ(IOMMU_HW_CAP_DIRTY_TRACKING, 1573 + caps & IOMMU_HW_CAP_DIRTY_TRACKING); 1574 + 1575 + test_ioctl_destroy(stddev_id); 1576 + test_ioctl_destroy(hwpt_id); 1577 + } 1578 + 1579 + TEST_F(iommufd_dirty_tracking, get_dirty_bitmap) 1580 + { 1581 + uint32_t stddev_id; 1582 + uint32_t hwpt_id; 1583 + uint32_t ioas_id; 1584 + 1585 + test_ioctl_ioas_alloc(&ioas_id); 1586 + test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer, 1587 + variant->buffer_size, MOCK_APERTURE_START); 1588 + 1589 + test_cmd_hwpt_alloc(self->idev_id, ioas_id, 1590 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id); 1591 + test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); 1592 + 1593 + test_cmd_set_dirty_tracking(hwpt_id, true); 1594 + 1595 + test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size, 1596 + MOCK_APERTURE_START, self->page_size, 1597 + self->bitmap, self->bitmap_size, 0, _metadata); 1598 + 1599 + /* PAGE_SIZE unaligned bitmap */ 1600 + test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size, 1601 + MOCK_APERTURE_START, self->page_size, 1602 + self->bitmap + MOCK_PAGE_SIZE, 1603 + self->bitmap_size, 0, _metadata); 1604 + 1605 + test_ioctl_destroy(stddev_id); 1606 + test_ioctl_destroy(hwpt_id); 1607 + } 1608 + 1609 + TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear) 1610 + { 1611 + uint32_t stddev_id; 1612 + uint32_t hwpt_id; 1613 + uint32_t ioas_id; 1614 + 1615 + test_ioctl_ioas_alloc(&ioas_id); 1616 + test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer, 1617 + variant->buffer_size, MOCK_APERTURE_START); 1618 + 1619 + test_cmd_hwpt_alloc(self->idev_id, ioas_id, 1620 + IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id); 1621 + test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); 1622 + 1623 + test_cmd_set_dirty_tracking(hwpt_id, true); 1624 + 1625 + test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size, 1626 + MOCK_APERTURE_START, self->page_size, 1627 + self->bitmap, self->bitmap_size, 1628 + IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR, 1629 + _metadata); 1630 + 1631 + /* Unaligned bitmap */ 1632 + test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size, 1633 + MOCK_APERTURE_START, self->page_size, 1634 + self->bitmap + MOCK_PAGE_SIZE, 1635 + self->bitmap_size, 1636 + IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR, 1637 + _metadata); 1638 + 1639 + test_ioctl_destroy(stddev_id); 1640 + test_ioctl_destroy(hwpt_id); 1534 1641 } 1535 1642 1536 1643 /* VFIO compatibility IOCTLs */ ··· 2074 1729 ASSERT_EQ(0, ioctl(self->fd, VFIO_IOMMU_UNMAP_DMA, &unmap_cmd)); 2075 1730 ASSERT_EQ(BUFFER_SIZE, unmap_cmd.size); 2076 1731 2077 - /* UNMAP_FLAG_ALL requres 0 iova/size */ 1732 + /* UNMAP_FLAG_ALL requires 0 iova/size */ 2078 1733 ASSERT_EQ(0, ioctl(self->fd, VFIO_IOMMU_MAP_DMA, &map_cmd)); 2079 1734 unmap_cmd.flags = VFIO_DMA_UNMAP_FLAG_ALL; 2080 1735 EXPECT_ERRNO(EINVAL, ioctl(self->fd, VFIO_IOMMU_UNMAP_DMA, &unmap_cmd));
+4 -3
tools/testing/selftests/iommu/iommufd_fail_nth.c
··· 105 105 106 106 /* 107 107 * This is just an arbitrary limit based on the current kernel 108 - * situation. Changes in the kernel can dramtically change the number of 108 + * situation. Changes in the kernel can dramatically change the number of 109 109 * required fault injection sites, so if this hits it doesn't 110 110 * necessarily mean a test failure, just that the limit has to be made 111 111 * bigger. ··· 612 612 &idev_id)) 613 613 return -1; 614 614 615 - if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info))) 615 + if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL)) 616 616 return -1; 617 617 618 - if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, &hwpt_id)) 618 + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id, 619 + IOMMU_HWPT_DATA_NONE, 0, 0)) 619 620 return -1; 620 621 621 622 if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
+221 -12
tools/testing/selftests/iommu/iommufd_utils.h
··· 16 16 /* Hack to make assertions more readable */ 17 17 #define _IOMMU_TEST_CMD(x) IOMMU_TEST_CMD 18 18 19 + /* Imported from include/asm-generic/bitops/generic-non-atomic.h */ 20 + #define BITS_PER_BYTE 8 21 + #define BITS_PER_LONG __BITS_PER_LONG 22 + #define BIT_MASK(nr) (1UL << ((nr) % __BITS_PER_LONG)) 23 + #define BIT_WORD(nr) ((nr) / __BITS_PER_LONG) 24 + 25 + static inline void set_bit(unsigned int nr, unsigned long *addr) 26 + { 27 + unsigned long mask = BIT_MASK(nr); 28 + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); 29 + 30 + *p |= mask; 31 + } 32 + 33 + static inline bool test_bit(unsigned int nr, unsigned long *addr) 34 + { 35 + return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG - 1))); 36 + } 37 + 19 38 static void *buffer; 20 39 static unsigned long BUFFER_SIZE; 21 40 ··· 93 74 EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \ 94 75 stdev_id, hwpt_id, NULL)) 95 76 77 + static int _test_cmd_mock_domain_flags(int fd, unsigned int ioas_id, 78 + __u32 stdev_flags, __u32 *stdev_id, 79 + __u32 *hwpt_id, __u32 *idev_id) 80 + { 81 + struct iommu_test_cmd cmd = { 82 + .size = sizeof(cmd), 83 + .op = IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS, 84 + .id = ioas_id, 85 + .mock_domain_flags = { .dev_flags = stdev_flags }, 86 + }; 87 + int ret; 88 + 89 + ret = ioctl(fd, IOMMU_TEST_CMD, &cmd); 90 + if (ret) 91 + return ret; 92 + if (stdev_id) 93 + *stdev_id = cmd.mock_domain_flags.out_stdev_id; 94 + assert(cmd.id != 0); 95 + if (hwpt_id) 96 + *hwpt_id = cmd.mock_domain_flags.out_hwpt_id; 97 + if (idev_id) 98 + *idev_id = cmd.mock_domain_flags.out_idev_id; 99 + return 0; 100 + } 101 + #define test_cmd_mock_domain_flags(ioas_id, flags, stdev_id, hwpt_id, idev_id) \ 102 + ASSERT_EQ(0, _test_cmd_mock_domain_flags(self->fd, ioas_id, flags, \ 103 + stdev_id, hwpt_id, idev_id)) 104 + #define test_err_mock_domain_flags(_errno, ioas_id, flags, stdev_id, hwpt_id) \ 105 + EXPECT_ERRNO(_errno, \ 106 + _test_cmd_mock_domain_flags(self->fd, ioas_id, flags, \ 107 + stdev_id, hwpt_id, NULL)) 108 + 96 109 static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, 97 110 __u32 *hwpt_id) 98 111 { ··· 154 103 pt_id, NULL)) 155 104 156 105 static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, 157 - __u32 *hwpt_id) 106 + __u32 flags, __u32 *hwpt_id, __u32 data_type, 107 + void *data, size_t data_len) 158 108 { 159 109 struct iommu_hwpt_alloc cmd = { 160 110 .size = sizeof(cmd), 111 + .flags = flags, 161 112 .dev_id = device_id, 162 113 .pt_id = pt_id, 114 + .data_type = data_type, 115 + .data_len = data_len, 116 + .data_uptr = (uint64_t)data, 163 117 }; 164 118 int ret; 165 119 ··· 176 120 return 0; 177 121 } 178 122 179 - #define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \ 180 - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id)) 123 + #define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ 124 + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ 125 + hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \ 126 + 0)) 127 + #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ 128 + EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc( \ 129 + self->fd, device_id, pt_id, flags, \ 130 + hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0)) 131 + 132 + #define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \ 133 + data_type, data, data_len) \ 134 + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ 135 + hwpt_id, data_type, data, data_len)) 136 + #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \ 137 + data_type, data, data_len) \ 138 + EXPECT_ERRNO(_errno, \ 139 + _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ 140 + hwpt_id, data_type, data, data_len)) 181 141 182 142 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, 183 143 unsigned int ioas_id) ··· 213 141 } 214 142 #define test_cmd_access_replace_ioas(access_id, ioas_id) \ 215 143 ASSERT_EQ(0, _test_cmd_access_replace_ioas(self->fd, access_id, ioas_id)) 144 + 145 + static int _test_cmd_set_dirty_tracking(int fd, __u32 hwpt_id, bool enabled) 146 + { 147 + struct iommu_hwpt_set_dirty_tracking cmd = { 148 + .size = sizeof(cmd), 149 + .flags = enabled ? IOMMU_HWPT_DIRTY_TRACKING_ENABLE : 0, 150 + .hwpt_id = hwpt_id, 151 + }; 152 + int ret; 153 + 154 + ret = ioctl(fd, IOMMU_HWPT_SET_DIRTY_TRACKING, &cmd); 155 + if (ret) 156 + return -errno; 157 + return 0; 158 + } 159 + #define test_cmd_set_dirty_tracking(hwpt_id, enabled) \ 160 + ASSERT_EQ(0, _test_cmd_set_dirty_tracking(self->fd, hwpt_id, enabled)) 161 + 162 + static int _test_cmd_get_dirty_bitmap(int fd, __u32 hwpt_id, size_t length, 163 + __u64 iova, size_t page_size, 164 + __u64 *bitmap, __u32 flags) 165 + { 166 + struct iommu_hwpt_get_dirty_bitmap cmd = { 167 + .size = sizeof(cmd), 168 + .hwpt_id = hwpt_id, 169 + .flags = flags, 170 + .iova = iova, 171 + .length = length, 172 + .page_size = page_size, 173 + .data = (uintptr_t)bitmap, 174 + }; 175 + int ret; 176 + 177 + ret = ioctl(fd, IOMMU_HWPT_GET_DIRTY_BITMAP, &cmd); 178 + if (ret) 179 + return ret; 180 + return 0; 181 + } 182 + 183 + #define test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, page_size, \ 184 + bitmap, flags) \ 185 + ASSERT_EQ(0, _test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, \ 186 + page_size, bitmap, flags)) 187 + 188 + static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length, 189 + __u64 iova, size_t page_size, 190 + __u64 *bitmap, __u64 *dirty) 191 + { 192 + struct iommu_test_cmd cmd = { 193 + .size = sizeof(cmd), 194 + .op = IOMMU_TEST_OP_DIRTY, 195 + .id = hwpt_id, 196 + .dirty = { 197 + .iova = iova, 198 + .length = length, 199 + .page_size = page_size, 200 + .uptr = (uintptr_t)bitmap, 201 + } 202 + }; 203 + int ret; 204 + 205 + ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_DIRTY), &cmd); 206 + if (ret) 207 + return -ret; 208 + if (dirty) 209 + *dirty = cmd.dirty.out_nr_dirty; 210 + return 0; 211 + } 212 + 213 + #define test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size, \ 214 + bitmap, nr) \ 215 + ASSERT_EQ(0, \ 216 + _test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, \ 217 + page_size, bitmap, nr)) 218 + 219 + static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length, 220 + __u64 iova, size_t page_size, __u64 *bitmap, 221 + __u64 bitmap_size, __u32 flags, 222 + struct __test_metadata *_metadata) 223 + { 224 + unsigned long i, count, nbits = bitmap_size * BITS_PER_BYTE; 225 + unsigned long nr = nbits / 2; 226 + __u64 out_dirty = 0; 227 + 228 + /* Mark all even bits as dirty in the mock domain */ 229 + for (count = 0, i = 0; i < nbits; count += !(i % 2), i++) 230 + if (!(i % 2)) 231 + set_bit(i, (unsigned long *)bitmap); 232 + ASSERT_EQ(nr, count); 233 + 234 + test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size, 235 + bitmap, &out_dirty); 236 + ASSERT_EQ(nr, out_dirty); 237 + 238 + /* Expect all even bits as dirty in the user bitmap */ 239 + memset(bitmap, 0, bitmap_size); 240 + test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, page_size, bitmap, 241 + flags); 242 + for (count = 0, i = 0; i < nbits; count += !(i % 2), i++) 243 + ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *)bitmap)); 244 + ASSERT_EQ(count, out_dirty); 245 + 246 + memset(bitmap, 0, bitmap_size); 247 + test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, page_size, bitmap, 248 + flags); 249 + 250 + /* It as read already -- expect all zeroes */ 251 + for (i = 0; i < nbits; i++) { 252 + ASSERT_EQ(!(i % 2) && (flags & 253 + IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR), 254 + test_bit(i, (unsigned long *)bitmap)); 255 + } 256 + 257 + return 0; 258 + } 259 + #define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap, \ 260 + bitmap_size, flags, _metadata) \ 261 + ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id, length, iova, \ 262 + page_size, bitmap, bitmap_size, \ 263 + flags, _metadata)) 216 264 217 265 static int _test_cmd_create_access(int fd, unsigned int ioas_id, 218 266 __u32 *access_id, unsigned int flags) ··· 458 266 IOMMU_IOAS_MAP_READABLE)); \ 459 267 }) 460 268 269 + #define test_ioctl_ioas_map_fixed_id(ioas_id, buffer, length, iova) \ 270 + ({ \ 271 + __u64 __iova = iova; \ 272 + ASSERT_EQ(0, \ 273 + _test_ioctl_ioas_map( \ 274 + self->fd, ioas_id, buffer, length, &__iova, \ 275 + IOMMU_IOAS_MAP_FIXED_IOVA | \ 276 + IOMMU_IOAS_MAP_WRITEABLE | \ 277 + IOMMU_IOAS_MAP_READABLE)); \ 278 + }) 279 + 461 280 #define test_err_ioctl_ioas_map_fixed(_errno, buffer, length, iova) \ 462 281 ({ \ 463 282 __u64 __iova = iova; \ ··· 557 354 #endif 558 355 559 356 /* @data can be NULL */ 560 - static int _test_cmd_get_hw_info(int fd, __u32 device_id, 561 - void *data, size_t data_len) 357 + static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data, 358 + size_t data_len, uint32_t *capabilities) 562 359 { 563 360 struct iommu_test_hw_info *info = (struct iommu_test_hw_info *)data; 564 361 struct iommu_hw_info cmd = { ··· 566 363 .dev_id = device_id, 567 364 .data_len = data_len, 568 365 .data_uptr = (uint64_t)data, 366 + .out_capabilities = 0, 569 367 }; 570 368 int ret; 571 369 ··· 603 399 assert(!info->flags); 604 400 } 605 401 402 + if (capabilities) 403 + *capabilities = cmd.out_capabilities; 404 + 606 405 return 0; 607 406 } 608 407 609 - #define test_cmd_get_hw_info(device_id, data, data_len) \ 610 - ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, \ 611 - data, data_len)) 408 + #define test_cmd_get_hw_info(device_id, data, data_len) \ 409 + ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, data, \ 410 + data_len, NULL)) 612 411 613 - #define test_err_get_hw_info(_errno, device_id, data, data_len) \ 614 - EXPECT_ERRNO(_errno, \ 615 - _test_cmd_get_hw_info(self->fd, device_id, \ 616 - data, data_len)) 412 + #define test_err_get_hw_info(_errno, device_id, data, data_len) \ 413 + EXPECT_ERRNO(_errno, _test_cmd_get_hw_info(self->fd, device_id, data, \ 414 + data_len, NULL)) 415 + 416 + #define test_cmd_get_hw_capabilities(device_id, caps, mask) \ 417 + ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))