Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

iommupt: Add the basic structure of the iommu implementation

The existing IOMMU page table implementations duplicate all of the working
algorithms for each format. By using the generic page table API a single C
version of the IOMMU algorithms can be created and re-used for all of the
different formats used in the drivers. The implementation will provide a
single C version of the iommu domain operations: iova_to_phys, map, unmap,
and read_and_clear_dirty.

Further, adding new algorithms and techniques becomes easy to do across
the entire fleet of drivers and formats.

The C functions are drop in compatible with the existing iommu_domain_ops
using the IOMMU_PT_DOMAIN_OPS() macro. Each per-format implementation
compilation unit will produce exported symbols following the pattern
pt_iommu_FMT_map_pages() which the macro directly maps to the
iommu_domain_ops members. This avoids the additional function pointer
indirection like io-pgtable has.

The top level struct used by the drivers is pt_iommu_table_FMT. It
contains the other structs to allow container_of() to move between the
driver, iommu page table, generic page table, and generic format layers.

struct pt_iommu_table_amdv1 {
struct pt_iommu {
struct iommu_domain domain;
} iommu;
struct pt_amdv1 {
struct pt_common common;
} amdpt;
};

The driver is expected to union the pt_iommu_table_FMT with its own
existing domain struct:

struct driver_domain {
union {
struct iommu_domain domain;
struct pt_iommu_table_amdv1 amdv1;
};
};
PT_IOMMU_CHECK_DOMAIN(struct driver_domain, amdv1, domain);

To create an alias to avoid renaming 'domain' in a lot of driver code.

This allows all the layers to access all the necessary functions to
implement their different roles with no change to any of the existing
iommu core code.

Implement the basic starting point: pt_iommu_init(), get_info() and
deinit().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

authored by

Jason Gunthorpe and committed by
Joerg Roedel
cdb39d91 ab0b5728

+461
+13
drivers/iommu/generic_pt/Kconfig
··· 17 17 kernels. 18 18 19 19 The kunit tests require this to be enabled to get full coverage. 20 + 21 + config IOMMU_PT 22 + tristate "IOMMU Page Tables" 23 + select IOMMU_API 24 + depends on IOMMU_SUPPORT 25 + depends on GENERIC_PT 26 + help 27 + Generic library for building IOMMU page tables 28 + 29 + IOMMU_PT provides an implementation of the page table operations 30 + related to struct iommu_domain using GENERIC_PT. It provides a single 31 + implementation of the page table operations that can be shared by 32 + multiple drivers. 20 33 endif
+39
drivers/iommu/generic_pt/fmt/iommu_template.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES 4 + * 5 + * Template to build the iommu module and kunit from the format and 6 + * implementation headers. 7 + * 8 + * The format should have: 9 + * #define PT_FMT <name> 10 + * #define PT_SUPPORTED_FEATURES (BIT(PT_FEAT_xx) | BIT(PT_FEAT_yy)) 11 + * And optionally: 12 + * #define PT_FORCE_ENABLED_FEATURES .. 13 + * #define PT_FMT_VARIANT <suffix> 14 + */ 15 + #include <linux/args.h> 16 + #include <linux/stringify.h> 17 + 18 + #ifdef PT_FMT_VARIANT 19 + #define PTPFX_RAW \ 20 + CONCATENATE(CONCATENATE(PT_FMT, _), PT_FMT_VARIANT) 21 + #else 22 + #define PTPFX_RAW PT_FMT 23 + #endif 24 + 25 + #define PTPFX CONCATENATE(PTPFX_RAW, _) 26 + 27 + #define _PT_FMT_H PT_FMT.h 28 + #define PT_FMT_H __stringify(_PT_FMT_H) 29 + 30 + #define _PT_DEFS_H CONCATENATE(defs_, _PT_FMT_H) 31 + #define PT_DEFS_H __stringify(_PT_DEFS_H) 32 + 33 + #include <linux/generic_pt/common.h> 34 + #include PT_DEFS_H 35 + #include "../pt_defs.h" 36 + #include PT_FMT_H 37 + #include "../pt_common.h" 38 + 39 + #include "../iommu_pt.h"
+259
drivers/iommu/generic_pt/iommu_pt.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES 4 + * 5 + * "Templated C code" for implementing the iommu operations for page tables. 6 + * This is compiled multiple times, over all the page table formats to pick up 7 + * the per-format definitions. 8 + */ 9 + #ifndef __GENERIC_PT_IOMMU_PT_H 10 + #define __GENERIC_PT_IOMMU_PT_H 11 + 12 + #include "pt_iter.h" 13 + 14 + #include <linux/export.h> 15 + #include <linux/iommu.h> 16 + #include "../iommu-pages.h" 17 + 18 + #define DOMAIN_NS(op) CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), op) 19 + 20 + struct pt_iommu_collect_args { 21 + struct iommu_pages_list free_list; 22 + }; 23 + 24 + static int __collect_tables(struct pt_range *range, void *arg, 25 + unsigned int level, struct pt_table_p *table) 26 + { 27 + struct pt_state pts = pt_init(range, level, table); 28 + struct pt_iommu_collect_args *collect = arg; 29 + int ret; 30 + 31 + if (!pt_can_have_table(&pts)) 32 + return 0; 33 + 34 + for_each_pt_level_entry(&pts) { 35 + if (pts.type == PT_ENTRY_TABLE) { 36 + iommu_pages_list_add(&collect->free_list, pts.table_lower); 37 + ret = pt_descend(&pts, arg, __collect_tables); 38 + if (ret) 39 + return ret; 40 + continue; 41 + } 42 + } 43 + return 0; 44 + } 45 + 46 + static inline struct pt_table_p *table_alloc_top(struct pt_common *common, 47 + uintptr_t top_of_table, 48 + gfp_t gfp) 49 + { 50 + struct pt_iommu *iommu_table = iommu_from_common(common); 51 + 52 + /* 53 + * Top doesn't need the free list or otherwise, so it technically 54 + * doesn't need to use iommu pages. Use the API anyhow as the top is 55 + * usually not smaller than PAGE_SIZE to keep things simple. 56 + */ 57 + return iommu_alloc_pages_node_sz( 58 + iommu_table->nid, gfp, 59 + log2_to_int(pt_top_memsize_lg2(common, top_of_table))); 60 + } 61 + 62 + static void NS(get_info)(struct pt_iommu *iommu_table, 63 + struct pt_iommu_info *info) 64 + { 65 + struct pt_common *common = common_from_iommu(iommu_table); 66 + struct pt_range range = pt_top_range(common); 67 + struct pt_state pts = pt_init_top(&range); 68 + pt_vaddr_t pgsize_bitmap = 0; 69 + 70 + if (pt_feature(common, PT_FEAT_DYNAMIC_TOP)) { 71 + for (pts.level = 0; pts.level <= PT_MAX_TOP_LEVEL; 72 + pts.level++) { 73 + if (pt_table_item_lg2sz(&pts) >= common->max_vasz_lg2) 74 + break; 75 + pgsize_bitmap |= pt_possible_sizes(&pts); 76 + } 77 + } else { 78 + for (pts.level = 0; pts.level <= range.top_level; pts.level++) 79 + pgsize_bitmap |= pt_possible_sizes(&pts); 80 + } 81 + 82 + /* Hide page sizes larger than the maximum OA */ 83 + info->pgsize_bitmap = oalog2_mod(pgsize_bitmap, common->max_oasz_lg2); 84 + } 85 + 86 + static void NS(deinit)(struct pt_iommu *iommu_table) 87 + { 88 + struct pt_common *common = common_from_iommu(iommu_table); 89 + struct pt_range range = pt_all_range(common); 90 + struct pt_iommu_collect_args collect = { 91 + .free_list = IOMMU_PAGES_LIST_INIT(collect.free_list), 92 + }; 93 + 94 + iommu_pages_list_add(&collect.free_list, range.top_table); 95 + pt_walk_range(&range, __collect_tables, &collect); 96 + 97 + /* 98 + * The driver has to already have fenced the HW access to the page table 99 + * and invalidated any caching referring to this memory. 100 + */ 101 + iommu_put_pages_list(&collect.free_list); 102 + } 103 + 104 + static const struct pt_iommu_ops NS(ops) = { 105 + .get_info = NS(get_info), 106 + .deinit = NS(deinit), 107 + }; 108 + 109 + static int pt_init_common(struct pt_common *common) 110 + { 111 + struct pt_range top_range = pt_top_range(common); 112 + 113 + if (PT_WARN_ON(top_range.top_level > PT_MAX_TOP_LEVEL)) 114 + return -EINVAL; 115 + 116 + if (top_range.top_level == PT_MAX_TOP_LEVEL || 117 + common->max_vasz_lg2 == top_range.max_vasz_lg2) 118 + common->features &= ~BIT(PT_FEAT_DYNAMIC_TOP); 119 + 120 + if (top_range.max_vasz_lg2 == PT_VADDR_MAX_LG2) 121 + common->features |= BIT(PT_FEAT_FULL_VA); 122 + 123 + /* Requested features must match features compiled into this format */ 124 + if ((common->features & ~(unsigned int)PT_SUPPORTED_FEATURES) || 125 + (!IS_ENABLED(CONFIG_DEBUG_GENERIC_PT) && 126 + (common->features & PT_FORCE_ENABLED_FEATURES) != 127 + PT_FORCE_ENABLED_FEATURES)) 128 + return -EOPNOTSUPP; 129 + 130 + if (common->max_oasz_lg2 == 0) 131 + common->max_oasz_lg2 = pt_max_oa_lg2(common); 132 + else 133 + common->max_oasz_lg2 = min(common->max_oasz_lg2, 134 + pt_max_oa_lg2(common)); 135 + return 0; 136 + } 137 + 138 + static int pt_iommu_init_domain(struct pt_iommu *iommu_table, 139 + struct iommu_domain *domain) 140 + { 141 + struct pt_common *common = common_from_iommu(iommu_table); 142 + struct pt_iommu_info info; 143 + struct pt_range range; 144 + 145 + NS(get_info)(iommu_table, &info); 146 + 147 + domain->type = __IOMMU_DOMAIN_PAGING; 148 + domain->pgsize_bitmap = info.pgsize_bitmap; 149 + 150 + if (pt_feature(common, PT_FEAT_DYNAMIC_TOP)) 151 + range = _pt_top_range(common, 152 + _pt_top_set(NULL, PT_MAX_TOP_LEVEL)); 153 + else 154 + range = pt_top_range(common); 155 + 156 + /* A 64-bit high address space table on a 32-bit system cannot work. */ 157 + domain->geometry.aperture_start = (unsigned long)range.va; 158 + if ((pt_vaddr_t)domain->geometry.aperture_start != range.va) 159 + return -EOVERFLOW; 160 + 161 + /* 162 + * The aperture is limited to what the API can do after considering all 163 + * the different types dma_addr_t/unsigned long/pt_vaddr_t that are used 164 + * to store a VA. Set the aperture to something that is valid for all 165 + * cases. Saturate instead of truncate the end if the types are smaller 166 + * than the top range. aperture_end should be called aperture_last. 167 + */ 168 + domain->geometry.aperture_end = (unsigned long)range.last_va; 169 + if ((pt_vaddr_t)domain->geometry.aperture_end != range.last_va) { 170 + domain->geometry.aperture_end = ULONG_MAX; 171 + domain->pgsize_bitmap &= ULONG_MAX; 172 + } 173 + domain->geometry.force_aperture = true; 174 + 175 + return 0; 176 + } 177 + 178 + static void pt_iommu_zero(struct pt_iommu_table *fmt_table) 179 + { 180 + struct pt_iommu *iommu_table = &fmt_table->iommu; 181 + struct pt_iommu cfg = *iommu_table; 182 + 183 + static_assert(offsetof(struct pt_iommu_table, iommu.domain) == 0); 184 + memset_after(fmt_table, 0, iommu.domain); 185 + 186 + /* The caller can initialize some of these values */ 187 + iommu_table->nid = cfg.nid; 188 + } 189 + 190 + #define pt_iommu_table_cfg CONCATENATE(pt_iommu_table, _cfg) 191 + #define pt_iommu_init CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), init) 192 + 193 + int pt_iommu_init(struct pt_iommu_table *fmt_table, 194 + const struct pt_iommu_table_cfg *cfg, gfp_t gfp) 195 + { 196 + struct pt_iommu *iommu_table = &fmt_table->iommu; 197 + struct pt_common *common = common_from_iommu(iommu_table); 198 + struct pt_table_p *table_mem; 199 + int ret; 200 + 201 + if (cfg->common.hw_max_vasz_lg2 > PT_MAX_VA_ADDRESS_LG2 || 202 + !cfg->common.hw_max_vasz_lg2 || !cfg->common.hw_max_oasz_lg2) 203 + return -EINVAL; 204 + 205 + pt_iommu_zero(fmt_table); 206 + common->features = cfg->common.features; 207 + common->max_vasz_lg2 = cfg->common.hw_max_vasz_lg2; 208 + common->max_oasz_lg2 = cfg->common.hw_max_oasz_lg2; 209 + ret = pt_iommu_fmt_init(fmt_table, cfg); 210 + if (ret) 211 + return ret; 212 + 213 + if (cfg->common.hw_max_oasz_lg2 > pt_max_oa_lg2(common)) 214 + return -EINVAL; 215 + 216 + ret = pt_init_common(common); 217 + if (ret) 218 + return ret; 219 + 220 + if (pt_feature(common, PT_FEAT_SIGN_EXTEND) && 221 + (pt_feature(common, PT_FEAT_FULL_VA) || 222 + pt_feature(common, PT_FEAT_DYNAMIC_TOP))) 223 + return -EINVAL; 224 + 225 + ret = pt_iommu_init_domain(iommu_table, &iommu_table->domain); 226 + if (ret) 227 + return ret; 228 + 229 + table_mem = table_alloc_top(common, common->top_of_table, gfp); 230 + if (IS_ERR(table_mem)) 231 + return PTR_ERR(table_mem); 232 + pt_top_set(common, table_mem, pt_top_get_level(common)); 233 + 234 + /* Must be last, see pt_iommu_deinit() */ 235 + iommu_table->ops = &NS(ops); 236 + return 0; 237 + } 238 + EXPORT_SYMBOL_NS_GPL(pt_iommu_init, "GENERIC_PT_IOMMU"); 239 + 240 + #ifdef pt_iommu_fmt_hw_info 241 + #define pt_iommu_table_hw_info CONCATENATE(pt_iommu_table, _hw_info) 242 + #define pt_iommu_hw_info CONCATENATE(CONCATENATE(pt_iommu_, PTPFX), hw_info) 243 + void pt_iommu_hw_info(struct pt_iommu_table *fmt_table, 244 + struct pt_iommu_table_hw_info *info) 245 + { 246 + struct pt_iommu *iommu_table = &fmt_table->iommu; 247 + struct pt_common *common = common_from_iommu(iommu_table); 248 + struct pt_range top_range = pt_top_range(common); 249 + 250 + pt_iommu_fmt_hw_info(fmt_table, &top_range, info); 251 + } 252 + EXPORT_SYMBOL_NS_GPL(pt_iommu_hw_info, "GENERIC_PT_IOMMU"); 253 + #endif 254 + 255 + MODULE_LICENSE("GPL"); 256 + MODULE_DESCRIPTION("IOMMU Page table implementation for " __stringify(PTPFX_RAW)); 257 + MODULE_IMPORT_NS("GENERIC_PT"); 258 + 259 + #endif /* __GENERIC_PT_IOMMU_PT_H */
+150
include/linux/generic_pt/iommu.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES 4 + */ 5 + #ifndef __GENERIC_PT_IOMMU_H 6 + #define __GENERIC_PT_IOMMU_H 7 + 8 + #include <linux/generic_pt/common.h> 9 + #include <linux/iommu.h> 10 + #include <linux/mm_types.h> 11 + 12 + struct pt_iommu_ops; 13 + 14 + /** 15 + * DOC: IOMMU Radix Page Table 16 + * 17 + * The IOMMU implementation of the Generic Page Table provides an ops struct 18 + * that is useful to go with an iommu_domain to serve the DMA API, IOMMUFD and 19 + * the generic map/unmap interface. 20 + * 21 + * This interface uses a caller provided locking approach. The caller must have 22 + * a VA range lock concept that prevents concurrent threads from calling ops on 23 + * the same VA. Generally the range lock must be at least as large as a single 24 + * map call. 25 + */ 26 + 27 + /** 28 + * struct pt_iommu - Base structure for IOMMU page tables 29 + * 30 + * The format-specific struct will include this as the first member. 31 + */ 32 + struct pt_iommu { 33 + /** 34 + * @domain: The core IOMMU domain. The driver should use a union to 35 + * overlay this memory with its previously existing domain struct to 36 + * create an alias. 37 + */ 38 + struct iommu_domain domain; 39 + 40 + /** 41 + * @ops: Function pointers to access the API 42 + */ 43 + const struct pt_iommu_ops *ops; 44 + 45 + /** 46 + * @nid: Node ID to use for table memory allocations. The IOMMU driver 47 + * may want to set the NID to the device's NID, if there are multiple 48 + * table walkers. 49 + */ 50 + int nid; 51 + }; 52 + 53 + /** 54 + * struct pt_iommu_info - Details about the IOMMU page table 55 + * 56 + * Returned from pt_iommu_ops->get_info() 57 + */ 58 + struct pt_iommu_info { 59 + /** 60 + * @pgsize_bitmap: A bitmask where each set bit indicates 61 + * a page size that can be natively stored in the page table. 62 + */ 63 + u64 pgsize_bitmap; 64 + }; 65 + 66 + struct pt_iommu_ops { 67 + /** 68 + * @get_info: Return the pt_iommu_info structure 69 + * @iommu_table: Table to query 70 + * 71 + * Return some basic static information about the page table. 72 + */ 73 + void (*get_info)(struct pt_iommu *iommu_table, 74 + struct pt_iommu_info *info); 75 + 76 + /** 77 + * @deinit: Undo a format specific init operation 78 + * @iommu_table: Table to destroy 79 + * 80 + * Release all of the memory. The caller must have already removed the 81 + * table from all HW access and all caches. 82 + */ 83 + void (*deinit)(struct pt_iommu *iommu_table); 84 + }; 85 + 86 + static inline void pt_iommu_deinit(struct pt_iommu *iommu_table) 87 + { 88 + /* 89 + * It is safe to call pt_iommu_deinit() before an init, or if init 90 + * fails. The ops pointer will only become non-NULL if deinit needs to be 91 + * run. 92 + */ 93 + if (iommu_table->ops) 94 + iommu_table->ops->deinit(iommu_table); 95 + } 96 + 97 + /** 98 + * struct pt_iommu_cfg - Common configuration values for all formats 99 + */ 100 + struct pt_iommu_cfg { 101 + /** 102 + * @features: Features required. Only these features will be turned on. 103 + * The feature list should reflect what the IOMMU HW is capable of. 104 + */ 105 + unsigned int features; 106 + /** 107 + * @hw_max_vasz_lg2: Maximum VA the IOMMU HW can support. This will 108 + * imply the top level of the table. 109 + */ 110 + u8 hw_max_vasz_lg2; 111 + /** 112 + * @hw_max_oasz_lg2: Maximum OA the IOMMU HW can support. The format 113 + * might select a lower maximum OA. 114 + */ 115 + u8 hw_max_oasz_lg2; 116 + }; 117 + 118 + /* Generate the exported function signatures from iommu_pt.h */ 119 + #define IOMMU_PROTOTYPES(fmt) \ 120 + int pt_iommu_##fmt##_init(struct pt_iommu_##fmt *table, \ 121 + const struct pt_iommu_##fmt##_cfg *cfg, \ 122 + gfp_t gfp); \ 123 + void pt_iommu_##fmt##_hw_info(struct pt_iommu_##fmt *table, \ 124 + struct pt_iommu_##fmt##_hw_info *info) 125 + #define IOMMU_FORMAT(fmt, member) \ 126 + struct pt_iommu_##fmt { \ 127 + struct pt_iommu iommu; \ 128 + struct pt_##fmt member; \ 129 + }; \ 130 + IOMMU_PROTOTYPES(fmt) 131 + 132 + /* 133 + * The driver should setup its domain struct like 134 + * union { 135 + * struct iommu_domain domain; 136 + * struct pt_iommu_xxx xx; 137 + * }; 138 + * PT_IOMMU_CHECK_DOMAIN(struct mock_iommu_domain, xx.iommu, domain); 139 + * 140 + * Which creates an alias between driver_domain.domain and 141 + * driver_domain.xx.iommu.domain. This is to avoid a mass rename of existing 142 + * driver_domain.domain users. 143 + */ 144 + #define PT_IOMMU_CHECK_DOMAIN(s, pt_iommu_memb, domain_memb) \ 145 + static_assert(offsetof(s, pt_iommu_memb.domain) == \ 146 + offsetof(s, domain_memb)) 147 + 148 + #undef IOMMU_PROTOTYPES 149 + #undef IOMMU_FORMAT 150 + #endif