Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'drm-xe-next-2026-03-25' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Hi Dave and Sima,

Here goes our third, perhaps, final drm-xe-next PR towards 7.1.

In the big things we have:
- THP support in drm_pagemap
- xe_vm_get_property_ioctl

Thanks,
Matt

UAPI Changes:
- Implement xe_vm_get_property_ioctl (Jonathan)

Cross-subsystem Changes:
- Enable THP support in drm_pagemap (Francois, Brost)

Core Changes:
- Improve VF FLR synchronization for Xe VFIO (Piotr)

Driver Changes:
- Fix confusion with locals on context creation (Tomasz, Fixes)
- Add new SVM copy GT stats per size (Francois)
- always keep track of remap prev/next (Auld, Fixes)
- AuxCCS handling and render compression modifiers (Tvrtko)
- Implement recent spec updates to Wa_16025250150 (Roper)
- xe3p_lpg: L2 flush optimization (Tejas)
- vf: Improve getting clean NULL context (Wajdeczko)
- pf: Fix use-after-free in migration restore (Winiarski. Fixes)
- Fix format specifier for printing pointer differences (Nathan Chancellor, Fixes)
- Extend Wa_14026781792 for xe3lpg (Niton)
- xe3p_lpg: Add Wa_16029437861 (Varun)
- Fix spelling mistakes and comment style in ttm_resource.c (Varun)
- Merge drm/drm-next into drm-xe-next (Thomas)
- Fix missing runtime PM reference in ccs_mode_store (Sanjay, Fixes)
- Fix uninitialized new_ts when capturing context timestamp (Umesh)
- Allow reading after disabling OA stream (Ashutosh)
- Page Reclamation Fixes (Brian Nguyen, Fixes)
- Include running dword offset in default_lrc dumps (Roper)
- Assert/Deassert I2C IRQ (Raag)
- Fixup reset, wedge, unload corner cases (Zhanjun, Brost)
- Fail immediately on GuC load error (Daniele)
- Fix kernel-doc for DRM_XE_VM_BIND_FLAG_DECOMPRESS (Niton, Fixes)
- Drop redundant entries for Wa_16021867713 & Wa_14019449301 (Roper, Fixes)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/acS5xmWC3ivPTmyV@gsse-cloud1.jf.intel.com

+1340 -293
+5 -2
drivers/gpu/drm/drm_gpusvm.c
··· 1488 1488 order = drm_gpusvm_hmm_pfn_to_order(pfns[i], i, npages); 1489 1489 if (is_device_private_page(page) || 1490 1490 is_device_coherent_page(page)) { 1491 + struct drm_pagemap_zdd *__zdd = 1492 + drm_pagemap_page_zone_device_data(page); 1493 + 1491 1494 if (!ctx->allow_mixed && 1492 - zdd != page->zone_device_data && i > 0) { 1495 + zdd != __zdd && i > 0) { 1493 1496 err = -EOPNOTSUPP; 1494 1497 goto err_unmap; 1495 1498 } 1496 - zdd = page->zone_device_data; 1499 + zdd = __zdd; 1497 1500 if (pagemap != page_pgmap(page)) { 1498 1501 if (pagemap) { 1499 1502 err = -EOPNOTSUPP;
+128 -29
drivers/gpu/drm/drm_pagemap.c
··· 154 154 } 155 155 156 156 /** 157 - * drm_pagemap_migration_unlock_put_page() - Put a migration page 158 - * @page: Pointer to the page to put 157 + * drm_pagemap_migration_unlock_put_folio() - Put a migration folio 158 + * @folio: Pointer to the folio to put 159 159 * 160 - * This function unlocks and puts a page. 160 + * This function unlocks and puts a folio. 161 161 */ 162 - static void drm_pagemap_migration_unlock_put_page(struct page *page) 162 + static void drm_pagemap_migration_unlock_put_folio(struct folio *folio) 163 163 { 164 - unlock_page(page); 165 - put_page(page); 164 + folio_unlock(folio); 165 + folio_put(folio); 166 166 } 167 167 168 168 /** ··· 177 177 { 178 178 unsigned long i; 179 179 180 - for (i = 0; i < npages; ++i) { 180 + for (i = 0; i < npages;) { 181 181 struct page *page; 182 + struct folio *folio; 183 + unsigned int order = 0; 182 184 183 185 if (!migrate_pfn[i]) 184 - continue; 186 + goto next; 185 187 186 188 page = migrate_pfn_to_page(migrate_pfn[i]); 187 - drm_pagemap_migration_unlock_put_page(page); 189 + folio = page_folio(page); 190 + order = folio_order(folio); 191 + 192 + drm_pagemap_migration_unlock_put_folio(folio); 188 193 migrate_pfn[i] = 0; 194 + 195 + next: 196 + i += NR_PAGES(order); 189 197 } 190 198 } 191 199 192 200 /** 193 201 * drm_pagemap_get_devmem_page() - Get a reference to a device memory page 194 202 * @page: Pointer to the page 203 + * @order: Order 195 204 * @zdd: Pointer to the GPU SVM zone device data 196 205 * 197 206 * This function associates the given page with the specified GPU SVM zone 198 207 * device data and initializes it for zone device usage. 199 208 */ 200 209 static void drm_pagemap_get_devmem_page(struct page *page, 210 + unsigned int order, 201 211 struct drm_pagemap_zdd *zdd) 202 212 { 203 - page->zone_device_data = drm_pagemap_zdd_get(zdd); 204 - zone_device_page_init(page, page_pgmap(page), 0); 213 + zone_device_folio_init((struct folio *)page, zdd->dpagemap->pagemap, 214 + order); 215 + folio_set_zone_device_data(page_folio(page), drm_pagemap_zdd_get(zdd)); 205 216 } 206 217 207 218 /** ··· 255 244 order = folio_order(folio); 256 245 257 246 if (is_device_private_page(page)) { 258 - struct drm_pagemap_zdd *zdd = page->zone_device_data; 247 + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(page); 259 248 struct drm_pagemap *dpagemap = zdd->dpagemap; 260 249 struct drm_pagemap_addr addr; 261 250 ··· 326 315 goto next; 327 316 328 317 if (is_zone_device_page(page)) { 329 - struct drm_pagemap_zdd *zdd = page->zone_device_data; 318 + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(page); 330 319 struct drm_pagemap *dpagemap = zdd->dpagemap; 331 320 332 321 dpagemap->ops->device_unmap(dpagemap, dev, &pagemap_addr[i]); ··· 456 445 } 457 446 458 447 /** 448 + * drm_pagemap_cpages() - Count collected pages 449 + * @migrate_pfn: Array of migrate_pfn entries to account 450 + * @npages: Number of entries in @migrate_pfn 451 + * 452 + * Compute the total number of minimum-sized pages represented by the 453 + * collected entries in @migrate_pfn. The total is derived from the 454 + * order encoded in each entry. 455 + * 456 + * Return: Total number of minimum-sized pages. 457 + */ 458 + static int drm_pagemap_cpages(unsigned long *migrate_pfn, unsigned long npages) 459 + { 460 + unsigned long i, cpages = 0; 461 + 462 + for (i = 0; i < npages;) { 463 + struct page *page = migrate_pfn_to_page(migrate_pfn[i]); 464 + struct folio *folio; 465 + unsigned int order = 0; 466 + 467 + if (page) { 468 + folio = page_folio(page); 469 + order = folio_order(folio); 470 + cpages += NR_PAGES(order); 471 + } else if (migrate_pfn[i] & MIGRATE_PFN_COMPOUND) { 472 + order = HPAGE_PMD_ORDER; 473 + cpages += NR_PAGES(order); 474 + } 475 + 476 + i += NR_PAGES(order); 477 + } 478 + 479 + return cpages; 480 + } 481 + 482 + /** 459 483 * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory 460 484 * @devmem_allocation: The device memory allocation to migrate to. 461 485 * The caller should hold a reference to the device memory allocation, ··· 527 481 .end = end, 528 482 .pgmap_owner = pagemap->owner, 529 483 .flags = MIGRATE_VMA_SELECT_SYSTEM | MIGRATE_VMA_SELECT_DEVICE_COHERENT | 530 - MIGRATE_VMA_SELECT_DEVICE_PRIVATE, 484 + MIGRATE_VMA_SELECT_DEVICE_PRIVATE | MIGRATE_VMA_SELECT_COMPOUND, 531 485 }; 532 486 unsigned long i, npages = npages_in_range(start, end); 533 487 unsigned long own_pages = 0, migrated_pages = 0; ··· 592 546 goto err_free; 593 547 } 594 548 595 - if (migrate.cpages != npages) { 549 + if (migrate.cpages != npages && 550 + drm_pagemap_cpages(migrate.src, npages) != npages) { 596 551 /* 597 552 * Some pages to migrate. But we want to migrate all or 598 553 * nothing. Raced or unknown device pages. ··· 633 586 634 587 own_pages = 0; 635 588 636 - for (i = 0; i < npages; ++i) { 589 + for (i = 0; i < npages;) { 590 + unsigned long j; 637 591 struct page *page = pfn_to_page(migrate.dst[i]); 638 592 struct page *src_page = migrate_pfn_to_page(migrate.src[i]); 639 - cur.start = i; 593 + unsigned int order = 0; 640 594 595 + cur.start = i; 641 596 pages[i] = NULL; 642 597 if (src_page && is_device_private_page(src_page)) { 643 - struct drm_pagemap_zdd *src_zdd = src_page->zone_device_data; 598 + struct drm_pagemap_zdd *src_zdd = 599 + drm_pagemap_page_zone_device_data(src_page); 644 600 645 601 if (page_pgmap(src_page) == pagemap && 646 602 !mdetails->can_migrate_same_pagemap) { 647 603 migrate.dst[i] = 0; 648 604 own_pages++; 649 - continue; 605 + goto next; 650 606 } 651 607 if (mdetails->source_peer_migrates) { 652 608 cur.dpagemap = src_zdd->dpagemap; ··· 665 615 pages[i] = page; 666 616 } 667 617 migrate.dst[i] = migrate_pfn(migrate.dst[i]); 668 - drm_pagemap_get_devmem_page(page, zdd); 618 + 619 + if (migrate.src[i] & MIGRATE_PFN_COMPOUND) { 620 + drm_WARN_ONCE(dpagemap->drm, src_page && 621 + folio_order(page_folio(src_page)) != HPAGE_PMD_ORDER, 622 + "Unexpected folio order\n"); 623 + 624 + order = HPAGE_PMD_ORDER; 625 + migrate.dst[i] |= MIGRATE_PFN_COMPOUND; 626 + 627 + for (j = 1; j < NR_PAGES(order) && i + j < npages; j++) 628 + migrate.dst[i + j] = 0; 629 + } 630 + 631 + drm_pagemap_get_devmem_page(page, order, zdd); 669 632 670 633 /* If we switched the migrating drm_pagemap, migrate previous pages now */ 671 634 err = drm_pagemap_migrate_range(devmem_allocation, migrate.src, migrate.dst, ··· 688 625 npages = i + 1; 689 626 goto err_finalize; 690 627 } 628 + 629 + next: 630 + i += NR_PAGES(order); 691 631 } 632 + 692 633 cur.start = npages; 693 634 cur.ops = NULL; /* Force migration */ 694 635 err = drm_pagemap_migrate_range(devmem_allocation, migrate.src, migrate.dst, ··· 782 715 goto next; 783 716 784 717 if (fault_page) { 785 - if (src_page->zone_device_data != 786 - fault_page->zone_device_data) 718 + if (drm_pagemap_page_zone_device_data(src_page) != 719 + drm_pagemap_page_zone_device_data(fault_page)) 787 720 goto next; 788 721 } 789 722 ··· 801 734 page = folio_page(folio, 0); 802 735 mpfn[i] = migrate_pfn(page_to_pfn(page)); 803 736 737 + if (order) 738 + mpfn[i] |= MIGRATE_PFN_COMPOUND; 804 739 next: 805 740 if (page) 806 741 addr += page_size(page); ··· 1058 989 if (err) 1059 990 goto err_finalize; 1060 991 1061 - for (i = 0; i < npages; ++i) 992 + for (i = 0; i < npages;) { 993 + unsigned int order = 0; 994 + 1062 995 pages[i] = migrate_pfn_to_page(src[i]); 996 + if (pages[i]) 997 + order = folio_order(page_folio(pages[i])); 998 + 999 + i += NR_PAGES(order); 1000 + } 1063 1001 1064 1002 err = ops->copy_to_ram(pages, pagemap_addr, npages, NULL); 1065 1003 if (err) ··· 1119 1043 .vma = vas, 1120 1044 .pgmap_owner = page_pgmap(page)->owner, 1121 1045 .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE | 1122 - MIGRATE_VMA_SELECT_DEVICE_COHERENT, 1046 + MIGRATE_VMA_SELECT_DEVICE_COHERENT | 1047 + MIGRATE_VMA_SELECT_COMPOUND, 1123 1048 .fault_page = page, 1124 1049 }; 1125 1050 struct drm_pagemap_migrate_details mdetails = {}; ··· 1134 1057 void *buf; 1135 1058 int i, err = 0; 1136 1059 1137 - zdd = page->zone_device_data; 1060 + zdd = drm_pagemap_page_zone_device_data(page); 1138 1061 if (time_before64(get_jiffies_64(), zdd->devmem_allocation->timeslice_expiration)) 1139 1062 return 0; 1140 1063 ··· 1186 1109 if (err) 1187 1110 goto err_finalize; 1188 1111 1189 - for (i = 0; i < npages; ++i) 1112 + for (i = 0; i < npages;) { 1113 + unsigned int order = 0; 1114 + 1190 1115 pages[i] = migrate_pfn_to_page(migrate.src[i]); 1116 + if (pages[i]) 1117 + order = folio_order(page_folio(pages[i])); 1118 + 1119 + i += NR_PAGES(order); 1120 + } 1191 1121 1192 1122 err = ops->copy_to_ram(pages, pagemap_addr, npages, NULL); 1193 1123 if (err) ··· 1224 1140 */ 1225 1141 static void drm_pagemap_folio_free(struct folio *folio) 1226 1142 { 1227 - drm_pagemap_zdd_put(folio->page.zone_device_data); 1143 + struct page *page = folio_page(folio, 0); 1144 + 1145 + drm_pagemap_zdd_put(drm_pagemap_page_zone_device_data(page)); 1228 1146 } 1229 1147 1230 1148 /** ··· 1242 1156 */ 1243 1157 static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf) 1244 1158 { 1245 - struct drm_pagemap_zdd *zdd = vmf->page->zone_device_data; 1159 + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(vmf->page); 1246 1160 int err; 1247 1161 1248 1162 err = __drm_pagemap_migrate_to_ram(vmf->vma, ··· 1252 1166 return err ? VM_FAULT_SIGBUS : 0; 1253 1167 } 1254 1168 1169 + static void drm_pagemap_folio_split(struct folio *orig_folio, struct folio *new_folio) 1170 + { 1171 + struct drm_pagemap_zdd *zdd; 1172 + 1173 + if (!new_folio) 1174 + return; 1175 + 1176 + new_folio->pgmap = orig_folio->pgmap; 1177 + zdd = folio_zone_device_data(orig_folio); 1178 + folio_set_zone_device_data(new_folio, drm_pagemap_zdd_get(zdd)); 1179 + } 1180 + 1255 1181 static const struct dev_pagemap_ops drm_pagemap_pagemap_ops = { 1256 1182 .folio_free = drm_pagemap_folio_free, 1257 1183 .migrate_to_ram = drm_pagemap_migrate_to_ram, 1184 + .folio_split = drm_pagemap_folio_split, 1258 1185 }; 1259 1186 1260 1187 /** ··· 1321 1222 */ 1322 1223 struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page) 1323 1224 { 1324 - struct drm_pagemap_zdd *zdd = page->zone_device_data; 1225 + struct drm_pagemap_zdd *zdd = drm_pagemap_page_zone_device_data(page); 1325 1226 1326 1227 return zdd->devmem_allocation->dpagemap; 1327 1228 }
+14 -14
drivers/gpu/drm/ttm/ttm_resource.c
··· 37 37 #include <drm/drm_print.h> 38 38 #include <drm/drm_util.h> 39 39 40 - /* Detach the cursor from the bulk move list*/ 40 + /* Detach the cursor from the bulk move list */ 41 41 static void 42 42 ttm_resource_cursor_clear_bulk(struct ttm_resource_cursor *cursor) 43 43 { ··· 105 105 * ttm_resource_cursor_fini() - Finalize the LRU list cursor usage 106 106 * @cursor: The struct ttm_resource_cursor to finalize. 107 107 * 108 - * The function pulls the LRU list cursor off any lists it was previusly 108 + * The function pulls the LRU list cursor off any lists it was previously 109 109 * attached to. Needs to be called with the LRU lock held. The function 110 - * can be called multiple times after eachother. 110 + * can be called multiple times after each other. 111 111 */ 112 112 void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor) 113 113 { ··· 317 317 } 318 318 319 319 /** 320 - * ttm_resource_init - resource object constructure 321 - * @bo: buffer object this resources is allocated for 320 + * ttm_resource_init - resource object constructor 321 + * @bo: buffer object this resource is allocated for 322 322 * @place: placement of the resource 323 - * @res: the resource object to inistilize 323 + * @res: the resource object to initialize 324 324 * 325 325 * Initialize a new resource object. Counterpart of ttm_resource_fini(). 326 326 */ ··· 435 435 * @size: How many bytes the new allocation needs. 436 436 * 437 437 * Test if @res intersects with @place and @size. Used for testing if evictions 438 - * are valueable or not. 438 + * are valuable or not. 439 439 * 440 440 * Returns true if the res placement intersects with @place and @size. 441 441 */ ··· 513 513 * @bdev: ttm device this manager belongs to 514 514 * @size: size of managed resources in arbitrary units 515 515 * 516 - * Initialise core parts of a manager object. 516 + * Initialize core parts of a manager object. 517 517 */ 518 518 void ttm_resource_manager_init(struct ttm_resource_manager *man, 519 519 struct ttm_device *bdev, ··· 536 536 /* 537 537 * ttm_resource_manager_evict_all 538 538 * 539 - * @bdev - device to use 540 - * @man - manager to use 539 + * @bdev: device to use 540 + * @man: manager to use 541 541 * 542 542 * Evict all the objects out of a memory manager until it is empty. 543 543 * Part of memory manager cleanup sequence. ··· 882 882 883 883 /** 884 884 * ttm_kmap_iter_linear_io_fini - Clean up an iterator for linear io memory 885 - * @iter_io: The iterator to initialize 885 + * @iter_io: The iterator to finalize 886 886 * @bdev: The TTM device 887 887 * @mem: The ttm resource representing the iomap. 888 888 * ··· 921 921 /** 922 922 * ttm_resource_manager_create_debugfs - Create debugfs entry for specified 923 923 * resource manager. 924 - * @man: The TTM resource manager for which the debugfs stats file be creates 924 + * @man: The TTM resource manager for which the debugfs stats file to be created 925 925 * @parent: debugfs directory in which the file will reside 926 926 * @name: The filename to create. 927 927 * 928 - * This function setups up a debugfs file that can be used to look 928 + * This function sets up a debugfs file that can be used to look 929 929 * at debug statistics of the specified ttm_resource_manager. 930 930 */ 931 931 void ttm_resource_manager_create_debugfs(struct ttm_resource_manager *man, 932 - struct dentry * parent, 932 + struct dentry *parent, 933 933 const char *name) 934 934 { 935 935 #if defined(CONFIG_DEBUG_FS)
+8 -4
drivers/gpu/drm/xe/display/intel_fbdev_fb.c
··· 56 56 if (intel_fbdev_fb_prefer_stolen(drm, size)) { 57 57 obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), 58 58 size, 59 - ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT | 59 + ttm_bo_type_kernel, 60 + XE_BO_FLAG_FORCE_WC | 60 61 XE_BO_FLAG_STOLEN | 61 - XE_BO_FLAG_GGTT, false); 62 + XE_BO_FLAG_GGTT, 63 + false); 62 64 if (!IS_ERR(obj)) 63 65 drm_info(&xe->drm, "Allocated fbdev into stolen\n"); 64 66 else ··· 71 69 72 70 if (IS_ERR(obj)) { 73 71 obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size, 74 - ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT | 72 + ttm_bo_type_kernel, 73 + XE_BO_FLAG_FORCE_WC | 75 74 XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) | 76 - XE_BO_FLAG_GGTT, false); 75 + XE_BO_FLAG_GGTT, 76 + false); 77 77 } 78 78 79 79 if (IS_ERR(obj)) {
+8
drivers/gpu/drm/xe/display/xe_display.c
··· 541 541 .synchronize = irq_synchronize, 542 542 }; 543 543 544 + static bool has_auxccs(struct drm_device *drm) 545 + { 546 + struct xe_device *xe = to_xe_device(drm); 547 + 548 + return xe->info.platform == XE_ALDERLAKE_P; 549 + } 550 + 544 551 static const struct intel_display_parent_interface parent = { 545 552 .bo = &xe_display_bo_interface, 546 553 .dsb = &xe_display_dsb_interface, ··· 559 552 .pcode = &xe_display_pcode_interface, 560 553 .rpm = &xe_display_rpm_interface, 561 554 .stolen = &xe_display_stolen_interface, 555 + .has_auxccs = has_auxccs, 562 556 }; 563 557 564 558 /**
+3 -3
drivers/gpu/drm/xe/display/xe_display_bo.c
··· 42 42 if (ret) 43 43 goto err; 44 44 45 - if (!(bo->flags & XE_BO_FLAG_SCANOUT)) { 45 + if (!(bo->flags & XE_BO_FLAG_FORCE_WC)) { 46 46 /* 47 - * XE_BO_FLAG_SCANOUT should ideally be set at creation, or is 47 + * XE_BO_FLAG_FORCE_WC should ideally be set at creation, or is 48 48 * automatically set when creating FB. We cannot change caching 49 49 * mode when the bo is VM_BINDed, so we can only set 50 50 * coherency with display when unbound. ··· 54 54 ret = -EINVAL; 55 55 goto err; 56 56 } 57 - bo->flags |= XE_BO_FLAG_SCANOUT; 57 + bo->flags |= XE_BO_FLAG_FORCE_WC; 58 58 } 59 59 ttm_bo_unreserve(&bo->ttm); 60 60 return 0;
+3 -1
drivers/gpu/drm/xe/display/xe_dsb_buffer.c
··· 54 54 PAGE_ALIGN(size), 55 55 ttm_bo_type_kernel, 56 56 XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) | 57 - XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false); 57 + XE_BO_FLAG_FORCE_WC | 58 + XE_BO_FLAG_GGTT, 59 + false); 58 60 if (IS_ERR(obj)) { 59 61 ret = PTR_ERR(obj); 60 62 goto err_pin_map;
+85 -33
drivers/gpu/drm/xe/display/xe_fb_pin.c
··· 49 49 *dpt_ofs = ALIGN(*dpt_ofs, 4096); 50 50 } 51 51 52 - static void 53 - write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, 54 - u32 bo_ofs, u32 width, u32 height, u32 src_stride, 55 - u32 dst_stride) 52 + static unsigned int 53 + write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int pad) 54 + { 55 + /* The DE ignores the PTEs for the padding tiles */ 56 + return dest + pad * sizeof(u64); 57 + } 58 + 59 + static unsigned int 60 + write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map, 61 + unsigned int dest, 62 + const struct intel_remapped_plane_info *plane) 56 63 { 57 64 struct xe_device *xe = xe_bo_device(bo); 58 65 struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt; 59 - u32 column, row; 60 - u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe->pat.idx[XE_CACHE_NONE]); 66 + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, 67 + xe->pat.idx[XE_CACHE_NONE]); 68 + unsigned int offset = plane->offset * XE_PAGE_SIZE; 69 + unsigned int size = plane->size; 61 70 62 - for (row = 0; row < height; row++) { 63 - u32 src_idx = src_stride * row + bo_ofs; 71 + while (size--) { 72 + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE); 64 73 65 - for (column = 0; column < width; column++) { 66 - u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE, XE_PAGE_SIZE); 67 - iosys_map_wr(map, *dpt_ofs, u64, pte | addr); 68 - 69 - *dpt_ofs += 8; 70 - src_idx++; 71 - } 72 - 73 - /* The DE ignores the PTEs for the padding tiles */ 74 - *dpt_ofs += (dst_stride - width) * 8; 74 + iosys_map_wr(map, dest, u64, addr | pte); 75 + dest += sizeof(u64); 76 + offset += XE_PAGE_SIZE; 75 77 } 76 78 77 - /* Align to next page */ 78 - *dpt_ofs = ALIGN(*dpt_ofs, 4096); 79 + return dest; 80 + } 81 + 82 + static unsigned int 83 + write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map, 84 + unsigned int dest, 85 + const struct intel_remapped_plane_info *plane) 86 + { 87 + struct xe_device *xe = xe_bo_device(bo); 88 + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt; 89 + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, 90 + xe->pat.idx[XE_CACHE_NONE]); 91 + unsigned int offset, column, row; 92 + 93 + for (row = 0; row < plane->height; row++) { 94 + offset = (plane->offset + plane->src_stride * row) * 95 + XE_PAGE_SIZE; 96 + 97 + for (column = 0; column < plane->width; column++) { 98 + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE); 99 + 100 + iosys_map_wr(map, dest, u64, addr | pte); 101 + dest += sizeof(u64); 102 + offset += XE_PAGE_SIZE; 103 + } 104 + 105 + dest = write_dpt_padding(map, dest, 106 + plane->dst_stride - plane->width); 107 + } 108 + 109 + return dest; 110 + } 111 + 112 + static void 113 + write_dpt_remapped(struct xe_bo *bo, 114 + const struct intel_remapped_info *remap_info, 115 + struct iosys_map *map) 116 + { 117 + unsigned int i, dest = 0; 118 + 119 + for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) { 120 + const struct intel_remapped_plane_info *plane = 121 + &remap_info->plane[i]; 122 + 123 + if (!plane->linear && !plane->width && !plane->height) 124 + continue; 125 + 126 + if (dest && remap_info->plane_alignment) { 127 + const unsigned int index = dest / sizeof(u64); 128 + const unsigned int pad = 129 + ALIGN(index, remap_info->plane_alignment) - 130 + index; 131 + 132 + dest = write_dpt_padding(map, dest, pad); 133 + } 134 + 135 + if (plane->linear) 136 + dest = write_dpt_remapped_linear(bo, map, dest, plane); 137 + else 138 + dest = write_dpt_remapped_tiled(bo, map, dest, plane); 139 + } 79 140 } 80 141 81 142 static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb, ··· 183 122 ttm_bo_type_kernel, 184 123 XE_BO_FLAG_SYSTEM | 185 124 XE_BO_FLAG_GGTT | 186 - XE_BO_FLAG_PAGETABLE, 125 + XE_BO_FLAG_PAGETABLE | 126 + XE_BO_FLAG_FORCE_WC, 187 127 alignment, false); 188 128 if (IS_ERR(dpt)) 189 129 return PTR_ERR(dpt); ··· 199 137 iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr); 200 138 } 201 139 } else if (view->type == I915_GTT_VIEW_REMAPPED) { 202 - const struct intel_remapped_info *remap_info = &view->remapped; 203 - u32 i, dpt_ofs = 0; 204 - 205 - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) 206 - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs, 207 - remap_info->plane[i].offset, 208 - remap_info->plane[i].width, 209 - remap_info->plane[i].height, 210 - remap_info->plane[i].src_stride, 211 - remap_info->plane[i].dst_stride); 212 - 140 + write_dpt_remapped(bo, &view->remapped, &dpt->vmap); 213 141 } else { 214 142 const struct intel_rotation_info *rot_info = &view->rotated; 215 143 u32 i, dpt_ofs = 0; ··· 481 429 return 0; 482 430 483 431 /* We reject creating !SCANOUT fb's, so this is weird.. */ 484 - drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT)); 432 + drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_FORCE_WC)); 485 433 486 434 vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, alignment); 487 435
+1 -1
drivers/gpu/drm/xe/display/xe_initial_plane.c
··· 48 48 if (plane_config->size == 0) 49 49 return NULL; 50 50 51 - flags = XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT; 51 + flags = XE_BO_FLAG_FORCE_WC | XE_BO_FLAG_GGTT; 52 52 53 53 base = round_down(plane_config->base, page_size); 54 54 if (IS_DGFX(xe)) {
+19
drivers/gpu/drm/xe/instructions/xe_mi_commands.h
··· 34 34 #define MI_FORCE_WAKEUP __MI_INSTR(0x1D) 35 35 #define MI_MATH(n) (__MI_INSTR(0x1A) | XE_INSTR_NUM_DW((n) + 1)) 36 36 37 + #define MI_SEMAPHORE_WAIT (__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5)) 38 + #define MI_SEMW_GGTT REG_BIT(22) 39 + #define MI_SEMW_POLL REG_BIT(15) 40 + #define MI_SEMW_COMPARE_OP_MASK REG_GENMASK(14, 12) 41 + #define COMPARE_OP_SAD_GT_SDD 0 42 + #define COMPARE_OP_SAD_GTE_SDD 1 43 + #define COMPARE_OP_SAD_LT_SDD 2 44 + #define COMPARE_OP_SAD_LTE_SDD 3 45 + #define COMPARE_OP_SAD_EQ_SDD 4 46 + #define COMPARE_OP_SAD_NEQ_SDD 5 47 + #define MI_SEMW_COMPARE(OP) REG_FIELD_PREP(MI_SEMW_COMPARE_OP_MASK, COMPARE_OP_##OP) 48 + #define MI_SEMW_TOKEN(token) REG_FIELD_PREP(REG_GENMASK(9, 2), (token)) 49 + 37 50 #define MI_STORE_DATA_IMM __MI_INSTR(0x20) 38 51 #define MI_SDI_GGTT REG_BIT(22) 39 52 #define MI_SDI_LEN_DW GENMASK(9, 0) ··· 93 80 #define MI_SET_APPID __MI_INSTR(0x0e) 94 81 #define MI_SET_APPID_SESSION_ID_MASK REG_GENMASK(6, 0) 95 82 #define MI_SET_APPID_SESSION_ID(x) REG_FIELD_PREP(MI_SET_APPID_SESSION_ID_MASK, x) 83 + 84 + #define MI_SEMAPHORE_WAIT_TOKEN (__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5)) /* XeLP+ */ 85 + #define MI_SEMAPHORE_REGISTER_POLL REG_BIT(16) 86 + #define MI_SEMAPHORE_POLL REG_BIT(15) 87 + #define MI_SEMAPHORE_CMP_OP_MASK REG_GENMASK(14, 12) 88 + #define MI_SEMAPHORE_SAD_EQ_SDD REG_FIELD_PREP(MI_SEMAPHORE_CMP_OP_MASK, 4) 96 89 97 90 #endif
+8
drivers/gpu/drm/xe/regs/xe_engine_regs.h
··· 132 132 #define RING_BBADDR(base) XE_REG((base) + 0x140) 133 133 #define RING_BBADDR_UDW(base) XE_REG((base) + 0x168) 134 134 135 + #define PR_CTR_CTRL(base) XE_REG((base) + 0x178) 136 + #define CTR_COUNT_SELECT_FF REG_BIT(31) 137 + #define CTR_LOGIC_OP_MASK REG_GENMASK(30, 0) 138 + #define CTR_START 0 139 + #define CTR_STOP 1 140 + #define CTR_LOGIC_OP(OP) REG_FIELD_PREP(CTR_LOGIC_OP_MASK, CTR_##OP) 141 + #define PR_CTR_THRSH(base) XE_REG((base) + 0x17c) 142 + 135 143 #define BCS_SWCTRL(base) XE_REG((base) + 0x200, XE_REG_OPTION_MASKED) 136 144 #define BCS_SWCTRL_DISABLE_256B REG_BIT(2) 137 145
+1
drivers/gpu/drm/xe/regs/xe_gt_regs.h
··· 578 578 #define ENABLE_SMP_LD_RENDER_SURFACE_CONTROL REG_BIT(44 - 32) 579 579 #define FORCE_SLM_FENCE_SCOPE_TO_TILE REG_BIT(42 - 32) 580 580 #define FORCE_UGM_FENCE_SCOPE_TO_TILE REG_BIT(41 - 32) 581 + #define L3_128B_256B_WRT_DIS REG_BIT(40 - 32) 581 582 #define MAXREQS_PER_BANK REG_GENMASK(39 - 32, 37 - 32) 582 583 #define DISABLE_128B_EVICTION_COMMAND_UDW REG_BIT(36 - 32) 583 584 #define LSCFE_SAME_ADDRESS_ATOMICS_COALESCING_DISABLE REG_BIT(35 - 32)
+15 -9
drivers/gpu/drm/xe/xe_bo.c
··· 510 510 WARN_ON((bo->flags & XE_BO_FLAG_USER) && !bo->cpu_caching); 511 511 512 512 /* 513 - * Display scanout is always non-coherent with the CPU cache. 514 - * 515 513 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE 516 514 * lookups are also non-coherent and require a CPU:WC mapping. 517 515 */ 518 - if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) || 519 - (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE)) 516 + if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) || 517 + (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE)) 520 518 caching = ttm_write_combined; 521 519 } 522 520 ··· 687 689 688 690 if (!xe_vm_in_fault_mode(vm)) { 689 691 drm_gpuvm_bo_evict(vm_bo, true); 690 - continue; 692 + /* 693 + * L2 cache may not be flushed, so ensure that is done in 694 + * xe_vm_invalidate_vma() below 695 + */ 696 + if (!xe_device_is_l2_flush_optimized(xe)) 697 + continue; 691 698 } 692 699 693 700 if (!idle) { ··· 3199 3196 if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING) 3200 3197 bo_flags |= XE_BO_FLAG_DEFER_BACKING; 3201 3198 3199 + /* 3200 + * Display scanout is always non-coherent with the CPU cache. 3201 + */ 3202 3202 if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT) 3203 - bo_flags |= XE_BO_FLAG_SCANOUT; 3203 + bo_flags |= XE_BO_FLAG_FORCE_WC; 3204 3204 3205 3205 if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) { 3206 3206 if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20)) ··· 3215 3209 3216 3210 /* CCS formats need physical placement at a 64K alignment in VRAM. */ 3217 3211 if ((bo_flags & XE_BO_FLAG_VRAM_MASK) && 3218 - (bo_flags & XE_BO_FLAG_SCANOUT) && 3212 + (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT) && 3219 3213 !(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) && 3220 3214 IS_ALIGNED(args->size, SZ_64K)) 3221 3215 bo_flags |= XE_BO_FLAG_NEEDS_64K; ··· 3235 3229 args->cpu_caching != DRM_XE_GEM_CPU_CACHING_WC)) 3236 3230 return -EINVAL; 3237 3231 3238 - if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_SCANOUT && 3232 + if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_FORCE_WC && 3239 3233 args->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB)) 3240 3234 return -EINVAL; 3241 3235 ··· 3703 3697 bo = xe_bo_create_user(xe, NULL, args->size, 3704 3698 DRM_XE_GEM_CPU_CACHING_WC, 3705 3699 XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) | 3706 - XE_BO_FLAG_SCANOUT | 3700 + XE_BO_FLAG_FORCE_WC | 3707 3701 XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL); 3708 3702 if (IS_ERR(bo)) 3709 3703 return PTR_ERR(bo);
+1 -1
drivers/gpu/drm/xe/xe_bo.h
··· 35 35 #define XE_BO_FLAG_PINNED BIT(7) 36 36 #define XE_BO_FLAG_NO_RESV_EVICT BIT(8) 37 37 #define XE_BO_FLAG_DEFER_BACKING BIT(9) 38 - #define XE_BO_FLAG_SCANOUT BIT(10) 38 + #define XE_BO_FLAG_FORCE_WC BIT(10) 39 39 #define XE_BO_FLAG_FIXED_PLACEMENT BIT(11) 40 40 #define XE_BO_FLAG_PAGETABLE BIT(12) 41 41 #define XE_BO_FLAG_NEEDS_CPU_ACCESS BIT(13)
+33
drivers/gpu/drm/xe/xe_device.c
··· 211 211 DRM_RENDER_ALLOW), 212 212 DRM_IOCTL_DEF_DRV(XE_EXEC_QUEUE_SET_PROPERTY, xe_exec_queue_set_property_ioctl, 213 213 DRM_RENDER_ALLOW), 214 + DRM_IOCTL_DEF_DRV(XE_VM_GET_PROPERTY, xe_vm_get_property_ioctl, 215 + DRM_RENDER_ALLOW), 214 216 }; 215 217 216 218 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg) ··· 1096 1094 } 1097 1095 } 1098 1096 1097 + /** 1098 + * xe_device_is_l2_flush_optimized - if L2 flush is optimized by HW 1099 + * @xe: The device to check. 1100 + * 1101 + * Return: true if the HW device optimizing L2 flush, false otherwise. 1102 + */ 1103 + bool xe_device_is_l2_flush_optimized(struct xe_device *xe) 1104 + { 1105 + /* XA is *always* flushed, like at the end-of-submssion (and maybe other 1106 + * places), just that internally as an optimisation hw doesn't need to make 1107 + * that a full flush (which will also include XA) when Media is 1108 + * off/powergated, since it doesn't need to worry about GT caches vs Media 1109 + * coherency, and only CPU vs GPU coherency, so can make that flush a 1110 + * targeted XA flush, since stuff tagged with XA now means it's shared with 1111 + * the CPU. The main implication is that we now need to somehow flush non-XA before 1112 + * freeing system memory pages, otherwise dirty cachelines could be flushed after the free 1113 + * (like if Media suddenly turns on and does a full flush) 1114 + */ 1115 + if (GRAPHICS_VER(xe) >= 35 && !IS_DGFX(xe)) 1116 + return true; 1117 + return false; 1118 + } 1119 + 1099 1120 void xe_device_l2_flush(struct xe_device *xe) 1100 1121 { 1101 1122 struct xe_gt *gt; ··· 1164 1139 void xe_device_td_flush(struct xe_device *xe) 1165 1140 { 1166 1141 struct xe_gt *root_gt; 1142 + 1143 + /* 1144 + * From Xe3p onward the HW takes care of flush of TD entries also along 1145 + * with flushing XA entries, which will be at the usual sync points, 1146 + * like at the end of submission, so no manual flush is needed here. 1147 + */ 1148 + if (GRAPHICS_VER(xe) >= 35) 1149 + return; 1167 1150 1168 1151 if (!IS_DGFX(xe) || GRAPHICS_VER(xe) < 20) 1169 1152 return;
+1
drivers/gpu/drm/xe/xe_device.h
··· 188 188 u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address); 189 189 u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address); 190 190 191 + bool xe_device_is_l2_flush_optimized(struct xe_device *xe); 191 192 void xe_device_td_flush(struct xe_device *xe); 192 193 void xe_device_l2_flush(struct xe_device *xe); 193 194
+9 -6
drivers/gpu/drm/xe/xe_ggtt.c
··· 66 66 * give us the correct placement for free. 67 67 */ 68 68 69 + #define XE_GGTT_FLAGS_64K BIT(0) 70 + #define XE_GGTT_FLAGS_ONLINE BIT(1) 71 + 69 72 /** 70 73 * struct xe_ggtt_node - A node in GGTT. 71 74 * ··· 120 117 * @flags: Flags for this GGTT 121 118 * Acceptable flags: 122 119 * - %XE_GGTT_FLAGS_64K - if PTE size is 64K. Otherwise, regular is 4K. 120 + * - %XE_GGTT_FLAGS_ONLINE - is GGTT online, protected by ggtt->lock 121 + * after init 123 122 */ 124 123 unsigned int flags; 125 124 /** @scratch: Internal object allocation used as a scratch page */ ··· 372 367 { 373 368 struct xe_ggtt *ggtt = arg; 374 369 370 + scoped_guard(mutex, &ggtt->lock) 371 + ggtt->flags &= ~XE_GGTT_FLAGS_ONLINE; 375 372 drain_workqueue(ggtt->wq); 376 373 } 377 374 ··· 444 437 if (err) 445 438 return err; 446 439 440 + ggtt->flags |= XE_GGTT_FLAGS_ONLINE; 447 441 return devm_add_action_or_reset(xe->drm.dev, dev_fini_ggtt, ggtt); 448 442 } 449 443 ALLOW_ERROR_INJECTION(xe_ggtt_init_early, ERRNO); /* See xe_pci_probe() */ ··· 473 465 static void ggtt_node_remove(struct xe_ggtt_node *node) 474 466 { 475 467 struct xe_ggtt *ggtt = node->ggtt; 476 - struct xe_device *xe = tile_to_xe(ggtt->tile); 477 468 bool bound; 478 - int idx; 479 - 480 - bound = drm_dev_enter(&xe->drm, &idx); 481 469 482 470 mutex_lock(&ggtt->lock); 471 + bound = ggtt->flags & XE_GGTT_FLAGS_ONLINE; 483 472 if (bound) 484 473 xe_ggtt_clear(ggtt, xe_ggtt_node_addr(node), xe_ggtt_node_size(node)); 485 474 drm_mm_remove_node(&node->base); ··· 488 483 489 484 if (node->invalidate_on_remove) 490 485 xe_ggtt_invalidate(ggtt); 491 - 492 - drm_dev_exit(idx); 493 486 494 487 free_node: 495 488 ggtt_node_fini(node);
+6 -3
drivers/gpu/drm/xe/xe_gt.c
··· 171 171 static void gt_reset_worker(struct work_struct *w); 172 172 173 173 static int emit_job_sync(struct xe_exec_queue *q, struct xe_bb *bb, 174 - long timeout_jiffies) 174 + long timeout_jiffies, bool force_reset) 175 175 { 176 176 struct xe_sched_job *job; 177 177 struct dma_fence *fence; ··· 180 180 job = xe_bb_create_job(q, bb); 181 181 if (IS_ERR(job)) 182 182 return PTR_ERR(job); 183 + 184 + job->ring_ops_force_reset = force_reset; 183 185 184 186 xe_sched_job_arm(job); 185 187 fence = dma_fence_get(&job->drm.s_fence->finished); ··· 206 204 if (IS_ERR(bb)) 207 205 return PTR_ERR(bb); 208 206 209 - ret = emit_job_sync(q, bb, HZ); 207 + ret = emit_job_sync(q, bb, HZ, false); 210 208 xe_bb_free(bb, NULL); 211 209 212 210 return ret; ··· 371 369 372 370 bb->len = cs - bb->cs; 373 371 374 - ret = emit_job_sync(q, bb, HZ); 372 + /* only VFs need to trigger reset to get a clean NULL context */ 373 + ret = emit_job_sync(q, bb, HZ, IS_SRIOV_VF(gt_to_xe(gt))); 375 374 376 375 xe_bb_free(bb, NULL); 377 376
+2
drivers/gpu/drm/xe/xe_gt_ccs_mode.c
··· 12 12 #include "xe_gt_printk.h" 13 13 #include "xe_gt_sysfs.h" 14 14 #include "xe_mmio.h" 15 + #include "xe_pm.h" 15 16 #include "xe_sriov.h" 16 17 #include "xe_sriov_pf.h" 17 18 ··· 164 163 xe_gt_info(gt, "Setting compute mode to %d\n", num_engines); 165 164 gt->ccs_mode = num_engines; 166 165 xe_gt_record_user_engines(gt); 166 + guard(xe_pm_runtime)(xe); 167 167 xe_gt_reset(gt); 168 168 169 169 /* We may end PF lockdown once CCS mode is default again */
+62 -16
drivers/gpu/drm/xe/xe_gt_sriov_pf_control.c
··· 171 171 case XE_GT_SRIOV_STATE_##_X: return #_X 172 172 CASE2STR(WIP); 173 173 CASE2STR(FLR_WIP); 174 + CASE2STR(FLR_PREPARE); 174 175 CASE2STR(FLR_SEND_START); 175 176 CASE2STR(FLR_WAIT_GUC); 176 177 CASE2STR(FLR_GUC_DONE); ··· 1487 1486 * The VF FLR state machine looks like:: 1488 1487 * 1489 1488 * (READY,PAUSED,STOPPED)<------------<--------------o 1490 - * | \ 1491 - * flr \ 1492 - * | \ 1493 - * ....V..........................FLR_WIP........... \ 1494 - * : \ : \ 1489 + * | | \ 1490 + * flr prepare \ 1491 + * | | \ 1492 + * ....V.............V............FLR_WIP........... \ 1493 + * : | | : \ 1494 + * : | FLR_PREPARE : | 1495 + * : | / : | 1496 + * : \ flr : | 1497 + * : \ / : | 1495 1498 * : \ o----<----busy : | 1496 1499 * : \ / / : | 1497 1500 * : FLR_SEND_START---failed----->-----------o--->(FLR_FAILED)<---o ··· 1544 1539 pf_queue_vf(gt, vfid); 1545 1540 } 1546 1541 1547 - static void pf_enter_vf_flr_wip(struct xe_gt *gt, unsigned int vfid) 1542 + static bool pf_exit_vf_flr_prepare(struct xe_gt *gt, unsigned int vfid) 1548 1543 { 1549 - if (!pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_WIP)) { 1550 - xe_gt_sriov_dbg(gt, "VF%u FLR is already in progress\n", vfid); 1551 - return; 1552 - } 1544 + if (!pf_exit_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_PREPARE)) 1545 + return false; 1546 + 1547 + pf_enter_vf_flr_send_start(gt, vfid); 1548 + return true; 1549 + } 1550 + 1551 + static bool pf_enter_vf_flr_wip(struct xe_gt *gt, unsigned int vfid) 1552 + { 1553 + if (!pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_WIP)) 1554 + return false; 1553 1555 1554 1556 pf_enter_vf_wip(gt, vfid); 1555 - pf_enter_vf_flr_send_start(gt, vfid); 1557 + return true; 1556 1558 } 1557 1559 1558 1560 static void pf_exit_vf_flr_wip(struct xe_gt *gt, unsigned int vfid) 1559 1561 { 1560 1562 if (pf_exit_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_WIP)) { 1563 + pf_escape_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_PREPARE); 1561 1564 pf_escape_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_SEND_FINISH); 1562 1565 pf_escape_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_RESET_MMIO); 1563 1566 pf_escape_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_RESET_DATA); ··· 1773 1760 } 1774 1761 1775 1762 /** 1763 + * xe_gt_sriov_pf_control_prepare_flr() - Notify PF that VF FLR request was issued. 1764 + * @gt: the &xe_gt 1765 + * @vfid: the VF identifier 1766 + * 1767 + * This is an optional early notification path used to mark pending FLR before 1768 + * the GuC notifies the PF with a FLR event. 1769 + * 1770 + * This function is for PF only. 1771 + * 1772 + * Return: 0 on success or a negative error code on failure. 1773 + */ 1774 + int xe_gt_sriov_pf_control_prepare_flr(struct xe_gt *gt, unsigned int vfid) 1775 + { 1776 + if (!pf_enter_vf_flr_wip(gt, vfid)) 1777 + return -EALREADY; 1778 + 1779 + pf_enter_vf_state(gt, vfid, XE_GT_SRIOV_STATE_FLR_PREPARE); 1780 + return 0; 1781 + } 1782 + 1783 + static int pf_begin_vf_flr(struct xe_gt *gt, unsigned int vfid) 1784 + { 1785 + if (pf_enter_vf_flr_wip(gt, vfid)) { 1786 + pf_enter_vf_flr_send_start(gt, vfid); 1787 + return 0; 1788 + } 1789 + 1790 + if (pf_exit_vf_flr_prepare(gt, vfid)) 1791 + return 0; 1792 + 1793 + xe_gt_sriov_dbg(gt, "VF%u FLR is already in progress\n", vfid); 1794 + return -EALREADY; 1795 + } 1796 + 1797 + /** 1776 1798 * xe_gt_sriov_pf_control_trigger_flr - Start a VF FLR sequence. 1777 1799 * @gt: the &xe_gt 1778 1800 * @vfid: the VF identifier ··· 1818 1770 */ 1819 1771 int xe_gt_sriov_pf_control_trigger_flr(struct xe_gt *gt, unsigned int vfid) 1820 1772 { 1821 - pf_enter_vf_flr_wip(gt, vfid); 1822 - 1823 - return 0; 1773 + return pf_begin_vf_flr(gt, vfid); 1824 1774 } 1825 1775 1826 1776 /** ··· 1925 1879 1926 1880 if (needs_dispatch_flr(xe)) { 1927 1881 for_each_gt(gtit, xe, gtid) 1928 - pf_enter_vf_flr_wip(gtit, vfid); 1882 + pf_begin_vf_flr(gtit, vfid); 1929 1883 } else { 1930 - pf_enter_vf_flr_wip(gt, vfid); 1884 + pf_begin_vf_flr(gt, vfid); 1931 1885 } 1932 1886 } 1933 1887
+1
drivers/gpu/drm/xe/xe_gt_sriov_pf_control.h
··· 27 27 int xe_gt_sriov_pf_control_trigger_restore_vf(struct xe_gt *gt, unsigned int vfid); 28 28 int xe_gt_sriov_pf_control_finish_restore_vf(struct xe_gt *gt, unsigned int vfid); 29 29 int xe_gt_sriov_pf_control_stop_vf(struct xe_gt *gt, unsigned int vfid); 30 + int xe_gt_sriov_pf_control_prepare_flr(struct xe_gt *gt, unsigned int vfid); 30 31 int xe_gt_sriov_pf_control_trigger_flr(struct xe_gt *gt, unsigned int vfid); 31 32 int xe_gt_sriov_pf_control_sync_flr(struct xe_gt *gt, unsigned int vfid, bool sync); 32 33 int xe_gt_sriov_pf_control_wait_flr(struct xe_gt *gt, unsigned int vfid);
+2
drivers/gpu/drm/xe/xe_gt_sriov_pf_control_types.h
··· 15 15 * 16 16 * @XE_GT_SRIOV_STATE_WIP: indicates that some operations are in progress. 17 17 * @XE_GT_SRIOV_STATE_FLR_WIP: indicates that a VF FLR is in progress. 18 + * @XE_GT_SRIOV_STATE_FLR_PREPARE: indicates that the PF received early VF FLR prepare notification. 18 19 * @XE_GT_SRIOV_STATE_FLR_SEND_START: indicates that the PF wants to send a FLR START command. 19 20 * @XE_GT_SRIOV_STATE_FLR_WAIT_GUC: indicates that the PF awaits for a response from the GuC. 20 21 * @XE_GT_SRIOV_STATE_FLR_GUC_DONE: indicates that the PF has received a response from the GuC. ··· 57 56 XE_GT_SRIOV_STATE_WIP = 1, 58 57 59 58 XE_GT_SRIOV_STATE_FLR_WIP, 59 + XE_GT_SRIOV_STATE_FLR_PREPARE, 60 60 XE_GT_SRIOV_STATE_FLR_SEND_START, 61 61 XE_GT_SRIOV_STATE_FLR_WAIT_GUC, 62 62 XE_GT_SRIOV_STATE_FLR_GUC_DONE,
+6
drivers/gpu/drm/xe/xe_gt_stats.c
··· 85 85 DEF_STAT_STR(SVM_64K_CPU_COPY_US, "svm_64K_cpu_copy_us"), 86 86 DEF_STAT_STR(SVM_2M_CPU_COPY_US, "svm_2M_cpu_copy_us"), 87 87 DEF_STAT_STR(SVM_DEVICE_COPY_KB, "svm_device_copy_kb"), 88 + DEF_STAT_STR(SVM_4K_DEVICE_COPY_KB, "svm_4K_device_copy_kb"), 89 + DEF_STAT_STR(SVM_64K_DEVICE_COPY_KB, "svm_64K_device_copy_kb"), 90 + DEF_STAT_STR(SVM_2M_DEVICE_COPY_KB, "svm_2M_device_copy_kb"), 88 91 DEF_STAT_STR(SVM_CPU_COPY_KB, "svm_cpu_copy_kb"), 92 + DEF_STAT_STR(SVM_4K_CPU_COPY_KB, "svm_4K_cpu_copy_kb"), 93 + DEF_STAT_STR(SVM_64K_CPU_COPY_KB, "svm_64K_cpu_copy_kb"), 94 + DEF_STAT_STR(SVM_2M_CPU_COPY_KB, "svm_2M_cpu_copy_kb"), 89 95 DEF_STAT_STR(SVM_4K_GET_PAGES_US, "svm_4K_get_pages_us"), 90 96 DEF_STAT_STR(SVM_64K_GET_PAGES_US, "svm_64K_get_pages_us"), 91 97 DEF_STAT_STR(SVM_2M_GET_PAGES_US, "svm_2M_get_pages_us"),
+6
drivers/gpu/drm/xe/xe_gt_stats_types.h
··· 40 40 XE_GT_STATS_ID_SVM_64K_CPU_COPY_US, 41 41 XE_GT_STATS_ID_SVM_2M_CPU_COPY_US, 42 42 XE_GT_STATS_ID_SVM_DEVICE_COPY_KB, 43 + XE_GT_STATS_ID_SVM_4K_DEVICE_COPY_KB, 44 + XE_GT_STATS_ID_SVM_64K_DEVICE_COPY_KB, 45 + XE_GT_STATS_ID_SVM_2M_DEVICE_COPY_KB, 43 46 XE_GT_STATS_ID_SVM_CPU_COPY_KB, 47 + XE_GT_STATS_ID_SVM_4K_CPU_COPY_KB, 48 + XE_GT_STATS_ID_SVM_64K_CPU_COPY_KB, 49 + XE_GT_STATS_ID_SVM_2M_CPU_COPY_KB, 44 50 XE_GT_STATS_ID_SVM_4K_GET_PAGES_US, 45 51 XE_GT_STATS_ID_SVM_64K_GET_PAGES_US, 46 52 XE_GT_STATS_ID_SVM_2M_GET_PAGES_US,
+30 -5
drivers/gpu/drm/xe/xe_guc.c
··· 98 98 if (xe_guc_using_main_gamctrl_queues(guc)) 99 99 flags |= GUC_CTL_MAIN_GAMCTRL_QUEUES; 100 100 101 + if (GRAPHICS_VER(xe) >= 35 && !IS_DGFX(xe) && xe_gt_is_media_type(guc_to_gt(guc))) 102 + flags |= GUC_CTL_ENABLE_L2FLUSH_OPT; 103 + 101 104 return flags; 102 105 } 103 106 ··· 1179 1176 struct xe_guc_pc *guc_pc = &gt->uc.guc.pc; 1180 1177 u32 before_freq, act_freq, cur_freq; 1181 1178 u32 status = 0, tries = 0; 1179 + int load_result, ret; 1182 1180 ktime_t before; 1183 1181 u64 delta_ms; 1184 - int ret; 1185 1182 1186 1183 before_freq = xe_guc_pc_get_act_freq(guc_pc); 1187 1184 before = ktime_get(); 1188 1185 1189 - ret = poll_timeout_us(ret = guc_load_done(gt, &status, &tries), ret, 1186 + ret = poll_timeout_us(load_result = guc_load_done(gt, &status, &tries), load_result, 1190 1187 10 * USEC_PER_MSEC, 1191 1188 GUC_LOAD_TIMEOUT_SEC * USEC_PER_SEC, false); 1192 1189 ··· 1194 1191 act_freq = xe_guc_pc_get_act_freq(guc_pc); 1195 1192 cur_freq = xe_guc_pc_get_cur_freq_fw(guc_pc); 1196 1193 1197 - if (ret) { 1194 + if (ret || load_result <= 0) { 1198 1195 xe_gt_err(gt, "load failed: status = 0x%08X, time = %lldms, freq = %dMHz (req %dMHz)\n", 1199 1196 status, delta_ms, xe_guc_pc_get_act_freq(guc_pc), 1200 1197 xe_guc_pc_get_cur_freq_fw(guc_pc)); ··· 1402 1399 return 0; 1403 1400 } 1404 1401 1405 - int xe_guc_suspend(struct xe_guc *guc) 1402 + /** 1403 + * xe_guc_softreset() - Soft reset GuC 1404 + * @guc: The GuC object 1405 + * 1406 + * Send soft reset command to GuC through mmio send. 1407 + * 1408 + * Return: 0 if success, otherwise error code 1409 + */ 1410 + int xe_guc_softreset(struct xe_guc *guc) 1406 1411 { 1407 - struct xe_gt *gt = guc_to_gt(guc); 1408 1412 u32 action[] = { 1409 1413 XE_GUC_ACTION_CLIENT_SOFT_RESET, 1410 1414 }; 1411 1415 int ret; 1412 1416 1417 + if (!xe_uc_fw_is_running(&guc->fw)) 1418 + return 0; 1419 + 1413 1420 ret = xe_guc_mmio_send(guc, action, ARRAY_SIZE(action)); 1421 + if (ret) 1422 + return ret; 1423 + 1424 + return 0; 1425 + } 1426 + 1427 + int xe_guc_suspend(struct xe_guc *guc) 1428 + { 1429 + struct xe_gt *gt = guc_to_gt(guc); 1430 + int ret; 1431 + 1432 + ret = xe_guc_softreset(guc); 1414 1433 if (ret) { 1415 1434 xe_gt_err(gt, "GuC suspend failed: %pe\n", ERR_PTR(ret)); 1416 1435 return ret;
+1
drivers/gpu/drm/xe/xe_guc.h
··· 44 44 void xe_guc_runtime_suspend(struct xe_guc *guc); 45 45 void xe_guc_runtime_resume(struct xe_guc *guc); 46 46 int xe_guc_suspend(struct xe_guc *guc); 47 + int xe_guc_softreset(struct xe_guc *guc); 47 48 void xe_guc_notify(struct xe_guc *guc); 48 49 int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr); 49 50 int xe_guc_mmio_send(struct xe_guc *guc, const u32 *request, u32 len);
+9 -15
drivers/gpu/drm/xe/xe_guc_ct.c
··· 31 31 #include "xe_guc_submit.h" 32 32 #include "xe_guc_tlb_inval.h" 33 33 #include "xe_map.h" 34 + #include "xe_page_reclaim.h" 34 35 #include "xe_pm.h" 35 36 #include "xe_sleep.h" 36 37 #include "xe_sriov_vf.h" ··· 353 352 { 354 353 struct xe_guc_ct *ct = arg; 355 354 355 + xe_guc_ct_stop(ct); 356 356 guc_ct_change_state(ct, XE_GUC_CT_STATE_DISABLED); 357 357 } 358 358 ··· 1631 1629 ret = xe_guc_pagefault_handler(guc, payload, adj_len); 1632 1630 break; 1633 1631 case XE_GUC_ACTION_TLB_INVALIDATION_DONE: 1634 - case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1635 - /* 1636 - * Page reclamation is an extension of TLB invalidation. Both 1637 - * operations share the same seqno and fence. When either 1638 - * action completes, we need to signal the corresponding 1639 - * fence. Since the handling logic (lookup fence by seqno, 1640 - * fence signalling) is identical, we use the same handler 1641 - * for both G2H events. 1642 - */ 1643 1632 ret = xe_guc_tlb_inval_done_handler(guc, payload, adj_len); 1633 + break; 1634 + case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1635 + ret = xe_guc_page_reclaim_done_handler(guc, payload, adj_len); 1644 1636 break; 1645 1637 case XE_GUC_ACTION_GUC2PF_RELAY_FROM_VF: 1646 1638 ret = xe_guc_relay_process_guc2pf(&guc->relay, hxg, hxg_len); ··· 1843 1847 ret = xe_guc_pagefault_handler(guc, payload, adj_len); 1844 1848 break; 1845 1849 case XE_GUC_ACTION_TLB_INVALIDATION_DONE: 1846 - case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1847 - /* 1848 - * Seqno and fence handling of page reclamation and TLB 1849 - * invalidation is identical, so we can use the same handler 1850 - * for both actions. 1851 - */ 1852 1850 __g2h_release_space(ct, len); 1853 1851 ret = xe_guc_tlb_inval_done_handler(guc, payload, adj_len); 1852 + break; 1853 + case XE_GUC_ACTION_PAGE_RECLAMATION_DONE: 1854 + __g2h_release_space(ct, len); 1855 + ret = xe_guc_page_reclaim_done_handler(guc, payload, adj_len); 1854 1856 break; 1855 1857 default: 1856 1858 xe_gt_warn(gt, "NOT_POSSIBLE\n");
+1
drivers/gpu/drm/xe/xe_guc_fwif.h
··· 67 67 #define GUC_CTL_ENABLE_PSMI_LOGGING BIT(7) 68 68 #define GUC_CTL_MAIN_GAMCTRL_QUEUES BIT(9) 69 69 #define GUC_CTL_DISABLE_SCHEDULER BIT(14) 70 + #define GUC_CTL_ENABLE_L2FLUSH_OPT BIT(15) 70 71 71 72 #define GUC_CTL_DEBUG 3 72 73 #define GUC_LOG_VERBOSITY REG_GENMASK(1, 0)
+62 -25
drivers/gpu/drm/xe/xe_guc_submit.c
··· 47 47 48 48 #define XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN 6 49 49 50 + static int guc_submit_reset_prepare(struct xe_guc *guc); 51 + 50 52 static struct xe_guc * 51 53 exec_queue_to_guc(struct xe_exec_queue *q) 52 54 { ··· 240 238 EXEC_QUEUE_STATE_BANNED)); 241 239 } 242 240 243 - static void guc_submit_fini(struct drm_device *drm, void *arg) 241 + static void guc_submit_sw_fini(struct drm_device *drm, void *arg) 244 242 { 245 243 struct xe_guc *guc = arg; 246 244 struct xe_device *xe = guc_to_xe(guc); ··· 256 254 xe_gt_assert(gt, ret); 257 255 258 256 xa_destroy(&guc->submission_state.exec_queue_lookup); 257 + } 258 + 259 + static void guc_submit_fini(void *arg) 260 + { 261 + struct xe_guc *guc = arg; 262 + 263 + /* Forcefully kill any remaining exec queues */ 264 + xe_guc_ct_stop(&guc->ct); 265 + guc_submit_reset_prepare(guc); 266 + xe_guc_softreset(guc); 267 + xe_guc_submit_stop(guc); 268 + xe_uc_fw_sanitize(&guc->fw); 269 + xe_guc_submit_pause_abort(guc); 259 270 } 260 271 261 272 static void guc_submit_wedged_fini(void *arg) ··· 340 325 341 326 guc->submission_state.initialized = true; 342 327 343 - return drmm_add_action_or_reset(&xe->drm, guc_submit_fini, guc); 328 + err = drmm_add_action_or_reset(&xe->drm, guc_submit_sw_fini, guc); 329 + if (err) 330 + return err; 331 + 332 + return devm_add_action_or_reset(xe->drm.dev, guc_submit_fini, guc); 344 333 } 345 334 346 335 /* ··· 1319 1300 */ 1320 1301 void xe_guc_submit_wedge(struct xe_guc *guc) 1321 1302 { 1303 + struct xe_device *xe = guc_to_xe(guc); 1322 1304 struct xe_gt *gt = guc_to_gt(guc); 1323 1305 struct xe_exec_queue *q; 1324 1306 unsigned long index; ··· 1334 1314 if (!guc->submission_state.initialized) 1335 1315 return; 1336 1316 1337 - err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev, 1338 - guc_submit_wedged_fini, guc); 1339 - if (err) { 1340 - xe_gt_err(gt, "Failed to register clean-up in wedged.mode=%s; " 1341 - "Although device is wedged.\n", 1342 - xe_wedged_mode_to_string(XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)); 1343 - return; 1344 - } 1317 + if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET) { 1318 + err = devm_add_action_or_reset(guc_to_xe(guc)->drm.dev, 1319 + guc_submit_wedged_fini, guc); 1320 + if (err) { 1321 + xe_gt_err(gt, "Failed to register clean-up on wedged.mode=%s; " 1322 + "Although device is wedged.\n", 1323 + xe_wedged_mode_to_string(XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)); 1324 + return; 1325 + } 1345 1326 1346 - mutex_lock(&guc->submission_state.lock); 1347 - xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) 1348 - if (xe_exec_queue_get_unless_zero(q)) 1349 - set_exec_queue_wedged(q); 1350 - mutex_unlock(&guc->submission_state.lock); 1327 + mutex_lock(&guc->submission_state.lock); 1328 + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) 1329 + if (xe_exec_queue_get_unless_zero(q)) 1330 + set_exec_queue_wedged(q); 1331 + mutex_unlock(&guc->submission_state.lock); 1332 + } else { 1333 + /* Forcefully kill any remaining exec queues, signal fences */ 1334 + guc_submit_reset_prepare(guc); 1335 + xe_guc_submit_stop(guc); 1336 + xe_guc_softreset(guc); 1337 + xe_uc_fw_sanitize(&guc->fw); 1338 + xe_guc_submit_pause_abort(guc); 1339 + } 1351 1340 } 1352 1341 1353 1342 static bool guc_submit_hint_wedged(struct xe_guc *guc) ··· 2327 2298 static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q) 2328 2299 { 2329 2300 struct xe_gpu_scheduler *sched = &q->guc->sched; 2301 + bool do_destroy = false; 2330 2302 2331 2303 /* Stop scheduling + flush any DRM scheduler operations */ 2332 2304 xe_sched_submission_stop(sched); ··· 2335 2305 /* Clean up lost G2H + reset engine state */ 2336 2306 if (exec_queue_registered(q)) { 2337 2307 if (exec_queue_destroyed(q)) 2338 - __guc_exec_queue_destroy(guc, q); 2308 + do_destroy = true; 2339 2309 } 2340 2310 if (q->guc->suspend_pending) { 2341 2311 set_exec_queue_suspended(q); ··· 2371 2341 xe_guc_exec_queue_trigger_cleanup(q); 2372 2342 } 2373 2343 } 2344 + 2345 + if (do_destroy) 2346 + __guc_exec_queue_destroy(guc, q); 2374 2347 } 2375 2348 2376 - int xe_guc_submit_reset_prepare(struct xe_guc *guc) 2349 + static int guc_submit_reset_prepare(struct xe_guc *guc) 2377 2350 { 2378 2351 int ret; 2379 - 2380 - if (xe_gt_WARN_ON(guc_to_gt(guc), vf_recovery(guc))) 2381 - return 0; 2382 - 2383 - if (!guc->submission_state.initialized) 2384 - return 0; 2385 2352 2386 2353 /* 2387 2354 * Using an atomic here rather than submission_state.lock as this ··· 2392 2365 wake_up_all(&guc->ct.wq); 2393 2366 2394 2367 return ret; 2368 + } 2369 + 2370 + int xe_guc_submit_reset_prepare(struct xe_guc *guc) 2371 + { 2372 + if (xe_gt_WARN_ON(guc_to_gt(guc), vf_recovery(guc))) 2373 + return 0; 2374 + 2375 + if (!guc->submission_state.initialized) 2376 + return 0; 2377 + 2378 + return guc_submit_reset_prepare(guc); 2395 2379 } 2396 2380 2397 2381 void xe_guc_submit_reset_wait(struct xe_guc *guc) ··· 2801 2763 continue; 2802 2764 2803 2765 xe_sched_submission_start(sched); 2804 - if (exec_queue_killed_or_banned_or_wedged(q)) 2805 - xe_guc_exec_queue_trigger_cleanup(q); 2766 + guc_exec_queue_kill(q); 2806 2767 } 2807 2768 mutex_unlock(&guc->submission_state.lock); 2808 2769 }
+12 -3
drivers/gpu/drm/xe/xe_i2c.c
··· 176 176 */ 177 177 void xe_i2c_irq_handler(struct xe_device *xe, u32 master_ctl) 178 178 { 179 - if (!xe_i2c_irq_present(xe)) 179 + struct xe_mmio *mmio = xe_root_tile_mmio(xe); 180 + 181 + if (!(master_ctl & I2C_IRQ) || !xe_i2c_irq_present(xe)) 180 182 return; 181 183 182 - if (master_ctl & I2C_IRQ) 183 - generic_handle_irq_safe(xe->i2c->adapter_irq); 184 + /* Forward interrupt to I2C adapter */ 185 + generic_handle_irq_safe(xe->i2c->adapter_irq); 186 + 187 + /* Deassert after I2C adapter clears the interrupt */ 188 + xe_mmio_rmw32(mmio, I2C_CONFIG_CMD, 0, PCI_COMMAND_INTX_DISABLE); 189 + /* Reassert to allow subsequent interrupt generation */ 190 + xe_mmio_rmw32(mmio, I2C_CONFIG_CMD, PCI_COMMAND_INTX_DISABLE, 0); 184 191 } 185 192 186 193 void xe_i2c_irq_reset(struct xe_device *xe) ··· 197 190 if (!xe_i2c_irq_present(xe)) 198 191 return; 199 192 193 + xe_mmio_rmw32(mmio, I2C_CONFIG_CMD, 0, PCI_COMMAND_INTX_DISABLE); 200 194 xe_mmio_rmw32(mmio, I2C_BRIDGE_PCICFGCTL, ACPI_INTR_EN, 0); 201 195 } 202 196 ··· 209 201 return; 210 202 211 203 xe_mmio_rmw32(mmio, I2C_BRIDGE_PCICFGCTL, 0, ACPI_INTR_EN); 204 + xe_mmio_rmw32(mmio, I2C_CONFIG_CMD, PCI_COMMAND_INTX_DISABLE, 0); 212 205 } 213 206 214 207 static int xe_i2c_irq_map(struct irq_domain *h, unsigned int virq,
+66 -30
drivers/gpu/drm/xe/xe_lrc.c
··· 28 28 #include "xe_map.h" 29 29 #include "xe_memirq.h" 30 30 #include "xe_mmio.h" 31 + #include "xe_ring_ops.h" 31 32 #include "xe_sriov.h" 32 33 #include "xe_trace_lrc.h" 33 34 #include "xe_vm.h" ··· 93 92 94 93 if (xe_configfs_get_ctx_restore_mid_bb(to_pci_dev(xe->drm.dev), 95 94 class, NULL)) 95 + return true; 96 + 97 + if (gt->ring_ops[class]->emit_aux_table_inv) 96 98 return true; 97 99 98 100 return false; ··· 1221 1217 return cmd - batch; 1222 1218 } 1223 1219 1220 + static ssize_t setup_invalidate_auxccs_wa(struct xe_lrc *lrc, 1221 + struct xe_hw_engine *hwe, 1222 + u32 *batch, size_t max_len) 1223 + { 1224 + struct xe_gt *gt = lrc->gt; 1225 + u32 *(*emit)(struct xe_gt *gt, u32 *cmd) = 1226 + gt->ring_ops[hwe->class]->emit_aux_table_inv; 1227 + 1228 + if (!emit) 1229 + return 0; 1230 + 1231 + if (xe_gt_WARN_ON(gt, max_len < 8)) 1232 + return -ENOSPC; 1233 + 1234 + return emit(gt, batch) - batch; 1235 + } 1236 + 1224 1237 struct bo_setup { 1225 1238 ssize_t (*setup)(struct xe_lrc *lrc, struct xe_hw_engine *hwe, 1226 1239 u32 *batch, size_t max_size); ··· 1370 1349 { 1371 1350 static const struct bo_setup rcs_funcs[] = { 1372 1351 { .setup = setup_timestamp_wa }, 1352 + { .setup = setup_invalidate_auxccs_wa }, 1373 1353 { .setup = setup_configfs_mid_ctx_restore_bb }, 1374 1354 }; 1375 1355 static const struct bo_setup xcs_funcs[] = { 1356 + { .setup = setup_invalidate_auxccs_wa }, 1376 1357 { .setup = setup_configfs_mid_ctx_restore_bb }, 1377 1358 }; 1378 1359 struct bo_setup_state state = { ··· 1630 1607 bo = xe_bo_create_pin_map_novm(xe, tile, bo_size, 1631 1608 ttm_bo_type_kernel, 1632 1609 bo_flags, false); 1633 - if (IS_ERR(lrc->bo)) 1634 - return PTR_ERR(lrc->bo); 1610 + if (IS_ERR(bo)) 1611 + return PTR_ERR(bo); 1635 1612 1636 1613 lrc->bo = bo; 1637 1614 ··· 1925 1902 1926 1903 static int dump_mi_command(struct drm_printer *p, 1927 1904 struct xe_gt *gt, 1905 + u32 *start, 1928 1906 u32 *dw, 1929 1907 int remaining_dw) 1930 1908 { ··· 1941 1917 while (num_noop < remaining_dw && 1942 1918 (*(++dw) & REG_GENMASK(31, 23)) == MI_NOOP) 1943 1919 num_noop++; 1944 - drm_printf(p, "[%#010x] MI_NOOP (%d dwords)\n", inst_header, num_noop); 1920 + drm_printf(p, "LRC[%#5tx] = [%#010x] MI_NOOP (%d dwords)\n", 1921 + dw - num_noop - start, inst_header, num_noop); 1945 1922 return num_noop; 1946 1923 1947 1924 case MI_TOPOLOGY_FILTER: 1948 - drm_printf(p, "[%#010x] MI_TOPOLOGY_FILTER\n", inst_header); 1925 + drm_printf(p, "LRC[%#5tx] = [%#010x] MI_TOPOLOGY_FILTER\n", 1926 + dw - start, inst_header); 1949 1927 return 1; 1950 1928 1951 1929 case MI_BATCH_BUFFER_END: 1952 - drm_printf(p, "[%#010x] MI_BATCH_BUFFER_END\n", inst_header); 1930 + drm_printf(p, "LRC[%#5tx] = [%#010x] MI_BATCH_BUFFER_END\n", 1931 + dw - start, inst_header); 1953 1932 /* Return 'remaining_dw' to consume the rest of the LRC */ 1954 1933 return remaining_dw; 1955 1934 } ··· 1966 1939 1967 1940 switch (inst_header & MI_OPCODE) { 1968 1941 case MI_LOAD_REGISTER_IMM: 1969 - drm_printf(p, "[%#010x] MI_LOAD_REGISTER_IMM: %d regs\n", 1970 - inst_header, (numdw - 1) / 2); 1942 + drm_printf(p, "LRC[%#5tx] = [%#010x] MI_LOAD_REGISTER_IMM: %d regs\n", 1943 + dw - start, inst_header, (numdw - 1) / 2); 1971 1944 for (int i = 1; i < numdw; i += 2) 1972 - drm_printf(p, " - %#6x = %#010x\n", dw[i], dw[i + 1]); 1945 + drm_printf(p, "LRC[%#5tx] = - %#6x = %#010x\n", 1946 + &dw[i] - start, dw[i], dw[i + 1]); 1973 1947 return numdw; 1974 1948 1975 1949 case MI_LOAD_REGISTER_MEM & MI_OPCODE: 1976 - drm_printf(p, "[%#010x] MI_LOAD_REGISTER_MEM: %s%s\n", 1977 - inst_header, 1950 + drm_printf(p, "LRC[%#5tx] = [%#010x] MI_LOAD_REGISTER_MEM: %s%s\n", 1951 + dw - start, inst_header, 1978 1952 dw[0] & MI_LRI_LRM_CS_MMIO ? "CS_MMIO " : "", 1979 1953 dw[0] & MI_LRM_USE_GGTT ? "USE_GGTT " : ""); 1980 1954 if (numdw == 4) 1981 - drm_printf(p, " - %#6x = %#010llx\n", 1955 + drm_printf(p, "LRC[%#5tx] = - %#6x = %#010llx\n", 1956 + dw - start, 1982 1957 dw[1], ((u64)(dw[3]) << 32 | (u64)(dw[2]))); 1983 1958 else 1984 - drm_printf(p, " - %*ph (%s)\n", 1985 - (int)sizeof(u32) * (numdw - 1), dw + 1, 1986 - numdw < 4 ? "truncated" : "malformed"); 1959 + drm_printf(p, "LRC[%#5tx] = - %*ph (%s)\n", 1960 + dw - start, (int)sizeof(u32) * (numdw - 1), 1961 + dw + 1, numdw < 4 ? "truncated" : "malformed"); 1987 1962 return numdw; 1988 1963 1989 1964 case MI_FORCE_WAKEUP: 1990 - drm_printf(p, "[%#010x] MI_FORCE_WAKEUP\n", inst_header); 1965 + drm_printf(p, "LRC[%#5tx] = [%#010x] MI_FORCE_WAKEUP\n", 1966 + dw - start, inst_header); 1991 1967 return numdw; 1992 1968 1993 1969 default: 1994 - drm_printf(p, "[%#010x] unknown MI opcode %#x, likely %d dwords\n", 1995 - inst_header, opcode, numdw); 1970 + drm_printf(p, "LRC[%#5tx] = [%#010x] unknown MI opcode %#x, likely %d dwords\n", 1971 + dw - start, inst_header, opcode, numdw); 1996 1972 return numdw; 1997 1973 } 1998 1974 } 1999 1975 2000 1976 static int dump_gfxpipe_command(struct drm_printer *p, 2001 1977 struct xe_gt *gt, 1978 + u32 *start, 2002 1979 u32 *dw, 2003 1980 int remaining_dw) 2004 1981 { ··· 2021 1990 switch (*dw & GFXPIPE_MATCH_MASK) { 2022 1991 #define MATCH(cmd) \ 2023 1992 case cmd: \ 2024 - drm_printf(p, "[%#010x] " #cmd " (%d dwords)\n", *dw, numdw); \ 1993 + drm_printf(p, "LRC[%#5tx] = [%#010x] " #cmd " (%d dwords)\n", \ 1994 + dw - start, *dw, numdw); \ 2025 1995 return numdw 2026 1996 #define MATCH3D(cmd) \ 2027 1997 case CMD_##cmd: \ 2028 - drm_printf(p, "[%#010x] " #cmd " (%d dwords)\n", *dw, numdw); \ 1998 + drm_printf(p, "LRC[%#5tx] = [%#010x] " #cmd " (%d dwords)\n", \ 1999 + dw - start, *dw, numdw); \ 2029 2000 return numdw 2030 2001 2031 2002 MATCH(STATE_BASE_ADDRESS); ··· 2159 2126 MATCH3D(3DSTATE_SLICE_TABLE_STATE_POINTER_2); 2160 2127 2161 2128 default: 2162 - drm_printf(p, "[%#010x] unknown GFXPIPE command (pipeline=%#x, opcode=%#x, subopcode=%#x), likely %d dwords\n", 2163 - *dw, pipeline, opcode, subopcode, numdw); 2129 + drm_printf(p, "LRC[%#5tx] = [%#010x] unknown GFXPIPE command (pipeline=%#x, opcode=%#x, subopcode=%#x), likely %d dwords\n", 2130 + dw - start, *dw, pipeline, opcode, subopcode, numdw); 2164 2131 return numdw; 2165 2132 } 2166 2133 } 2167 2134 2168 2135 static int dump_gfx_state_command(struct drm_printer *p, 2169 2136 struct xe_gt *gt, 2137 + u32 *start, 2170 2138 u32 *dw, 2171 2139 int remaining_dw) 2172 2140 { ··· 2185 2151 MATCH(STATE_WRITE_INLINE); 2186 2152 2187 2153 default: 2188 - drm_printf(p, "[%#010x] unknown GFX_STATE command (opcode=%#x), likely %d dwords\n", 2189 - *dw, opcode, numdw); 2154 + drm_printf(p, "LRC[%#5tx] = [%#010x] unknown GFX_STATE command (opcode=%#x), likely %d dwords\n", 2155 + dw - start, *dw, opcode, numdw); 2190 2156 return numdw; 2191 2157 } 2192 2158 } ··· 2195 2161 struct xe_gt *gt, 2196 2162 enum xe_engine_class hwe_class) 2197 2163 { 2198 - u32 *dw; 2164 + u32 *dw, *start; 2199 2165 int remaining_dw, num_dw; 2200 2166 2201 2167 if (!gt->default_lrc[hwe_class]) { ··· 2208 2174 * hardware status page. 2209 2175 */ 2210 2176 dw = gt->default_lrc[hwe_class] + LRC_PPHWSP_SIZE; 2177 + start = dw; 2211 2178 remaining_dw = (xe_gt_lrc_size(gt, hwe_class) - LRC_PPHWSP_SIZE) / 4; 2212 2179 2213 2180 while (remaining_dw > 0) { 2214 2181 if ((*dw & XE_INSTR_CMD_TYPE) == XE_INSTR_MI) { 2215 - num_dw = dump_mi_command(p, gt, dw, remaining_dw); 2182 + num_dw = dump_mi_command(p, gt, start, dw, remaining_dw); 2216 2183 } else if ((*dw & XE_INSTR_CMD_TYPE) == XE_INSTR_GFXPIPE) { 2217 - num_dw = dump_gfxpipe_command(p, gt, dw, remaining_dw); 2184 + num_dw = dump_gfxpipe_command(p, gt, start, dw, remaining_dw); 2218 2185 } else if ((*dw & XE_INSTR_CMD_TYPE) == XE_INSTR_GFX_STATE) { 2219 - num_dw = dump_gfx_state_command(p, gt, dw, remaining_dw); 2186 + num_dw = dump_gfx_state_command(p, gt, start, dw, remaining_dw); 2220 2187 } else { 2221 2188 num_dw = min(instr_dw(*dw), remaining_dw); 2222 - drm_printf(p, "[%#10x] Unknown instruction of type %#x, likely %d dwords\n", 2189 + drm_printf(p, "LRC[%#5tx] = [%#10x] Unknown instruction of type %#x, likely %d dwords\n", 2190 + dw - start, 2223 2191 *dw, REG_FIELD_GET(XE_INSTR_CMD_TYPE, *dw), 2224 2192 num_dw); 2225 2193 } ··· 2599 2563 * @lrc: Pointer to the lrc. 2600 2564 * 2601 2565 * Return latest ctx timestamp. With support for active contexts, the 2602 - * calculation may bb slightly racy, so follow a read-again logic to ensure that 2566 + * calculation may be slightly racy, so follow a read-again logic to ensure that 2603 2567 * the context is still active before returning the right timestamp. 2604 2568 * 2605 2569 * Returns: New ctx timestamp value 2606 2570 */ 2607 2571 u64 xe_lrc_timestamp(struct xe_lrc *lrc) 2608 2572 { 2609 - u64 lrc_ts, reg_ts, new_ts; 2573 + u64 lrc_ts, reg_ts, new_ts = lrc->ctx_timestamp; 2610 2574 u32 engine_id; 2611 2575 2612 2576 lrc_ts = xe_lrc_ctx_timestamp(lrc);
+5 -2
drivers/gpu/drm/xe/xe_oa.c
··· 543 543 size_t offset = 0; 544 544 int ret; 545 545 546 - /* Can't read from disabled streams */ 547 - if (!stream->enabled || !stream->sample) 546 + if (!stream->sample) 548 547 return -EINVAL; 549 548 550 549 if (!(file->f_flags & O_NONBLOCK)) { ··· 1459 1460 1460 1461 if (stream->sample) 1461 1462 hrtimer_cancel(&stream->poll_check_timer); 1463 + 1464 + /* Update stream->oa_buffer.tail to allow any final reports to be read */ 1465 + if (xe_oa_buffer_check_unlocked(stream)) 1466 + wake_up(&stream->poll_wq); 1462 1467 } 1463 1468 1464 1469 static int xe_oa_enable_preempt_timeslice(struct xe_oa_stream *stream)
+26
drivers/gpu/drm/xe/xe_page_reclaim.c
··· 11 11 #include "xe_page_reclaim.h" 12 12 13 13 #include "xe_gt_stats.h" 14 + #include "xe_guc_tlb_inval.h" 14 15 #include "xe_macros.h" 15 16 #include "xe_pat.h" 16 17 #include "xe_sa.h" ··· 27 26 * flushes. 28 27 * - pat_index is transient display (1) 29 28 * 29 + * For cases of NULL VMA, there should be no corresponding PRL entry 30 + * so skip over. 31 + * 30 32 * Return: true when page reclamation is unnecessary, false otherwise. 31 33 */ 32 34 bool xe_page_reclaim_skip(struct xe_tile *tile, struct xe_vma *vma) 33 35 { 34 36 u8 l3_policy; 37 + 38 + if (xe_vma_is_null(vma)) 39 + return true; 35 40 36 41 l3_policy = xe_pat_index_get_l3_policy(tile->xe, vma->attr.pat_index); 37 42 ··· 136 129 } 137 130 138 131 return page ? 0 : -ENOMEM; 132 + } 133 + 134 + /** 135 + * xe_guc_page_reclaim_done_handler() - Page reclaim done handler 136 + * @guc: guc 137 + * @msg: message indicating page reclamation done 138 + * @len: length of message 139 + * 140 + * Page reclamation is an extension of TLB invalidation. Both 141 + * operations share the same seqno and fence. When either 142 + * action completes, we need to signal the corresponding 143 + * fence. Since the handling logic is currently identical, this 144 + * function delegates to the TLB invalidation handler. 145 + * 146 + * Return: 0 on success, -EPROTO for malformed messages. 147 + */ 148 + int xe_guc_page_reclaim_done_handler(struct xe_guc *guc, u32 *msg, u32 len) 149 + { 150 + return xe_guc_tlb_inval_done_handler(guc, msg, len); 139 151 }
+3
drivers/gpu/drm/xe/xe_page_reclaim.h
··· 20 20 struct xe_tlb_inval_fence; 21 21 struct xe_tile; 22 22 struct xe_gt; 23 + struct xe_guc; 23 24 struct xe_vma; 24 25 25 26 struct xe_guc_page_reclaim_entry { ··· 122 121 if (entries) 123 122 put_page(virt_to_page(entries)); 124 123 } 124 + 125 + int xe_guc_page_reclaim_done_handler(struct xe_guc *guc, u32 *msg, u32 len); 125 126 126 127 #endif /* _XE_PAGE_RECLAIM_H_ */
+32
drivers/gpu/drm/xe/xe_pagefault.c
··· 187 187 goto unlock_vm; 188 188 } 189 189 190 + if (xe_vma_read_only(vma) && 191 + pf->consumer.access_type != XE_PAGEFAULT_ACCESS_TYPE_READ) { 192 + err = -EPERM; 193 + goto unlock_vm; 194 + } 195 + 190 196 atomic = xe_pagefault_access_is_atomic(pf->consumer.access_type); 191 197 192 198 if (xe_vma_is_cpu_addr_mirror(vma)) ··· 250 244 pf->consumer.engine_instance); 251 245 } 252 246 247 + static void xe_pagefault_save_to_vm(struct xe_device *xe, struct xe_pagefault *pf) 248 + { 249 + struct xe_vm *vm; 250 + 251 + /* 252 + * Pagefault may be asociated to VM that is not in fault mode. 253 + * Perform asid_to_vm behavior, except if VM is not in fault 254 + * mode, return VM anyways. 255 + */ 256 + down_read(&xe->usm.lock); 257 + vm = xa_load(&xe->usm.asid_to_vm, pf->consumer.asid); 258 + if (vm) 259 + xe_vm_get(vm); 260 + else 261 + vm = ERR_PTR(-EINVAL); 262 + up_read(&xe->usm.lock); 263 + 264 + if (IS_ERR(vm)) 265 + return; 266 + 267 + xe_vm_add_fault_entry_pf(vm, pf); 268 + 269 + xe_vm_put(vm); 270 + } 271 + 253 272 static void xe_pagefault_queue_work(struct work_struct *w) 254 273 { 255 274 struct xe_pagefault_queue *pf_queue = ··· 293 262 294 263 err = xe_pagefault_service(&pf); 295 264 if (err) { 265 + xe_pagefault_save_to_vm(gt_to_xe(pf.gt), &pf); 296 266 if (!(pf.consumer.access_type & XE_PAGEFAULT_ACCESS_PREFETCH)) { 297 267 xe_pagefault_print(&pf); 298 268 xe_gt_info(pf.gt, "Fault response: Unsuccessful %pe\n",
+7 -7
drivers/gpu/drm/xe/xe_pat.c
··· 92 92 }; 93 93 94 94 static const struct xe_pat_table_entry xelp_pat_table[] = { 95 - [0] = { XELP_PAT_WB, XE_COH_AT_LEAST_1WAY }, 95 + [0] = { XELP_PAT_WB, XE_COH_1WAY }, 96 96 [1] = { XELP_PAT_WC, XE_COH_NONE }, 97 97 [2] = { XELP_PAT_WT, XE_COH_NONE }, 98 98 [3] = { XELP_PAT_UC, XE_COH_NONE }, ··· 102 102 [0] = { XELP_PAT_UC, XE_COH_NONE }, 103 103 [1] = { XELP_PAT_WC, XE_COH_NONE }, 104 104 [2] = { XELP_PAT_WT, XE_COH_NONE }, 105 - [3] = { XELP_PAT_WB, XE_COH_AT_LEAST_1WAY }, 105 + [3] = { XELP_PAT_WB, XE_COH_1WAY }, 106 106 [4] = { XEHPC_PAT_CLOS(1) | XELP_PAT_WT, XE_COH_NONE }, 107 - [5] = { XEHPC_PAT_CLOS(1) | XELP_PAT_WB, XE_COH_AT_LEAST_1WAY }, 107 + [5] = { XEHPC_PAT_CLOS(1) | XELP_PAT_WB, XE_COH_1WAY }, 108 108 [6] = { XEHPC_PAT_CLOS(2) | XELP_PAT_WT, XE_COH_NONE }, 109 - [7] = { XEHPC_PAT_CLOS(2) | XELP_PAT_WB, XE_COH_AT_LEAST_1WAY }, 109 + [7] = { XEHPC_PAT_CLOS(2) | XELP_PAT_WB, XE_COH_1WAY }, 110 110 }; 111 111 112 112 static const struct xe_pat_table_entry xelpg_pat_table[] = { 113 113 [0] = { XELPG_PAT_0_WB, XE_COH_NONE }, 114 114 [1] = { XELPG_PAT_1_WT, XE_COH_NONE }, 115 115 [2] = { XELPG_PAT_3_UC, XE_COH_NONE }, 116 - [3] = { XELPG_PAT_0_WB | XELPG_2_COH_1W, XE_COH_AT_LEAST_1WAY }, 117 - [4] = { XELPG_PAT_0_WB | XELPG_3_COH_2W, XE_COH_AT_LEAST_1WAY }, 116 + [3] = { XELPG_PAT_0_WB | XELPG_2_COH_1W, XE_COH_1WAY }, 117 + [4] = { XELPG_PAT_0_WB | XELPG_3_COH_2W, XE_COH_2WAY }, 118 118 }; 119 119 120 120 /* ··· 147 147 REG_FIELD_PREP(XE2_L3_POLICY, l3_policy) | \ 148 148 REG_FIELD_PREP(XE2_L4_POLICY, l4_policy) | \ 149 149 REG_FIELD_PREP(XE2_COH_MODE, __coh_mode), \ 150 - .coh_mode = __coh_mode ? XE_COH_AT_LEAST_1WAY : XE_COH_NONE, \ 150 + .coh_mode = __coh_mode ? __coh_mode : XE_COH_NONE, \ 151 151 .valid = 1 \ 152 152 } 153 153
+3 -2
drivers/gpu/drm/xe/xe_pat.h
··· 28 28 /** 29 29 * @coh_mode: The GPU coherency mode that @value maps to. 30 30 */ 31 - #define XE_COH_NONE 1 32 - #define XE_COH_AT_LEAST_1WAY 2 31 + #define XE_COH_NONE 1 32 + #define XE_COH_1WAY 2 33 + #define XE_COH_2WAY 3 33 34 u16 coh_mode; 34 35 35 36 /**
+35 -15
drivers/gpu/drm/xe/xe_pt.c
··· 1442 1442 err = vma_check_userptr(vm, op->map.vma, pt_update); 1443 1443 break; 1444 1444 case DRM_GPUVA_OP_REMAP: 1445 - if (op->remap.prev) 1445 + if (op->remap.prev && !op->remap.skip_prev) 1446 1446 err = vma_check_userptr(vm, op->remap.prev, pt_update); 1447 - if (!err && op->remap.next) 1447 + if (!err && op->remap.next && !op->remap.skip_next) 1448 1448 err = vma_check_userptr(vm, op->remap.next, pt_update); 1449 1449 break; 1450 1450 case DRM_GPUVA_OP_UNMAP: ··· 1655 1655 XE_WARN_ON(!level); 1656 1656 /* Check for leaf node */ 1657 1657 if (xe_walk->prl && xe_page_reclaim_list_valid(xe_walk->prl) && 1658 - (!xe_child->base.children || !xe_child->base.children[first])) { 1658 + xe_child->level <= MAX_HUGEPTE_LEVEL) { 1659 1659 struct iosys_map *leaf_map = &xe_child->bo->vmap; 1660 1660 pgoff_t count = xe_pt_num_entries(addr, next, xe_child->level, walk); 1661 1661 1662 1662 for (pgoff_t i = 0; i < count; i++) { 1663 - u64 pte = xe_map_rd(xe, leaf_map, (first + i) * sizeof(u64), u64); 1663 + u64 pte; 1664 1664 int ret; 1665 + 1666 + /* 1667 + * If not a leaf pt, skip unless non-leaf pt is interleaved between 1668 + * leaf ptes which causes the page walk to skip over the child leaves 1669 + */ 1670 + if (xe_child->base.children && xe_child->base.children[first + i]) { 1671 + u64 pt_size = 1ULL << walk->shifts[xe_child->level]; 1672 + bool edge_pt = (i == 0 && !IS_ALIGNED(addr, pt_size)) || 1673 + (i == count - 1 && !IS_ALIGNED(next, pt_size)); 1674 + 1675 + if (!edge_pt) { 1676 + xe_page_reclaim_list_abort(xe_walk->tile->primary_gt, 1677 + xe_walk->prl, 1678 + "PT is skipped by walk at level=%u offset=%lu", 1679 + xe_child->level, first + i); 1680 + break; 1681 + } 1682 + continue; 1683 + } 1684 + 1685 + pte = xe_map_rd(xe, leaf_map, (first + i) * sizeof(u64), u64); 1665 1686 1666 1687 /* 1667 1688 * In rare scenarios, pte may not be written yet due to racy conditions. ··· 1695 1674 } 1696 1675 1697 1676 /* Ensure it is a defined page */ 1698 - xe_tile_assert(xe_walk->tile, 1699 - xe_child->level == 0 || 1700 - (pte & (XE_PTE_PS64 | XE_PDE_PS_2M | XE_PDPE_PS_1G))); 1677 + xe_tile_assert(xe_walk->tile, xe_child->level == 0 || 1678 + (pte & (XE_PDE_PS_2M | XE_PDPE_PS_1G))); 1701 1679 1702 1680 /* An entry should be added for 64KB but contigious 4K have XE_PTE_PS64 */ 1703 1681 if (pte & XE_PTE_PS64) ··· 1721 1701 killed = xe_pt_check_kill(addr, next, level - 1, xe_child, action, walk); 1722 1702 1723 1703 /* 1724 - * Verify PRL is active and if entry is not a leaf pte (base.children conditions), 1725 - * there is a potential need to invalidate the PRL if any PTE (num_live) are dropped. 1704 + * Verify if any PTE are potentially dropped at non-leaf levels, either from being 1705 + * killed or the page walk covers the region. 1726 1706 */ 1727 - if (xe_walk->prl && level > 1 && xe_child->num_live && 1728 - xe_child->base.children && xe_child->base.children[first]) { 1707 + if (xe_walk->prl && xe_page_reclaim_list_valid(xe_walk->prl) && 1708 + xe_child->level > MAX_HUGEPTE_LEVEL && xe_child->num_live) { 1729 1709 bool covered = xe_pt_covers(addr, next, xe_child->level, &xe_walk->base); 1730 1710 1731 1711 /* ··· 2198 2178 2199 2179 err = unbind_op_prepare(tile, pt_update_ops, old); 2200 2180 2201 - if (!err && op->remap.prev) { 2181 + if (!err && op->remap.prev && !op->remap.skip_prev) { 2202 2182 err = bind_op_prepare(vm, tile, pt_update_ops, 2203 2183 op->remap.prev, false); 2204 2184 pt_update_ops->wait_vm_bookkeep = true; 2205 2185 } 2206 - if (!err && op->remap.next) { 2186 + if (!err && op->remap.next && !op->remap.skip_next) { 2207 2187 err = bind_op_prepare(vm, tile, pt_update_ops, 2208 2188 op->remap.next, false); 2209 2189 pt_update_ops->wait_vm_bookkeep = true; ··· 2428 2408 2429 2409 unbind_op_commit(vm, tile, pt_update_ops, old, fence, fence2); 2430 2410 2431 - if (op->remap.prev) 2411 + if (op->remap.prev && !op->remap.skip_prev) 2432 2412 bind_op_commit(vm, tile, pt_update_ops, op->remap.prev, 2433 2413 fence, fence2, false); 2434 - if (op->remap.next) 2414 + if (op->remap.next && !op->remap.skip_next) 2435 2415 bind_op_commit(vm, tile, pt_update_ops, op->remap.next, 2436 2416 fence, fence2, false); 2437 2417 break;
+122 -21
drivers/gpu/drm/xe/xe_ring_ops.c
··· 48 48 return MI_ARB_CHECK | BIT(8) | state; 49 49 } 50 50 51 - static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg, 52 - u32 *dw, int i) 51 + static u32 * 52 + __emit_aux_table_inv(u32 *cmd, const struct xe_reg reg, u32 adj_offset) 53 53 { 54 - dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN; 55 - dw[i++] = reg.addr + gt->mmio.adj_offset; 56 - dw[i++] = AUX_INV; 57 - dw[i++] = MI_NOOP; 54 + *cmd++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | 55 + MI_LRI_MMIO_REMAP_EN; 56 + *cmd++ = reg.addr + adj_offset; 57 + *cmd++ = AUX_INV; 58 + *cmd++ = MI_SEMAPHORE_WAIT_TOKEN | MI_SEMAPHORE_REGISTER_POLL | 59 + MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_EQ_SDD; 60 + *cmd++ = 0; 61 + *cmd++ = reg.addr + adj_offset; 62 + *cmd++ = 0; 63 + *cmd++ = 0; 58 64 59 - return i; 65 + return cmd; 66 + } 67 + 68 + static u32 *emit_aux_table_inv_render_compute(struct xe_gt *gt, u32 *cmd) 69 + { 70 + return __emit_aux_table_inv(cmd, CCS_AUX_INV, gt->mmio.adj_offset); 71 + } 72 + 73 + static u32 *emit_aux_table_inv_video_decode(struct xe_gt *gt, u32 *cmd) 74 + { 75 + return __emit_aux_table_inv(cmd, VD0_AUX_INV, gt->mmio.adj_offset); 76 + } 77 + 78 + static u32 *emit_aux_table_inv_video_enhance(struct xe_gt *gt, u32 *cmd) 79 + { 80 + return __emit_aux_table_inv(cmd, VE0_AUX_INV, gt->mmio.adj_offset); 81 + } 82 + 83 + static int emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *dw, int i) 84 + { 85 + struct xe_gt *gt = hwe->gt; 86 + u32 *(*emit)(struct xe_gt *gt, u32 *cmd) = 87 + gt->ring_ops[hwe->class]->emit_aux_table_inv; 88 + 89 + if (emit) 90 + return emit(gt, dw + i) - dw; 91 + else 92 + return i; 60 93 } 61 94 62 95 static int emit_user_interrupt(u32 *dw, int i) ··· 289 256 return i; 290 257 } 291 258 259 + static int emit_fake_watchdog(struct xe_lrc *lrc, u32 *dw, int i) 260 + { 261 + /* 262 + * Setup a watchdog with impossible condition to always trigger an 263 + * hardware interrupt that would force the GuC to reset the engine. 264 + */ 265 + 266 + dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(2) | MI_LRI_LRM_CS_MMIO; 267 + dw[i++] = PR_CTR_THRSH(0).addr; 268 + dw[i++] = 2; /* small threshold */ 269 + dw[i++] = PR_CTR_CTRL(0).addr; 270 + dw[i++] = CTR_LOGIC_OP(START); 271 + 272 + dw[i++] = MI_SEMAPHORE_WAIT | MI_SEMW_GGTT | MI_SEMW_POLL | MI_SEMW_COMPARE(SAD_EQ_SDD); 273 + dw[i++] = 0xdead; /* this should never be seen */ 274 + dw[i++] = lower_32_bits(xe_lrc_ggtt_addr(lrc)); 275 + dw[i++] = upper_32_bits(xe_lrc_ggtt_addr(lrc)); 276 + dw[i++] = 0; /* unused token */ 277 + 278 + dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_LRM_CS_MMIO; 279 + dw[i++] = PR_CTR_CTRL(0).addr; 280 + dw[i++] = CTR_LOGIC_OP(STOP); 281 + 282 + return i; 283 + } 284 + 292 285 /* for engines that don't require any special HW handling (no EUs, no aux inval, etc) */ 293 286 static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc, 294 287 u64 batch_addr, u32 *head, u32 seqno) ··· 324 265 struct xe_gt *gt = job->q->gt; 325 266 326 267 *head = lrc->ring.tail; 268 + 269 + if (job->ring_ops_force_reset) 270 + i = emit_fake_watchdog(lrc, dw, i); 327 271 328 272 i = emit_copy_timestamp(gt_to_xe(gt), lrc, dw, i); 329 273 ··· 367 305 * PVC is a special case that has no compression of either type 368 306 * (FlatCCS or AuxCCS). Also, AuxCCS is no longer used from Xe2 369 307 * onward, so any future platforms with no FlatCCS will not have 370 - * AuxCCS either. 308 + * AuxCCS, and we explicitly do not want to support it on MTL. 371 309 */ 372 - if (GRAPHICS_VER(xe) >= 20 || xe->info.platform == XE_PVC) 310 + if (GRAPHICS_VERx100(xe) >= 1270 || xe->info.platform == XE_PVC) 373 311 return false; 374 312 375 313 return !xe->info.has_flat_ccs; ··· 382 320 u32 ppgtt_flag = get_ppgtt_flag(job); 383 321 struct xe_gt *gt = job->q->gt; 384 322 struct xe_device *xe = gt_to_xe(gt); 385 - bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE; 386 323 387 324 *head = lrc->ring.tail; 325 + 326 + if (job->ring_ops_force_reset) 327 + i = emit_fake_watchdog(lrc, dw, i); 388 328 389 329 i = emit_copy_timestamp(xe, lrc, dw, i); 390 330 391 331 dw[i++] = preparser_disable(true); 392 332 393 333 /* hsdes: 1809175790 */ 394 - if (has_aux_ccs(xe)) { 395 - if (decode) 396 - i = emit_aux_table_inv(gt, VD0_AUX_INV, dw, i); 397 - else 398 - i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i); 399 - } 334 + i = emit_aux_table_inv(job->q->hwe, dw, i); 400 335 401 336 if (job->ring_ops_flush_tlb) 402 337 i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), ··· 440 381 441 382 *head = lrc->ring.tail; 442 383 384 + if (job->ring_ops_force_reset) 385 + i = emit_fake_watchdog(lrc, dw, i); 386 + 443 387 i = emit_copy_timestamp(xe, lrc, dw, i); 388 + 389 + /* 390 + * On AuxCCS platforms the invalidation of the Aux table requires 391 + * quiescing the memory traffic beforehand. 392 + */ 393 + if (has_aux_ccs(xe)) 394 + i = emit_render_cache_flush(job, dw, i); 444 395 445 396 dw[i++] = preparser_disable(true); 446 397 if (lacks_render) ··· 462 393 i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i); 463 394 464 395 /* hsdes: 1809175790 */ 465 - if (has_aux_ccs(xe)) 466 - i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i); 396 + i = emit_aux_table_inv(job->q->hwe, dw, i); 467 397 468 398 dw[i++] = preparser_disable(false); 469 399 ··· 500 432 u32 dw[MAX_JOB_SIZE_DW], i = 0; 501 433 502 434 *head = lrc->ring.tail; 435 + 436 + xe_gt_assert(gt, !job->ring_ops_force_reset); 503 437 504 438 i = emit_copy_timestamp(xe, lrc, dw, i); 505 439 ··· 589 519 .emit_job = emit_job_gen12_copy, 590 520 }; 591 521 592 - static const struct xe_ring_ops ring_ops_gen12_video = { 522 + static const struct xe_ring_ops ring_ops_gen12_video_decode = { 523 + .emit_job = emit_job_gen12_video, 524 + }; 525 + 526 + static const struct xe_ring_ops ring_ops_gen12_video_enhance = { 593 527 .emit_job = emit_job_gen12_video, 594 528 }; 595 529 ··· 601 527 .emit_job = emit_job_gen12_render_compute, 602 528 }; 603 529 530 + static const struct xe_ring_ops auxccs_ring_ops_gen12_video_decode = { 531 + .emit_job = emit_job_gen12_video, 532 + .emit_aux_table_inv = emit_aux_table_inv_video_decode, 533 + }; 534 + 535 + static const struct xe_ring_ops auxccs_ring_ops_gen12_video_enhance = { 536 + .emit_job = emit_job_gen12_video, 537 + .emit_aux_table_inv = emit_aux_table_inv_video_enhance, 538 + }; 539 + 540 + static const struct xe_ring_ops auxccs_ring_ops_gen12_render_compute = { 541 + .emit_job = emit_job_gen12_render_compute, 542 + .emit_aux_table_inv = emit_aux_table_inv_render_compute, 543 + }; 544 + 604 545 const struct xe_ring_ops * 605 546 xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class) 606 547 { 548 + struct xe_device *xe = gt_to_xe(gt); 549 + 607 550 switch (class) { 608 551 case XE_ENGINE_CLASS_OTHER: 609 552 return &ring_ops_gen12_gsc; 610 553 case XE_ENGINE_CLASS_COPY: 611 554 return &ring_ops_gen12_copy; 612 555 case XE_ENGINE_CLASS_VIDEO_DECODE: 556 + if (has_aux_ccs(xe)) 557 + return &auxccs_ring_ops_gen12_video_decode; 558 + else 559 + return &ring_ops_gen12_video_decode; 613 560 case XE_ENGINE_CLASS_VIDEO_ENHANCE: 614 - return &ring_ops_gen12_video; 561 + if (has_aux_ccs(xe)) 562 + return &auxccs_ring_ops_gen12_video_enhance; 563 + else 564 + return &ring_ops_gen12_video_enhance; 615 565 case XE_ENGINE_CLASS_RENDER: 616 566 case XE_ENGINE_CLASS_COMPUTE: 617 - return &ring_ops_gen12_render_compute; 567 + if (has_aux_ccs(xe)) 568 + return &auxccs_ring_ops_gen12_render_compute; 569 + else 570 + return &ring_ops_gen12_render_compute; 618 571 default: 619 572 return NULL; 620 573 }
+7 -1
drivers/gpu/drm/xe/xe_ring_ops_types.h
··· 6 6 #ifndef _XE_RING_OPS_TYPES_H_ 7 7 #define _XE_RING_OPS_TYPES_H_ 8 8 9 + #include <linux/types.h> 10 + 11 + struct xe_gt; 9 12 struct xe_sched_job; 10 13 11 - #define MAX_JOB_SIZE_DW 58 14 + #define MAX_JOB_SIZE_DW 74 12 15 #define MAX_JOB_SIZE_BYTES (MAX_JOB_SIZE_DW * 4) 13 16 14 17 /** ··· 20 17 struct xe_ring_ops { 21 18 /** @emit_job: Write job to ring */ 22 19 void (*emit_job)(struct xe_sched_job *job); 20 + 21 + /** @emit_aux_table_inv: Emit aux table invalidation to the ring */ 22 + u32 *(*emit_aux_table_inv)(struct xe_gt *gt, u32 *cmd); 23 23 }; 24 24 25 25 #endif
+2
drivers/gpu/drm/xe/xe_sched_job_types.h
··· 63 63 u64 sample_timestamp; 64 64 /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */ 65 65 bool ring_ops_flush_tlb; 66 + /** @ring_ops_force_reset: The ring ops need to trigger a reset before payload. */ 67 + bool ring_ops_force_reset; 66 68 /** @ggtt: mapped in ggtt. */ 67 69 bool ggtt; 68 70 /** @restore_replay: job being replayed for restore */
+2
drivers/gpu/drm/xe/xe_sriov_packet.c
··· 341 341 ret = xe_sriov_pf_migration_restore_produce(xe, vfid, *data); 342 342 if (ret) { 343 343 xe_sriov_packet_free(*data); 344 + *data = NULL; 345 + 344 346 return ret; 345 347 } 346 348
+24
drivers/gpu/drm/xe/xe_sriov_pf_control.c
··· 124 124 } 125 125 126 126 /** 127 + * xe_sriov_pf_control_prepare_flr() - Notify PF that VF FLR prepare has started. 128 + * @xe: the &xe_device 129 + * @vfid: the VF identifier 130 + * 131 + * This function is for PF only. 132 + * 133 + * Return: 0 on success or a negative error code on failure. 134 + */ 135 + int xe_sriov_pf_control_prepare_flr(struct xe_device *xe, unsigned int vfid) 136 + { 137 + struct xe_gt *gt; 138 + unsigned int id; 139 + int result = 0; 140 + int err; 141 + 142 + for_each_gt(gt, xe, id) { 143 + err = xe_gt_sriov_pf_control_prepare_flr(gt, vfid); 144 + result = result ? -EUCLEAN : err; 145 + } 146 + 147 + return result; 148 + } 149 + 150 + /** 127 151 * xe_sriov_pf_control_wait_flr() - Wait for a VF reset (FLR) to complete. 128 152 * @xe: the &xe_device 129 153 * @vfid: the VF identifier
+1
drivers/gpu/drm/xe/xe_sriov_pf_control.h
··· 12 12 int xe_sriov_pf_control_resume_vf(struct xe_device *xe, unsigned int vfid); 13 13 int xe_sriov_pf_control_stop_vf(struct xe_device *xe, unsigned int vfid); 14 14 int xe_sriov_pf_control_reset_vf(struct xe_device *xe, unsigned int vfid); 15 + int xe_sriov_pf_control_prepare_flr(struct xe_device *xe, unsigned int vfid); 15 16 int xe_sriov_pf_control_wait_flr(struct xe_device *xe, unsigned int vfid); 16 17 int xe_sriov_pf_control_sync_flr(struct xe_device *xe, unsigned int vfid); 17 18 int xe_sriov_pf_control_trigger_save_vf(struct xe_device *xe, unsigned int vfid);
+1
drivers/gpu/drm/xe/xe_sriov_vfio.c
··· 42 42 EXPORT_SYMBOL_FOR_MODULES(xe_sriov_vfio_##_func, "xe-vfio-pci") 43 43 44 44 DEFINE_XE_SRIOV_VFIO_FUNCTION(int, wait_flr_done, control_wait_flr); 45 + DEFINE_XE_SRIOV_VFIO_FUNCTION(int, flr_prepare, control_prepare_flr); 45 46 DEFINE_XE_SRIOV_VFIO_FUNCTION(int, suspend_device, control_pause_vf); 46 47 DEFINE_XE_SRIOV_VFIO_FUNCTION(int, resume_device, control_resume_vf); 47 48 DEFINE_XE_SRIOV_VFIO_FUNCTION(int, stop_copy_enter, control_trigger_save_vf);
+25 -2
drivers/gpu/drm/xe/xe_svm.c
··· 485 485 const enum xe_svm_copy_dir dir, 486 486 int kb) 487 487 { 488 - if (dir == XE_SVM_COPY_TO_VRAM) 488 + if (dir == XE_SVM_COPY_TO_VRAM) { 489 + switch (kb) { 490 + case 4: 491 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_4K_DEVICE_COPY_KB, kb); 492 + break; 493 + case 64: 494 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_64K_DEVICE_COPY_KB, kb); 495 + break; 496 + case 2048: 497 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_2M_DEVICE_COPY_KB, kb); 498 + break; 499 + } 489 500 xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_DEVICE_COPY_KB, kb); 490 - else 501 + } else { 502 + switch (kb) { 503 + case 4: 504 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_4K_CPU_COPY_KB, kb); 505 + break; 506 + case 64: 507 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_64K_CPU_COPY_KB, kb); 508 + break; 509 + case 2048: 510 + xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_2M_CPU_COPY_KB, kb); 511 + break; 512 + } 491 513 xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_CPU_COPY_KB, kb); 514 + } 492 515 } 493 516 494 517 static void xe_svm_copy_us_stats_incr(struct xe_gt *gt,
+7 -15
drivers/gpu/drm/xe/xe_uc.c
··· 157 157 158 158 err = xe_gt_sriov_vf_connect(uc_to_gt(uc)); 159 159 if (err) 160 - goto err_out; 160 + return err; 161 161 162 162 uc->guc.submission_state.enabled = true; 163 163 164 164 err = xe_guc_opt_in_features_enable(&uc->guc); 165 165 if (err) 166 - goto err_out; 166 + return err; 167 167 168 168 err = xe_gt_record_default_lrcs(uc_to_gt(uc)); 169 169 if (err) 170 - goto err_out; 170 + return err; 171 171 172 172 return 0; 173 - 174 - err_out: 175 - xe_guc_sanitize(&uc->guc); 176 - return err; 177 173 } 178 174 179 175 /* ··· 201 205 202 206 ret = xe_gt_record_default_lrcs(uc_to_gt(uc)); 203 207 if (ret) 204 - goto err_out; 208 + return ret; 205 209 206 210 ret = xe_guc_post_load_init(&uc->guc); 207 211 if (ret) 208 - goto err_out; 212 + return ret; 209 213 210 214 ret = xe_guc_pc_start(&uc->guc.pc); 211 215 if (ret) 212 - goto err_out; 216 + return ret; 213 217 214 218 ret = xe_guc_rc_enable(&uc->guc); 215 219 if (ret) 216 - goto err_out; 220 + return ret; 217 221 218 222 xe_guc_engine_activity_enable_stats(&uc->guc); 219 223 ··· 228 232 xe_gsc_load_start(&uc->gsc); 229 233 230 234 return 0; 231 - 232 - err_out: 233 - xe_guc_sanitize(&uc->guc); 234 - return ret; 235 235 } 236 236 237 237 int xe_uc_reset_prepare(struct xe_uc *uc)
+218 -5
drivers/gpu/drm/xe/xe_vm.c
··· 27 27 #include "xe_device.h" 28 28 #include "xe_drm_client.h" 29 29 #include "xe_exec_queue.h" 30 + #include "xe_gt.h" 30 31 #include "xe_migrate.h" 31 32 #include "xe_pat.h" 32 33 #include "xe_pm.h" ··· 576 575 free_preempt_fences(&preempt_fences); 577 576 578 577 trace_xe_vm_rebind_worker_exit(vm); 578 + } 579 + 580 + /** 581 + * xe_vm_add_fault_entry_pf() - Add pagefault to vm fault list 582 + * @vm: The VM. 583 + * @pf: The pagefault. 584 + * 585 + * This function takes the data from the pagefault @pf and saves it to @vm->faults.list. 586 + * 587 + * The function exits silently if the list is full, and reports a warning if the pagefault 588 + * could not be saved to the list. 589 + */ 590 + void xe_vm_add_fault_entry_pf(struct xe_vm *vm, struct xe_pagefault *pf) 591 + { 592 + struct xe_vm_fault_entry *e; 593 + struct xe_hw_engine *hwe; 594 + 595 + /* Do not report faults on reserved engines */ 596 + hwe = xe_gt_hw_engine(pf->gt, pf->consumer.engine_class, 597 + pf->consumer.engine_instance, false); 598 + if (!hwe || xe_hw_engine_is_reserved(hwe)) 599 + return; 600 + 601 + e = kzalloc_obj(*e); 602 + if (!e) { 603 + drm_warn(&vm->xe->drm, 604 + "Could not allocate memory for fault!\n"); 605 + return; 606 + } 607 + 608 + guard(spinlock)(&vm->faults.lock); 609 + 610 + /* 611 + * Limit the number of faults in the fault list to prevent 612 + * memory overuse. 613 + */ 614 + if (vm->faults.len >= MAX_FAULTS_SAVED_PER_VM) { 615 + kfree(e); 616 + return; 617 + } 618 + 619 + e->address = pf->consumer.page_addr; 620 + /* 621 + * TODO: 622 + * Address precision is currently always SZ_4K, but this may change 623 + * in the future. 624 + */ 625 + e->address_precision = SZ_4K; 626 + e->access_type = pf->consumer.access_type; 627 + e->fault_type = FIELD_GET(XE_PAGEFAULT_TYPE_MASK, 628 + pf->consumer.fault_type_level), 629 + e->fault_level = FIELD_GET(XE_PAGEFAULT_LEVEL_MASK, 630 + pf->consumer.fault_type_level), 631 + 632 + list_add_tail(&e->list, &vm->faults.list); 633 + vm->faults.len++; 634 + } 635 + 636 + static void xe_vm_clear_fault_entries(struct xe_vm *vm) 637 + { 638 + struct xe_vm_fault_entry *e, *tmp; 639 + 640 + guard(spinlock)(&vm->faults.lock); 641 + list_for_each_entry_safe(e, tmp, &vm->faults.list, list) { 642 + list_del(&e->list); 643 + kfree(e); 644 + } 645 + vm->faults.len = 0; 579 646 } 580 647 581 648 static int xe_vma_ops_alloc(struct xe_vma_ops *vops, bool array_of_binds) ··· 1607 1538 INIT_LIST_HEAD(&vm->userptr.invalidated); 1608 1539 spin_lock_init(&vm->userptr.invalidated_lock); 1609 1540 1541 + INIT_LIST_HEAD(&vm->faults.list); 1542 + spin_lock_init(&vm->faults.lock); 1543 + 1610 1544 ttm_lru_bulk_move_init(&vm->lru_bulk_move); 1611 1545 1612 1546 INIT_WORK(&vm->destroy_work, vm_destroy_work_func); ··· 1925 1853 xe_assert(xe, lookup == vm); 1926 1854 } 1927 1855 up_write(&xe->usm.lock); 1856 + 1857 + xe_vm_clear_fault_entries(vm); 1928 1858 1929 1859 for_each_tile(tile, xe, id) 1930 1860 xe_range_fence_tree_fini(&vm->rftree[id]); ··· 2658 2584 if (!err && op->remap.skip_prev) { 2659 2585 op->remap.prev->tile_present = 2660 2586 tile_present; 2661 - op->remap.prev = NULL; 2662 2587 } 2663 2588 } 2664 2589 if (op->remap.next) { ··· 2667 2594 if (!err && op->remap.skip_next) { 2668 2595 op->remap.next->tile_present = 2669 2596 tile_present; 2670 - op->remap.next = NULL; 2671 2597 } 2672 2598 } 2673 2599 2674 - /* Adjust for partial unbind after removing VMA from VM */ 2600 + /* 2601 + * Adjust for partial unbind after removing VMA from VM. In case 2602 + * of unwind we might need to undo this later. 2603 + */ 2675 2604 if (!err) { 2676 2605 op->base.remap.unmap->va->va.addr = op->remap.start; 2677 2606 op->base.remap.unmap->va->va.range = op->remap.range; ··· 2792 2717 2793 2718 op->remap.start = xe_vma_start(old); 2794 2719 op->remap.range = xe_vma_size(old); 2720 + op->remap.old_start = op->remap.start; 2721 + op->remap.old_range = op->remap.range; 2795 2722 2796 2723 flags |= op->base.remap.unmap->va->flags & XE_VMA_CREATE_MASK; 2797 2724 if (op->base.remap.prev) { ··· 2942 2865 xe_svm_notifier_lock(vm); 2943 2866 vma->gpuva.flags &= ~XE_VMA_DESTROYED; 2944 2867 xe_svm_notifier_unlock(vm); 2945 - if (post_commit) 2868 + if (post_commit) { 2869 + /* 2870 + * Restore the old va range, in case of the 2871 + * prev/next skip optimisation. Otherwise what 2872 + * we re-insert here could be smaller than the 2873 + * original range. 2874 + */ 2875 + op->base.remap.unmap->va->va.addr = 2876 + op->remap.old_start; 2877 + op->base.remap.unmap->va->va.range = 2878 + op->remap.old_range; 2946 2879 xe_vm_insert_vma(vm, vma); 2880 + } 2947 2881 } 2948 2882 break; 2949 2883 } ··· 3553 3465 goto free_bind_ops; 3554 3466 } 3555 3467 3556 - if (XE_WARN_ON(coh_mode > XE_COH_AT_LEAST_1WAY)) { 3468 + if (XE_WARN_ON(coh_mode > XE_COH_2WAY)) { 3557 3469 err = -EINVAL; 3558 3470 goto free_bind_ops; 3559 3471 } ··· 3580 3492 op == DRM_XE_VM_BIND_OP_MAP_USERPTR) || 3581 3493 XE_IOCTL_DBG(xe, coh_mode == XE_COH_NONE && 3582 3494 op == DRM_XE_VM_BIND_OP_MAP_USERPTR) || 3495 + XE_IOCTL_DBG(xe, xe_device_is_l2_flush_optimized(xe) && 3496 + (op == DRM_XE_VM_BIND_OP_MAP_USERPTR || 3497 + is_cpu_addr_mirror) && 3498 + (pat_index != 19 && coh_mode != XE_COH_2WAY)) || 3583 3499 XE_IOCTL_DBG(xe, comp_en && 3584 3500 op == DRM_XE_VM_BIND_OP_MAP_USERPTR) || 3585 3501 XE_IOCTL_DBG(xe, op == DRM_XE_VM_BIND_OP_MAP_USERPTR && ··· 3723 3631 */ 3724 3632 comp_en = xe_pat_index_get_comp_en(xe, pat_index); 3725 3633 if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach && comp_en)) 3634 + return -EINVAL; 3635 + 3636 + if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach && xe_device_is_l2_flush_optimized(xe) && 3637 + (pat_index != 19 && coh_mode != XE_COH_2WAY))) 3726 3638 return -EINVAL; 3727 3639 3728 3640 /* If a BO is protected it can only be mapped if the key is still valid */ ··· 3972 3876 put_vm: 3973 3877 xe_vm_put(vm); 3974 3878 return err; 3879 + } 3880 + 3881 + /* 3882 + * Map access type, fault type, and fault level from current bspec 3883 + * specification to user spec abstraction. The current mapping is 3884 + * approximately 1-to-1, with access type being the only notable 3885 + * exception as it carries additional data with respect to prefetch 3886 + * status that needs to be masked out. 3887 + */ 3888 + static u8 xe_to_user_access_type(u8 access_type) 3889 + { 3890 + return access_type & XE_PAGEFAULT_ACCESS_TYPE_MASK; 3891 + } 3892 + 3893 + static u8 xe_to_user_fault_type(u8 fault_type) 3894 + { 3895 + return fault_type; 3896 + } 3897 + 3898 + static u8 xe_to_user_fault_level(u8 fault_level) 3899 + { 3900 + return fault_level; 3901 + } 3902 + 3903 + static int fill_faults(struct xe_vm *vm, 3904 + struct drm_xe_vm_get_property *args) 3905 + { 3906 + struct xe_vm_fault __user *usr_ptr = u64_to_user_ptr(args->data); 3907 + struct xe_vm_fault *fault_list, fault_entry = { 0 }; 3908 + struct xe_vm_fault_entry *entry; 3909 + int ret = 0, i = 0, count, entry_size; 3910 + 3911 + entry_size = sizeof(struct xe_vm_fault); 3912 + count = args->size / entry_size; 3913 + 3914 + fault_list = kcalloc(count, sizeof(struct xe_vm_fault), GFP_KERNEL); 3915 + if (!fault_list) 3916 + return -ENOMEM; 3917 + 3918 + spin_lock(&vm->faults.lock); 3919 + list_for_each_entry(entry, &vm->faults.list, list) { 3920 + if (i == count) 3921 + break; 3922 + 3923 + fault_entry.address = xe_device_canonicalize_addr(vm->xe, entry->address); 3924 + fault_entry.address_precision = entry->address_precision; 3925 + 3926 + fault_entry.access_type = xe_to_user_access_type(entry->access_type); 3927 + fault_entry.fault_type = xe_to_user_fault_type(entry->fault_type); 3928 + fault_entry.fault_level = xe_to_user_fault_level(entry->fault_level); 3929 + 3930 + memcpy(&fault_list[i], &fault_entry, entry_size); 3931 + 3932 + i++; 3933 + } 3934 + spin_unlock(&vm->faults.lock); 3935 + 3936 + ret = copy_to_user(usr_ptr, fault_list, args->size); 3937 + 3938 + kfree(fault_list); 3939 + return ret ? -EFAULT : 0; 3940 + } 3941 + 3942 + static int xe_vm_get_property_helper(struct xe_vm *vm, 3943 + struct drm_xe_vm_get_property *args) 3944 + { 3945 + size_t size; 3946 + 3947 + switch (args->property) { 3948 + case DRM_XE_VM_GET_PROPERTY_FAULTS: 3949 + spin_lock(&vm->faults.lock); 3950 + size = size_mul(sizeof(struct xe_vm_fault), vm->faults.len); 3951 + spin_unlock(&vm->faults.lock); 3952 + 3953 + if (!args->size) { 3954 + args->size = size; 3955 + return 0; 3956 + } 3957 + 3958 + /* 3959 + * Number of faults may increase between calls to 3960 + * xe_vm_get_property_ioctl, so just report the number of 3961 + * faults the user requests if it's less than or equal to 3962 + * the number of faults in the VM fault array. 3963 + * 3964 + * We should also at least assert that the args->size value 3965 + * is a multiple of the xe_vm_fault struct size. 3966 + */ 3967 + if (args->size > size || args->size % sizeof(struct xe_vm_fault)) 3968 + return -EINVAL; 3969 + 3970 + return fill_faults(vm, args); 3971 + } 3972 + return -EINVAL; 3973 + } 3974 + 3975 + int xe_vm_get_property_ioctl(struct drm_device *drm, void *data, 3976 + struct drm_file *file) 3977 + { 3978 + struct xe_device *xe = to_xe_device(drm); 3979 + struct xe_file *xef = to_xe_file(file); 3980 + struct drm_xe_vm_get_property *args = data; 3981 + struct xe_vm *vm; 3982 + int ret = 0; 3983 + 3984 + if (XE_IOCTL_DBG(xe, (args->reserved[0] || args->reserved[1] || 3985 + args->reserved[2]))) 3986 + return -EINVAL; 3987 + 3988 + vm = xe_vm_lookup(xef, args->vm_id); 3989 + if (XE_IOCTL_DBG(xe, !vm)) 3990 + return -ENOENT; 3991 + 3992 + ret = xe_vm_get_property_helper(vm, args); 3993 + 3994 + xe_vm_put(vm); 3995 + return ret; 3975 3996 } 3976 3997 3977 3998 /**
+12
drivers/gpu/drm/xe/xe_vm.h
··· 12 12 #include "xe_map.h" 13 13 #include "xe_vm_types.h" 14 14 15 + /** 16 + * MAX_FAULTS_SAVED_PER_VM - Maximum number of faults each vm can store before future 17 + * faults are discarded to prevent memory overuse 18 + */ 19 + #define MAX_FAULTS_SAVED_PER_VM 50 20 + 15 21 struct drm_device; 16 22 struct drm_printer; 17 23 struct drm_file; ··· 28 22 29 23 struct xe_exec_queue; 30 24 struct xe_file; 25 + struct xe_pagefault; 31 26 struct xe_sync_entry; 32 27 struct xe_svm_range; 33 28 struct drm_exec; ··· 210 203 int xe_vm_bind_ioctl(struct drm_device *dev, void *data, 211 204 struct drm_file *file); 212 205 int xe_vm_query_vmas_attrs_ioctl(struct drm_device *dev, void *data, struct drm_file *file); 206 + int xe_vm_get_property_ioctl(struct drm_device *dev, void *data, 207 + struct drm_file *file); 208 + 213 209 void xe_vm_close_and_put(struct xe_vm *vm); 214 210 215 211 static inline bool xe_vm_in_fault_mode(struct xe_vm *vm) ··· 327 317 void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap); 328 318 void xe_vm_snapshot_print(struct xe_vm_snapshot *snap, struct drm_printer *p); 329 319 void xe_vm_snapshot_free(struct xe_vm_snapshot *snap); 320 + 321 + void xe_vm_add_fault_entry_pf(struct xe_vm *vm, struct xe_pagefault *pf); 330 322 331 323 /** 332 324 * xe_vm_set_validating() - Register this task as currently making bos resident
+24 -1
drivers/gpu/drm/xe/xe_vm_madvise.c
··· 309 309 if (XE_IOCTL_DBG(xe, !coh_mode)) 310 310 return false; 311 311 312 - if (XE_WARN_ON(coh_mode > XE_COH_AT_LEAST_1WAY)) 312 + if (XE_WARN_ON(coh_mode > XE_COH_2WAY)) 313 313 return false; 314 314 315 315 if (XE_IOCTL_DBG(xe, args->pat_index.pad)) ··· 419 419 struct xe_vmas_in_madvise_range madvise_range = {.addr = args->start, 420 420 .range = args->range, }; 421 421 struct xe_madvise_details details; 422 + u16 pat_index, coh_mode; 422 423 struct xe_vm *vm; 423 424 struct drm_exec exec; 424 425 int err, attr_type; ··· 456 455 if (err || !madvise_range.num_vmas) 457 456 goto madv_fini; 458 457 458 + if (args->type == DRM_XE_MEM_RANGE_ATTR_PAT) { 459 + pat_index = array_index_nospec(args->pat_index.val, xe->pat.n_entries); 460 + coh_mode = xe_pat_index_get_coh_mode(xe, pat_index); 461 + if (XE_IOCTL_DBG(xe, madvise_range.has_svm_userptr_vmas && 462 + xe_device_is_l2_flush_optimized(xe) && 463 + (pat_index != 19 && coh_mode != XE_COH_2WAY))) { 464 + err = -EINVAL; 465 + goto madv_fini; 466 + } 467 + } 468 + 459 469 if (madvise_range.has_bo_vmas) { 460 470 if (args->type == DRM_XE_MEM_RANGE_ATTR_ATOMIC) { 461 471 if (!check_bo_args_are_sane(vm, madvise_range.vmas, ··· 484 472 485 473 if (!bo) 486 474 continue; 475 + 476 + if (args->type == DRM_XE_MEM_RANGE_ATTR_PAT) { 477 + if (XE_IOCTL_DBG(xe, bo->ttm.base.import_attach && 478 + xe_device_is_l2_flush_optimized(xe) && 479 + (pat_index != 19 && 480 + coh_mode != XE_COH_2WAY))) { 481 + err = -EINVAL; 482 + goto err_fini; 483 + } 484 + } 485 + 487 486 err = drm_exec_lock_obj(&exec, &bo->ttm.base); 488 487 drm_exec_retry_on_contention(&exec); 489 488 if (err)
+33
drivers/gpu/drm/xe/xe_vm_types.h
··· 24 24 struct drm_pagemap; 25 25 26 26 struct xe_bo; 27 + struct xe_pagefault; 27 28 struct xe_svm_range; 28 29 struct xe_sync_entry; 29 30 struct xe_user_fence; ··· 176 175 }; 177 176 178 177 struct xe_device; 178 + 179 + /** 180 + * struct xe_vm_fault_entry - Elements of vm->faults.list 181 + * @list: link into @xe_vm.faults.list 182 + * @address: address of the fault 183 + * @address_precision: precision of faulted address 184 + * @access_type: type of address access that resulted in fault 185 + * @fault_type: type of fault reported 186 + * @fault_level: fault level of the fault 187 + */ 188 + struct xe_vm_fault_entry { 189 + struct list_head list; 190 + u64 address; 191 + u32 address_precision; 192 + u8 access_type; 193 + u8 fault_type; 194 + u8 fault_level; 195 + }; 179 196 180 197 struct xe_vm { 181 198 /** @gpuvm: base GPUVM used to track VMAs */ ··· 352 333 bool capture_once; 353 334 } error_capture; 354 335 336 + /** @faults: List of all faults associated with this VM */ 337 + struct { 338 + /** @faults.lock: lock protecting @faults.list */ 339 + spinlock_t lock; 340 + /** @faults.list: list of xe_vm_fault_entry entries */ 341 + struct list_head list; 342 + /** @faults.len: length of @faults.list */ 343 + unsigned int len; 344 + } faults; 345 + 355 346 /** 356 347 * @validation: Validation data only valid with the vm resv held. 357 348 * Note: This is really task state of the task holding the vm resv, ··· 422 393 u64 start; 423 394 /** @range: range of the VMA unmap */ 424 395 u64 range; 396 + /** @old_start: Original start of the VMA we unmap */ 397 + u64 old_start; 398 + /** @old_range: Original range of the VMA we unmap */ 399 + u64 old_range; 425 400 /** @skip_prev: skip prev rebind */ 426 401 bool skip_prev; 427 402 /** @skip_next: skip next rebind */
+7 -20
drivers/gpu/drm/xe/xe_wa.c
··· 260 260 LSN_DIM_Z_WGT_MASK, 261 261 LSN_LNI_WGT(1) | LSN_LNE_WGT(1) | 262 262 LSN_DIM_X_WGT(1) | LSN_DIM_Y_WGT(1) | 263 - LSN_DIM_Z_WGT(1))) 264 - }, 265 - 266 - /* Xe2_HPM */ 267 - 268 - { XE_RTP_NAME("16021867713"), 269 - XE_RTP_RULES(MEDIA_VERSION(1301), 270 - ENGINE_CLASS(VIDEO_DECODE)), 271 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F1C(0), MFXPIPE_CLKGATE_DIS)), 272 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 273 - }, 274 - { XE_RTP_NAME("14019449301"), 275 - XE_RTP_RULES(MEDIA_VERSION(1301), ENGINE_CLASS(VIDEO_DECODE)), 276 - XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F08(0), CG3DDISHRS_CLKGATE_DIS)), 277 - XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 263 + LSN_DIM_Z_WGT(1)), 264 + SET(LSC_CHICKEN_BIT_0_UDW, L3_128B_256B_WRT_DIS)) 278 265 }, 279 266 280 267 /* Xe3_LPG */ ··· 293 306 XE_RTP_ACTIONS(SET(MMIOATSREQLIMIT_GAM_WALK_3D, 294 307 DIS_ATS_WRONLY_PG)) 295 308 }, 296 - { XE_RTP_NAME("14026144927"), 309 + { XE_RTP_NAME("14026144927, 16029437861"), 297 310 XE_RTP_RULES(GRAPHICS_VERSION(3510), GRAPHICS_STEP(A0, B0)), 298 311 XE_RTP_ACTIONS(SET(L3SQCREG2, L3_SQ_DISABLE_COAMA_2WAY_COH | 299 312 L3_SQ_DISABLE_COAMA)) ··· 657 670 XE_RTP_RULES(GRAPHICS_VERSION_RANGE(2001, 2004), ENGINE_CLASS(RENDER)), 658 671 XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_CLIP_NEGATIVE_BOUNDING_BOX)) 659 672 }, 673 + { XE_RTP_NAME("14026781792"), 674 + XE_RTP_RULES(GRAPHICS_VERSION_RANGE(3000, 3510), ENGINE_CLASS(RENDER)), 675 + XE_RTP_ACTIONS(SET(FF_MODE, DIS_TE_PATCH_CTRL)) 676 + }, 660 677 661 678 /* DG1 */ 662 679 ··· 788 797 XE_RTP_RULES(GRAPHICS_VERSION(3000), GRAPHICS_STEP(A0, B0), 789 798 ENGINE_CLASS(RENDER)), 790 799 XE_RTP_ACTIONS(SET(CHICKEN_RASTER_1, DIS_CLIP_NEGATIVE_BOUNDING_BOX)) 791 - }, 792 - { XE_RTP_NAME("14026781792"), 793 - XE_RTP_RULES(GRAPHICS_VERSION(3510), ENGINE_CLASS(RENDER)), 794 - XE_RTP_ACTIONS(SET(FF_MODE, DIS_TE_PATCH_CTRL)) 795 800 }, 796 801 }; 797 802
+14
drivers/vfio/pci/xe/main.c
··· 85 85 spin_unlock(&xe_vdev->reset_lock); 86 86 } 87 87 88 + static void xe_vfio_pci_reset_prepare(struct pci_dev *pdev) 89 + { 90 + struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev); 91 + int ret; 92 + 93 + if (!pdev->is_virtfn) 94 + return; 95 + 96 + ret = xe_sriov_vfio_flr_prepare(xe_vdev->xe, xe_vdev->vfid); 97 + if (ret) 98 + dev_err(&pdev->dev, "Failed to prepare FLR: %d\n", ret); 99 + } 100 + 88 101 static void xe_vfio_pci_reset_done(struct pci_dev *pdev) 89 102 { 90 103 struct xe_vfio_pci_core_device *xe_vdev = pci_get_drvdata(pdev); ··· 140 127 } 141 128 142 129 static const struct pci_error_handlers xe_vfio_pci_err_handlers = { 130 + .reset_prepare = xe_vfio_pci_reset_prepare, 143 131 .reset_done = xe_vfio_pci_reset_done, 144 132 .error_detected = vfio_pci_core_aer_err_detected, 145 133 };
+21
include/drm/drm_pagemap.h
··· 4 4 5 5 #include <linux/dma-direction.h> 6 6 #include <linux/hmm.h> 7 + #include <linux/memremap.h> 7 8 #include <linux/types.h> 8 9 9 10 #define NR_PAGES(order) (1U << (order)) ··· 367 366 void drm_pagemap_destroy(struct drm_pagemap *dpagemap, bool is_atomic_or_reclaim); 368 367 369 368 int drm_pagemap_reinit(struct drm_pagemap *dpagemap); 369 + 370 + /** 371 + * drm_pagemap_page_zone_device_data() - Page to zone_device_data 372 + * @page: Pointer to the page 373 + * 374 + * Return: Page's zone_device_data 375 + */ 376 + static inline struct drm_pagemap_zdd *drm_pagemap_page_zone_device_data(struct page *page) 377 + { 378 + struct folio *folio = page_folio(page); 379 + 380 + return folio_zone_device_data(folio); 381 + } 382 + 383 + #else 384 + 385 + static inline struct drm_pagemap_zdd *drm_pagemap_page_zone_device_data(struct page *page) 386 + { 387 + return NULL; 388 + } 370 389 371 390 #endif /* IS_ENABLED(CONFIG_ZONE_DEVICE) */ 372 391
+11
include/drm/intel/xe_sriov_vfio.h
··· 28 28 bool xe_sriov_vfio_migration_supported(struct xe_device *xe); 29 29 30 30 /** 31 + * xe_sriov_vfio_flr_prepare() - Notify PF that VF FLR prepare has started. 32 + * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf() 33 + * @vfid: the VF identifier (can't be 0) 34 + * 35 + * This function marks VF FLR as pending before PF receives GuC FLR event. 36 + * 37 + * Return: 0 on success or a negative error code on failure. 38 + */ 39 + int xe_sriov_vfio_flr_prepare(struct xe_device *xe, unsigned int vfid); 40 + 41 + /** 31 42 * xe_sriov_vfio_wait_flr_done() - Wait for VF FLR completion. 32 43 * @xe: the PF &xe_device obtained by calling xe_sriov_vfio_get_pf() 33 44 * @vfid: the VF identifier (can't be 0)
+90 -2
include/uapi/drm/xe_drm.h
··· 83 83 * - &DRM_IOCTL_XE_OBSERVATION 84 84 * - &DRM_IOCTL_XE_MADVISE 85 85 * - &DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS 86 + * - &DRM_IOCTL_XE_VM_GET_PROPERTY 86 87 */ 87 88 88 89 /* ··· 108 107 #define DRM_XE_MADVISE 0x0c 109 108 #define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS 0x0d 110 109 #define DRM_XE_EXEC_QUEUE_SET_PROPERTY 0x0e 110 + #define DRM_XE_VM_GET_PROPERTY 0x0f 111 111 112 112 /* Must be kept compact -- no holes */ 113 113 ··· 127 125 #define DRM_IOCTL_XE_MADVISE DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise) 128 126 #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr) 129 127 #define DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_SET_PROPERTY, struct drm_xe_exec_queue_set_property) 128 + #define DRM_IOCTL_XE_VM_GET_PROPERTY DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_GET_PROPERTY, struct drm_xe_vm_get_property) 130 129 131 130 /** 132 131 * DOC: Xe IOCTL Extensions ··· 1060 1057 * not invoke autoreset. Neither will stack variables going out of scope. 1061 1058 * Therefore it's recommended to always explicitly reset the madvises when 1062 1059 * freeing the memory backing a region used in a &DRM_IOCTL_XE_MADVISE call. 1063 - * - DRM_XE_VM_BIND_FLAG_DECOMPRESS - Request on-device decompression for a MAP. 1060 + * - %DRM_XE_VM_BIND_FLAG_DECOMPRESS - Request on-device decompression for a MAP. 1064 1061 * When set on a MAP bind operation, request the driver schedule an on-device 1065 1062 * in-place decompression (via the migrate/resolve path) for the GPU mapping 1066 1063 * created by this bind. Only valid for DRM_XE_VM_BIND_OP_MAP; usage on ··· 1117 1114 * incoherent GT access is possible. 1118 1115 * 1119 1116 * Note: For userptr and externally imported dma-buf the kernel expects 1120 - * either 1WAY or 2WAY for the @pat_index. 1117 + * either 1WAY or 2WAY for the @pat_index. Starting from NVL-P, for 1118 + * userptr, svm, madvise and externally imported dma-buf the kernel expects 1119 + * either 2WAY or 1WAY and XA @pat_index. 1121 1120 * 1122 1121 * For DRM_XE_VM_BIND_FLAG_NULL bindings there are no KMD restrictions 1123 1122 * on the @pat_index. For such mappings there is no actual memory being ··· 1264 1259 1265 1260 /** @reserved: Reserved */ 1266 1261 __u64 reserved[2]; 1262 + }; 1263 + 1264 + /** struct xe_vm_fault - Describes faults for %DRM_XE_VM_GET_PROPERTY_FAULTS */ 1265 + struct xe_vm_fault { 1266 + /** @address: Canonical address of the fault */ 1267 + __u64 address; 1268 + /** @address_precision: Precision of faulted address */ 1269 + __u32 address_precision; 1270 + /** @access_type: Type of address access that resulted in fault */ 1271 + #define FAULT_ACCESS_TYPE_READ 0 1272 + #define FAULT_ACCESS_TYPE_WRITE 1 1273 + #define FAULT_ACCESS_TYPE_ATOMIC 2 1274 + __u8 access_type; 1275 + /** @fault_type: Type of fault reported */ 1276 + #define FAULT_TYPE_NOT_PRESENT 0 1277 + #define FAULT_TYPE_WRITE_ACCESS 1 1278 + #define FAULT_TYPE_ATOMIC_ACCESS 2 1279 + __u8 fault_type; 1280 + /** @fault_level: fault level of the fault */ 1281 + #define FAULT_LEVEL_PTE 0 1282 + #define FAULT_LEVEL_PDE 1 1283 + #define FAULT_LEVEL_PDP 2 1284 + #define FAULT_LEVEL_PML4 3 1285 + #define FAULT_LEVEL_PML5 4 1286 + __u8 fault_level; 1287 + /** @pad: MBZ */ 1288 + __u8 pad; 1289 + /** @reserved: MBZ */ 1290 + __u64 reserved[4]; 1291 + }; 1292 + 1293 + /** 1294 + * struct drm_xe_vm_get_property - Input of &DRM_IOCTL_XE_VM_GET_PROPERTY 1295 + * 1296 + * The user provides a VM and a property to query among DRM_XE_VM_GET_PROPERTY_*, 1297 + * and sets the values in the vm_id and property members, respectively. This 1298 + * determines both the VM to get the property of, as well as the property to 1299 + * report. 1300 + * 1301 + * If size is set to 0, the driver fills it with the required size for the 1302 + * requested property. The user is expected here to allocate memory for the 1303 + * property structure and to provide a pointer to the allocated memory using the 1304 + * data member. For some properties, this may be zero, in which case, the 1305 + * value of the property will be saved to the value member and size will remain 1306 + * zero on return. 1307 + * 1308 + * If size is not zero, then the IOCTL will attempt to copy the requested 1309 + * property into the data member. 1310 + * 1311 + * The IOCTL will return -ENOENT if the VM could not be identified from the 1312 + * provided VM ID, or -EINVAL if the IOCTL fails for any other reason, such as 1313 + * providing an invalid size for the given property or if the property data 1314 + * could not be copied to the memory allocated to the data member. 1315 + * 1316 + * The property member can be: 1317 + * - %DRM_XE_VM_GET_PROPERTY_FAULTS 1318 + */ 1319 + struct drm_xe_vm_get_property { 1320 + /** @extensions: Pointer to the first extension struct, if any */ 1321 + __u64 extensions; 1322 + 1323 + /** @vm_id: The ID of the VM to query the properties of */ 1324 + __u32 vm_id; 1325 + 1326 + #define DRM_XE_VM_GET_PROPERTY_FAULTS 0 1327 + /** @property: property to get */ 1328 + __u32 property; 1329 + 1330 + /** @size: Size to allocate for @data */ 1331 + __u32 size; 1332 + 1333 + /** @pad: MBZ */ 1334 + __u32 pad; 1335 + 1336 + union { 1337 + /** @data: Pointer to user-defined array of flexible size and type */ 1338 + __u64 data; 1339 + /** @value: Return value for scalar queries */ 1340 + __u64 value; 1341 + }; 1342 + 1343 + /** @reserved: MBZ */ 1344 + __u64 reserved[3]; 1267 1345 }; 1268 1346 1269 1347 /**