Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

device-dax: introduce 'seed' devices

Add a seed device concept for dynamic dax regions to be able to split the
region amongst multiple sub-instances. The seed device, similar to
libnvdimm seed devices, is a device that starts with zero capacity
allocated and unbound to a driver. In contrast to libnvdimm seed devices
explicit 'create' and 'delete' interfaces are added to the region to
trigger seeds to be created and unused devices to be reclaimed. The
explicit create and delete replaces implicit create as a side effect of
probe and implicit delete when writing 0 to the size that libnvdimm
implements.

Delete can be performed on any 0-sized and idle device. This avoids the
gymnastics of needing to move device_unregister() to its own async
context. Specifically, it avoids the deadlock of deleting a device via
one of its own attributes. It is also less surprising to userspace which
never sees an extra device it did not request.

For now just add the device creation, teardown, and ->probe() prevention.
A later patch will arrange for the 'dax/size' attribute to be writable to
allocate capacity from the region.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106113873.30709.15168756050631539431.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Dan Williams and committed by
Linus Torvalds
0f3da14a f11cf813

+272 -40
+262 -39
drivers/dax/bus.c
··· 139 139 { 140 140 struct dax_device_driver *dax_drv = to_dax_drv(dev->driver); 141 141 struct dev_dax *dev_dax = to_dev_dax(dev); 142 + struct dax_region *dax_region = dev_dax->region; 143 + struct range *range = &dev_dax->range; 144 + int rc; 142 145 143 - return dax_drv->probe(dev_dax); 146 + if (range_len(range) == 0 || dev_dax->id < 0) 147 + return -ENXIO; 148 + 149 + rc = dax_drv->probe(dev_dax); 150 + 151 + if (rc || is_static(dax_region)) 152 + return rc; 153 + 154 + /* 155 + * Track new seed creation only after successful probe of the 156 + * previous seed. 157 + */ 158 + if (dax_region->seed == dev) 159 + dax_region->seed = NULL; 160 + 161 + return 0; 144 162 } 145 163 146 164 static int dax_bus_remove(struct device *dev) ··· 255 237 } 256 238 static DEVICE_ATTR_RO(available_size); 257 239 240 + static ssize_t seed_show(struct device *dev, 241 + struct device_attribute *attr, char *buf) 242 + { 243 + struct dax_region *dax_region = dev_get_drvdata(dev); 244 + struct device *seed; 245 + ssize_t rc; 246 + 247 + if (is_static(dax_region)) 248 + return -EINVAL; 249 + 250 + device_lock(dev); 251 + seed = dax_region->seed; 252 + rc = sprintf(buf, "%s\n", seed ? dev_name(seed) : ""); 253 + device_unlock(dev); 254 + 255 + return rc; 256 + } 257 + static DEVICE_ATTR_RO(seed); 258 + 259 + static ssize_t create_show(struct device *dev, 260 + struct device_attribute *attr, char *buf) 261 + { 262 + struct dax_region *dax_region = dev_get_drvdata(dev); 263 + struct device *youngest; 264 + ssize_t rc; 265 + 266 + if (is_static(dax_region)) 267 + return -EINVAL; 268 + 269 + device_lock(dev); 270 + youngest = dax_region->youngest; 271 + rc = sprintf(buf, "%s\n", youngest ? dev_name(youngest) : ""); 272 + device_unlock(dev); 273 + 274 + return rc; 275 + } 276 + 277 + static ssize_t create_store(struct device *dev, struct device_attribute *attr, 278 + const char *buf, size_t len) 279 + { 280 + struct dax_region *dax_region = dev_get_drvdata(dev); 281 + unsigned long long avail; 282 + ssize_t rc; 283 + int val; 284 + 285 + if (is_static(dax_region)) 286 + return -EINVAL; 287 + 288 + rc = kstrtoint(buf, 0, &val); 289 + if (rc) 290 + return rc; 291 + if (val != 1) 292 + return -EINVAL; 293 + 294 + device_lock(dev); 295 + avail = dax_region_avail_size(dax_region); 296 + if (avail == 0) 297 + rc = -ENOSPC; 298 + else { 299 + struct dev_dax_data data = { 300 + .dax_region = dax_region, 301 + .size = 0, 302 + .id = -1, 303 + }; 304 + struct dev_dax *dev_dax = devm_create_dev_dax(&data); 305 + 306 + if (IS_ERR(dev_dax)) 307 + rc = PTR_ERR(dev_dax); 308 + else { 309 + /* 310 + * In support of crafting multiple new devices 311 + * simultaneously multiple seeds can be created, 312 + * but only the first one that has not been 313 + * successfully bound is tracked as the region 314 + * seed. 315 + */ 316 + if (!dax_region->seed) 317 + dax_region->seed = &dev_dax->dev; 318 + dax_region->youngest = &dev_dax->dev; 319 + rc = len; 320 + } 321 + } 322 + device_unlock(dev); 323 + 324 + return rc; 325 + } 326 + static DEVICE_ATTR_RW(create); 327 + 328 + void kill_dev_dax(struct dev_dax *dev_dax) 329 + { 330 + struct dax_device *dax_dev = dev_dax->dax_dev; 331 + struct inode *inode = dax_inode(dax_dev); 332 + 333 + kill_dax(dax_dev); 334 + unmap_mapping_range(inode->i_mapping, 0, 0, 1); 335 + } 336 + EXPORT_SYMBOL_GPL(kill_dev_dax); 337 + 338 + static void free_dev_dax_range(struct dev_dax *dev_dax) 339 + { 340 + struct dax_region *dax_region = dev_dax->region; 341 + struct range *range = &dev_dax->range; 342 + 343 + device_lock_assert(dax_region->dev); 344 + if (range_len(range)) 345 + __release_region(&dax_region->res, range->start, 346 + range_len(range)); 347 + } 348 + 349 + static void unregister_dev_dax(void *dev) 350 + { 351 + struct dev_dax *dev_dax = to_dev_dax(dev); 352 + 353 + dev_dbg(dev, "%s\n", __func__); 354 + 355 + kill_dev_dax(dev_dax); 356 + free_dev_dax_range(dev_dax); 357 + device_del(dev); 358 + put_device(dev); 359 + } 360 + 361 + /* a return value >= 0 indicates this invocation invalidated the id */ 362 + static int __free_dev_dax_id(struct dev_dax *dev_dax) 363 + { 364 + struct dax_region *dax_region = dev_dax->region; 365 + struct device *dev = &dev_dax->dev; 366 + int rc = dev_dax->id; 367 + 368 + device_lock_assert(dev); 369 + 370 + if (is_static(dax_region) || dev_dax->id < 0) 371 + return -1; 372 + ida_free(&dax_region->ida, dev_dax->id); 373 + dev_dax->id = -1; 374 + return rc; 375 + } 376 + 377 + static int free_dev_dax_id(struct dev_dax *dev_dax) 378 + { 379 + struct device *dev = &dev_dax->dev; 380 + int rc; 381 + 382 + device_lock(dev); 383 + rc = __free_dev_dax_id(dev_dax); 384 + device_unlock(dev); 385 + return rc; 386 + } 387 + 388 + static ssize_t delete_store(struct device *dev, struct device_attribute *attr, 389 + const char *buf, size_t len) 390 + { 391 + struct dax_region *dax_region = dev_get_drvdata(dev); 392 + struct dev_dax *dev_dax; 393 + struct device *victim; 394 + bool do_del = false; 395 + int rc; 396 + 397 + if (is_static(dax_region)) 398 + return -EINVAL; 399 + 400 + victim = device_find_child_by_name(dax_region->dev, buf); 401 + if (!victim) 402 + return -ENXIO; 403 + 404 + device_lock(dev); 405 + device_lock(victim); 406 + dev_dax = to_dev_dax(victim); 407 + if (victim->driver || range_len(&dev_dax->range)) 408 + rc = -EBUSY; 409 + else { 410 + /* 411 + * Invalidate the device so it does not become active 412 + * again, but always preserve device-id-0 so that 413 + * /sys/bus/dax/ is guaranteed to be populated while any 414 + * dax_region is registered. 415 + */ 416 + if (dev_dax->id > 0) { 417 + do_del = __free_dev_dax_id(dev_dax) >= 0; 418 + rc = len; 419 + if (dax_region->seed == victim) 420 + dax_region->seed = NULL; 421 + if (dax_region->youngest == victim) 422 + dax_region->youngest = NULL; 423 + } else 424 + rc = -EBUSY; 425 + } 426 + device_unlock(victim); 427 + 428 + /* won the race to invalidate the device, clean it up */ 429 + if (do_del) 430 + devm_release_action(dev, unregister_dev_dax, victim); 431 + device_unlock(dev); 432 + put_device(victim); 433 + 434 + return rc; 435 + } 436 + static DEVICE_ATTR_WO(delete); 437 + 258 438 static umode_t dax_region_visible(struct kobject *kobj, struct attribute *a, 259 439 int n) 260 440 { 261 441 struct device *dev = container_of(kobj, struct device, kobj); 262 442 struct dax_region *dax_region = dev_get_drvdata(dev); 263 443 264 - if (is_static(dax_region) && a == &dev_attr_available_size.attr) 265 - return 0; 444 + if (is_static(dax_region)) 445 + if (a == &dev_attr_available_size.attr 446 + || a == &dev_attr_create.attr 447 + || a == &dev_attr_seed.attr 448 + || a == &dev_attr_delete.attr) 449 + return 0; 266 450 return a->mode; 267 451 } 268 452 ··· 472 252 &dev_attr_available_size.attr, 473 253 &dev_attr_region_size.attr, 474 254 &dev_attr_align.attr, 255 + &dev_attr_create.attr, 256 + &dev_attr_seed.attr, 257 + &dev_attr_delete.attr, 475 258 &dev_attr_id.attr, 476 259 NULL, 477 260 }; ··· 543 320 dax_region->align = align; 544 321 dax_region->dev = parent; 545 322 dax_region->target_node = target_node; 323 + ida_init(&dax_region->ida); 546 324 dax_region->res = (struct resource) { 547 325 .start = res->start, 548 326 .end = res->end, ··· 570 346 struct resource *alloc; 571 347 572 348 device_lock_assert(dax_region->dev); 349 + 350 + /* handle the seed alloc special case */ 351 + if (!size) { 352 + dev_dax->range = (struct range) { 353 + .start = res->start, 354 + .end = res->start - 1, 355 + }; 356 + return 0; 357 + } 573 358 574 359 /* TODO: handle multiple allocations per region */ 575 360 if (res->child) ··· 681 448 NULL, 682 449 }; 683 450 684 - void kill_dev_dax(struct dev_dax *dev_dax) 685 - { 686 - struct dax_device *dax_dev = dev_dax->dax_dev; 687 - struct inode *inode = dax_inode(dax_dev); 688 - 689 - kill_dax(dax_dev); 690 - unmap_mapping_range(inode->i_mapping, 0, 0, 1); 691 - } 692 - EXPORT_SYMBOL_GPL(kill_dev_dax); 693 - 694 - static void free_dev_dax_range(struct dev_dax *dev_dax) 695 - { 696 - struct dax_region *dax_region = dev_dax->region; 697 - struct range *range = &dev_dax->range; 698 - 699 - device_lock_assert(dax_region->dev); 700 - __release_region(&dax_region->res, range->start, range_len(range)); 701 - } 702 - 703 451 static void dev_dax_release(struct device *dev) 704 452 { 705 453 struct dev_dax *dev_dax = to_dev_dax(dev); 706 454 struct dax_region *dax_region = dev_dax->region; 707 455 struct dax_device *dax_dev = dev_dax->dax_dev; 708 456 709 - dax_region_put(dax_region); 710 457 put_dax(dax_dev); 458 + free_dev_dax_id(dev_dax); 459 + dax_region_put(dax_region); 711 460 kfree(dev_dax->pgmap); 712 461 kfree(dev_dax); 713 462 } ··· 698 483 .release = dev_dax_release, 699 484 .groups = dax_attribute_groups, 700 485 }; 701 - 702 - static void unregister_dev_dax(void *dev) 703 - { 704 - struct dev_dax *dev_dax = to_dev_dax(dev); 705 - 706 - dev_dbg(dev, "%s\n", __func__); 707 - 708 - kill_dev_dax(dev_dax); 709 - free_dev_dax_range(dev_dax); 710 - device_del(dev); 711 - put_device(dev); 712 - } 713 486 714 487 struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data) 715 488 { ··· 709 506 struct device *dev; 710 507 int rc; 711 508 712 - if (data->id < 0) 713 - return ERR_PTR(-EINVAL); 714 - 715 509 dev_dax = kzalloc(sizeof(*dev_dax), GFP_KERNEL); 716 510 if (!dev_dax) 717 511 return ERR_PTR(-ENOMEM); 718 512 513 + if (is_static(dax_region)) { 514 + if (dev_WARN_ONCE(parent, data->id < 0, 515 + "dynamic id specified to static region\n")) { 516 + rc = -EINVAL; 517 + goto err_id; 518 + } 519 + 520 + dev_dax->id = data->id; 521 + } else { 522 + if (dev_WARN_ONCE(parent, data->id >= 0, 523 + "static id specified to dynamic region\n")) { 524 + rc = -EINVAL; 525 + goto err_id; 526 + } 527 + 528 + rc = ida_alloc(&dax_region->ida, GFP_KERNEL); 529 + if (rc < 0) 530 + goto err_id; 531 + dev_dax->id = rc; 532 + } 533 + 719 534 dev_dax->region = dax_region; 720 535 dev = &dev_dax->dev; 721 536 device_initialize(dev); 722 - dev_set_name(dev, "dax%d.%d", dax_region->id, data->id); 537 + dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id); 723 538 724 539 rc = alloc_dev_dax_range(dev_dax, data->size); 725 540 if (rc) ··· 800 579 err_pgmap: 801 580 free_dev_dax_range(dev_dax); 802 581 err_range: 582 + free_dev_dax_id(dev_dax); 583 + err_id: 803 584 kfree(dev_dax); 804 585 805 586 return ERR_PTR(rc);
+9
drivers/dax/dax-private.h
··· 7 7 8 8 #include <linux/device.h> 9 9 #include <linux/cdev.h> 10 + #include <linux/idr.h> 10 11 11 12 /* private routines between core files */ 12 13 struct dax_device; ··· 23 22 * @kref: to pin while other agents have a need to do lookups 24 23 * @dev: parent device backing this region 25 24 * @align: allocation and mapping alignment for child dax devices 25 + * @ida: instance id allocator 26 26 * @res: resource tree to track instance allocations 27 + * @seed: allow userspace to find the first unbound seed device 28 + * @youngest: allow userspace to find the most recently created device 27 29 */ 28 30 struct dax_region { 29 31 int id; ··· 34 30 struct kref kref; 35 31 struct device *dev; 36 32 unsigned int align; 33 + struct ida ida; 37 34 struct resource res; 35 + struct device *seed; 36 + struct device *youngest; 38 37 }; 39 38 40 39 /** ··· 46 39 * @region - parent region 47 40 * @dax_dev - core dax functionality 48 41 * @target_node: effective numa node if dev_dax memory range is onlined 42 + * @id: ida allocated id 49 43 * @dev - device core 50 44 * @pgmap - pgmap for memmap setup / lifetime (driver owned) 51 45 * @range: resource range for the instance ··· 55 47 struct dax_region *region; 56 48 struct dax_device *dax_dev; 57 49 int target_node; 50 + int id; 58 51 struct device dev; 59 52 struct dev_pagemap *pgmap; 60 53 struct range range;
+1 -1
drivers/dax/hmem/hmem.c
··· 26 26 27 27 data = (struct dev_dax_data) { 28 28 .dax_region = dax_region, 29 - .id = 0, 29 + .id = -1, 30 30 .size = resource_size(res), 31 31 }; 32 32 dev_dax = devm_create_dev_dax(&data);