Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block updates from Jens Axboe:

- Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
copies between kernel and userspace by matching registered buffer
PFNs at I/O time. Includes selftests.

- Refactor bio integrity to support filesystem initiated integrity
operations and arbitrary buffer alignment.

- Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
unify synchronous bi_end_io callbacks.

- Fix zone write plug refcount handling and plug removal races. Add
support for serializing zone writes at QD=1 for rotational zoned
devices, yielding significant throughput improvements.

- Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
command.

- Add io_uring passthrough (uring_cmd) support to the BSG layer.

- Replace pp_buf in partition scanning with struct seq_buf.

- zloop improvements and cleanups.

- drbd genl cleanup, switching to pre_doit/post_doit.

- NVMe pull request via Keith:
- Fabrics authentication updates
- Enhanced block queue limits support
- Workqueue usage updates
- A new write zeroes device quirk
- Tagset cleanup fix for loop device

- MD pull requests via Yu Kuai:
- Fix raid5 soft lockup in retry_aligned_read()
- Fix raid10 deadlock with check operation and nowait requests
- Fix raid1 overlapping writes on writemostly disks
- Fix sysfs deadlock on array_state=clear
- Proactive RAID-5 parity building with llbitmap, with
write_zeroes_unmap optimization for initial sync
- Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
version mismatch fallback
- Fix bcache use-after-free and uninitialized closure
- Validate raid5 journal metadata payload size
- Various cleanups

- Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
block: refactor blkdev_zone_mgmt_ioctl
MAINTAINERS: update ublk driver maintainer email
Documentation: ublk: address review comments for SHMEM_ZC docs
ublk: allow buffer registration before device is started
ublk: replace xarray with IDA for shmem buffer index allocation
ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
ublk: verify all pages in multi-page bvec fall within registered range
ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
xfs: use bio_await in xfs_zone_gc_reset_sync
block: add a bio_submit_or_kill helper
block: factor out a bio_await helper
block: unify the synchronous bi_end_io callbacks
xfs: fix number of GC bvecs
selftests/ublk: add read-only buffer registration test
selftests/ublk: add filesystem fio verify test for shmem_zc
selftests/ublk: add hugetlbfs shmem_zc test for loop target
selftests/ublk: add shared memory zero-copy test
selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
...

+5504 -3124
+15
Documentation/ABI/stable/sysfs-block
··· 886 886 zone commands, they will be treated as regular block devices and 887 887 zoned will report "none". 888 888 889 + What: /sys/block/<disk>/queue/zoned_qd1_writes 890 + Date: January 2026 891 + Contact: Damien Le Moal <dlemoal@kernel.org> 892 + Description: 893 + [RW] zoned_qd1_writes indicates if write operations to a zoned 894 + block device are being handled using a single issuer context (a 895 + kernel thread) operating at a maximum queue depth of 1. This 896 + attribute is visible only for zoned block devices. The default 897 + value for zoned block devices that are not rotational devices 898 + (e.g. ZNS SSDs or zoned UFS devices) is 0. For rotational zoned 899 + block devices (e.g. SMR HDDs) the default value is 1. Since 900 + this default may not be appropriate for some devices, e.g. 901 + remotely connected devices over high latency networks, the user 902 + can disable this feature by setting this attribute to 0. 903 + 889 904 890 905 What: /sys/block/<disk>/hidden 891 906 Date: March 2023
+13
Documentation/ABI/testing/sysfs-nvme
··· 1 + What: /sys/devices/virtual/nvme-fabrics/ctl/.../tls_configured_key 2 + Date: November 2025 3 + KernelVersion: 6.19 4 + Contact: Linux NVMe mailing list <linux-nvme@lists.infradead.org> 5 + Description: 6 + The file is avaliable when using a secure concatanation 7 + connection to a NVMe target. Reading the file will return 8 + the serial of the currently negotiated key. 9 + 10 + Writing 0 to the file will trigger a PSK reauthentication 11 + (REPLACETLSPSK) with the target. After a reauthentication 12 + the value returned by tls_configured_key will be the new 13 + serial.
+9 -1
Documentation/admin-guide/blockdev/zoned_loop.rst
··· 62 62 /dev/zloop-control device:: 63 63 64 64 $ cat /dev/zloop-control 65 - add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io 65 + add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,max_open_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io,zone_append=%u,ordered_zone_append,discard_write_cache 66 66 remove id=%d 67 67 68 68 In more details, the options that can be used with the "add" command are as ··· 80 80 conv_zones Total number of conventioanl zones starting from 81 81 sector 0 82 82 Default: 8 83 + max_open_zones Maximum number of open sequential write required zones 84 + (0 for no limit). 85 + Default: 0 83 86 base_dir Path to the base directory where to create the directory 84 87 containing the zone files of the device. 85 88 Default=/var/local/zloop. ··· 107 104 (extents), as when enabled, this can significantly reduce 108 105 the number of data extents needed to for a file data 109 106 mapping. 107 + discard_write_cache Discard all data that was not explicitly persisted using a 108 + flush operation when the device is removed by truncating 109 + each zone file to the size recorded during the last flush 110 + operation. This simulates power fail events where 111 + uncommitted data is lost. 110 112 =================== ========================================================= 111 113 112 114 3) Deleting a Zoned Device
+1 -1
Documentation/block/inline-encryption.rst
··· 153 153 large, multiple bounce bios may be required; see the code for details. 154 154 155 155 For decryption, blk-crypto-fallback "wraps" the bio's completion callback 156 - (``bi_complete``) and private data (``bi_private``) with its own, unsets the 156 + (``bi_end_io``) and private data (``bi_private``) with its own, unsets the 157 157 bio's encryption context, then submits the bio. If the read completes 158 158 successfully, blk-crypto-fallback restores the bio's original completion 159 159 callback and private data, then decrypts the bio's data in-place using the
+119
Documentation/block/ublk.rst
··· 485 485 in case that too many ublk devices are handled by this single io_ring_ctx 486 486 and each one has very large queue depth 487 487 488 + Shared Memory Zero Copy (UBLK_F_SHMEM_ZC) 489 + ------------------------------------------ 490 + 491 + The ``UBLK_F_SHMEM_ZC`` feature provides an alternative zero-copy path 492 + that works by sharing physical memory pages between the client application 493 + and the ublk server. Unlike the io_uring fixed buffer approach above, 494 + shared memory zero copy does not require io_uring buffer registration 495 + per I/O — instead, it relies on the kernel matching physical pages 496 + at I/O time. This allows the ublk server to access the shared 497 + buffer directly, which is unlikely for the io_uring fixed buffer 498 + approach. 499 + 500 + Motivation 501 + ~~~~~~~~~~ 502 + 503 + Shared memory zero copy takes a different approach: if the client 504 + application and the ublk server both map the same physical memory, there is 505 + nothing to copy. The kernel detects the shared pages automatically and 506 + tells the server where the data already lives. 507 + 508 + ``UBLK_F_SHMEM_ZC`` can be thought of as a supplement for optimized client 509 + applications — when the client is willing to allocate I/O buffers from 510 + shared memory, the entire data path becomes zero-copy. 511 + 512 + Use Cases 513 + ~~~~~~~~~ 514 + 515 + This feature is useful when the client application can be configured to 516 + use a specific shared memory region for its I/O buffers: 517 + 518 + - **Custom storage clients** that allocate I/O buffers from shared memory 519 + (memfd, hugetlbfs) and issue direct I/O to the ublk device 520 + - **Database engines** that use pre-allocated buffer pools with O_DIRECT 521 + 522 + How It Works 523 + ~~~~~~~~~~~~ 524 + 525 + 1. The ublk server and client both ``mmap()`` the same file (memfd or 526 + hugetlbfs) with ``MAP_SHARED``. This gives both processes access to the 527 + same physical pages. 528 + 529 + 2. The ublk server registers its mapping with the kernel:: 530 + 531 + struct ublk_shmem_buf_reg buf = { .addr = mmap_va, .len = size }; 532 + ublk_ctrl_cmd(UBLK_U_CMD_REG_BUF, .addr = &buf); 533 + 534 + The kernel pins the pages and builds a PFN lookup tree. 535 + 536 + 3. When the client issues direct I/O (``O_DIRECT``) to ``/dev/ublkb*``, 537 + the kernel checks whether the I/O buffer pages match any registered 538 + pages by comparing PFNs. 539 + 540 + 4. On a match, the kernel sets ``UBLK_IO_F_SHMEM_ZC`` in the I/O 541 + descriptor and encodes the buffer index and offset in ``addr``:: 542 + 543 + if (iod->op_flags & UBLK_IO_F_SHMEM_ZC) { 544 + /* Data is already in our shared mapping — zero copy */ 545 + index = ublk_shmem_zc_index(iod->addr); 546 + offset = ublk_shmem_zc_offset(iod->addr); 547 + buf = shmem_table[index].mmap_base + offset; 548 + } 549 + 550 + 5. If pages do not match (e.g., the client used a non-shared buffer), 551 + the I/O falls back to the normal copy path silently. 552 + 553 + The shared memory can be set up via two methods: 554 + 555 + - **Socket-based**: the client sends a memfd to the ublk server via 556 + ``SCM_RIGHTS`` on a unix socket. The server mmaps and registers it. 557 + - **Hugetlbfs-based**: both processes ``mmap(MAP_SHARED)`` the same 558 + hugetlbfs file. No IPC needed — same file gives same physical pages. 559 + 560 + Advantages 561 + ~~~~~~~~~~ 562 + 563 + - **Simple**: no per-I/O buffer registration or unregistration commands. 564 + Once the shared buffer is registered, all matching I/O is zero-copy 565 + automatically. 566 + - **Direct buffer access**: the ublk server can read and write the shared 567 + buffer directly via its own mmap, without going through io_uring fixed 568 + buffer operations. This is more friendly for server implementations. 569 + - **Fast**: PFN matching is a single maple tree lookup per bvec. No 570 + io_uring command round-trips for buffer management. 571 + - **Compatible**: non-matching I/O silently falls back to the copy path. 572 + The device works normally for any client, with zero-copy as an 573 + optimization when shared memory is available. 574 + 575 + Limitations 576 + ~~~~~~~~~~~ 577 + 578 + - **Requires client cooperation**: the client must allocate its I/O 579 + buffers from the shared memory region. This requires a custom or 580 + configured client — standard applications using their own buffers 581 + will not benefit. 582 + - **Direct I/O only**: buffered I/O (without ``O_DIRECT``) goes through 583 + the page cache, which allocates its own pages. These kernel-allocated 584 + pages will never match the registered shared buffer. Only ``O_DIRECT`` 585 + puts the client's buffer pages directly into the block I/O. 586 + - **Contiguous data only**: each I/O request's data must be contiguous 587 + within a single registered buffer. Scatter/gather I/O that spans 588 + multiple non-adjacent registered buffers cannot use the zero-copy path. 589 + 590 + Control Commands 591 + ~~~~~~~~~~~~~~~~ 592 + 593 + - ``UBLK_U_CMD_REG_BUF`` 594 + 595 + Register a shared memory buffer. ``ctrl_cmd.addr`` points to a 596 + ``struct ublk_shmem_buf_reg`` containing the buffer virtual address and size. 597 + Returns the assigned buffer index (>= 0) on success. The kernel pins 598 + pages and builds the PFN lookup tree. Queue freeze is handled 599 + internally. 600 + 601 + - ``UBLK_U_CMD_UNREG_BUF`` 602 + 603 + Unregister a previously registered buffer. ``ctrl_cmd.data[0]`` is the 604 + buffer index. Unpins pages and removes PFN entries from the lookup 605 + tree. 606 + 488 607 References 489 608 ========== 490 609
+1 -1
MAINTAINERS
··· 27015 27015 F: fs/ubifs/ 27016 27016 27017 27017 UBLK USERSPACE BLOCK DRIVER 27018 - M: Ming Lei <ming.lei@redhat.com> 27018 + M: Ming Lei <tom.leiming@gmail.com> 27019 27019 L: linux-block@vger.kernel.org 27020 27020 S: Maintained 27021 27021 F: Documentation/block/ublk.rst
+159 -160
block/bio.c
··· 18 18 #include <linux/highmem.h> 19 19 #include <linux/blk-crypto.h> 20 20 #include <linux/xarray.h> 21 + #include <linux/kmemleak.h> 21 22 22 23 #include <trace/events/block.h> 23 24 #include "blk.h" ··· 34 33 unsigned int nr; 35 34 unsigned int nr_irq; 36 35 }; 36 + 37 + #define BIO_INLINE_VECS 4 37 38 38 39 static struct biovec_slab { 39 40 int nr_vecs; ··· 117 114 return bs->front_pad + sizeof(struct bio) + bs->back_pad; 118 115 } 119 116 117 + static inline void *bio_slab_addr(struct bio *bio) 118 + { 119 + return (void *)bio - bio->bi_pool->front_pad; 120 + } 121 + 120 122 static struct kmem_cache *bio_find_or_create_slab(struct bio_set *bs) 121 123 { 122 124 unsigned int size = bs_bio_slab_size(bs); ··· 167 159 mutex_unlock(&bio_slab_lock); 168 160 } 169 161 170 - void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs) 171 - { 172 - BUG_ON(nr_vecs > BIO_MAX_VECS); 173 - 174 - if (nr_vecs == BIO_MAX_VECS) 175 - mempool_free(bv, pool); 176 - else if (nr_vecs > BIO_INLINE_VECS) 177 - kmem_cache_free(biovec_slab(nr_vecs)->slab, bv); 178 - } 179 - 180 162 /* 181 163 * Make the first allocation restricted and don't dump info on allocation 182 164 * failures, since we'll fall back to the mempool in case of failure. 183 165 */ 184 - static inline gfp_t bvec_alloc_gfp(gfp_t gfp) 166 + static inline gfp_t try_alloc_gfp(gfp_t gfp) 185 167 { 186 168 return (gfp & ~(__GFP_DIRECT_RECLAIM | __GFP_IO)) | 187 169 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN; 188 - } 189 - 190 - struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs, 191 - gfp_t gfp_mask) 192 - { 193 - struct biovec_slab *bvs = biovec_slab(*nr_vecs); 194 - 195 - if (WARN_ON_ONCE(!bvs)) 196 - return NULL; 197 - 198 - /* 199 - * Upgrade the nr_vecs request to take full advantage of the allocation. 200 - * We also rely on this in the bvec_free path. 201 - */ 202 - *nr_vecs = bvs->nr_vecs; 203 - 204 - /* 205 - * Try a slab allocation first for all smaller allocations. If that 206 - * fails and __GFP_DIRECT_RECLAIM is set retry with the mempool. 207 - * The mempool is sized to handle up to BIO_MAX_VECS entries. 208 - */ 209 - if (*nr_vecs < BIO_MAX_VECS) { 210 - struct bio_vec *bvl; 211 - 212 - bvl = kmem_cache_alloc(bvs->slab, bvec_alloc_gfp(gfp_mask)); 213 - if (likely(bvl) || !(gfp_mask & __GFP_DIRECT_RECLAIM)) 214 - return bvl; 215 - *nr_vecs = BIO_MAX_VECS; 216 - } 217 - 218 - return mempool_alloc(pool, gfp_mask); 219 170 } 220 171 221 172 void bio_uninit(struct bio *bio) ··· 198 231 void *p = bio; 199 232 200 233 WARN_ON_ONCE(!bs); 234 + WARN_ON_ONCE(bio->bi_max_vecs > BIO_MAX_VECS); 201 235 202 236 bio_uninit(bio); 203 - bvec_free(&bs->bvec_pool, bio->bi_io_vec, bio->bi_max_vecs); 237 + if (bio->bi_max_vecs == BIO_MAX_VECS) 238 + mempool_free(bio->bi_io_vec, &bs->bvec_pool); 239 + else if (bio->bi_max_vecs > BIO_INLINE_VECS) 240 + kmem_cache_free(biovec_slab(bio->bi_max_vecs)->slab, 241 + bio->bi_io_vec); 204 242 mempool_free(p - bs->front_pad, &bs->bio_pool); 205 243 } 206 244 ··· 402 430 } 403 431 } 404 432 433 + /* 434 + * submit_bio_noacct() converts recursion to iteration; this means if we're 435 + * running beneath it, any bios we allocate and submit will not be submitted 436 + * (and thus freed) until after we return. 437 + * 438 + * This exposes us to a potential deadlock if we allocate multiple bios from the 439 + * same bio_set while running underneath submit_bio_noacct(). If we were to 440 + * allocate multiple bios (say a stacking block driver that was splitting bios), 441 + * we would deadlock if we exhausted the mempool's reserve. 442 + * 443 + * We solve this, and guarantee forward progress by punting the bios on 444 + * current->bio_list to a per bio_set rescuer workqueue before blocking to wait 445 + * for elements being returned to the mempool. 446 + */ 405 447 static void punt_bios_to_rescuer(struct bio_set *bs) 406 448 { 407 449 struct bio_list punt, nopunt; 408 450 struct bio *bio; 409 451 410 - if (WARN_ON_ONCE(!bs->rescue_workqueue)) 452 + if (!current->bio_list || !bs->rescue_workqueue) 411 453 return; 454 + if (bio_list_empty(&current->bio_list[0]) && 455 + bio_list_empty(&current->bio_list[1])) 456 + return; 457 + 412 458 /* 413 459 * In order to guarantee forward progress we must punt only bios that 414 460 * were allocated from this bio_set; otherwise, if there was a bio on ··· 473 483 local_irq_restore(flags); 474 484 } 475 485 476 - static struct bio *bio_alloc_percpu_cache(struct block_device *bdev, 477 - unsigned short nr_vecs, blk_opf_t opf, gfp_t gfp, 478 - struct bio_set *bs) 486 + static struct bio *bio_alloc_percpu_cache(struct bio_set *bs) 479 487 { 480 488 struct bio_alloc_cache *cache; 481 489 struct bio *bio; ··· 491 503 cache->free_list = bio->bi_next; 492 504 cache->nr--; 493 505 put_cpu(); 494 - 495 - if (nr_vecs) 496 - bio_init_inline(bio, bdev, nr_vecs, opf); 497 - else 498 - bio_init(bio, bdev, NULL, nr_vecs, opf); 499 506 bio->bi_pool = bs; 507 + 508 + kmemleak_alloc(bio_slab_addr(bio), 509 + kmem_cache_size(bs->bio_slab), 1, GFP_NOIO); 500 510 return bio; 501 511 } 502 512 ··· 503 517 * @bdev: block device to allocate the bio for (can be %NULL) 504 518 * @nr_vecs: number of bvecs to pre-allocate 505 519 * @opf: operation and flags for bio 506 - * @gfp_mask: the GFP_* mask given to the slab allocator 520 + * @gfp: the GFP_* mask given to the slab allocator 507 521 * @bs: the bio_set to allocate from. 508 522 * 509 523 * Allocate a bio from the mempools in @bs. ··· 533 547 * Returns: Pointer to new bio on success, NULL on failure. 534 548 */ 535 549 struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, 536 - blk_opf_t opf, gfp_t gfp_mask, 537 - struct bio_set *bs) 550 + blk_opf_t opf, gfp_t gfp, struct bio_set *bs) 538 551 { 539 - gfp_t saved_gfp = gfp_mask; 540 - struct bio *bio; 552 + struct bio_vec *bvecs = NULL; 553 + struct bio *bio = NULL; 554 + gfp_t saved_gfp = gfp; 541 555 void *p; 542 556 543 557 /* should not use nobvec bioset for nr_vecs > 0 */ 544 558 if (WARN_ON_ONCE(!mempool_initialized(&bs->bvec_pool) && nr_vecs > 0)) 545 559 return NULL; 546 560 561 + gfp = try_alloc_gfp(gfp); 547 562 if (bs->cache && nr_vecs <= BIO_INLINE_VECS) { 548 - opf |= REQ_ALLOC_CACHE; 549 - bio = bio_alloc_percpu_cache(bdev, nr_vecs, opf, 550 - gfp_mask, bs); 551 - if (bio) 552 - return bio; 553 563 /* 554 - * No cached bio available, bio returned below marked with 555 - * REQ_ALLOC_CACHE to participate in per-cpu alloc cache. 564 + * Set REQ_ALLOC_CACHE even if no cached bio is available to 565 + * return the allocated bio to the percpu cache when done. 556 566 */ 557 - } else 558 - opf &= ~REQ_ALLOC_CACHE; 559 - 560 - /* 561 - * submit_bio_noacct() converts recursion to iteration; this means if 562 - * we're running beneath it, any bios we allocate and submit will not be 563 - * submitted (and thus freed) until after we return. 564 - * 565 - * This exposes us to a potential deadlock if we allocate multiple bios 566 - * from the same bio_set() while running underneath submit_bio_noacct(). 567 - * If we were to allocate multiple bios (say a stacking block driver 568 - * that was splitting bios), we would deadlock if we exhausted the 569 - * mempool's reserve. 570 - * 571 - * We solve this, and guarantee forward progress, with a rescuer 572 - * workqueue per bio_set. If we go to allocate and there are bios on 573 - * current->bio_list, we first try the allocation without 574 - * __GFP_DIRECT_RECLAIM; if that fails, we punt those bios we would be 575 - * blocking to the rescuer workqueue before we retry with the original 576 - * gfp_flags. 577 - */ 578 - if (current->bio_list && 579 - (!bio_list_empty(&current->bio_list[0]) || 580 - !bio_list_empty(&current->bio_list[1])) && 581 - bs->rescue_workqueue) 582 - gfp_mask &= ~__GFP_DIRECT_RECLAIM; 583 - 584 - p = mempool_alloc(&bs->bio_pool, gfp_mask); 585 - if (!p && gfp_mask != saved_gfp) { 586 - punt_bios_to_rescuer(bs); 587 - gfp_mask = saved_gfp; 588 - p = mempool_alloc(&bs->bio_pool, gfp_mask); 589 - } 590 - if (unlikely(!p)) 591 - return NULL; 592 - if (!mempool_is_saturated(&bs->bio_pool)) 593 - opf &= ~REQ_ALLOC_CACHE; 594 - 595 - bio = p + bs->front_pad; 596 - if (nr_vecs > BIO_INLINE_VECS) { 597 - struct bio_vec *bvl = NULL; 598 - 599 - bvl = bvec_alloc(&bs->bvec_pool, &nr_vecs, gfp_mask); 600 - if (!bvl && gfp_mask != saved_gfp) { 601 - punt_bios_to_rescuer(bs); 602 - gfp_mask = saved_gfp; 603 - bvl = bvec_alloc(&bs->bvec_pool, &nr_vecs, gfp_mask); 604 - } 605 - if (unlikely(!bvl)) 606 - goto err_free; 607 - 608 - bio_init(bio, bdev, bvl, nr_vecs, opf); 609 - } else if (nr_vecs) { 610 - bio_init_inline(bio, bdev, BIO_INLINE_VECS, opf); 567 + opf |= REQ_ALLOC_CACHE; 568 + bio = bio_alloc_percpu_cache(bs); 611 569 } else { 612 - bio_init(bio, bdev, NULL, 0, opf); 570 + opf &= ~REQ_ALLOC_CACHE; 571 + p = kmem_cache_alloc(bs->bio_slab, gfp); 572 + if (p) 573 + bio = p + bs->front_pad; 613 574 } 614 575 576 + if (bio && nr_vecs > BIO_INLINE_VECS) { 577 + struct biovec_slab *bvs = biovec_slab(nr_vecs); 578 + 579 + /* 580 + * Upgrade nr_vecs to take full advantage of the allocation. 581 + * We also rely on this in bio_free(). 582 + */ 583 + nr_vecs = bvs->nr_vecs; 584 + bvecs = kmem_cache_alloc(bvs->slab, gfp); 585 + if (unlikely(!bvecs)) { 586 + kmem_cache_free(bs->bio_slab, p); 587 + bio = NULL; 588 + } 589 + } 590 + 591 + if (unlikely(!bio)) { 592 + /* 593 + * Give up if we are not allow to sleep as non-blocking mempool 594 + * allocations just go back to the slab allocation. 595 + */ 596 + if (!(saved_gfp & __GFP_DIRECT_RECLAIM)) 597 + return NULL; 598 + 599 + punt_bios_to_rescuer(bs); 600 + 601 + /* 602 + * Don't rob the mempools by returning to the per-CPU cache if 603 + * we're tight on memory. 604 + */ 605 + opf &= ~REQ_ALLOC_CACHE; 606 + 607 + p = mempool_alloc(&bs->bio_pool, saved_gfp); 608 + bio = p + bs->front_pad; 609 + if (nr_vecs > BIO_INLINE_VECS) { 610 + nr_vecs = BIO_MAX_VECS; 611 + bvecs = mempool_alloc(&bs->bvec_pool, saved_gfp); 612 + } 613 + } 614 + 615 + if (nr_vecs && nr_vecs <= BIO_INLINE_VECS) 616 + bio_init_inline(bio, bdev, nr_vecs, opf); 617 + else 618 + bio_init(bio, bdev, bvecs, nr_vecs, opf); 615 619 bio->bi_pool = bs; 616 620 return bio; 617 - 618 - err_free: 619 - mempool_free(p, &bs->bio_pool); 620 - return NULL; 621 621 } 622 622 EXPORT_SYMBOL(bio_alloc_bioset); 623 623 ··· 737 765 while ((bio = cache->free_list) != NULL) { 738 766 cache->free_list = bio->bi_next; 739 767 cache->nr--; 768 + kmemleak_alloc(bio_slab_addr(bio), 769 + kmem_cache_size(bio->bi_pool->bio_slab), 770 + 1, GFP_KERNEL); 740 771 bio_free(bio); 741 772 if (++i == nr) 742 773 break; ··· 803 828 bio->bi_bdev = NULL; 804 829 cache->free_list = bio; 805 830 cache->nr++; 831 + kmemleak_free(bio_slab_addr(bio)); 806 832 } else if (in_hardirq()) { 807 833 lockdep_assert_irqs_disabled(); 808 834 ··· 811 835 bio->bi_next = cache->free_list_irq; 812 836 cache->free_list_irq = bio; 813 837 cache->nr_irq++; 838 + kmemleak_free(bio_slab_addr(bio)); 814 839 } else { 815 840 goto out_free; 816 841 } ··· 874 897 * @gfp: allocation priority 875 898 * @bs: bio_set to allocate from 876 899 * 877 - * Allocate a new bio that is a clone of @bio_src. The caller owns the returned 878 - * bio, but not the actual data it points to. 879 - * 880 - * The caller must ensure that the return bio is not freed before @bio_src. 900 + * Allocate a new bio that is a clone of @bio_src. This reuses the bio_vecs 901 + * pointed to by @bio_src->bi_io_vec, and clones the iterator pointing to 902 + * the current position in it. The caller owns the returned bio, but not 903 + * the bio_vecs, and must ensure the bio is freed before the memory 904 + * pointed to by @bio_Src->bi_io_vecs. 881 905 */ 882 906 struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, 883 907 gfp_t gfp, struct bio_set *bs) ··· 907 929 * @gfp: allocation priority 908 930 * 909 931 * Initialize a new bio in caller provided memory that is a clone of @bio_src. 910 - * The caller owns the returned bio, but not the actual data it points to. 911 - * 912 - * The caller must ensure that @bio_src is not freed before @bio. 932 + * The same bio_vecs reuse and bio lifetime rules as bio_alloc_clone() apply. 913 933 */ 914 934 int bio_init_clone(struct block_device *bdev, struct bio *bio, 915 935 struct bio *bio_src, gfp_t gfp) ··· 1039 1063 unsigned int len, unsigned int offset) 1040 1064 { 1041 1065 if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) 1066 + return 0; 1067 + if (WARN_ON_ONCE(len == 0)) 1042 1068 return 0; 1043 1069 if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) 1044 1070 return 0; ··· 1462 1484 bio_iov_iter_unbounce_read(bio, is_error, mark_dirty); 1463 1485 } 1464 1486 1465 - static void submit_bio_wait_endio(struct bio *bio) 1487 + static void bio_wait_end_io(struct bio *bio) 1466 1488 { 1467 1489 complete(bio->bi_private); 1468 1490 } 1491 + 1492 + /** 1493 + * bio_await - call a function on a bio, and wait until it completes 1494 + * @bio: the bio which describes the I/O 1495 + * @submit: function called to submit the bio 1496 + * @priv: private data passed to @submit 1497 + * 1498 + * Wait for the bio as well as any bio chained off it after executing the 1499 + * passed in callback @submit. The wait for the bio is set up before calling 1500 + * @submit to ensure that the completion is captured. If @submit is %NULL, 1501 + * submit_bio() is used instead to submit the bio. 1502 + * 1503 + * Note: this overrides the bi_private and bi_end_io fields in the bio. 1504 + */ 1505 + void bio_await(struct bio *bio, void *priv, 1506 + void (*submit)(struct bio *bio, void *priv)) 1507 + { 1508 + DECLARE_COMPLETION_ONSTACK_MAP(done, 1509 + bio->bi_bdev->bd_disk->lockdep_map); 1510 + 1511 + bio->bi_private = &done; 1512 + bio->bi_end_io = bio_wait_end_io; 1513 + bio->bi_opf |= REQ_SYNC; 1514 + if (submit) 1515 + submit(bio, priv); 1516 + else 1517 + submit_bio(bio); 1518 + blk_wait_io(&done); 1519 + } 1520 + EXPORT_SYMBOL_GPL(bio_await); 1469 1521 1470 1522 /** 1471 1523 * submit_bio_wait - submit a bio, and wait until it completes ··· 1510 1502 */ 1511 1503 int submit_bio_wait(struct bio *bio) 1512 1504 { 1513 - DECLARE_COMPLETION_ONSTACK_MAP(done, 1514 - bio->bi_bdev->bd_disk->lockdep_map); 1515 - 1516 - bio->bi_private = &done; 1517 - bio->bi_end_io = submit_bio_wait_endio; 1518 - bio->bi_opf |= REQ_SYNC; 1519 - submit_bio(bio); 1520 - blk_wait_io(&done); 1521 - 1505 + bio_await(bio, NULL, NULL); 1522 1506 return blk_status_to_errno(bio->bi_status); 1523 1507 } 1524 1508 EXPORT_SYMBOL(submit_bio_wait); 1509 + 1510 + static void bio_endio_cb(struct bio *bio, void *priv) 1511 + { 1512 + bio_endio(bio); 1513 + } 1514 + 1515 + /* 1516 + * Submit @bio synchronously, or call bio_endio on it if the current process 1517 + * is being killed. 1518 + */ 1519 + int bio_submit_or_kill(struct bio *bio, unsigned int flags) 1520 + { 1521 + if ((flags & BLKDEV_ZERO_KILLABLE) && fatal_signal_pending(current)) { 1522 + bio_await(bio, NULL, bio_endio_cb); 1523 + return -EINTR; 1524 + } 1525 + 1526 + return submit_bio_wait(bio); 1527 + } 1525 1528 1526 1529 /** 1527 1530 * bdev_rw_virt - synchronously read into / write from kernel mapping ··· 1563 1544 return error; 1564 1545 } 1565 1546 EXPORT_SYMBOL_GPL(bdev_rw_virt); 1566 - 1567 - static void bio_wait_end_io(struct bio *bio) 1568 - { 1569 - complete(bio->bi_private); 1570 - bio_put(bio); 1571 - } 1572 - 1573 - /* 1574 - * bio_await_chain - ends @bio and waits for every chained bio to complete 1575 - */ 1576 - void bio_await_chain(struct bio *bio) 1577 - { 1578 - DECLARE_COMPLETION_ONSTACK_MAP(done, 1579 - bio->bi_bdev->bd_disk->lockdep_map); 1580 - 1581 - bio->bi_private = &done; 1582 - bio->bi_end_io = bio_wait_end_io; 1583 - bio_endio(bio); 1584 - blk_wait_io(&done); 1585 - } 1586 1547 1587 1548 void __bio_advance(struct bio *bio, unsigned bytes) 1588 1549 {
+16
block/blk-cgroup.c
··· 24 24 #include <linux/backing-dev.h> 25 25 #include <linux/slab.h> 26 26 #include <linux/delay.h> 27 + #include <linux/wait_bit.h> 27 28 #include <linux/atomic.h> 28 29 #include <linux/ctype.h> 29 30 #include <linux/resume_user_mode.h> ··· 612 611 613 612 q->root_blkg = NULL; 614 613 spin_unlock_irq(&q->queue_lock); 614 + 615 + wake_up_var(&q->root_blkg); 615 616 } 616 617 617 618 static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat *src) ··· 1501 1498 struct blkcg_gq *new_blkg, *blkg; 1502 1499 bool preloaded; 1503 1500 1501 + /* 1502 + * If the queue is shared across disk rebind (e.g., SCSI), the 1503 + * previous disk's blkcg state is cleaned up asynchronously via 1504 + * disk_release() -> blkcg_exit_disk(). Wait for that cleanup to 1505 + * finish (indicated by root_blkg becoming NULL) before setting up 1506 + * new blkcg state. Otherwise, we may overwrite q->root_blkg while 1507 + * the old one is still alive, and radix_tree_insert() in 1508 + * blkg_create() will fail with -EEXIST because the old entries 1509 + * still occupy the same queue id slot in blkcg->blkg_tree. 1510 + */ 1511 + wait_var_event(&q->root_blkg, !READ_ONCE(q->root_blkg)); 1512 + 1504 1513 new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL); 1505 1514 if (!new_blkg) 1506 1515 return -ENOMEM; ··· 2037 2022 return; 2038 2023 out: 2039 2024 rcu_read_unlock(); 2025 + put_disk(disk); 2040 2026 } 2041 2027 2042 2028 /**
+20 -20
block/blk-crypto-sysfs.c
··· 18 18 struct blk_crypto_attr { 19 19 struct attribute attr; 20 20 ssize_t (*show)(struct blk_crypto_profile *profile, 21 - struct blk_crypto_attr *attr, char *page); 21 + const struct blk_crypto_attr *attr, char *page); 22 22 }; 23 23 24 24 static struct blk_crypto_profile *kobj_to_crypto_profile(struct kobject *kobj) ··· 26 26 return container_of(kobj, struct blk_crypto_kobj, kobj)->profile; 27 27 } 28 28 29 - static struct blk_crypto_attr *attr_to_crypto_attr(struct attribute *attr) 29 + static const struct blk_crypto_attr *attr_to_crypto_attr(const struct attribute *attr) 30 30 { 31 - return container_of(attr, struct blk_crypto_attr, attr); 31 + return container_of_const(attr, struct blk_crypto_attr, attr); 32 32 } 33 33 34 34 static ssize_t hw_wrapped_keys_show(struct blk_crypto_profile *profile, 35 - struct blk_crypto_attr *attr, char *page) 35 + const struct blk_crypto_attr *attr, char *page) 36 36 { 37 37 /* Always show supported, since the file doesn't exist otherwise. */ 38 38 return sysfs_emit(page, "supported\n"); 39 39 } 40 40 41 41 static ssize_t max_dun_bits_show(struct blk_crypto_profile *profile, 42 - struct blk_crypto_attr *attr, char *page) 42 + const struct blk_crypto_attr *attr, char *page) 43 43 { 44 44 return sysfs_emit(page, "%u\n", 8 * profile->max_dun_bytes_supported); 45 45 } 46 46 47 47 static ssize_t num_keyslots_show(struct blk_crypto_profile *profile, 48 - struct blk_crypto_attr *attr, char *page) 48 + const struct blk_crypto_attr *attr, char *page) 49 49 { 50 50 return sysfs_emit(page, "%u\n", profile->num_slots); 51 51 } 52 52 53 53 static ssize_t raw_keys_show(struct blk_crypto_profile *profile, 54 - struct blk_crypto_attr *attr, char *page) 54 + const struct blk_crypto_attr *attr, char *page) 55 55 { 56 56 /* Always show supported, since the file doesn't exist otherwise. */ 57 57 return sysfs_emit(page, "supported\n"); 58 58 } 59 59 60 60 #define BLK_CRYPTO_RO_ATTR(_name) \ 61 - static struct blk_crypto_attr _name##_attr = __ATTR_RO(_name) 61 + static const struct blk_crypto_attr _name##_attr = __ATTR_RO(_name) 62 62 63 63 BLK_CRYPTO_RO_ATTR(hw_wrapped_keys); 64 64 BLK_CRYPTO_RO_ATTR(max_dun_bits); ··· 66 66 BLK_CRYPTO_RO_ATTR(raw_keys); 67 67 68 68 static umode_t blk_crypto_is_visible(struct kobject *kobj, 69 - struct attribute *attr, int n) 69 + const struct attribute *attr, int n) 70 70 { 71 71 struct blk_crypto_profile *profile = kobj_to_crypto_profile(kobj); 72 - struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 72 + const struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 73 73 74 74 if (a == &hw_wrapped_keys_attr && 75 75 !(profile->key_types_supported & BLK_CRYPTO_KEY_TYPE_HW_WRAPPED)) ··· 81 81 return 0444; 82 82 } 83 83 84 - static struct attribute *blk_crypto_attrs[] = { 84 + static const struct attribute *const blk_crypto_attrs[] = { 85 85 &hw_wrapped_keys_attr.attr, 86 86 &max_dun_bits_attr.attr, 87 87 &num_keyslots_attr.attr, ··· 90 90 }; 91 91 92 92 static const struct attribute_group blk_crypto_attr_group = { 93 - .attrs = blk_crypto_attrs, 94 - .is_visible = blk_crypto_is_visible, 93 + .attrs_const = blk_crypto_attrs, 94 + .is_visible_const = blk_crypto_is_visible, 95 95 }; 96 96 97 97 /* ··· 99 99 * modes, these are initialized at boot time by blk_crypto_sysfs_init(). 100 100 */ 101 101 static struct blk_crypto_attr __blk_crypto_mode_attrs[BLK_ENCRYPTION_MODE_MAX]; 102 - static struct attribute *blk_crypto_mode_attrs[BLK_ENCRYPTION_MODE_MAX + 1]; 102 + static const struct attribute *blk_crypto_mode_attrs[BLK_ENCRYPTION_MODE_MAX + 1]; 103 103 104 104 static umode_t blk_crypto_mode_is_visible(struct kobject *kobj, 105 - struct attribute *attr, int n) 105 + const struct attribute *attr, int n) 106 106 { 107 107 struct blk_crypto_profile *profile = kobj_to_crypto_profile(kobj); 108 - struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 108 + const struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 109 109 int mode_num = a - __blk_crypto_mode_attrs; 110 110 111 111 if (profile->modes_supported[mode_num]) ··· 114 114 } 115 115 116 116 static ssize_t blk_crypto_mode_show(struct blk_crypto_profile *profile, 117 - struct blk_crypto_attr *attr, char *page) 117 + const struct blk_crypto_attr *attr, char *page) 118 118 { 119 119 int mode_num = attr - __blk_crypto_mode_attrs; 120 120 ··· 123 123 124 124 static const struct attribute_group blk_crypto_modes_attr_group = { 125 125 .name = "modes", 126 - .attrs = blk_crypto_mode_attrs, 127 - .is_visible = blk_crypto_mode_is_visible, 126 + .attrs_const = blk_crypto_mode_attrs, 127 + .is_visible_const = blk_crypto_mode_is_visible, 128 128 }; 129 129 130 130 static const struct attribute_group *blk_crypto_attr_groups[] = { ··· 137 137 struct attribute *attr, char *page) 138 138 { 139 139 struct blk_crypto_profile *profile = kobj_to_crypto_profile(kobj); 140 - struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 140 + const struct blk_crypto_attr *a = attr_to_crypto_attr(attr); 141 141 142 142 return a->show(profile, a, page); 143 143 }
+3 -3
block/blk-ia-ranges.c
··· 30 30 ssize_t (*show)(struct blk_independent_access_range *iar, char *buf); 31 31 }; 32 32 33 - static struct blk_ia_range_sysfs_entry blk_ia_range_sector_entry = { 33 + static const struct blk_ia_range_sysfs_entry blk_ia_range_sector_entry = { 34 34 .attr = { .name = "sector", .mode = 0444 }, 35 35 .show = blk_ia_range_sector_show, 36 36 }; 37 37 38 - static struct blk_ia_range_sysfs_entry blk_ia_range_nr_sectors_entry = { 38 + static const struct blk_ia_range_sysfs_entry blk_ia_range_nr_sectors_entry = { 39 39 .attr = { .name = "nr_sectors", .mode = 0444 }, 40 40 .show = blk_ia_range_nr_sectors_show, 41 41 }; 42 42 43 - static struct attribute *blk_ia_range_attrs[] = { 43 + static const struct attribute *const blk_ia_range_attrs[] = { 44 44 &blk_ia_range_sector_entry.attr, 45 45 &blk_ia_range_nr_sectors_entry.attr, 46 46 NULL,
+17 -6
block/blk-iocost.c
··· 1596 1596 return HRTIMER_NORESTART; 1597 1597 } 1598 1598 1599 - static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p) 1599 + static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p, 1600 + u32 *nr_done) 1600 1601 { 1601 1602 u32 nr_met[2] = { }; 1602 1603 u32 nr_missed[2] = { }; ··· 1634 1633 1635 1634 *rq_wait_pct_p = div64_u64(rq_wait_ns * 100, 1636 1635 ioc->period_us * NSEC_PER_USEC); 1636 + 1637 + *nr_done = nr_met[READ] + nr_met[WRITE] + nr_missed[READ] + nr_missed[WRITE]; 1637 1638 } 1638 1639 1639 1640 /* was iocg idle this period? */ ··· 2253 2250 u64 usage_us_sum = 0; 2254 2251 u32 ppm_rthr; 2255 2252 u32 ppm_wthr; 2256 - u32 missed_ppm[2], rq_wait_pct; 2253 + u32 missed_ppm[2], rq_wait_pct, nr_done; 2257 2254 u64 period_vtime; 2258 2255 int prev_busy_level; 2259 2256 2260 2257 /* how were the latencies during the period? */ 2261 - ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct); 2258 + ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct, &nr_done); 2262 2259 2263 2260 /* take care of active iocgs */ 2264 2261 spin_lock_irq(&ioc->lock); ··· 2400 2397 * and should increase vtime rate. 2401 2398 */ 2402 2399 prev_busy_level = ioc->busy_level; 2403 - if (rq_wait_pct > RQ_WAIT_BUSY_PCT || 2404 - missed_ppm[READ] > ppm_rthr || 2405 - missed_ppm[WRITE] > ppm_wthr) { 2400 + if (!nr_done && nr_lagging) { 2401 + /* 2402 + * When there are lagging IOs but no completions, we don't 2403 + * know if the IO latency will meet the QoS targets. The 2404 + * disk might be saturated or not. We should not reset 2405 + * busy_level to 0 (which would prevent vrate from scaling 2406 + * up or down), but rather to keep it unchanged. 2407 + */ 2408 + } else if (rq_wait_pct > RQ_WAIT_BUSY_PCT || 2409 + missed_ppm[READ] > ppm_rthr || 2410 + missed_ppm[WRITE] > ppm_wthr) { 2406 2411 /* clearly missing QoS targets, slow down vrate */ 2407 2412 ioc->busy_level = max(ioc->busy_level, 0); 2408 2413 ioc->busy_level++;
+2 -14
block/blk-lib.c
··· 155 155 __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp, &bio, 156 156 flags, limit); 157 157 if (bio) { 158 - if ((flags & BLKDEV_ZERO_KILLABLE) && 159 - fatal_signal_pending(current)) { 160 - bio_await_chain(bio); 161 - blk_finish_plug(&plug); 162 - return -EINTR; 163 - } 164 - ret = submit_bio_wait(bio); 158 + ret = bio_submit_or_kill(bio, flags); 165 159 bio_put(bio); 166 160 } 167 161 blk_finish_plug(&plug); ··· 230 236 blk_start_plug(&plug); 231 237 __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp, &bio, flags); 232 238 if (bio) { 233 - if ((flags & BLKDEV_ZERO_KILLABLE) && 234 - fatal_signal_pending(current)) { 235 - bio_await_chain(bio); 236 - blk_finish_plug(&plug); 237 - return -EINTR; 238 - } 239 - ret = submit_bio_wait(bio); 239 + ret = bio_submit_or_kill(bio, flags); 240 240 bio_put(bio); 241 241 } 242 242 blk_finish_plug(&plug);
+1
block/blk-mq-debugfs.c
··· 97 97 QUEUE_FLAG_NAME(NO_ELV_SWITCH), 98 98 QUEUE_FLAG_NAME(QOS_ENABLED), 99 99 QUEUE_FLAG_NAME(BIO_ISSUE_TIME), 100 + QUEUE_FLAG_NAME(ZONED_QD1_WRITES), 100 101 }; 101 102 #undef QUEUE_FLAG_NAME 102 103
+5 -5
block/blk-mq-sysfs.c
··· 53 53 struct request_queue *q; 54 54 ssize_t res; 55 55 56 - entry = container_of(attr, struct blk_mq_hw_ctx_sysfs_entry, attr); 56 + entry = container_of_const(attr, struct blk_mq_hw_ctx_sysfs_entry, attr); 57 57 hctx = container_of(kobj, struct blk_mq_hw_ctx, kobj); 58 58 q = hctx->queue; 59 59 ··· 101 101 return pos + ret; 102 102 } 103 103 104 - static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_tags = { 104 + static const struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_tags = { 105 105 .attr = {.name = "nr_tags", .mode = 0444 }, 106 106 .show = blk_mq_hw_sysfs_nr_tags_show, 107 107 }; 108 - static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_reserved_tags = { 108 + static const struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_nr_reserved_tags = { 109 109 .attr = {.name = "nr_reserved_tags", .mode = 0444 }, 110 110 .show = blk_mq_hw_sysfs_nr_reserved_tags_show, 111 111 }; 112 - static struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_cpus = { 112 + static const struct blk_mq_hw_ctx_sysfs_entry blk_mq_hw_sysfs_cpus = { 113 113 .attr = {.name = "cpu_list", .mode = 0444 }, 114 114 .show = blk_mq_hw_sysfs_cpus_show, 115 115 }; 116 116 117 - static struct attribute *default_hw_ctx_attrs[] = { 117 + static const struct attribute *const default_hw_ctx_attrs[] = { 118 118 &blk_mq_hw_sysfs_nr_tags.attr, 119 119 &blk_mq_hw_sysfs_nr_reserved_tags.attr, 120 120 &blk_mq_hw_sysfs_cpus.attr,
+19
block/blk-mq.c
··· 3424 3424 */ 3425 3425 void blk_steal_bios(struct bio_list *list, struct request *rq) 3426 3426 { 3427 + struct bio *bio; 3428 + 3429 + for (bio = rq->bio; bio; bio = bio->bi_next) { 3430 + if (bio->bi_opf & REQ_POLLED) { 3431 + bio->bi_opf &= ~REQ_POLLED; 3432 + bio->bi_cookie = BLK_QC_T_NONE; 3433 + } 3434 + /* 3435 + * The alternate request queue that we may end up submitting 3436 + * the bio to may be frozen temporarily, in this case REQ_NOWAIT 3437 + * will fail the I/O immediately with EAGAIN to the issuer. 3438 + * We are not in the issuer context which cannot block. Clear 3439 + * the flag to avoid spurious EAGAIN I/O failures. 3440 + */ 3441 + bio->bi_opf &= ~REQ_NOWAIT; 3442 + bio_clear_flag(bio, BIO_QOS_THROTTLED); 3443 + bio_clear_flag(bio, BIO_QOS_MERGED); 3444 + } 3445 + 3427 3446 if (rq->bio) { 3428 3447 if (list->tail) 3429 3448 list->tail->bi_next = rq->bio;
+8 -4
block/blk-settings.c
··· 189 189 } 190 190 191 191 /* 192 - * The PI generation / validation helpers do not expect intervals to 193 - * straddle multiple bio_vecs. Enforce alignment so that those are 192 + * Some IO controllers can not handle data intervals straddling 193 + * multiple bio_vecs. For those, enforce alignment so that those are 194 194 * never generated, and that each buffer is aligned as expected. 195 195 */ 196 - if (bi->csum_type) { 196 + if (!(bi->flags & BLK_SPLIT_INTERVAL_CAPABLE) && bi->csum_type) { 197 197 lim->dma_alignment = max(lim->dma_alignment, 198 198 (1U << bi->interval_exp) - 1); 199 199 } ··· 992 992 if ((ti->flags & BLK_INTEGRITY_REF_TAG) != 993 993 (bi->flags & BLK_INTEGRITY_REF_TAG)) 994 994 goto incompatible; 995 + if ((ti->flags & BLK_SPLIT_INTERVAL_CAPABLE) && 996 + !(bi->flags & BLK_SPLIT_INTERVAL_CAPABLE)) 997 + ti->flags &= ~BLK_SPLIT_INTERVAL_CAPABLE; 995 998 } else { 996 999 ti->flags = BLK_INTEGRITY_STACKED; 997 1000 ti->flags |= (bi->flags & BLK_INTEGRITY_DEVICE_CAPABLE) | 998 - (bi->flags & BLK_INTEGRITY_REF_TAG); 1001 + (bi->flags & BLK_INTEGRITY_REF_TAG) | 1002 + (bi->flags & BLK_SPLIT_INTERVAL_CAPABLE); 999 1003 ti->csum_type = bi->csum_type; 1000 1004 ti->pi_tuple_size = bi->pi_tuple_size; 1001 1005 ti->metadata_size = bi->metadata_size;
+65 -24
block/blk-sysfs.c
··· 390 390 return queue_var_show(disk_nr_zones(disk), page); 391 391 } 392 392 393 + static ssize_t queue_zoned_qd1_writes_show(struct gendisk *disk, char *page) 394 + { 395 + return queue_var_show(!!blk_queue_zoned_qd1_writes(disk->queue), 396 + page); 397 + } 398 + 399 + static ssize_t queue_zoned_qd1_writes_store(struct gendisk *disk, 400 + const char *page, size_t count) 401 + { 402 + struct request_queue *q = disk->queue; 403 + unsigned long qd1_writes; 404 + unsigned int memflags; 405 + ssize_t ret; 406 + 407 + ret = queue_var_store(&qd1_writes, page, count); 408 + if (ret < 0) 409 + return ret; 410 + 411 + memflags = blk_mq_freeze_queue(q); 412 + blk_mq_quiesce_queue(q); 413 + if (qd1_writes) 414 + blk_queue_flag_set(QUEUE_FLAG_ZONED_QD1_WRITES, q); 415 + else 416 + blk_queue_flag_clear(QUEUE_FLAG_ZONED_QD1_WRITES, q); 417 + blk_mq_unquiesce_queue(q); 418 + blk_mq_unfreeze_queue(q, memflags); 419 + 420 + return count; 421 + } 422 + 393 423 static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page) 394 424 { 395 425 return queue_var_show(!!blk_queue_passthrough_stat(disk->queue), page); ··· 581 551 return 0; 582 552 } 583 553 584 - #define QUEUE_RO_ENTRY(_prefix, _name) \ 585 - static struct queue_sysfs_entry _prefix##_entry = { \ 586 - .attr = { .name = _name, .mode = 0444 }, \ 587 - .show = _prefix##_show, \ 554 + #define QUEUE_RO_ENTRY(_prefix, _name) \ 555 + static const struct queue_sysfs_entry _prefix##_entry = { \ 556 + .attr = { .name = _name, .mode = 0444 }, \ 557 + .show = _prefix##_show, \ 588 558 }; 589 559 590 - #define QUEUE_RW_ENTRY(_prefix, _name) \ 591 - static struct queue_sysfs_entry _prefix##_entry = { \ 592 - .attr = { .name = _name, .mode = 0644 }, \ 593 - .show = _prefix##_show, \ 594 - .store = _prefix##_store, \ 560 + #define QUEUE_RW_ENTRY(_prefix, _name) \ 561 + static const struct queue_sysfs_entry _prefix##_entry = { \ 562 + .attr = { .name = _name, .mode = 0644 }, \ 563 + .show = _prefix##_show, \ 564 + .store = _prefix##_store, \ 595 565 }; 596 566 597 567 #define QUEUE_LIM_RO_ENTRY(_prefix, _name) \ 598 - static struct queue_sysfs_entry _prefix##_entry = { \ 568 + static const struct queue_sysfs_entry _prefix##_entry = { \ 599 569 .attr = { .name = _name, .mode = 0444 }, \ 600 570 .show_limit = _prefix##_show, \ 601 571 } 602 572 603 573 #define QUEUE_LIM_RW_ENTRY(_prefix, _name) \ 604 - static struct queue_sysfs_entry _prefix##_entry = { \ 574 + static const struct queue_sysfs_entry _prefix##_entry = { \ 605 575 .attr = { .name = _name, .mode = 0644 }, \ 606 576 .show_limit = _prefix##_show, \ 607 577 .store_limit = _prefix##_store, \ ··· 647 617 QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); 648 618 649 619 QUEUE_LIM_RO_ENTRY(queue_zoned, "zoned"); 620 + QUEUE_RW_ENTRY(queue_zoned_qd1_writes, "zoned_qd1_writes"); 650 621 QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones"); 651 622 QUEUE_LIM_RO_ENTRY(queue_max_open_zones, "max_open_zones"); 652 623 QUEUE_LIM_RO_ENTRY(queue_max_active_zones, "max_active_zones"); ··· 665 634 QUEUE_LIM_RO_ENTRY(queue_dma_alignment, "dma_alignment"); 666 635 667 636 /* legacy alias for logical_block_size: */ 668 - static struct queue_sysfs_entry queue_hw_sector_size_entry = { 637 + static const struct queue_sysfs_entry queue_hw_sector_size_entry = { 669 638 .attr = {.name = "hw_sector_size", .mode = 0444 }, 670 639 .show_limit = queue_logical_block_size_show, 671 640 }; ··· 731 700 #endif 732 701 733 702 /* Common attributes for bio-based and request-based queues. */ 734 - static struct attribute *queue_attrs[] = { 703 + static const struct attribute *const queue_attrs[] = { 735 704 /* 736 705 * Attributes which are protected with q->limits_lock. 737 706 */ ··· 785 754 &queue_nomerges_entry.attr, 786 755 &queue_poll_entry.attr, 787 756 &queue_poll_delay_entry.attr, 757 + &queue_zoned_qd1_writes_entry.attr, 788 758 789 759 NULL, 790 760 }; 791 761 792 762 /* Request-based queue attributes that are not relevant for bio-based queues. */ 793 - static struct attribute *blk_mq_queue_attrs[] = { 763 + static const struct attribute *const blk_mq_queue_attrs[] = { 794 764 /* 795 765 * Attributes which require some form of locking other than 796 766 * q->sysfs_lock. ··· 811 779 NULL, 812 780 }; 813 781 814 - static umode_t queue_attr_visible(struct kobject *kobj, struct attribute *attr, 782 + static umode_t queue_attr_visible(struct kobject *kobj, const struct attribute *attr, 815 783 int n) 816 784 { 817 785 struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); 818 786 struct request_queue *q = disk->queue; 819 787 820 788 if ((attr == &queue_max_open_zones_entry.attr || 821 - attr == &queue_max_active_zones_entry.attr) && 789 + attr == &queue_max_active_zones_entry.attr || 790 + attr == &queue_zoned_qd1_writes_entry.attr) && 822 791 !blk_queue_is_zoned(q)) 823 792 return 0; 824 793 ··· 827 794 } 828 795 829 796 static umode_t blk_mq_queue_attr_visible(struct kobject *kobj, 830 - struct attribute *attr, int n) 797 + const struct attribute *attr, int n) 831 798 { 832 799 struct gendisk *disk = container_of(kobj, struct gendisk, queue_kobj); 833 800 struct request_queue *q = disk->queue; ··· 841 808 return attr->mode; 842 809 } 843 810 844 - static struct attribute_group queue_attr_group = { 845 - .attrs = queue_attrs, 846 - .is_visible = queue_attr_visible, 811 + static const struct attribute_group queue_attr_group = { 812 + .attrs_const = queue_attrs, 813 + .is_visible_const = queue_attr_visible, 847 814 }; 848 815 849 - static struct attribute_group blk_mq_queue_attr_group = { 850 - .attrs = blk_mq_queue_attrs, 851 - .is_visible = blk_mq_queue_attr_visible, 816 + static const struct attribute_group blk_mq_queue_attr_group = { 817 + .attrs_const = blk_mq_queue_attrs, 818 + .is_visible_const = blk_mq_queue_attr_visible, 852 819 }; 853 820 854 - #define to_queue(atr) container_of((atr), struct queue_sysfs_entry, attr) 821 + #define to_queue(atr) container_of_const((atr), struct queue_sysfs_entry, attr) 855 822 856 823 static ssize_t 857 824 queue_attr_show(struct kobject *kobj, struct attribute *attr, char *page) ··· 966 933 if (queue_is_mq(q)) 967 934 blk_mq_debugfs_register(q); 968 935 blk_debugfs_unlock(q, memflags); 936 + 937 + /* 938 + * For blk-mq rotational zoned devices, default to using QD=1 939 + * writes. For non-mq rotational zoned devices, the device driver can 940 + * set an appropriate default. 941 + */ 942 + if (queue_is_mq(q) && blk_queue_rot(q) && blk_queue_is_zoned(q)) 943 + blk_queue_flag_set(QUEUE_FLAG_ZONED_QD1_WRITES, q); 969 944 970 945 ret = disk_register_independent_access_ranges(disk); 971 946 if (ret)
+3 -2
block/blk-wbt.c
··· 782 782 return; 783 783 784 784 rwb = wbt_alloc(); 785 - if (WARN_ON_ONCE(!rwb)) 785 + if (!rwb) 786 786 return; 787 787 788 - if (WARN_ON_ONCE(wbt_init(disk, rwb))) { 788 + if (wbt_init(disk, rwb)) { 789 + pr_warn("%s: failed to enable wbt\n", disk->disk_name); 789 790 wbt_free(rwb); 790 791 return; 791 792 }
+295 -182
block/blk-zoned.c
··· 16 16 #include <linux/spinlock.h> 17 17 #include <linux/refcount.h> 18 18 #include <linux/mempool.h> 19 + #include <linux/kthread.h> 20 + #include <linux/freezer.h> 19 21 20 22 #include <trace/events/block.h> 21 23 ··· 42 40 /* 43 41 * Per-zone write plug. 44 42 * @node: hlist_node structure for managing the plug using a hash table. 43 + * @entry: list_head structure for listing the plug in the disk list of active 44 + * zone write plugs. 45 45 * @bio_list: The list of BIOs that are currently plugged. 46 46 * @bio_work: Work struct to handle issuing of plugged BIOs 47 47 * @rcu_head: RCU head to free zone write plugs with an RCU grace period. ··· 66 62 */ 67 63 struct blk_zone_wplug { 68 64 struct hlist_node node; 65 + struct list_head entry; 69 66 struct bio_list bio_list; 70 67 struct work_struct bio_work; 71 68 struct rcu_head rcu_head; ··· 104 99 * being executed or the zone write plug bio list is not empty. 105 100 * - BLK_ZONE_WPLUG_NEED_WP_UPDATE: Indicates that we lost track of a zone 106 101 * write pointer offset and need to update it. 107 - * - BLK_ZONE_WPLUG_UNHASHED: Indicates that the zone write plug was removed 108 - * from the disk hash table and that the initial reference to the zone 109 - * write plug set when the plug was first added to the hash table has been 110 - * dropped. This flag is set when a zone is reset, finished or become full, 111 - * to prevent new references to the zone write plug to be taken for 112 - * newly incoming BIOs. A zone write plug flagged with this flag will be 113 - * freed once all remaining references from BIOs or functions are dropped. 102 + * - BLK_ZONE_WPLUG_DEAD: Indicates that the zone write plug will be 103 + * removed from the disk hash table of zone write plugs when the last 104 + * reference on the zone write plug is dropped. If set, this flag also 105 + * indicates that the initial extra reference on the zone write plug was 106 + * dropped, meaning that the reference count indicates the current number of 107 + * active users (code context or BIOs and requests in flight). This flag is 108 + * set when a zone is reset, finished or becomes full. 114 109 */ 115 110 #define BLK_ZONE_WPLUG_PLUGGED (1U << 0) 116 111 #define BLK_ZONE_WPLUG_NEED_WP_UPDATE (1U << 1) 117 - #define BLK_ZONE_WPLUG_UNHASHED (1U << 2) 112 + #define BLK_ZONE_WPLUG_DEAD (1U << 2) 118 113 119 114 /** 120 115 * blk_zone_cond_str - Return a zone condition name string ··· 417 412 return 0; 418 413 } 419 414 420 - static int blkdev_truncate_zone_range(struct block_device *bdev, 421 - blk_mode_t mode, const struct blk_zone_range *zrange) 415 + static int blkdev_reset_zone(struct block_device *bdev, blk_mode_t mode, 416 + struct blk_zone_range *zrange) 422 417 { 423 418 loff_t start, end; 419 + int ret = -EINVAL; 424 420 421 + inode_lock(bdev->bd_mapping->host); 422 + filemap_invalidate_lock(bdev->bd_mapping); 425 423 if (zrange->sector + zrange->nr_sectors <= zrange->sector || 426 424 zrange->sector + zrange->nr_sectors > get_capacity(bdev->bd_disk)) 427 425 /* Out of range */ 428 - return -EINVAL; 426 + goto out_unlock; 429 427 430 428 start = zrange->sector << SECTOR_SHIFT; 431 429 end = ((zrange->sector + zrange->nr_sectors) << SECTOR_SHIFT) - 1; 432 430 433 - return truncate_bdev_range(bdev, mode, start, end); 431 + ret = truncate_bdev_range(bdev, mode, start, end); 432 + if (ret) 433 + goto out_unlock; 434 + 435 + ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET, zrange->sector, 436 + zrange->nr_sectors); 437 + out_unlock: 438 + filemap_invalidate_unlock(bdev->bd_mapping); 439 + inode_unlock(bdev->bd_mapping->host); 440 + return ret; 434 441 } 435 442 436 443 /* ··· 455 438 void __user *argp = (void __user *)arg; 456 439 struct blk_zone_range zrange; 457 440 enum req_op op; 458 - int ret; 459 441 460 442 if (!argp) 461 443 return -EINVAL; ··· 470 454 471 455 switch (cmd) { 472 456 case BLKRESETZONE: 473 - op = REQ_OP_ZONE_RESET; 474 - 475 - /* Invalidate the page cache, including dirty pages. */ 476 - inode_lock(bdev->bd_mapping->host); 477 - filemap_invalidate_lock(bdev->bd_mapping); 478 - ret = blkdev_truncate_zone_range(bdev, mode, &zrange); 479 - if (ret) 480 - goto fail; 481 - break; 457 + return blkdev_reset_zone(bdev, mode, &zrange); 482 458 case BLKOPENZONE: 483 459 op = REQ_OP_ZONE_OPEN; 484 460 break; ··· 484 476 return -ENOTTY; 485 477 } 486 478 487 - ret = blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors); 488 - 489 - fail: 490 - if (cmd == BLKRESETZONE) { 491 - filemap_invalidate_unlock(bdev->bd_mapping); 492 - inode_unlock(bdev->bd_mapping->host); 493 - } 494 - 495 - return ret; 479 + return blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors); 496 480 } 497 481 498 482 static bool disk_zone_is_last(struct gendisk *disk, struct blk_zone *zone) ··· 492 492 return zone->start + zone->len >= get_capacity(disk); 493 493 } 494 494 495 - static bool disk_zone_is_full(struct gendisk *disk, 496 - unsigned int zno, unsigned int offset_in_zone) 497 - { 498 - if (zno < disk->nr_zones - 1) 499 - return offset_in_zone >= disk->zone_capacity; 500 - return offset_in_zone >= disk->last_zone_capacity; 501 - } 502 - 503 495 static bool disk_zone_wplug_is_full(struct gendisk *disk, 504 496 struct blk_zone_wplug *zwplug) 505 497 { 506 - return disk_zone_is_full(disk, zwplug->zone_no, zwplug->wp_offset); 498 + if (zwplug->zone_no < disk->nr_zones - 1) 499 + return zwplug->wp_offset >= disk->zone_capacity; 500 + return zwplug->wp_offset >= disk->last_zone_capacity; 507 501 } 508 502 509 503 static bool disk_insert_zone_wplug(struct gendisk *disk, ··· 514 520 * are racing with other submission context, so we may already have a 515 521 * zone write plug for the same zone. 516 522 */ 517 - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); 523 + spin_lock_irqsave(&disk->zone_wplugs_hash_lock, flags); 518 524 hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) { 519 525 if (zwplg->zone_no == zwplug->zone_no) { 520 - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); 526 + spin_unlock_irqrestore(&disk->zone_wplugs_hash_lock, 527 + flags); 521 528 return false; 522 529 } 523 530 } ··· 530 535 * necessarilly in the active condition. 531 536 */ 532 537 zones_cond = rcu_dereference_check(disk->zones_cond, 533 - lockdep_is_held(&disk->zone_wplugs_lock)); 538 + lockdep_is_held(&disk->zone_wplugs_hash_lock)); 534 539 if (zones_cond) 535 540 zwplug->cond = zones_cond[zwplug->zone_no]; 536 541 else ··· 538 543 539 544 hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]); 540 545 atomic_inc(&disk->nr_zone_wplugs); 541 - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); 546 + spin_unlock_irqrestore(&disk->zone_wplugs_hash_lock, flags); 542 547 543 548 return true; 544 549 } ··· 582 587 mempool_free(zwplug, zwplug->disk->zone_wplugs_pool); 583 588 } 584 589 585 - static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) 590 + static void disk_free_zone_wplug(struct blk_zone_wplug *zwplug) 586 591 { 587 - if (refcount_dec_and_test(&zwplug->ref)) { 588 - WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); 589 - WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); 590 - WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_UNHASHED)); 591 - 592 - call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); 593 - } 594 - } 595 - 596 - static inline bool disk_should_remove_zone_wplug(struct gendisk *disk, 597 - struct blk_zone_wplug *zwplug) 598 - { 599 - lockdep_assert_held(&zwplug->lock); 600 - 601 - /* If the zone write plug was already removed, we are done. */ 602 - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) 603 - return false; 604 - 605 - /* If the zone write plug is still plugged, it cannot be removed. */ 606 - if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) 607 - return false; 608 - 609 - /* 610 - * Completions of BIOs with blk_zone_write_plug_bio_endio() may 611 - * happen after handling a request completion with 612 - * blk_zone_write_plug_finish_request() (e.g. with split BIOs 613 - * that are chained). In such case, disk_zone_wplug_unplug_bio() 614 - * should not attempt to remove the zone write plug until all BIO 615 - * completions are seen. Check by looking at the zone write plug 616 - * reference count, which is 2 when the plug is unused (one reference 617 - * taken when the plug was allocated and another reference taken by the 618 - * caller context). 619 - */ 620 - if (refcount_read(&zwplug->ref) > 2) 621 - return false; 622 - 623 - /* We can remove zone write plugs for zones that are empty or full. */ 624 - return !zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug); 625 - } 626 - 627 - static void disk_remove_zone_wplug(struct gendisk *disk, 628 - struct blk_zone_wplug *zwplug) 629 - { 592 + struct gendisk *disk = zwplug->disk; 630 593 unsigned long flags; 631 594 632 - /* If the zone write plug was already removed, we have nothing to do. */ 633 - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) 634 - return; 595 + WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_DEAD)); 596 + WARN_ON_ONCE(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED); 597 + WARN_ON_ONCE(!bio_list_empty(&zwplug->bio_list)); 635 598 636 - /* 637 - * Mark the zone write plug as unhashed and drop the extra reference we 638 - * took when the plug was inserted in the hash table. Also update the 639 - * disk zone condition array with the current condition of the zone 640 - * write plug. 641 - */ 642 - zwplug->flags |= BLK_ZONE_WPLUG_UNHASHED; 643 - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); 599 + spin_lock_irqsave(&disk->zone_wplugs_hash_lock, flags); 644 600 blk_zone_set_cond(rcu_dereference_check(disk->zones_cond, 645 - lockdep_is_held(&disk->zone_wplugs_lock)), 601 + lockdep_is_held(&disk->zone_wplugs_hash_lock)), 646 602 zwplug->zone_no, zwplug->cond); 647 603 hlist_del_init_rcu(&zwplug->node); 648 604 atomic_dec(&disk->nr_zone_wplugs); 649 - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); 605 + spin_unlock_irqrestore(&disk->zone_wplugs_hash_lock, flags); 606 + 607 + call_rcu(&zwplug->rcu_head, disk_free_zone_wplug_rcu); 608 + } 609 + 610 + static inline void disk_put_zone_wplug(struct blk_zone_wplug *zwplug) 611 + { 612 + if (refcount_dec_and_test(&zwplug->ref)) 613 + disk_free_zone_wplug(zwplug); 614 + } 615 + 616 + /* 617 + * Flag the zone write plug as dead and drop the initial reference we got when 618 + * the zone write plug was added to the hash table. The zone write plug will be 619 + * unhashed when its last reference is dropped. 620 + */ 621 + static void disk_mark_zone_wplug_dead(struct blk_zone_wplug *zwplug) 622 + { 623 + lockdep_assert_held(&zwplug->lock); 624 + 625 + if (!(zwplug->flags & BLK_ZONE_WPLUG_DEAD)) { 626 + zwplug->flags |= BLK_ZONE_WPLUG_DEAD; 627 + disk_put_zone_wplug(zwplug); 628 + } 629 + } 630 + 631 + static bool disk_zone_wplug_submit_bio(struct gendisk *disk, 632 + struct blk_zone_wplug *zwplug); 633 + 634 + static void blk_zone_wplug_bio_work(struct work_struct *work) 635 + { 636 + struct blk_zone_wplug *zwplug = 637 + container_of(work, struct blk_zone_wplug, bio_work); 638 + 639 + disk_zone_wplug_submit_bio(zwplug->disk, zwplug); 640 + 641 + /* Drop the reference we took in disk_zone_wplug_schedule_work(). */ 650 642 disk_put_zone_wplug(zwplug); 651 643 } 652 644 653 - static void blk_zone_wplug_bio_work(struct work_struct *work); 654 - 655 645 /* 656 - * Get a reference on the write plug for the zone containing @sector. 657 - * If the plug does not exist, it is allocated and hashed. 658 - * Return a pointer to the zone write plug with the plug spinlock held. 646 + * Get a zone write plug for the zone containing @sector. 647 + * If the plug does not exist, it is allocated and inserted in the disk hash 648 + * table. 659 649 */ 660 - static struct blk_zone_wplug *disk_get_and_lock_zone_wplug(struct gendisk *disk, 661 - sector_t sector, gfp_t gfp_mask, 662 - unsigned long *flags) 650 + static struct blk_zone_wplug *disk_get_or_alloc_zone_wplug(struct gendisk *disk, 651 + sector_t sector, gfp_t gfp_mask) 663 652 { 664 653 unsigned int zno = disk_zone_no(disk, sector); 665 654 struct blk_zone_wplug *zwplug; 666 655 667 656 again: 668 657 zwplug = disk_get_zone_wplug(disk, sector); 669 - if (zwplug) { 670 - /* 671 - * Check that a BIO completion or a zone reset or finish 672 - * operation has not already removed the zone write plug from 673 - * the hash table and dropped its reference count. In such case, 674 - * we need to get a new plug so start over from the beginning. 675 - */ 676 - spin_lock_irqsave(&zwplug->lock, *flags); 677 - if (zwplug->flags & BLK_ZONE_WPLUG_UNHASHED) { 678 - spin_unlock_irqrestore(&zwplug->lock, *flags); 679 - disk_put_zone_wplug(zwplug); 680 - goto again; 681 - } 658 + if (zwplug) 682 659 return zwplug; 683 - } 684 660 685 661 /* 686 662 * Allocate and initialize a zone write plug with an extra reference ··· 670 704 zwplug->wp_offset = bdev_offset_from_zone_start(disk->part0, sector); 671 705 bio_list_init(&zwplug->bio_list); 672 706 INIT_WORK(&zwplug->bio_work, blk_zone_wplug_bio_work); 707 + INIT_LIST_HEAD(&zwplug->entry); 673 708 zwplug->disk = disk; 674 - 675 - spin_lock_irqsave(&zwplug->lock, *flags); 676 709 677 710 /* 678 711 * Insert the new zone write plug in the hash table. This can fail only ··· 679 714 * in such case. 680 715 */ 681 716 if (!disk_insert_zone_wplug(disk, zwplug)) { 682 - spin_unlock_irqrestore(&zwplug->lock, *flags); 683 717 mempool_free(zwplug, disk->zone_wplugs_pool); 684 718 goto again; 685 719 } ··· 703 739 */ 704 740 static void disk_zone_wplug_abort(struct blk_zone_wplug *zwplug) 705 741 { 742 + struct gendisk *disk = zwplug->disk; 706 743 struct bio *bio; 707 744 708 745 lockdep_assert_held(&zwplug->lock); ··· 717 752 blk_zone_wplug_bio_io_error(zwplug, bio); 718 753 719 754 zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; 755 + 756 + /* 757 + * If we are using the per disk zone write plugs worker thread, remove 758 + * the zone write plug from the work list and drop the reference we 759 + * took when the zone write plug was added to that list. 760 + */ 761 + if (blk_queue_zoned_qd1_writes(disk->queue)) { 762 + spin_lock(&disk->zone_wplugs_list_lock); 763 + if (!list_empty(&zwplug->entry)) { 764 + list_del_init(&zwplug->entry); 765 + disk_put_zone_wplug(zwplug); 766 + } 767 + spin_unlock(&disk->zone_wplugs_list_lock); 768 + } 720 769 } 721 770 722 771 /* ··· 767 788 disk_zone_wplug_update_cond(disk, zwplug); 768 789 769 790 disk_zone_wplug_abort(zwplug); 770 - 771 - /* 772 - * The zone write plug now has no BIO plugged: remove it from the 773 - * hash table so that it cannot be seen. The plug will be freed 774 - * when the last reference is dropped. 775 - */ 776 - if (disk_should_remove_zone_wplug(disk, zwplug)) 777 - disk_remove_zone_wplug(disk, zwplug); 791 + if (!zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug)) 792 + disk_mark_zone_wplug_dead(zwplug); 778 793 } 779 794 780 795 static unsigned int blk_zone_wp_offset(struct blk_zone *zone) ··· 1165 1192 } 1166 1193 } 1167 1194 1168 - static void disk_zone_wplug_schedule_bio_work(struct gendisk *disk, 1169 - struct blk_zone_wplug *zwplug) 1195 + static void disk_zone_wplug_schedule_work(struct gendisk *disk, 1196 + struct blk_zone_wplug *zwplug) 1170 1197 { 1171 1198 lockdep_assert_held(&zwplug->lock); 1172 1199 1173 1200 /* 1174 - * Take a reference on the zone write plug and schedule the submission 1175 - * of the next plugged BIO. blk_zone_wplug_bio_work() will release the 1176 - * reference we take here. 1201 + * Schedule the submission of the next plugged BIO. Taking a reference 1202 + * to the zone write plug is required as the bio_work belongs to the 1203 + * plug, and thus we must ensure that the write plug does not go away 1204 + * while the work is being scheduled but has not run yet. 1205 + * blk_zone_wplug_bio_work() will release the reference we take here, 1206 + * and we also drop this reference if the work is already scheduled. 1177 1207 */ 1178 1208 WARN_ON_ONCE(!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)); 1209 + WARN_ON_ONCE(blk_queue_zoned_qd1_writes(disk->queue)); 1179 1210 refcount_inc(&zwplug->ref); 1180 - queue_work(disk->zone_wplugs_wq, &zwplug->bio_work); 1211 + if (!queue_work(disk->zone_wplugs_wq, &zwplug->bio_work)) 1212 + disk_put_zone_wplug(zwplug); 1181 1213 } 1182 1214 1183 1215 static inline void disk_zone_wplug_add_bio(struct gendisk *disk, ··· 1219 1241 bio_list_add(&zwplug->bio_list, bio); 1220 1242 trace_disk_zone_wplug_add_bio(zwplug->disk->queue, zwplug->zone_no, 1221 1243 bio->bi_iter.bi_sector, bio_sectors(bio)); 1244 + 1245 + /* 1246 + * If we are using the disk zone write plugs worker instead of the per 1247 + * zone write plug BIO work, add the zone write plug to the work list 1248 + * if it is not already there. Make sure to also get an extra reference 1249 + * on the zone write plug so that it does not go away until it is 1250 + * removed from the work list. 1251 + */ 1252 + if (blk_queue_zoned_qd1_writes(disk->queue)) { 1253 + spin_lock(&disk->zone_wplugs_list_lock); 1254 + if (list_empty(&zwplug->entry)) { 1255 + list_add_tail(&zwplug->entry, &disk->zone_wplugs_list); 1256 + refcount_inc(&zwplug->ref); 1257 + } 1258 + spin_unlock(&disk->zone_wplugs_list_lock); 1259 + } 1222 1260 } 1223 1261 1224 1262 /* ··· 1432 1438 if (bio->bi_opf & REQ_NOWAIT) 1433 1439 gfp_mask = GFP_NOWAIT; 1434 1440 1435 - zwplug = disk_get_and_lock_zone_wplug(disk, sector, gfp_mask, &flags); 1441 + zwplug = disk_get_or_alloc_zone_wplug(disk, sector, gfp_mask); 1436 1442 if (!zwplug) { 1437 1443 if (bio->bi_opf & REQ_NOWAIT) 1438 1444 bio_wouldblock_error(bio); 1439 1445 else 1440 1446 bio_io_error(bio); 1447 + return true; 1448 + } 1449 + 1450 + spin_lock_irqsave(&zwplug->lock, flags); 1451 + 1452 + /* 1453 + * If we got a zone write plug marked as dead, then the user is issuing 1454 + * writes to a full zone, or without synchronizing with zone reset or 1455 + * zone finish operations. In such case, fail the BIO to signal this 1456 + * invalid usage. 1457 + */ 1458 + if (zwplug->flags & BLK_ZONE_WPLUG_DEAD) { 1459 + spin_unlock_irqrestore(&zwplug->lock, flags); 1460 + disk_put_zone_wplug(zwplug); 1461 + bio_io_error(bio); 1441 1462 return true; 1442 1463 } 1443 1464 ··· 1467 1458 bio->bi_opf &= ~REQ_NOWAIT; 1468 1459 goto queue_bio; 1469 1460 } 1461 + 1462 + /* 1463 + * For rotational devices, we will use the gendisk zone write plugs 1464 + * work instead of the per zone write plug BIO work, so queue the BIO. 1465 + */ 1466 + if (blk_queue_zoned_qd1_writes(disk->queue)) 1467 + goto queue_bio; 1470 1468 1471 1469 /* If the zone is already plugged, add the BIO to the BIO plug list. */ 1472 1470 if (zwplug->flags & BLK_ZONE_WPLUG_PLUGGED) ··· 1497 1481 1498 1482 if (!(zwplug->flags & BLK_ZONE_WPLUG_PLUGGED)) { 1499 1483 zwplug->flags |= BLK_ZONE_WPLUG_PLUGGED; 1500 - disk_zone_wplug_schedule_bio_work(disk, zwplug); 1484 + if (blk_queue_zoned_qd1_writes(disk->queue)) 1485 + wake_up_process(disk->zone_wplugs_worker); 1486 + else 1487 + disk_zone_wplug_schedule_work(disk, zwplug); 1501 1488 } 1502 1489 1503 1490 spin_unlock_irqrestore(&zwplug->lock, flags); ··· 1546 1527 disk->disk_name, zwplug->zone_no); 1547 1528 disk_zone_wplug_abort(zwplug); 1548 1529 } 1549 - disk_remove_zone_wplug(disk, zwplug); 1530 + disk_mark_zone_wplug_dead(zwplug); 1550 1531 spin_unlock_irqrestore(&zwplug->lock, flags); 1551 1532 1552 1533 disk_put_zone_wplug(zwplug); ··· 1641 1622 1642 1623 spin_lock_irqsave(&zwplug->lock, flags); 1643 1624 1644 - /* Schedule submission of the next plugged BIO if we have one. */ 1645 - if (!bio_list_empty(&zwplug->bio_list)) { 1646 - disk_zone_wplug_schedule_bio_work(disk, zwplug); 1647 - spin_unlock_irqrestore(&zwplug->lock, flags); 1648 - return; 1649 - } 1650 - 1651 - zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; 1652 - 1653 1625 /* 1654 - * If the zone is full (it was fully written or finished, or empty 1655 - * (it was reset), remove its zone write plug from the hash table. 1626 + * For rotational devices, signal the BIO completion to the zone write 1627 + * plug work. Otherwise, schedule submission of the next plugged BIO 1628 + * if we have one. 1656 1629 */ 1657 - if (disk_should_remove_zone_wplug(disk, zwplug)) 1658 - disk_remove_zone_wplug(disk, zwplug); 1630 + if (bio_list_empty(&zwplug->bio_list)) 1631 + zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; 1632 + 1633 + if (blk_queue_zoned_qd1_writes(disk->queue)) 1634 + complete(&disk->zone_wplugs_worker_bio_done); 1635 + else if (!bio_list_empty(&zwplug->bio_list)) 1636 + disk_zone_wplug_schedule_work(disk, zwplug); 1637 + 1638 + if (!zwplug->wp_offset || disk_zone_wplug_is_full(disk, zwplug)) 1639 + disk_mark_zone_wplug_dead(zwplug); 1659 1640 1660 1641 spin_unlock_irqrestore(&zwplug->lock, flags); 1661 1642 } ··· 1746 1727 disk_put_zone_wplug(zwplug); 1747 1728 } 1748 1729 1749 - static void blk_zone_wplug_bio_work(struct work_struct *work) 1730 + static bool disk_zone_wplug_submit_bio(struct gendisk *disk, 1731 + struct blk_zone_wplug *zwplug) 1750 1732 { 1751 - struct blk_zone_wplug *zwplug = 1752 - container_of(work, struct blk_zone_wplug, bio_work); 1753 1733 struct block_device *bdev; 1754 1734 unsigned long flags; 1755 1735 struct bio *bio; ··· 1764 1746 if (!bio) { 1765 1747 zwplug->flags &= ~BLK_ZONE_WPLUG_PLUGGED; 1766 1748 spin_unlock_irqrestore(&zwplug->lock, flags); 1767 - goto put_zwplug; 1749 + return false; 1768 1750 } 1769 1751 1770 1752 trace_blk_zone_wplug_bio(zwplug->disk->queue, zwplug->zone_no, ··· 1778 1760 goto again; 1779 1761 } 1780 1762 1781 - bdev = bio->bi_bdev; 1782 - 1783 1763 /* 1784 1764 * blk-mq devices will reuse the extra reference on the request queue 1785 1765 * usage counter we took when the BIO was plugged, but the submission 1786 1766 * path for BIO-based devices will not do that. So drop this extra 1787 1767 * reference here. 1788 1768 */ 1769 + if (blk_queue_zoned_qd1_writes(disk->queue)) 1770 + reinit_completion(&disk->zone_wplugs_worker_bio_done); 1771 + bdev = bio->bi_bdev; 1789 1772 if (bdev_test_flag(bdev, BD_HAS_SUBMIT_BIO)) { 1790 1773 bdev->bd_disk->fops->submit_bio(bio); 1791 1774 blk_queue_exit(bdev->bd_disk->queue); ··· 1794 1775 blk_mq_submit_bio(bio); 1795 1776 } 1796 1777 1797 - put_zwplug: 1798 - /* Drop the reference we took in disk_zone_wplug_schedule_bio_work(). */ 1799 - disk_put_zone_wplug(zwplug); 1778 + return true; 1779 + } 1780 + 1781 + static struct blk_zone_wplug *disk_get_zone_wplugs_work(struct gendisk *disk) 1782 + { 1783 + struct blk_zone_wplug *zwplug; 1784 + 1785 + spin_lock_irq(&disk->zone_wplugs_list_lock); 1786 + zwplug = list_first_entry_or_null(&disk->zone_wplugs_list, 1787 + struct blk_zone_wplug, entry); 1788 + if (zwplug) 1789 + list_del_init(&zwplug->entry); 1790 + spin_unlock_irq(&disk->zone_wplugs_list_lock); 1791 + 1792 + return zwplug; 1793 + } 1794 + 1795 + static int disk_zone_wplugs_worker(void *data) 1796 + { 1797 + struct gendisk *disk = data; 1798 + struct blk_zone_wplug *zwplug; 1799 + unsigned int noio_flag; 1800 + 1801 + noio_flag = memalloc_noio_save(); 1802 + set_user_nice(current, MIN_NICE); 1803 + set_freezable(); 1804 + 1805 + for (;;) { 1806 + set_current_state(TASK_INTERRUPTIBLE | TASK_FREEZABLE); 1807 + 1808 + zwplug = disk_get_zone_wplugs_work(disk); 1809 + if (zwplug) { 1810 + /* 1811 + * Process all BIOs of this zone write plug and then 1812 + * drop the reference we took when adding the zone write 1813 + * plug to the active list. 1814 + */ 1815 + set_current_state(TASK_RUNNING); 1816 + while (disk_zone_wplug_submit_bio(disk, zwplug)) 1817 + blk_wait_io(&disk->zone_wplugs_worker_bio_done); 1818 + disk_put_zone_wplug(zwplug); 1819 + continue; 1820 + } 1821 + 1822 + /* 1823 + * Only sleep if nothing sets the state to running. Else check 1824 + * for zone write plugs work again as a newly submitted BIO 1825 + * might have added a zone write plug to the work list. 1826 + */ 1827 + if (get_current_state() == TASK_RUNNING) { 1828 + try_to_freeze(); 1829 + } else { 1830 + if (kthread_should_stop()) { 1831 + set_current_state(TASK_RUNNING); 1832 + break; 1833 + } 1834 + schedule(); 1835 + } 1836 + } 1837 + 1838 + WARN_ON_ONCE(!list_empty(&disk->zone_wplugs_list)); 1839 + memalloc_noio_restore(noio_flag); 1840 + 1841 + return 0; 1800 1842 } 1801 1843 1802 1844 void disk_init_zone_resources(struct gendisk *disk) 1803 1845 { 1804 - spin_lock_init(&disk->zone_wplugs_lock); 1846 + spin_lock_init(&disk->zone_wplugs_hash_lock); 1847 + spin_lock_init(&disk->zone_wplugs_list_lock); 1848 + INIT_LIST_HEAD(&disk->zone_wplugs_list); 1849 + init_completion(&disk->zone_wplugs_worker_bio_done); 1805 1850 } 1806 1851 1807 1852 /* ··· 1881 1798 unsigned int pool_size) 1882 1799 { 1883 1800 unsigned int i; 1801 + int ret = -ENOMEM; 1884 1802 1885 1803 atomic_set(&disk->nr_zone_wplugs, 0); 1886 1804 disk->zone_wplugs_hash_bits = ··· 1907 1823 if (!disk->zone_wplugs_wq) 1908 1824 goto destroy_pool; 1909 1825 1826 + disk->zone_wplugs_worker = 1827 + kthread_create(disk_zone_wplugs_worker, disk, 1828 + "%s_zwplugs_worker", disk->disk_name); 1829 + if (IS_ERR(disk->zone_wplugs_worker)) { 1830 + ret = PTR_ERR(disk->zone_wplugs_worker); 1831 + disk->zone_wplugs_worker = NULL; 1832 + goto destroy_wq; 1833 + } 1834 + wake_up_process(disk->zone_wplugs_worker); 1835 + 1910 1836 return 0; 1911 1837 1838 + destroy_wq: 1839 + destroy_workqueue(disk->zone_wplugs_wq); 1840 + disk->zone_wplugs_wq = NULL; 1912 1841 destroy_pool: 1913 1842 mempool_destroy(disk->zone_wplugs_pool); 1914 1843 disk->zone_wplugs_pool = NULL; ··· 1929 1832 kfree(disk->zone_wplugs_hash); 1930 1833 disk->zone_wplugs_hash = NULL; 1931 1834 disk->zone_wplugs_hash_bits = 0; 1932 - return -ENOMEM; 1835 + return ret; 1933 1836 } 1934 1837 1935 1838 static void disk_destroy_zone_wplugs_hash_table(struct gendisk *disk) ··· 1945 1848 while (!hlist_empty(&disk->zone_wplugs_hash[i])) { 1946 1849 zwplug = hlist_entry(disk->zone_wplugs_hash[i].first, 1947 1850 struct blk_zone_wplug, node); 1948 - refcount_inc(&zwplug->ref); 1949 - disk_remove_zone_wplug(disk, zwplug); 1950 - disk_put_zone_wplug(zwplug); 1851 + spin_lock_irq(&zwplug->lock); 1852 + disk_mark_zone_wplug_dead(zwplug); 1853 + spin_unlock_irq(&zwplug->lock); 1951 1854 } 1952 1855 } 1953 1856 ··· 1969 1872 { 1970 1873 unsigned long flags; 1971 1874 1972 - spin_lock_irqsave(&disk->zone_wplugs_lock, flags); 1875 + spin_lock_irqsave(&disk->zone_wplugs_hash_lock, flags); 1973 1876 zones_cond = rcu_replace_pointer(disk->zones_cond, zones_cond, 1974 - lockdep_is_held(&disk->zone_wplugs_lock)); 1975 - spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags); 1877 + lockdep_is_held(&disk->zone_wplugs_hash_lock)); 1878 + spin_unlock_irqrestore(&disk->zone_wplugs_hash_lock, flags); 1976 1879 1977 1880 kfree_rcu_mightsleep(zones_cond); 1978 1881 } 1979 1882 1980 1883 void disk_free_zone_resources(struct gendisk *disk) 1981 1884 { 1885 + if (disk->zone_wplugs_worker) 1886 + kthread_stop(disk->zone_wplugs_worker); 1887 + WARN_ON_ONCE(!list_empty(&disk->zone_wplugs_list)); 1888 + 1982 1889 if (disk->zone_wplugs_wq) { 1983 1890 destroy_workqueue(disk->zone_wplugs_wq); 1984 1891 disk->zone_wplugs_wq = NULL; ··· 2011 1910 { 2012 1911 struct queue_limits *lim = &disk->queue->limits; 2013 1912 unsigned int pool_size; 1913 + int ret = 0; 2014 1914 2015 1915 args->disk = disk; 2016 1916 args->nr_zones = ··· 2034 1932 pool_size = 2035 1933 min(BLK_ZONE_WPLUG_DEFAULT_POOL_SIZE, args->nr_zones); 2036 1934 2037 - if (!disk->zone_wplugs_hash) 2038 - return disk_alloc_zone_resources(disk, pool_size); 1935 + if (!disk->zone_wplugs_hash) { 1936 + ret = disk_alloc_zone_resources(disk, pool_size); 1937 + if (ret) 1938 + kfree(args->zones_cond); 1939 + } 2039 1940 2040 - return 0; 1941 + return ret; 2041 1942 } 2042 1943 2043 1944 /* ··· 2072 1967 disk->zone_capacity = args->zone_capacity; 2073 1968 disk->last_zone_capacity = args->last_zone_capacity; 2074 1969 disk_set_zones_cond_array(disk, args->zones_cond); 1970 + args->zones_cond = NULL; 2075 1971 2076 1972 /* 2077 1973 * Some devices can advertise zone resource limits that are larger than ··· 2184 2078 struct gendisk *disk = args->disk; 2185 2079 struct blk_zone_wplug *zwplug; 2186 2080 unsigned int wp_offset; 2187 - unsigned long flags; 2188 2081 2189 2082 /* 2190 2083 * Remember the capacity of the first sequential zone and check ··· 2213 2108 if (!wp_offset || wp_offset >= zone->capacity) 2214 2109 return 0; 2215 2110 2216 - zwplug = disk_get_and_lock_zone_wplug(disk, zone->wp, GFP_NOIO, &flags); 2111 + zwplug = disk_get_or_alloc_zone_wplug(disk, zone->wp, GFP_NOIO); 2217 2112 if (!zwplug) 2218 2113 return -ENOMEM; 2219 - spin_unlock_irqrestore(&zwplug->lock, flags); 2220 2114 disk_put_zone_wplug(zwplug); 2221 2115 2222 2116 return 0; ··· 2353 2249 } 2354 2250 memalloc_noio_restore(noio_flag); 2355 2251 2252 + if (ret <= 0) 2253 + goto free_resources; 2254 + 2356 2255 /* 2357 2256 * If zones where reported, make sure that the entire disk capacity 2358 2257 * has been checked. 2359 2258 */ 2360 - if (ret > 0 && args.sector != capacity) { 2259 + if (args.sector != capacity) { 2361 2260 pr_warn("%s: Missing zones from sector %llu\n", 2362 2261 disk->disk_name, args.sector); 2363 2262 ret = -ENODEV; 2263 + goto free_resources; 2364 2264 } 2365 2265 2366 - if (ret > 0) 2367 - return disk_update_zone_resources(disk, &args); 2266 + ret = disk_update_zone_resources(disk, &args); 2267 + if (ret) 2268 + goto free_resources; 2368 2269 2270 + return 0; 2271 + 2272 + free_resources: 2369 2273 pr_warn("%s: failed to revalidate zones\n", disk->disk_name); 2370 2274 2275 + kfree(args.zones_cond); 2371 2276 memflags = blk_mq_freeze_queue(q); 2372 2277 disk_free_zone_resources(disk); 2373 2278 blk_mq_unfreeze_queue(q, memflags);
+1 -6
block/blk.h
··· 55 55 struct task_struct *owner); 56 56 int __bio_queue_enter(struct request_queue *q, struct bio *bio); 57 57 void submit_bio_noacct_nocheck(struct bio *bio, bool split); 58 - void bio_await_chain(struct bio *bio); 58 + int bio_submit_or_kill(struct bio *bio, unsigned int flags); 59 59 60 60 static inline bool blk_try_enter_queue(struct request_queue *q, bool pm) 61 61 { ··· 107 107 108 108 struct block_device *blkdev_get_no_open(dev_t dev, bool autoload); 109 109 void blkdev_put_no_open(struct block_device *bdev); 110 - 111 - #define BIO_INLINE_VECS 4 112 - struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs, 113 - gfp_t gfp_mask); 114 - void bvec_free(mempool_t *pool, struct bio_vec *bv, unsigned short nr_vecs); 115 110 116 111 bool bvec_try_merge_hw_page(struct request_queue *q, struct bio_vec *bv, 117 112 struct page *page, unsigned len, unsigned offset);
+1 -1
block/bsg-lib.c
··· 393 393 394 394 blk_queue_rq_timeout(q, BLK_DEFAULT_SG_TIMEOUT); 395 395 396 - bset->bd = bsg_register_queue(q, dev, name, bsg_transport_sg_io_fn); 396 + bset->bd = bsg_register_queue(q, dev, name, bsg_transport_sg_io_fn, NULL); 397 397 if (IS_ERR(bset->bd)) { 398 398 ret = PTR_ERR(bset->bd); 399 399 goto out_cleanup_queue;
+32 -1
block/bsg.c
··· 12 12 #include <linux/idr.h> 13 13 #include <linux/bsg.h> 14 14 #include <linux/slab.h> 15 + #include <linux/io_uring/cmd.h> 15 16 16 17 #include <scsi/scsi.h> 17 18 #include <scsi/scsi_ioctl.h> ··· 29 28 unsigned int timeout; 30 29 unsigned int reserved_size; 31 30 bsg_sg_io_fn *sg_io_fn; 31 + bsg_uring_cmd_fn *uring_cmd_fn; 32 32 }; 33 33 34 34 static inline struct bsg_device *to_bsg_device(struct inode *inode) ··· 160 158 } 161 159 } 162 160 161 + static int bsg_check_uring_features(unsigned int issue_flags) 162 + { 163 + /* BSG passthrough requires big SQE/CQE support */ 164 + if ((issue_flags & (IO_URING_F_SQE128|IO_URING_F_CQE32)) != 165 + (IO_URING_F_SQE128|IO_URING_F_CQE32)) 166 + return -EOPNOTSUPP; 167 + return 0; 168 + } 169 + 170 + static int bsg_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) 171 + { 172 + struct bsg_device *bd = to_bsg_device(file_inode(ioucmd->file)); 173 + bool open_for_write = ioucmd->file->f_mode & FMODE_WRITE; 174 + struct request_queue *q = bd->queue; 175 + int ret; 176 + 177 + ret = bsg_check_uring_features(issue_flags); 178 + if (ret) 179 + return ret; 180 + 181 + if (!bd->uring_cmd_fn) 182 + return -EOPNOTSUPP; 183 + 184 + return bd->uring_cmd_fn(q, ioucmd, issue_flags, open_for_write); 185 + } 186 + 163 187 static const struct file_operations bsg_fops = { 164 188 .open = bsg_open, 165 189 .release = bsg_release, 166 190 .unlocked_ioctl = bsg_ioctl, 167 191 .compat_ioctl = compat_ptr_ioctl, 192 + .uring_cmd = bsg_uring_cmd, 168 193 .owner = THIS_MODULE, 169 194 .llseek = default_llseek, 170 195 }; ··· 216 187 EXPORT_SYMBOL_GPL(bsg_unregister_queue); 217 188 218 189 struct bsg_device *bsg_register_queue(struct request_queue *q, 219 - struct device *parent, const char *name, bsg_sg_io_fn *sg_io_fn) 190 + struct device *parent, const char *name, bsg_sg_io_fn *sg_io_fn, 191 + bsg_uring_cmd_fn *uring_cmd_fn) 220 192 { 221 193 struct bsg_device *bd; 222 194 int ret; ··· 229 199 bd->reserved_size = INT_MAX; 230 200 bd->queue = q; 231 201 bd->sg_io_fn = sg_io_fn; 202 + bd->uring_cmd_fn = uring_cmd_fn; 232 203 233 204 ret = ida_alloc_max(&bsg_minor_ida, BSG_MAX_DEVS - 1, GFP_KERNEL); 234 205 if (ret < 0) {
+2 -1
block/disk-events.c
··· 290 290 * Should be called when the media changes for @disk. Generates a uevent 291 291 * and attempts to free all dentries and inodes and invalidates all block 292 292 * device page cache entries in that case. 293 + * 294 + * Callers that need a partition re-scan should arrange for one explicitly. 293 295 */ 294 296 void disk_force_media_change(struct gendisk *disk) 295 297 { 296 298 disk_event_uevent(disk, DISK_EVENT_MEDIA_CHANGE); 297 299 inc_diskseq(disk); 298 300 bdev_mark_dead(disk->part0, true); 299 - set_bit(GD_NEED_PART_SCAN, &disk->state); 300 301 } 301 302 EXPORT_SYMBOL_GPL(disk_force_media_change); 302 303
+2 -9
block/ioctl.c
··· 153 153 nr_sects = len >> SECTOR_SHIFT; 154 154 155 155 blk_start_plug(&plug); 156 - while (1) { 157 - if (fatal_signal_pending(current)) { 158 - if (prev) 159 - bio_await_chain(prev); 160 - err = -EINTR; 161 - goto out_unplug; 162 - } 156 + while (!fatal_signal_pending(current)) { 163 157 bio = blk_alloc_discard_bio(bdev, &sector, &nr_sects, 164 158 GFP_KERNEL); 165 159 if (!bio) ··· 161 167 prev = bio_chain_and_submit(prev, bio); 162 168 } 163 169 if (prev) { 164 - err = submit_bio_wait(prev); 170 + err = bio_submit_or_kill(prev, BLKDEV_ZERO_KILLABLE); 165 171 if (err == -EOPNOTSUPP) 166 172 err = 0; 167 173 bio_put(prev); 168 174 } 169 - out_unplug: 170 175 blk_finish_plug(&plug); 171 176 fail: 172 177 filemap_invalidate_unlock(bdev->bd_mapping);
+24
block/opal_proto.h
··· 19 19 enum { 20 20 TCG_SECP_00 = 0, 21 21 TCG_SECP_01, 22 + TCG_SECP_02, 22 23 }; 23 24 24 25 /* ··· 126 125 OPAL_LOCKING_INFO_TABLE, 127 126 OPAL_ENTERPRISE_LOCKING_INFO_TABLE, 128 127 OPAL_DATASTORE, 128 + OPAL_LOCKING_TABLE, 129 129 /* C_PIN_TABLE object ID's */ 130 130 OPAL_C_PIN_MSID, 131 131 OPAL_C_PIN_SID, ··· 156 154 OPAL_AUTHENTICATE, 157 155 OPAL_RANDOM, 158 156 OPAL_ERASE, 157 + OPAL_REACTIVATE, 159 158 }; 160 159 161 160 enum opal_token { ··· 227 224 228 225 enum opal_parameter { 229 226 OPAL_SUM_SET_LIST = 0x060000, 227 + OPAL_SUM_RANGE_POLICY = 0x060001, 228 + OPAL_SUM_ADMIN1_PIN = 0x060002, 230 229 }; 231 230 232 231 enum opal_revertlsp { ··· 272 267 struct opal_compacket cp; 273 268 struct opal_packet pkt; 274 269 struct opal_data_subpacket subpkt; 270 + }; 271 + 272 + /* 273 + * TCG_Storage_Architecture_Core_Spec_v2.01_r1.00 274 + * Section: 3.3.4.7.5 STACK_RESET 275 + */ 276 + #define OPAL_STACK_RESET 0x0002 277 + 278 + struct opal_stack_reset { 279 + u8 extendedComID[4]; 280 + __be32 request_code; 281 + }; 282 + 283 + struct opal_stack_reset_response { 284 + u8 extendedComID[4]; 285 + __be32 request_code; 286 + u8 reserved0[2]; 287 + __be16 data_length; 288 + __be32 response; 275 289 }; 276 290 277 291 #define FC_TPER 0x0001
+14 -18
block/partitions/acorn.c
··· 40 40 (le32_to_cpu(dr->disc_size) >> 9); 41 41 42 42 if (name) { 43 - strlcat(state->pp_buf, " [", PAGE_SIZE); 44 - strlcat(state->pp_buf, name, PAGE_SIZE); 45 - strlcat(state->pp_buf, "]", PAGE_SIZE); 43 + seq_buf_printf(&state->pp_buf, " [%s]", name); 46 44 } 47 45 put_partition(state, slot, first_sector, nr_sects); 48 46 return dr; ··· 76 78 if (!rr) 77 79 return -1; 78 80 79 - strlcat(state->pp_buf, " [RISCiX]", PAGE_SIZE); 81 + seq_buf_puts(&state->pp_buf, " [RISCiX]"); 80 82 81 83 82 84 if (rr->magic == RISCIX_MAGIC) { 83 85 unsigned long size = nr_sects > 2 ? 2 : nr_sects; 84 86 int part; 85 87 86 - strlcat(state->pp_buf, " <", PAGE_SIZE); 88 + seq_buf_puts(&state->pp_buf, " <"); 87 89 88 90 put_partition(state, slot++, first_sect, size); 89 91 for (part = 0; part < 8; part++) { ··· 92 94 put_partition(state, slot++, 93 95 le32_to_cpu(rr->part[part].start), 94 96 le32_to_cpu(rr->part[part].length)); 95 - strlcat(state->pp_buf, "(", PAGE_SIZE); 96 - strlcat(state->pp_buf, rr->part[part].name, PAGE_SIZE); 97 - strlcat(state->pp_buf, ")", PAGE_SIZE); 97 + seq_buf_printf(&state->pp_buf, "(%s)", rr->part[part].name); 98 98 } 99 99 } 100 100 101 - strlcat(state->pp_buf, " >\n", PAGE_SIZE); 101 + seq_buf_puts(&state->pp_buf, " >\n"); 102 102 } else { 103 103 put_partition(state, slot++, first_sect, nr_sects); 104 104 } ··· 126 130 struct linux_part *linuxp; 127 131 unsigned long size = nr_sects > 2 ? 2 : nr_sects; 128 132 129 - strlcat(state->pp_buf, " [Linux]", PAGE_SIZE); 133 + seq_buf_puts(&state->pp_buf, " [Linux]"); 130 134 131 135 put_partition(state, slot++, first_sect, size); 132 136 ··· 134 138 if (!linuxp) 135 139 return -1; 136 140 137 - strlcat(state->pp_buf, " <", PAGE_SIZE); 141 + seq_buf_puts(&state->pp_buf, " <"); 138 142 while (linuxp->magic == cpu_to_le32(LINUX_NATIVE_MAGIC) || 139 143 linuxp->magic == cpu_to_le32(LINUX_SWAP_MAGIC)) { 140 144 if (slot == state->limit) ··· 144 148 le32_to_cpu(linuxp->nr_sects)); 145 149 linuxp ++; 146 150 } 147 - strlcat(state->pp_buf, " >", PAGE_SIZE); 151 + seq_buf_puts(&state->pp_buf, " >"); 148 152 149 153 put_dev_sector(sect); 150 154 return slot; ··· 289 293 break; 290 294 } 291 295 } 292 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 296 + seq_buf_puts(&state->pp_buf, "\n"); 293 297 return 1; 294 298 } 295 299 #endif ··· 362 366 return 0; 363 367 } 364 368 365 - strlcat(state->pp_buf, " [ICS]", PAGE_SIZE); 369 + seq_buf_puts(&state->pp_buf, " [ICS]"); 366 370 367 371 for (slot = 1, p = (const struct ics_part *)data; p->size; p++) { 368 372 u32 start = le32_to_cpu(p->start); ··· 396 400 } 397 401 398 402 put_dev_sector(sect); 399 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 403 + seq_buf_puts(&state->pp_buf, "\n"); 400 404 return 1; 401 405 } 402 406 #endif ··· 456 460 return 0; 457 461 } 458 462 459 - strlcat(state->pp_buf, " [POWERTEC]", PAGE_SIZE); 463 + seq_buf_puts(&state->pp_buf, " [POWERTEC]"); 460 464 461 465 for (i = 0, p = (const struct ptec_part *)data; i < 12; i++, p++) { 462 466 u32 start = le32_to_cpu(p->start); ··· 467 471 } 468 472 469 473 put_dev_sector(sect); 470 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 474 + seq_buf_puts(&state->pp_buf, "\n"); 471 475 return 1; 472 476 } 473 477 #endif ··· 538 542 539 543 size = get_capacity(state->disk); 540 544 put_partition(state, slot++, start, size - start); 541 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 545 + seq_buf_puts(&state->pp_buf, "\n"); 542 546 } 543 547 544 548 return i ? 1 : 0;
+8 -13
block/partitions/aix.c
··· 173 173 if (d) { 174 174 struct lvm_rec *p = (struct lvm_rec *)d; 175 175 u16 lvm_version = be16_to_cpu(p->version); 176 - char tmp[64]; 177 176 178 177 if (lvm_version == 1) { 179 178 int pp_size_log2 = be16_to_cpu(p->pp_size); 180 179 181 180 pp_bytes_size = 1 << pp_size_log2; 182 181 pp_blocks_size = pp_bytes_size / 512; 183 - snprintf(tmp, sizeof(tmp), 184 - " AIX LVM header version %u found\n", 185 - lvm_version); 182 + seq_buf_printf(&state->pp_buf, 183 + " AIX LVM header version %u found\n", 184 + lvm_version); 186 185 vgda_len = be32_to_cpu(p->vgda_len); 187 186 vgda_sector = be32_to_cpu(p->vgda_psn[0]); 188 187 } else { 189 - snprintf(tmp, sizeof(tmp), 190 - " unsupported AIX LVM version %d found\n", 191 - lvm_version); 188 + seq_buf_printf(&state->pp_buf, 189 + " unsupported AIX LVM version %d found\n", 190 + lvm_version); 192 191 } 193 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 194 192 put_dev_sector(sect); 195 193 } 196 194 if (vgda_sector && (d = read_part_sector(state, vgda_sector, &sect))) { ··· 249 251 continue; 250 252 } 251 253 if (lp_ix == lvip[lv_ix].pps_per_lv) { 252 - char tmp[70]; 253 - 254 254 put_partition(state, lv_ix + 1, 255 255 (i + 1 - lp_ix) * pp_blocks_size + psn_part1, 256 256 lvip[lv_ix].pps_per_lv * pp_blocks_size); 257 - snprintf(tmp, sizeof(tmp), " <%s>\n", 258 - n[lv_ix].name); 259 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 257 + seq_buf_printf(&state->pp_buf, " <%s>\n", 258 + n[lv_ix].name); 260 259 lvip[lv_ix].lv_is_contiguous = 1; 261 260 ret = 1; 262 261 next_lp_ix = 1;
+15 -20
block/partitions/amiga.c
··· 81 81 /* blksize is blocks per 512 byte standard block */ 82 82 blksize = be32_to_cpu( rdb->rdb_BlockBytes ) / 512; 83 83 84 - { 85 - char tmp[7 + 10 + 1 + 1]; 86 - 87 - /* Be more informative */ 88 - snprintf(tmp, sizeof(tmp), " RDSK (%d)", blksize * 512); 89 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 90 - } 84 + /* Be more informative */ 85 + seq_buf_printf(&state->pp_buf, " RDSK (%d)", blksize * 512); 91 86 blk = be32_to_cpu(rdb->rdb_PartitionList); 92 87 put_dev_sector(sect); 93 88 for (part = 1; (s32) blk>0 && part<=16; part++, put_dev_sector(sect)) { ··· 174 179 { 175 180 /* Be even more informative to aid mounting */ 176 181 char dostype[4]; 177 - char tmp[42]; 178 182 179 183 __be32 *dt = (__be32 *)dostype; 180 184 *dt = pb->pb_Environment[16]; 181 185 if (dostype[3] < ' ') 182 - snprintf(tmp, sizeof(tmp), " (%c%c%c^%c)", 183 - dostype[0], dostype[1], 184 - dostype[2], dostype[3] + '@' ); 186 + seq_buf_printf(&state->pp_buf, 187 + " (%c%c%c^%c)", 188 + dostype[0], dostype[1], 189 + dostype[2], 190 + dostype[3] + '@'); 185 191 else 186 - snprintf(tmp, sizeof(tmp), " (%c%c%c%c)", 187 - dostype[0], dostype[1], 188 - dostype[2], dostype[3]); 189 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 190 - snprintf(tmp, sizeof(tmp), "(res %d spb %d)", 191 - be32_to_cpu(pb->pb_Environment[6]), 192 - be32_to_cpu(pb->pb_Environment[4])); 193 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 192 + seq_buf_printf(&state->pp_buf, 193 + " (%c%c%c%c)", 194 + dostype[0], dostype[1], 195 + dostype[2], dostype[3]); 196 + seq_buf_printf(&state->pp_buf, "(res %d spb %d)", 197 + be32_to_cpu(pb->pb_Environment[6]), 198 + be32_to_cpu(pb->pb_Environment[4])); 194 199 } 195 200 res = 1; 196 201 } 197 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 202 + seq_buf_puts(&state->pp_buf, "\n"); 198 203 199 204 rdb_done: 200 205 return res;
+6 -6
block/partitions/atari.c
··· 70 70 } 71 71 72 72 pi = &rs->part[0]; 73 - strlcat(state->pp_buf, " AHDI", PAGE_SIZE); 73 + seq_buf_puts(&state->pp_buf, " AHDI"); 74 74 for (slot = 1; pi < &rs->part[4] && slot < state->limit; slot++, pi++) { 75 75 struct rootsector *xrs; 76 76 Sector sect2; ··· 89 89 #ifdef ICD_PARTS 90 90 part_fmt = 1; 91 91 #endif 92 - strlcat(state->pp_buf, " XGM<", PAGE_SIZE); 92 + seq_buf_puts(&state->pp_buf, " XGM<"); 93 93 partsect = extensect = be32_to_cpu(pi->st); 94 94 while (1) { 95 95 xrs = read_part_sector(state, partsect, &sect2); ··· 128 128 break; 129 129 } 130 130 } 131 - strlcat(state->pp_buf, " >", PAGE_SIZE); 131 + seq_buf_puts(&state->pp_buf, " >"); 132 132 } 133 133 #ifdef ICD_PARTS 134 134 if ( part_fmt!=1 ) { /* no extended partitions -> test ICD-format */ 135 135 pi = &rs->icdpart[0]; 136 136 /* sanity check: no ICD format if first partition invalid */ 137 137 if (OK_id(pi->id)) { 138 - strlcat(state->pp_buf, " ICD<", PAGE_SIZE); 138 + seq_buf_puts(&state->pp_buf, " ICD<"); 139 139 for (; pi < &rs->icdpart[8] && slot < state->limit; slot++, pi++) { 140 140 /* accept only GEM,BGM,RAW,LNX,SWP partitions */ 141 141 if (!((pi->flg & 1) && OK_id(pi->id))) ··· 144 144 be32_to_cpu(pi->st), 145 145 be32_to_cpu(pi->siz)); 146 146 } 147 - strlcat(state->pp_buf, " >", PAGE_SIZE); 147 + seq_buf_puts(&state->pp_buf, " >"); 148 148 } 149 149 } 150 150 #endif 151 151 put_dev_sector(sect); 152 152 153 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 153 + seq_buf_puts(&state->pp_buf, "\n"); 154 154 155 155 return 1; 156 156 }
+3 -5
block/partitions/check.h
··· 1 1 /* SPDX-License-Identifier: GPL-2.0 */ 2 2 #include <linux/pagemap.h> 3 3 #include <linux/blkdev.h> 4 + #include <linux/seq_buf.h> 4 5 #include "../blk.h" 5 6 6 7 /* ··· 21 20 int next; 22 21 int limit; 23 22 bool access_beyond_eod; 24 - char *pp_buf; 23 + struct seq_buf pp_buf; 25 24 }; 26 25 27 26 typedef struct { ··· 38 37 put_partition(struct parsed_partitions *p, int n, sector_t from, sector_t size) 39 38 { 40 39 if (n < p->limit) { 41 - char tmp[1 + BDEVNAME_SIZE + 10 + 1]; 42 - 43 40 p->parts[n].from = from; 44 41 p->parts[n].size = size; 45 - snprintf(tmp, sizeof(tmp), " %s%d", p->name, n); 46 - strlcat(p->pp_buf, tmp, PAGE_SIZE); 42 + seq_buf_printf(&p->pp_buf, " %s%d", p->name, n); 47 43 } 48 44 } 49 45
+2 -4
block/partitions/cmdline.c
··· 229 229 struct parsed_partitions *state) 230 230 { 231 231 struct partition_meta_info *info; 232 - char tmp[sizeof(info->volname) + 4]; 233 232 234 233 if (slot >= state->limit) 235 234 return 1; ··· 243 244 244 245 strscpy(info->volname, subpart->name, sizeof(info->volname)); 245 246 246 - snprintf(tmp, sizeof(tmp), "(%s)", info->volname); 247 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 247 + seq_buf_printf(&state->pp_buf, "(%s)", info->volname); 248 248 249 249 state->parts[slot].has_info = true; 250 250 ··· 377 379 cmdline_parts_set(parts, disk_size, state); 378 380 cmdline_parts_verifier(1, state); 379 381 380 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 382 + seq_buf_puts(&state->pp_buf, "\n"); 381 383 382 384 return 1; 383 385 }
+16 -15
block/partitions/core.c
··· 8 8 #include <linux/major.h> 9 9 #include <linux/slab.h> 10 10 #include <linux/string.h> 11 + #include <linux/sysfs.h> 11 12 #include <linux/ctype.h> 12 13 #include <linux/vmalloc.h> 13 14 #include <linux/raid/detect.h> ··· 124 123 state = allocate_partitions(hd); 125 124 if (!state) 126 125 return NULL; 127 - state->pp_buf = (char *)__get_free_page(GFP_KERNEL); 128 - if (!state->pp_buf) { 126 + state->pp_buf.buffer = (char *)__get_free_page(GFP_KERNEL); 127 + if (!state->pp_buf.buffer) { 129 128 free_partitions(state); 130 129 return NULL; 131 130 } 132 - state->pp_buf[0] = '\0'; 131 + seq_buf_init(&state->pp_buf, state->pp_buf.buffer, PAGE_SIZE); 133 132 134 133 state->disk = hd; 135 134 strscpy(state->name, hd->disk_name); 136 - snprintf(state->pp_buf, PAGE_SIZE, " %s:", state->name); 135 + seq_buf_printf(&state->pp_buf, " %s:", state->name); 137 136 if (isdigit(state->name[strlen(state->name)-1])) 138 137 sprintf(state->name, "p"); 139 138 ··· 152 151 153 152 } 154 153 if (res > 0) { 155 - printk(KERN_INFO "%s", state->pp_buf); 154 + printk(KERN_INFO "%s", seq_buf_str(&state->pp_buf)); 156 155 157 - free_page((unsigned long)state->pp_buf); 156 + free_page((unsigned long)state->pp_buf.buffer); 158 157 return state; 159 158 } 160 159 if (state->access_beyond_eod) ··· 165 164 if (err) 166 165 res = err; 167 166 if (res) { 168 - strlcat(state->pp_buf, 169 - " unable to read partition table\n", PAGE_SIZE); 170 - printk(KERN_INFO "%s", state->pp_buf); 167 + seq_buf_puts(&state->pp_buf, 168 + " unable to read partition table\n"); 169 + printk(KERN_INFO "%s", seq_buf_str(&state->pp_buf)); 171 170 } 172 171 173 - free_page((unsigned long)state->pp_buf); 172 + free_page((unsigned long)state->pp_buf.buffer); 174 173 free_partitions(state); 175 174 return ERR_PTR(res); 176 175 } ··· 178 177 static ssize_t part_partition_show(struct device *dev, 179 178 struct device_attribute *attr, char *buf) 180 179 { 181 - return sprintf(buf, "%d\n", bdev_partno(dev_to_bdev(dev))); 180 + return sysfs_emit(buf, "%d\n", bdev_partno(dev_to_bdev(dev))); 182 181 } 183 182 184 183 static ssize_t part_start_show(struct device *dev, 185 184 struct device_attribute *attr, char *buf) 186 185 { 187 - return sprintf(buf, "%llu\n", dev_to_bdev(dev)->bd_start_sect); 186 + return sysfs_emit(buf, "%llu\n", dev_to_bdev(dev)->bd_start_sect); 188 187 } 189 188 190 189 static ssize_t part_ro_show(struct device *dev, 191 190 struct device_attribute *attr, char *buf) 192 191 { 193 - return sprintf(buf, "%d\n", bdev_read_only(dev_to_bdev(dev))); 192 + return sysfs_emit(buf, "%d\n", bdev_read_only(dev_to_bdev(dev))); 194 193 } 195 194 196 195 static ssize_t part_alignment_offset_show(struct device *dev, 197 196 struct device_attribute *attr, char *buf) 198 197 { 199 - return sprintf(buf, "%u\n", bdev_alignment_offset(dev_to_bdev(dev))); 198 + return sysfs_emit(buf, "%u\n", bdev_alignment_offset(dev_to_bdev(dev))); 200 199 } 201 200 202 201 static ssize_t part_discard_alignment_show(struct device *dev, 203 202 struct device_attribute *attr, char *buf) 204 203 { 205 - return sprintf(buf, "%u\n", bdev_discard_alignment(dev_to_bdev(dev))); 204 + return sysfs_emit(buf, "%u\n", bdev_discard_alignment(dev_to_bdev(dev))); 206 205 } 207 206 208 207 static DEVICE_ATTR(partition, 0444, part_partition_show, NULL);
+1 -1
block/partitions/efi.c
··· 751 751 } 752 752 kfree(ptes); 753 753 kfree(gpt); 754 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 754 + seq_buf_puts(&state->pp_buf, "\n"); 755 755 return 1; 756 756 }
+10 -17
block/partitions/ibm.c
··· 173 173 { 174 174 sector_t blk; 175 175 int counter; 176 - char tmp[64]; 177 176 Sector sect; 178 177 unsigned char *data; 179 178 loff_t offset, size; 180 179 struct vtoc_format1_label f1; 181 180 int secperblk; 182 181 183 - snprintf(tmp, sizeof(tmp), "VOL1/%8s:", name); 184 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 182 + seq_buf_printf(&state->pp_buf, "VOL1/%8s:", name); 185 183 /* 186 184 * get start of VTOC from the disk label and then search for format1 187 185 * and format8 labels ··· 217 219 blk++; 218 220 data = read_part_sector(state, blk * secperblk, &sect); 219 221 } 220 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 222 + seq_buf_puts(&state->pp_buf, "\n"); 221 223 222 224 if (!data) 223 225 return -1; ··· 235 237 dasd_information2_t *info) 236 238 { 237 239 loff_t offset, geo_size, size; 238 - char tmp[64]; 239 240 int secperblk; 240 241 241 - snprintf(tmp, sizeof(tmp), "LNX1/%8s:", name); 242 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 242 + seq_buf_printf(&state->pp_buf, "LNX1/%8s:", name); 243 243 secperblk = blocksize >> 9; 244 244 if (label->lnx.ldl_version == 0xf2) { 245 245 size = label->lnx.formatted_blocks * secperblk; ··· 254 258 size = nr_sectors; 255 259 if (size != geo_size) { 256 260 if (!info) { 257 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 261 + seq_buf_puts(&state->pp_buf, "\n"); 258 262 return 1; 259 263 } 260 264 if (!strcmp(info->type, "ECKD")) ··· 266 270 /* first and only partition starts in the first block after the label */ 267 271 offset = labelsect + secperblk; 268 272 put_partition(state, 1, offset, size - offset); 269 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 273 + seq_buf_puts(&state->pp_buf, "\n"); 270 274 return 1; 271 275 } 272 276 ··· 278 282 sector_t labelsect) 279 283 { 280 284 loff_t offset, size; 281 - char tmp[64]; 282 285 int secperblk; 283 286 284 287 /* ··· 286 291 blocksize = label->cms.block_size; 287 292 secperblk = blocksize >> 9; 288 293 if (label->cms.disk_offset != 0) { 289 - snprintf(tmp, sizeof(tmp), "CMS1/%8s(MDSK):", name); 290 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 294 + seq_buf_printf(&state->pp_buf, "CMS1/%8s(MDSK):", name); 291 295 /* disk is reserved minidisk */ 292 296 offset = label->cms.disk_offset * secperblk; 293 297 size = (label->cms.block_count - 1) * secperblk; 294 298 } else { 295 - snprintf(tmp, sizeof(tmp), "CMS1/%8s:", name); 296 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 299 + seq_buf_printf(&state->pp_buf, "CMS1/%8s:", name); 297 300 /* 298 301 * Special case for FBA devices: 299 302 * If an FBA device is CMS formatted with blocksize > 512 byte ··· 307 314 } 308 315 309 316 put_partition(state, 1, offset, size-offset); 310 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 317 + seq_buf_puts(&state->pp_buf, "\n"); 311 318 return 1; 312 319 } 313 320 ··· 384 391 */ 385 392 res = 1; 386 393 if (info->format == DASD_FORMAT_LDL) { 387 - strlcat(state->pp_buf, "(nonl)", PAGE_SIZE); 394 + seq_buf_puts(&state->pp_buf, "(nonl)"); 388 395 size = nr_sectors; 389 396 offset = (info->label_block + 1) * (blocksize >> 9); 390 397 put_partition(state, 1, offset, size-offset); 391 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 398 + seq_buf_puts(&state->pp_buf, "\n"); 392 399 } 393 400 } else 394 401 res = 0;
+1 -1
block/partitions/karma.c
··· 53 53 } 54 54 slot++; 55 55 } 56 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 56 + seq_buf_puts(&state->pp_buf, "\n"); 57 57 put_dev_sector(sect); 58 58 return 1; 59 59 }
+2 -2
block/partitions/ldm.c
··· 582 582 return false; 583 583 } 584 584 585 - strlcat(pp->pp_buf, " [LDM]", PAGE_SIZE); 585 + seq_buf_puts(&pp->pp_buf, " [LDM]"); 586 586 587 587 /* Create the data partitions */ 588 588 list_for_each (item, &ldb->v_part) { ··· 597 597 part_num++; 598 598 } 599 599 600 - strlcat(pp->pp_buf, "\n", PAGE_SIZE); 600 + seq_buf_puts(&pp->pp_buf, "\n"); 601 601 return true; 602 602 } 603 603
+2 -2
block/partitions/mac.c
··· 86 86 if (blocks_in_map >= state->limit) 87 87 blocks_in_map = state->limit - 1; 88 88 89 - strlcat(state->pp_buf, " [mac]", PAGE_SIZE); 89 + seq_buf_puts(&state->pp_buf, " [mac]"); 90 90 for (slot = 1; slot <= blocks_in_map; ++slot) { 91 91 int pos = slot * secsize; 92 92 put_dev_sector(sect); ··· 152 152 #endif 153 153 154 154 put_dev_sector(sect); 155 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 155 + seq_buf_puts(&state->pp_buf, "\n"); 156 156 return 1; 157 157 }
+23 -44
block/partitions/msdos.c
··· 263 263 put_dev_sector(sect); 264 264 return; 265 265 } 266 - { 267 - char tmp[1 + BDEVNAME_SIZE + 10 + 11 + 1]; 268 - 269 - snprintf(tmp, sizeof(tmp), " %s%d: <solaris:", state->name, origin); 270 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 271 - } 266 + seq_buf_printf(&state->pp_buf, " %s%d: <solaris:", state->name, origin); 272 267 if (le32_to_cpu(v->v_version) != 1) { 273 - char tmp[64]; 274 - 275 - snprintf(tmp, sizeof(tmp), " cannot handle version %d vtoc>\n", 276 - le32_to_cpu(v->v_version)); 277 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 268 + seq_buf_printf(&state->pp_buf, 269 + " cannot handle version %d vtoc>\n", 270 + le32_to_cpu(v->v_version)); 278 271 put_dev_sector(sect); 279 272 return; 280 273 } ··· 275 282 max_nparts = le16_to_cpu(v->v_nparts) > 8 ? SOLARIS_X86_NUMSLICE : 8; 276 283 for (i = 0; i < max_nparts && state->next < state->limit; i++) { 277 284 struct solaris_x86_slice *s = &v->v_slice[i]; 278 - char tmp[3 + 10 + 1 + 1]; 279 285 280 286 if (s->s_size == 0) 281 287 continue; 282 - snprintf(tmp, sizeof(tmp), " [s%d]", i); 283 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 288 + seq_buf_printf(&state->pp_buf, " [s%d]", i); 284 289 /* solaris partitions are relative to current MS-DOS 285 290 * one; must add the offset of the current partition */ 286 291 put_partition(state, state->next++, ··· 286 295 le32_to_cpu(s->s_size)); 287 296 } 288 297 put_dev_sector(sect); 289 - strlcat(state->pp_buf, " >\n", PAGE_SIZE); 298 + seq_buf_puts(&state->pp_buf, " >\n"); 290 299 #endif 291 300 } 292 301 ··· 350 359 Sector sect; 351 360 struct bsd_disklabel *l; 352 361 struct bsd_partition *p; 353 - char tmp[64]; 354 362 355 363 l = read_part_sector(state, offset + 1, &sect); 356 364 if (!l) ··· 359 369 return; 360 370 } 361 371 362 - snprintf(tmp, sizeof(tmp), " %s%d: <%s:", state->name, origin, flavour); 363 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 372 + seq_buf_printf(&state->pp_buf, " %s%d: <%s:", state->name, origin, flavour); 364 373 365 374 if (le16_to_cpu(l->d_npartitions) < max_partitions) 366 375 max_partitions = le16_to_cpu(l->d_npartitions); ··· 380 391 /* full parent partition, we have it already */ 381 392 continue; 382 393 if (offset > bsd_start || offset+size < bsd_start+bsd_size) { 383 - strlcat(state->pp_buf, "bad subpartition - ignored\n", PAGE_SIZE); 394 + seq_buf_puts(&state->pp_buf, "bad subpartition - ignored\n"); 384 395 continue; 385 396 } 386 397 put_partition(state, state->next++, bsd_start, bsd_size); 387 398 } 388 399 put_dev_sector(sect); 389 - if (le16_to_cpu(l->d_npartitions) > max_partitions) { 390 - snprintf(tmp, sizeof(tmp), " (ignored %d more)", 391 - le16_to_cpu(l->d_npartitions) - max_partitions); 392 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 393 - } 394 - strlcat(state->pp_buf, " >\n", PAGE_SIZE); 400 + if (le16_to_cpu(l->d_npartitions) > max_partitions) 401 + seq_buf_printf(&state->pp_buf, " (ignored %d more)", 402 + le16_to_cpu(l->d_npartitions) - max_partitions); 403 + seq_buf_puts(&state->pp_buf, " >\n"); 395 404 } 396 405 #endif 397 406 ··· 483 496 put_dev_sector(sect); 484 497 return; 485 498 } 486 - { 487 - char tmp[1 + BDEVNAME_SIZE + 10 + 12 + 1]; 488 - 489 - snprintf(tmp, sizeof(tmp), " %s%d: <unixware:", state->name, origin); 490 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 491 - } 499 + seq_buf_printf(&state->pp_buf, " %s%d: <unixware:", state->name, origin); 492 500 p = &l->vtoc.v_slice[1]; 493 501 /* I omit the 0th slice as it is the same as whole disk. */ 494 502 while (p - &l->vtoc.v_slice[0] < UNIXWARE_NUMSLICE) { ··· 497 515 p++; 498 516 } 499 517 put_dev_sector(sect); 500 - strlcat(state->pp_buf, " >\n", PAGE_SIZE); 518 + seq_buf_puts(&state->pp_buf, " >\n"); 501 519 #endif 502 520 } 503 521 ··· 528 546 * the normal boot sector. */ 529 547 if (msdos_magic_present(data + 510) && 530 548 p->sys_ind == MINIX_PARTITION) { /* subpartition table present */ 531 - char tmp[1 + BDEVNAME_SIZE + 10 + 9 + 1]; 532 - 533 - snprintf(tmp, sizeof(tmp), " %s%d: <minix:", state->name, origin); 534 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 549 + seq_buf_printf(&state->pp_buf, " %s%d: <minix:", state->name, origin); 535 550 for (i = 0; i < MINIX_NR_SUBPARTITIONS; i++, p++) { 536 551 if (state->next == state->limit) 537 552 break; ··· 537 558 put_partition(state, state->next++, 538 559 start_sect(p), nr_sects(p)); 539 560 } 540 - strlcat(state->pp_buf, " >\n", PAGE_SIZE); 561 + seq_buf_puts(&state->pp_buf, " >\n"); 541 562 } 542 563 put_dev_sector(sect); 543 564 #endif /* CONFIG_MINIX_SUBPARTITION */ ··· 581 602 #ifdef CONFIG_AIX_PARTITION 582 603 return aix_partition(state); 583 604 #else 584 - strlcat(state->pp_buf, " [AIX]", PAGE_SIZE); 605 + seq_buf_puts(&state->pp_buf, " [AIX]"); 585 606 return 0; 586 607 #endif 587 608 } ··· 608 629 fb = (struct fat_boot_sector *) data; 609 630 if (slot == 1 && fb->reserved && fb->fats 610 631 && fat_valid_media(fb->media)) { 611 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 632 + seq_buf_puts(&state->pp_buf, "\n"); 612 633 put_dev_sector(sect); 613 634 return 1; 614 635 } else { ··· 657 678 n = min(size, max(sector_size, n)); 658 679 put_partition(state, slot, start, n); 659 680 660 - strlcat(state->pp_buf, " <", PAGE_SIZE); 681 + seq_buf_puts(&state->pp_buf, " <"); 661 682 parse_extended(state, start, size, disksig); 662 - strlcat(state->pp_buf, " >", PAGE_SIZE); 683 + seq_buf_puts(&state->pp_buf, " >"); 663 684 continue; 664 685 } 665 686 put_partition(state, slot, start, size); ··· 667 688 if (p->sys_ind == LINUX_RAID_PARTITION) 668 689 state->parts[slot].flags = ADDPART_FLAG_RAID; 669 690 if (p->sys_ind == DM6_PARTITION) 670 - strlcat(state->pp_buf, "[DM]", PAGE_SIZE); 691 + seq_buf_puts(&state->pp_buf, "[DM]"); 671 692 if (p->sys_ind == EZD_PARTITION) 672 - strlcat(state->pp_buf, "[EZD]", PAGE_SIZE); 693 + seq_buf_puts(&state->pp_buf, "[EZD]"); 673 694 } 674 695 675 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 696 + seq_buf_puts(&state->pp_buf, "\n"); 676 697 677 698 /* second pass - output for each on a separate line */ 678 699 p = (struct msdos_partition *) (0x1be + data);
+2 -4
block/partitions/of.c
··· 36 36 struct device_node *np) 37 37 { 38 38 struct partition_meta_info *info; 39 - char tmp[sizeof(info->volname) + 4]; 40 39 const char *partname; 41 40 int len; 42 41 ··· 62 63 partname = of_get_property(np, "name", &len); 63 64 strscpy(info->volname, partname, sizeof(info->volname)); 64 65 65 - snprintf(tmp, sizeof(tmp), "(%s)", info->volname); 66 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 66 + seq_buf_printf(&state->pp_buf, "(%s)", info->volname); 67 67 } 68 68 69 69 int of_partition(struct parsed_partitions *state) ··· 102 104 slot++; 103 105 } 104 106 105 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 107 + seq_buf_puts(&state->pp_buf, "\n"); 106 108 107 109 return 1; 108 110 }
+1 -1
block/partitions/osf.c
··· 81 81 le32_to_cpu(partition->p_size)); 82 82 slot++; 83 83 } 84 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 84 + seq_buf_puts(&state->pp_buf, "\n"); 85 85 put_dev_sector(sect); 86 86 return 1; 87 87 }
+1 -1
block/partitions/sgi.c
··· 79 79 } 80 80 slot++; 81 81 } 82 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 82 + seq_buf_puts(&state->pp_buf, "\n"); 83 83 put_dev_sector(sect); 84 84 return 1; 85 85 }
+1 -1
block/partitions/sun.c
··· 121 121 } 122 122 slot++; 123 123 } 124 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 124 + seq_buf_puts(&state->pp_buf, "\n"); 125 125 put_dev_sector(sect); 126 126 return 1; 127 127 }
+3 -6
block/partitions/sysv68.c
··· 54 54 unsigned char *data; 55 55 struct dkblk0 *b; 56 56 struct slice *slice; 57 - char tmp[64]; 58 57 59 58 data = read_part_sector(state, 0, &sect); 60 59 if (!data) ··· 73 74 return -1; 74 75 75 76 slices -= 1; /* last slice is the whole disk */ 76 - snprintf(tmp, sizeof(tmp), "sysV68: %s(s%u)", state->name, slices); 77 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 77 + seq_buf_printf(&state->pp_buf, "sysV68: %s(s%u)", state->name, slices); 78 78 slice = (struct slice *)data; 79 79 for (i = 0; i < slices; i++, slice++) { 80 80 if (slot == state->limit) ··· 82 84 put_partition(state, slot, 83 85 be32_to_cpu(slice->blkoff), 84 86 be32_to_cpu(slice->nblocks)); 85 - snprintf(tmp, sizeof(tmp), "(s%u)", i); 86 - strlcat(state->pp_buf, tmp, PAGE_SIZE); 87 + seq_buf_printf(&state->pp_buf, "(s%u)", i); 87 88 } 88 89 slot++; 89 90 } 90 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 91 + seq_buf_puts(&state->pp_buf, "\n"); 91 92 put_dev_sector(sect); 92 93 return 1; 93 94 }
+1 -1
block/partitions/ultrix.c
··· 39 39 label->pt_part[i].pi_blkoff, 40 40 label->pt_part[i].pi_nblocks); 41 41 put_dev_sector(sect); 42 - strlcat(state->pp_buf, "\n", PAGE_SIZE); 42 + seq_buf_puts(&state->pp_buf, "\n"); 43 43 return 1; 44 44 } else { 45 45 put_dev_sector(sect);
+411 -35
block/sed-opal.c
··· 160 160 { 0x00, 0x00, 0x08, 0x01, 0x00, 0x00, 0x00, 0x00 }, 161 161 [OPAL_DATASTORE] = 162 162 { 0x00, 0x00, 0x10, 0x01, 0x00, 0x00, 0x00, 0x00 }, 163 + [OPAL_LOCKING_TABLE] = 164 + { 0x00, 0x00, 0x08, 0x02, 0x00, 0x00, 0x00, 0x00 }, 163 165 164 166 /* C_PIN_TABLE object ID's */ 165 167 [OPAL_C_PIN_MSID] = ··· 220 218 { 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x06, 0x01 }, 221 219 [OPAL_ERASE] = 222 220 { 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x08, 0x03 }, 221 + [OPAL_REACTIVATE] = 222 + { 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x08, 0x01 }, 223 223 }; 224 224 225 225 static int end_opal_session_error(struct opal_dev *dev); ··· 1518 1514 return err; 1519 1515 } 1520 1516 1521 - static int setup_locking_range(struct opal_dev *dev, void *data) 1517 + static int setup_enable_range(struct opal_dev *dev, void *data) 1522 1518 { 1523 1519 u8 uid[OPAL_UID_LENGTH]; 1524 1520 struct opal_user_lr_setup *setup = data; ··· 1532 1528 1533 1529 if (lr == 0) 1534 1530 err = enable_global_lr(dev, uid, setup); 1535 - else { 1536 - err = cmd_start(dev, uid, opalmethod[OPAL_SET]); 1537 - 1538 - add_token_u8(&err, dev, OPAL_STARTNAME); 1539 - add_token_u8(&err, dev, OPAL_VALUES); 1540 - add_token_u8(&err, dev, OPAL_STARTLIST); 1541 - 1542 - add_token_u8(&err, dev, OPAL_STARTNAME); 1543 - add_token_u8(&err, dev, OPAL_RANGESTART); 1544 - add_token_u64(&err, dev, setup->range_start); 1545 - add_token_u8(&err, dev, OPAL_ENDNAME); 1546 - 1547 - add_token_u8(&err, dev, OPAL_STARTNAME); 1548 - add_token_u8(&err, dev, OPAL_RANGELENGTH); 1549 - add_token_u64(&err, dev, setup->range_length); 1550 - add_token_u8(&err, dev, OPAL_ENDNAME); 1551 - 1552 - add_token_u8(&err, dev, OPAL_STARTNAME); 1553 - add_token_u8(&err, dev, OPAL_READLOCKENABLED); 1554 - add_token_u64(&err, dev, !!setup->RLE); 1555 - add_token_u8(&err, dev, OPAL_ENDNAME); 1556 - 1557 - add_token_u8(&err, dev, OPAL_STARTNAME); 1558 - add_token_u8(&err, dev, OPAL_WRITELOCKENABLED); 1559 - add_token_u64(&err, dev, !!setup->WLE); 1560 - add_token_u8(&err, dev, OPAL_ENDNAME); 1561 - 1562 - add_token_u8(&err, dev, OPAL_ENDLIST); 1563 - add_token_u8(&err, dev, OPAL_ENDNAME); 1564 - } 1531 + else 1532 + err = generic_lr_enable_disable(dev, uid, !!setup->RLE, !!setup->WLE, 0, 0); 1565 1533 if (err) { 1566 - pr_debug("Error building Setup Locking range command.\n"); 1534 + pr_debug("Failed to create enable lr command.\n"); 1535 + return err; 1536 + } 1537 + 1538 + return finalize_and_send(dev, parse_and_check_status); 1539 + } 1540 + 1541 + static int setup_locking_range_start_length(struct opal_dev *dev, void *data) 1542 + { 1543 + int err; 1544 + u8 uid[OPAL_UID_LENGTH]; 1545 + struct opal_user_lr_setup *setup = data; 1546 + 1547 + err = build_locking_range(uid, sizeof(uid), setup->session.opal_key.lr); 1548 + if (err) 1549 + return err; 1550 + 1551 + err = cmd_start(dev, uid, opalmethod[OPAL_SET]); 1552 + 1553 + add_token_u8(&err, dev, OPAL_STARTNAME); 1554 + add_token_u8(&err, dev, OPAL_VALUES); 1555 + add_token_u8(&err, dev, OPAL_STARTLIST); 1556 + 1557 + add_token_u8(&err, dev, OPAL_STARTNAME); 1558 + add_token_u8(&err, dev, OPAL_RANGESTART); 1559 + add_token_u64(&err, dev, setup->range_start); 1560 + add_token_u8(&err, dev, OPAL_ENDNAME); 1561 + 1562 + add_token_u8(&err, dev, OPAL_STARTNAME); 1563 + add_token_u8(&err, dev, OPAL_RANGELENGTH); 1564 + add_token_u64(&err, dev, setup->range_length); 1565 + add_token_u8(&err, dev, OPAL_ENDNAME); 1566 + 1567 + add_token_u8(&err, dev, OPAL_ENDLIST); 1568 + add_token_u8(&err, dev, OPAL_ENDNAME); 1569 + 1570 + if (err) { 1571 + pr_debug("Error building Setup Locking RangeStartLength command.\n"); 1567 1572 return err; 1568 1573 } 1569 1574 ··· 1581 1568 1582 1569 static int response_get_column(const struct parsed_resp *resp, 1583 1570 int *iter, 1584 - u8 column, 1571 + u64 column, 1585 1572 u64 *value) 1586 1573 { 1587 1574 const struct opal_resp_tok *tok; ··· 1599 1586 n++; 1600 1587 1601 1588 if (response_get_u64(resp, n) != column) { 1602 - pr_debug("Token %d does not match expected column %u.\n", 1589 + pr_debug("Token %d does not match expected column %llu.\n", 1603 1590 n, column); 1604 1591 return OPAL_INVAL_PARAM; 1605 1592 } ··· 1755 1742 { 1756 1743 return start_generic_opal_session(dev, OPAL_ANYBODY_UID, 1757 1744 OPAL_ADMINSP_UID, NULL, 0); 1745 + } 1746 + 1747 + static int start_anybodyLSP_opal_session(struct opal_dev *dev, void *data) 1748 + { 1749 + return start_generic_opal_session(dev, OPAL_ANYBODY_UID, 1750 + OPAL_LOCKINGSP_UID, NULL, 0); 1758 1751 } 1759 1752 1760 1753 static int start_SIDASP_opal_session(struct opal_dev *dev, void *data) ··· 2299 2280 if (err) { 2300 2281 pr_debug("Error building Activate LockingSP command.\n"); 2301 2282 return err; 2283 + } 2284 + 2285 + return finalize_and_send(dev, parse_and_check_status); 2286 + } 2287 + 2288 + static int reactivate_lsp(struct opal_dev *dev, void *data) 2289 + { 2290 + struct opal_lr_react *opal_react = data; 2291 + u8 user_lr[OPAL_UID_LENGTH]; 2292 + int err, i; 2293 + 2294 + err = cmd_start(dev, opaluid[OPAL_THISSP_UID], 2295 + opalmethod[OPAL_REACTIVATE]); 2296 + 2297 + if (err) { 2298 + pr_debug("Error building Reactivate LockingSP command.\n"); 2299 + return err; 2300 + } 2301 + 2302 + /* 2303 + * If neither 'entire_table' nor 'num_lrs' is set, the device 2304 + * gets reactivated with SUM disabled. Only Admin1PIN will change 2305 + * if set. 2306 + */ 2307 + if (opal_react->entire_table) { 2308 + /* Entire Locking table (all locking ranges) will be put in SUM. */ 2309 + add_token_u8(&err, dev, OPAL_STARTNAME); 2310 + add_token_u64(&err, dev, OPAL_SUM_SET_LIST); 2311 + add_token_bytestring(&err, dev, opaluid[OPAL_LOCKING_TABLE], OPAL_UID_LENGTH); 2312 + add_token_u8(&err, dev, OPAL_ENDNAME); 2313 + } else if (opal_react->num_lrs) { 2314 + /* Subset of Locking table (selected locking range(s)) to be put in SUM */ 2315 + err = build_locking_range(user_lr, sizeof(user_lr), 2316 + opal_react->lr[0]); 2317 + if (err) 2318 + return err; 2319 + 2320 + add_token_u8(&err, dev, OPAL_STARTNAME); 2321 + add_token_u64(&err, dev, OPAL_SUM_SET_LIST); 2322 + 2323 + add_token_u8(&err, dev, OPAL_STARTLIST); 2324 + add_token_bytestring(&err, dev, user_lr, OPAL_UID_LENGTH); 2325 + for (i = 1; i < opal_react->num_lrs; i++) { 2326 + user_lr[7] = opal_react->lr[i]; 2327 + add_token_bytestring(&err, dev, user_lr, OPAL_UID_LENGTH); 2328 + } 2329 + add_token_u8(&err, dev, OPAL_ENDLIST); 2330 + add_token_u8(&err, dev, OPAL_ENDNAME); 2331 + } 2332 + 2333 + /* Skipping the rangle policy parameter is same as setting its value to zero */ 2334 + if (opal_react->range_policy && (opal_react->num_lrs || opal_react->entire_table)) { 2335 + add_token_u8(&err, dev, OPAL_STARTNAME); 2336 + add_token_u64(&err, dev, OPAL_SUM_RANGE_POLICY); 2337 + add_token_u8(&err, dev, 1); 2338 + add_token_u8(&err, dev, OPAL_ENDNAME); 2339 + } 2340 + 2341 + /* 2342 + * Optional parameter. If set, it changes the Admin1 PIN even when SUM 2343 + * is being disabled. 2344 + */ 2345 + if (opal_react->new_admin_key.key_len) { 2346 + add_token_u8(&err, dev, OPAL_STARTNAME); 2347 + add_token_u64(&err, dev, OPAL_SUM_ADMIN1_PIN); 2348 + add_token_bytestring(&err, dev, opal_react->new_admin_key.key, 2349 + opal_react->new_admin_key.key_len); 2350 + add_token_u8(&err, dev, OPAL_ENDNAME); 2302 2351 } 2303 2352 2304 2353 return finalize_and_send(dev, parse_and_check_status); ··· 3042 2955 return ret; 3043 2956 } 3044 2957 2958 + static int opal_reactivate_lsp(struct opal_dev *dev, 2959 + struct opal_lr_react *opal_lr_react) 2960 + { 2961 + const struct opal_step active_steps[] = { 2962 + { start_admin1LSP_opal_session, &opal_lr_react->key }, 2963 + { reactivate_lsp, opal_lr_react }, 2964 + /* No end_opal_session. The controller terminates the session */ 2965 + }; 2966 + int ret; 2967 + 2968 + /* use either 'entire_table' parameter or set of locking ranges */ 2969 + if (opal_lr_react->num_lrs > OPAL_MAX_LRS || 2970 + (opal_lr_react->num_lrs && opal_lr_react->entire_table)) 2971 + return -EINVAL; 2972 + 2973 + ret = opal_get_key(dev, &opal_lr_react->key); 2974 + if (ret) 2975 + return ret; 2976 + mutex_lock(&dev->dev_lock); 2977 + setup_opal_dev(dev); 2978 + ret = execute_steps(dev, active_steps, ARRAY_SIZE(active_steps)); 2979 + mutex_unlock(&dev->dev_lock); 2980 + 2981 + return ret; 2982 + } 2983 + 3045 2984 static int opal_setup_locking_range(struct opal_dev *dev, 3046 2985 struct opal_user_lr_setup *opal_lrs) 3047 2986 { 3048 2987 const struct opal_step lr_steps[] = { 3049 2988 { start_auth_opal_session, &opal_lrs->session }, 3050 - { setup_locking_range, opal_lrs }, 2989 + { setup_locking_range_start_length, opal_lrs }, 2990 + { setup_enable_range, opal_lrs }, 2991 + { end_opal_session, } 2992 + }, lr_global_steps[] = { 2993 + { start_auth_opal_session, &opal_lrs->session }, 2994 + { setup_enable_range, opal_lrs }, 2995 + { end_opal_session, } 2996 + }; 2997 + int ret; 2998 + 2999 + ret = opal_get_key(dev, &opal_lrs->session.opal_key); 3000 + if (ret) 3001 + return ret; 3002 + mutex_lock(&dev->dev_lock); 3003 + setup_opal_dev(dev); 3004 + if (opal_lrs->session.opal_key.lr == 0) 3005 + ret = execute_steps(dev, lr_global_steps, ARRAY_SIZE(lr_global_steps)); 3006 + else 3007 + ret = execute_steps(dev, lr_steps, ARRAY_SIZE(lr_steps)); 3008 + mutex_unlock(&dev->dev_lock); 3009 + 3010 + return ret; 3011 + } 3012 + 3013 + static int opal_setup_locking_range_start_length(struct opal_dev *dev, 3014 + struct opal_user_lr_setup *opal_lrs) 3015 + { 3016 + const struct opal_step lr_steps[] = { 3017 + { start_auth_opal_session, &opal_lrs->session }, 3018 + { setup_locking_range_start_length, opal_lrs }, 3019 + { end_opal_session, } 3020 + }; 3021 + int ret; 3022 + 3023 + /* we can not set global locking range offset or length */ 3024 + if (opal_lrs->session.opal_key.lr == 0) 3025 + return -EINVAL; 3026 + 3027 + ret = opal_get_key(dev, &opal_lrs->session.opal_key); 3028 + if (ret) 3029 + return ret; 3030 + mutex_lock(&dev->dev_lock); 3031 + setup_opal_dev(dev); 3032 + ret = execute_steps(dev, lr_steps, ARRAY_SIZE(lr_steps)); 3033 + mutex_unlock(&dev->dev_lock); 3034 + 3035 + return ret; 3036 + } 3037 + 3038 + static int opal_enable_disable_range(struct opal_dev *dev, 3039 + struct opal_user_lr_setup *opal_lrs) 3040 + { 3041 + const struct opal_step lr_steps[] = { 3042 + { start_auth_opal_session, &opal_lrs->session }, 3043 + { setup_enable_range, opal_lrs }, 3051 3044 { end_opal_session, } 3052 3045 }; 3053 3046 int ret; ··· 3395 3228 return 0; 3396 3229 } 3397 3230 3231 + static int get_sum_ranges(struct opal_dev *dev, void *data) 3232 + { 3233 + const char *lr_uid; 3234 + size_t lr_uid_len; 3235 + u64 val; 3236 + const struct opal_resp_tok *tok; 3237 + int err, tok_n = 2; 3238 + struct opal_sum_ranges *sranges = data; 3239 + const __u8 lr_all[OPAL_MAX_LRS] = { 0, 1, 2, 3, 4, 5, 6, 7, 8 }; 3240 + 3241 + err = generic_get_columns(dev, opaluid[OPAL_LOCKING_INFO_TABLE], OPAL_SUM_SET_LIST, 3242 + OPAL_SUM_RANGE_POLICY); 3243 + if (err) { 3244 + pr_debug("Couldn't get locking info table columns %d to %d.\n", 3245 + OPAL_SUM_SET_LIST, OPAL_SUM_RANGE_POLICY); 3246 + return err; 3247 + } 3248 + 3249 + tok = response_get_token(&dev->parsed, tok_n); 3250 + if (IS_ERR(tok)) 3251 + return PTR_ERR(tok); 3252 + 3253 + if (!response_token_matches(tok, OPAL_STARTNAME)) { 3254 + pr_debug("Unexpected response token type %d.\n", tok_n); 3255 + return OPAL_INVAL_PARAM; 3256 + } 3257 + tok_n++; 3258 + 3259 + if (response_get_u64(&dev->parsed, tok_n) != OPAL_SUM_SET_LIST) { 3260 + pr_debug("Token %d does not match expected column %u.\n", 3261 + tok_n, OPAL_SUM_SET_LIST); 3262 + return OPAL_INVAL_PARAM; 3263 + } 3264 + tok_n++; 3265 + 3266 + tok = response_get_token(&dev->parsed, tok_n); 3267 + if (IS_ERR(tok)) 3268 + return PTR_ERR(tok); 3269 + 3270 + /* 3271 + * The OPAL_SUM_SET_LIST response contains two distinct values: 3272 + * 3273 + * - the list of individual locking ranges (UIDs) put in SUM. The list 3274 + * may also be empty signaling the SUM is disabled. 3275 + * 3276 + * - the Locking table UID if the entire Locking table is put in SUM. 3277 + */ 3278 + if (response_token_matches(tok, OPAL_STARTLIST)) { 3279 + sranges->num_lrs = 0; 3280 + 3281 + tok_n++; 3282 + tok = response_get_token(&dev->parsed, tok_n); 3283 + if (IS_ERR(tok)) 3284 + return PTR_ERR(tok); 3285 + 3286 + while (!response_token_matches(tok, OPAL_ENDLIST)) { 3287 + lr_uid_len = response_get_string(&dev->parsed, tok_n, &lr_uid); 3288 + if (lr_uid_len != OPAL_UID_LENGTH) { 3289 + pr_debug("Unexpected response token type %d.\n", tok_n); 3290 + return OPAL_INVAL_PARAM; 3291 + } 3292 + 3293 + if (memcmp(lr_uid, opaluid[OPAL_LOCKINGRANGE_GLOBAL], OPAL_UID_LENGTH)) { 3294 + if (lr_uid[5] != LOCKING_RANGE_NON_GLOBAL) { 3295 + pr_debug("Unexpected byte %d at LR UUID position 5.\n", 3296 + lr_uid[5]); 3297 + return OPAL_INVAL_PARAM; 3298 + } 3299 + sranges->lr[sranges->num_lrs++] = lr_uid[7]; 3300 + } else 3301 + sranges->lr[sranges->num_lrs++] = 0; 3302 + 3303 + tok_n++; 3304 + tok = response_get_token(&dev->parsed, tok_n); 3305 + if (IS_ERR(tok)) 3306 + return PTR_ERR(tok); 3307 + } 3308 + } else { 3309 + /* Only OPAL_LOCKING_TABLE UID is an alternative to OPAL_STARTLIST here. */ 3310 + lr_uid_len = response_get_string(&dev->parsed, tok_n, &lr_uid); 3311 + if (lr_uid_len != OPAL_UID_LENGTH) { 3312 + pr_debug("Unexpected response token type %d.\n", tok_n); 3313 + return OPAL_INVAL_PARAM; 3314 + } 3315 + 3316 + if (memcmp(lr_uid, opaluid[OPAL_LOCKING_TABLE], OPAL_UID_LENGTH)) { 3317 + pr_debug("Unexpected response UID.\n"); 3318 + return OPAL_INVAL_PARAM; 3319 + } 3320 + 3321 + /* sed-opal kernel API already provides following limit in Activate command */ 3322 + sranges->num_lrs = OPAL_MAX_LRS; 3323 + memcpy(sranges->lr, lr_all, OPAL_MAX_LRS); 3324 + } 3325 + tok_n++; 3326 + 3327 + tok = response_get_token(&dev->parsed, tok_n); 3328 + if (IS_ERR(tok)) 3329 + return PTR_ERR(tok); 3330 + 3331 + if (!response_token_matches(tok, OPAL_ENDNAME)) { 3332 + pr_debug("Unexpected response token type %d.\n", tok_n); 3333 + return OPAL_INVAL_PARAM; 3334 + } 3335 + tok_n++; 3336 + 3337 + err = response_get_column(&dev->parsed, &tok_n, OPAL_SUM_RANGE_POLICY, &val); 3338 + if (err) 3339 + return err; 3340 + 3341 + sranges->range_policy = val ? 1 : 0; 3342 + 3343 + return 0; 3344 + } 3345 + 3346 + static int opal_get_sum_ranges(struct opal_dev *dev, struct opal_sum_ranges *opal_sum_rngs, 3347 + void __user *data) 3348 + { 3349 + const struct opal_step admin_steps[] = { 3350 + { start_admin1LSP_opal_session, &opal_sum_rngs->key }, 3351 + { get_sum_ranges, opal_sum_rngs }, 3352 + { end_opal_session, } 3353 + }, anybody_steps[] = { 3354 + { start_anybodyLSP_opal_session, NULL }, 3355 + { get_sum_ranges, opal_sum_rngs }, 3356 + { end_opal_session, } 3357 + }; 3358 + int ret; 3359 + 3360 + mutex_lock(&dev->dev_lock); 3361 + setup_opal_dev(dev); 3362 + if (opal_sum_rngs->key.key_len) 3363 + /* Use Admin1 session (authenticated by PIN) to retrieve LockingInfo columns */ 3364 + ret = execute_steps(dev, admin_steps, ARRAY_SIZE(admin_steps)); 3365 + else 3366 + /* Use Anybody session (no key) to retrieve LockingInfo columns */ 3367 + ret = execute_steps(dev, anybody_steps, ARRAY_SIZE(anybody_steps)); 3368 + mutex_unlock(&dev->dev_lock); 3369 + 3370 + /* skip session info when copying back to uspace */ 3371 + if (!ret && copy_to_user(data + offsetof(struct opal_sum_ranges, num_lrs), 3372 + (void *)opal_sum_rngs + offsetof(struct opal_sum_ranges, num_lrs), 3373 + sizeof(*opal_sum_rngs) - offsetof(struct opal_sum_ranges, num_lrs))) { 3374 + pr_debug("Error copying SUM ranges info to userspace\n"); 3375 + return -EFAULT; 3376 + } 3377 + 3378 + return ret; 3379 + } 3380 + 3381 + static int opal_stack_reset(struct opal_dev *dev) 3382 + { 3383 + struct opal_stack_reset *req; 3384 + struct opal_stack_reset_response *resp; 3385 + int ret; 3386 + 3387 + mutex_lock(&dev->dev_lock); 3388 + 3389 + memset(dev->cmd, 0, IO_BUFFER_LENGTH); 3390 + req = (struct opal_stack_reset *)dev->cmd; 3391 + req->extendedComID[0] = dev->comid >> 8; 3392 + req->extendedComID[1] = dev->comid & 0xFF; 3393 + req->request_code = cpu_to_be32(OPAL_STACK_RESET); 3394 + 3395 + ret = dev->send_recv(dev->data, dev->comid, TCG_SECP_02, 3396 + dev->cmd, IO_BUFFER_LENGTH, true); 3397 + if (ret) { 3398 + pr_debug("Error sending stack reset: %d\n", ret); 3399 + goto out; 3400 + } 3401 + 3402 + memset(dev->resp, 0, IO_BUFFER_LENGTH); 3403 + ret = dev->send_recv(dev->data, dev->comid, TCG_SECP_02, 3404 + dev->resp, IO_BUFFER_LENGTH, false); 3405 + if (ret) { 3406 + pr_debug("Error receiving stack reset response: %d\n", ret); 3407 + goto out; 3408 + } 3409 + 3410 + resp = (struct opal_stack_reset_response *)dev->resp; 3411 + if (be16_to_cpu(resp->data_length) != 4) { 3412 + pr_debug("Stack reset pending\n"); 3413 + ret = -EBUSY; 3414 + goto out; 3415 + } 3416 + if (be32_to_cpu(resp->response) != 0) { 3417 + pr_debug("Stack reset failed: %u\n", be32_to_cpu(resp->response)); 3418 + ret = -EIO; 3419 + } 3420 + out: 3421 + mutex_unlock(&dev->dev_lock); 3422 + return ret; 3423 + } 3424 + 3398 3425 int sed_ioctl(struct opal_dev *dev, unsigned int cmd, void __user *arg) 3399 3426 { 3400 3427 void *p; ··· 3673 3312 break; 3674 3313 case IOC_OPAL_SET_SID_PW: 3675 3314 ret = opal_set_new_sid_pw(dev, p); 3315 + break; 3316 + case IOC_OPAL_REACTIVATE_LSP: 3317 + ret = opal_reactivate_lsp(dev, p); 3318 + break; 3319 + case IOC_OPAL_LR_SET_START_LEN: 3320 + ret = opal_setup_locking_range_start_length(dev, p); 3321 + break; 3322 + case IOC_OPAL_ENABLE_DISABLE_LR: 3323 + ret = opal_enable_disable_range(dev, p); 3324 + break; 3325 + case IOC_OPAL_GET_SUM_STATUS: 3326 + ret = opal_get_sum_ranges(dev, p, arg); 3327 + break; 3328 + case IOC_OPAL_STACK_RESET: 3329 + ret = opal_stack_reset(dev); 3676 3330 break; 3677 3331 3678 3332 default:
+474 -380
block/t10-pi.c
··· 12 12 #include <linux/unaligned.h> 13 13 #include "blk.h" 14 14 15 - struct blk_integrity_iter { 16 - void *prot_buf; 17 - void *data_buf; 18 - sector_t seed; 19 - unsigned int data_size; 20 - unsigned short interval; 21 - const char *disk_name; 15 + #define APP_TAG_ESCAPE 0xffff 16 + #define REF_TAG_ESCAPE 0xffffffff 17 + 18 + /* 19 + * This union is used for onstack allocations when the pi field is split across 20 + * segments. blk_validate_integrity_limits() guarantees pi_tuple_size matches 21 + * the sizeof one of these two types. 22 + */ 23 + union pi_tuple { 24 + struct crc64_pi_tuple crc64_pi; 25 + struct t10_pi_tuple t10_pi; 22 26 }; 23 27 24 - static __be16 t10_pi_csum(__be16 csum, void *data, unsigned int len, 25 - unsigned char csum_type) 28 + struct blk_integrity_iter { 29 + struct bio *bio; 30 + struct bio_integrity_payload *bip; 31 + struct blk_integrity *bi; 32 + struct bvec_iter data_iter; 33 + struct bvec_iter prot_iter; 34 + unsigned int interval_remaining; 35 + u64 seed; 36 + u64 csum; 37 + }; 38 + 39 + static void blk_calculate_guard(struct blk_integrity_iter *iter, void *data, 40 + unsigned int len) 26 41 { 27 - if (csum_type == BLK_INTEGRITY_CSUM_IP) 28 - return (__force __be16)ip_compute_csum(data, len); 29 - return cpu_to_be16(crc_t10dif_update(be16_to_cpu(csum), data, len)); 42 + switch (iter->bi->csum_type) { 43 + case BLK_INTEGRITY_CSUM_CRC64: 44 + iter->csum = crc64_nvme(iter->csum, data, len); 45 + break; 46 + case BLK_INTEGRITY_CSUM_CRC: 47 + iter->csum = crc_t10dif_update(iter->csum, data, len); 48 + break; 49 + case BLK_INTEGRITY_CSUM_IP: 50 + iter->csum = (__force u32)csum_partial(data, len, 51 + (__force __wsum)iter->csum); 52 + break; 53 + default: 54 + WARN_ON_ONCE(1); 55 + iter->csum = U64_MAX; 56 + break; 57 + } 58 + } 59 + 60 + static void blk_integrity_csum_finish(struct blk_integrity_iter *iter) 61 + { 62 + switch (iter->bi->csum_type) { 63 + case BLK_INTEGRITY_CSUM_IP: 64 + iter->csum = (__force u16)csum_fold((__force __wsum)iter->csum); 65 + break; 66 + default: 67 + break; 68 + } 30 69 } 31 70 32 71 /* 33 - * Type 1 and Type 2 protection use the same format: 16 bit guard tag, 34 - * 16 bit app tag, 32 bit reference tag. Type 3 does not define the ref 35 - * tag. 72 + * Update the csum for formats that have metadata padding in front of the data 73 + * integrity field 36 74 */ 37 - static void t10_pi_generate(struct blk_integrity_iter *iter, 38 - struct blk_integrity *bi) 75 + static void blk_integrity_csum_offset(struct blk_integrity_iter *iter) 39 76 { 40 - u8 offset = bi->pi_offset; 41 - unsigned int i; 77 + unsigned int offset = iter->bi->pi_offset; 78 + struct bio_vec *bvec = iter->bip->bip_vec; 42 79 43 - for (i = 0 ; i < iter->data_size ; i += iter->interval) { 44 - struct t10_pi_tuple *pi = iter->prot_buf + offset; 80 + while (offset > 0) { 81 + struct bio_vec pbv = bvec_iter_bvec(bvec, iter->prot_iter); 82 + unsigned int len = min(pbv.bv_len, offset); 83 + void *prot_buf = bvec_kmap_local(&pbv); 45 84 46 - pi->guard_tag = t10_pi_csum(0, iter->data_buf, iter->interval, 47 - bi->csum_type); 48 - if (offset) 49 - pi->guard_tag = t10_pi_csum(pi->guard_tag, 50 - iter->prot_buf, offset, bi->csum_type); 51 - pi->app_tag = 0; 85 + blk_calculate_guard(iter, prot_buf, len); 86 + kunmap_local(prot_buf); 87 + offset -= len; 88 + bvec_iter_advance_single(bvec, &iter->prot_iter, len); 89 + } 90 + blk_integrity_csum_finish(iter); 91 + } 52 92 53 - if (bi->flags & BLK_INTEGRITY_REF_TAG) 54 - pi->ref_tag = cpu_to_be32(lower_32_bits(iter->seed)); 55 - else 56 - pi->ref_tag = 0; 93 + static void blk_integrity_copy_from_tuple(struct bio_integrity_payload *bip, 94 + struct bvec_iter *iter, void *tuple, 95 + unsigned int tuple_size) 96 + { 97 + while (tuple_size) { 98 + struct bio_vec pbv = bvec_iter_bvec(bip->bip_vec, *iter); 99 + unsigned int len = min(tuple_size, pbv.bv_len); 100 + void *prot_buf = bvec_kmap_local(&pbv); 57 101 58 - iter->data_buf += iter->interval; 59 - iter->prot_buf += bi->metadata_size; 60 - iter->seed++; 102 + memcpy(prot_buf, tuple, len); 103 + kunmap_local(prot_buf); 104 + bvec_iter_advance_single(bip->bip_vec, iter, len); 105 + tuple_size -= len; 106 + tuple += len; 61 107 } 62 108 } 63 109 64 - static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter, 65 - struct blk_integrity *bi) 110 + static void blk_integrity_copy_to_tuple(struct bio_integrity_payload *bip, 111 + struct bvec_iter *iter, void *tuple, 112 + unsigned int tuple_size) 66 113 { 67 - u8 offset = bi->pi_offset; 68 - unsigned int i; 114 + while (tuple_size) { 115 + struct bio_vec pbv = bvec_iter_bvec(bip->bip_vec, *iter); 116 + unsigned int len = min(tuple_size, pbv.bv_len); 117 + void *prot_buf = bvec_kmap_local(&pbv); 69 118 70 - for (i = 0 ; i < iter->data_size ; i += iter->interval) { 71 - struct t10_pi_tuple *pi = iter->prot_buf + offset; 72 - __be16 csum; 73 - 74 - if (bi->flags & BLK_INTEGRITY_REF_TAG) { 75 - if (pi->app_tag == T10_PI_APP_ESCAPE) 76 - goto next; 77 - 78 - if (be32_to_cpu(pi->ref_tag) != 79 - lower_32_bits(iter->seed)) { 80 - pr_err("%s: ref tag error at location %llu " \ 81 - "(rcvd %u)\n", iter->disk_name, 82 - (unsigned long long) 83 - iter->seed, be32_to_cpu(pi->ref_tag)); 84 - return BLK_STS_PROTECTION; 85 - } 86 - } else { 87 - if (pi->app_tag == T10_PI_APP_ESCAPE && 88 - pi->ref_tag == T10_PI_REF_ESCAPE) 89 - goto next; 90 - } 91 - 92 - csum = t10_pi_csum(0, iter->data_buf, iter->interval, 93 - bi->csum_type); 94 - if (offset) 95 - csum = t10_pi_csum(csum, iter->prot_buf, offset, 96 - bi->csum_type); 97 - 98 - if (pi->guard_tag != csum) { 99 - pr_err("%s: guard tag error at sector %llu " \ 100 - "(rcvd %04x, want %04x)\n", iter->disk_name, 101 - (unsigned long long)iter->seed, 102 - be16_to_cpu(pi->guard_tag), be16_to_cpu(csum)); 103 - return BLK_STS_PROTECTION; 104 - } 105 - 106 - next: 107 - iter->data_buf += iter->interval; 108 - iter->prot_buf += bi->metadata_size; 109 - iter->seed++; 110 - } 111 - 112 - return BLK_STS_OK; 113 - } 114 - 115 - /** 116 - * t10_pi_type1_prepare - prepare PI prior submitting request to device 117 - * @rq: request with PI that should be prepared 118 - * 119 - * For Type 1/Type 2, the virtual start sector is the one that was 120 - * originally submitted by the block layer for the ref_tag usage. Due to 121 - * partitioning, MD/DM cloning, etc. the actual physical start sector is 122 - * likely to be different. Remap protection information to match the 123 - * physical LBA. 124 - */ 125 - static void t10_pi_type1_prepare(struct request *rq) 126 - { 127 - struct blk_integrity *bi = &rq->q->limits.integrity; 128 - const int tuple_sz = bi->metadata_size; 129 - u32 ref_tag = t10_pi_ref_tag(rq); 130 - u8 offset = bi->pi_offset; 131 - struct bio *bio; 132 - 133 - __rq_for_each_bio(bio, rq) { 134 - struct bio_integrity_payload *bip = bio_integrity(bio); 135 - u32 virt = bip_get_seed(bip) & 0xffffffff; 136 - struct bio_vec iv; 137 - struct bvec_iter iter; 138 - 139 - /* Already remapped? */ 140 - if (bip->bip_flags & BIP_MAPPED_INTEGRITY) 141 - break; 142 - 143 - bip_for_each_vec(iv, bip, iter) { 144 - unsigned int j; 145 - void *p; 146 - 147 - p = bvec_kmap_local(&iv); 148 - for (j = 0; j < iv.bv_len; j += tuple_sz) { 149 - struct t10_pi_tuple *pi = p + offset; 150 - 151 - if (be32_to_cpu(pi->ref_tag) == virt) 152 - pi->ref_tag = cpu_to_be32(ref_tag); 153 - virt++; 154 - ref_tag++; 155 - p += tuple_sz; 156 - } 157 - kunmap_local(p); 158 - } 159 - 160 - bip->bip_flags |= BIP_MAPPED_INTEGRITY; 161 - } 162 - } 163 - 164 - /** 165 - * t10_pi_type1_complete - prepare PI prior returning request to the blk layer 166 - * @rq: request with PI that should be prepared 167 - * @nr_bytes: total bytes to prepare 168 - * 169 - * For Type 1/Type 2, the virtual start sector is the one that was 170 - * originally submitted by the block layer for the ref_tag usage. Due to 171 - * partitioning, MD/DM cloning, etc. the actual physical start sector is 172 - * likely to be different. Since the physical start sector was submitted 173 - * to the device, we should remap it back to virtual values expected by the 174 - * block layer. 175 - */ 176 - static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes) 177 - { 178 - struct blk_integrity *bi = &rq->q->limits.integrity; 179 - unsigned intervals = nr_bytes >> bi->interval_exp; 180 - const int tuple_sz = bi->metadata_size; 181 - u32 ref_tag = t10_pi_ref_tag(rq); 182 - u8 offset = bi->pi_offset; 183 - struct bio *bio; 184 - 185 - __rq_for_each_bio(bio, rq) { 186 - struct bio_integrity_payload *bip = bio_integrity(bio); 187 - u32 virt = bip_get_seed(bip) & 0xffffffff; 188 - struct bio_vec iv; 189 - struct bvec_iter iter; 190 - 191 - bip_for_each_vec(iv, bip, iter) { 192 - unsigned int j; 193 - void *p; 194 - 195 - p = bvec_kmap_local(&iv); 196 - for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) { 197 - struct t10_pi_tuple *pi = p + offset; 198 - 199 - if (be32_to_cpu(pi->ref_tag) == ref_tag) 200 - pi->ref_tag = cpu_to_be32(virt); 201 - virt++; 202 - ref_tag++; 203 - intervals--; 204 - p += tuple_sz; 205 - } 206 - kunmap_local(p); 207 - } 208 - } 209 - } 210 - 211 - static __be64 ext_pi_crc64(u64 crc, void *data, unsigned int len) 212 - { 213 - return cpu_to_be64(crc64_nvme(crc, data, len)); 214 - } 215 - 216 - static void ext_pi_crc64_generate(struct blk_integrity_iter *iter, 217 - struct blk_integrity *bi) 218 - { 219 - u8 offset = bi->pi_offset; 220 - unsigned int i; 221 - 222 - for (i = 0 ; i < iter->data_size ; i += iter->interval) { 223 - struct crc64_pi_tuple *pi = iter->prot_buf + offset; 224 - 225 - pi->guard_tag = ext_pi_crc64(0, iter->data_buf, iter->interval); 226 - if (offset) 227 - pi->guard_tag = ext_pi_crc64(be64_to_cpu(pi->guard_tag), 228 - iter->prot_buf, offset); 229 - pi->app_tag = 0; 230 - 231 - if (bi->flags & BLK_INTEGRITY_REF_TAG) 232 - put_unaligned_be48(iter->seed, pi->ref_tag); 233 - else 234 - put_unaligned_be48(0ULL, pi->ref_tag); 235 - 236 - iter->data_buf += iter->interval; 237 - iter->prot_buf += bi->metadata_size; 238 - iter->seed++; 119 + memcpy(tuple, prot_buf, len); 120 + kunmap_local(prot_buf); 121 + bvec_iter_advance_single(bip->bip_vec, iter, len); 122 + tuple_size -= len; 123 + tuple += len; 239 124 } 240 125 } 241 126 ··· 131 246 return memcmp(ref_tag, ref_escape, sizeof(ref_escape)) == 0; 132 247 } 133 248 134 - static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter, 135 - struct blk_integrity *bi) 249 + static blk_status_t blk_verify_ext_pi(struct blk_integrity_iter *iter, 250 + struct crc64_pi_tuple *pi) 136 251 { 137 - u8 offset = bi->pi_offset; 138 - unsigned int i; 252 + u64 seed = lower_48_bits(iter->seed); 253 + u64 guard = get_unaligned_be64(&pi->guard_tag); 254 + u64 ref = get_unaligned_be48(pi->ref_tag); 255 + u16 app = get_unaligned_be16(&pi->app_tag); 139 256 140 - for (i = 0; i < iter->data_size; i += iter->interval) { 141 - struct crc64_pi_tuple *pi = iter->prot_buf + offset; 142 - u64 ref, seed; 143 - __be64 csum; 144 - 145 - if (bi->flags & BLK_INTEGRITY_REF_TAG) { 146 - if (pi->app_tag == T10_PI_APP_ESCAPE) 147 - goto next; 148 - 149 - ref = get_unaligned_be48(pi->ref_tag); 150 - seed = lower_48_bits(iter->seed); 151 - if (ref != seed) { 152 - pr_err("%s: ref tag error at location %llu (rcvd %llu)\n", 153 - iter->disk_name, seed, ref); 154 - return BLK_STS_PROTECTION; 155 - } 156 - } else { 157 - if (pi->app_tag == T10_PI_APP_ESCAPE && 158 - ext_pi_ref_escape(pi->ref_tag)) 159 - goto next; 160 - } 161 - 162 - csum = ext_pi_crc64(0, iter->data_buf, iter->interval); 163 - if (offset) 164 - csum = ext_pi_crc64(be64_to_cpu(csum), iter->prot_buf, 165 - offset); 166 - 167 - if (pi->guard_tag != csum) { 168 - pr_err("%s: guard tag error at sector %llu " \ 169 - "(rcvd %016llx, want %016llx)\n", 170 - iter->disk_name, (unsigned long long)iter->seed, 171 - be64_to_cpu(pi->guard_tag), be64_to_cpu(csum)); 257 + if (iter->bi->flags & BLK_INTEGRITY_REF_TAG) { 258 + if (app == APP_TAG_ESCAPE) 259 + return BLK_STS_OK; 260 + if (ref != seed) { 261 + pr_err("%s: ref tag error at location %llu (rcvd %llu)\n", 262 + iter->bio->bi_bdev->bd_disk->disk_name, seed, 263 + ref); 172 264 return BLK_STS_PROTECTION; 173 265 } 266 + } else if (app == APP_TAG_ESCAPE && ext_pi_ref_escape(pi->ref_tag)) { 267 + return BLK_STS_OK; 268 + } 174 269 175 - next: 176 - iter->data_buf += iter->interval; 177 - iter->prot_buf += bi->metadata_size; 178 - iter->seed++; 270 + if (guard != iter->csum) { 271 + pr_err("%s: guard tag error at sector %llu (rcvd %016llx, want %016llx)\n", 272 + iter->bio->bi_bdev->bd_disk->disk_name, iter->seed, 273 + guard, iter->csum); 274 + return BLK_STS_PROTECTION; 179 275 } 180 276 181 277 return BLK_STS_OK; 182 278 } 183 279 184 - static void ext_pi_type1_prepare(struct request *rq) 280 + static blk_status_t blk_verify_pi(struct blk_integrity_iter *iter, 281 + struct t10_pi_tuple *pi, u16 guard) 185 282 { 186 - struct blk_integrity *bi = &rq->q->limits.integrity; 187 - const int tuple_sz = bi->metadata_size; 188 - u64 ref_tag = ext_pi_ref_tag(rq); 189 - u8 offset = bi->pi_offset; 190 - struct bio *bio; 283 + u32 seed = lower_32_bits(iter->seed); 284 + u32 ref = get_unaligned_be32(&pi->ref_tag); 285 + u16 app = get_unaligned_be16(&pi->app_tag); 191 286 192 - __rq_for_each_bio(bio, rq) { 193 - struct bio_integrity_payload *bip = bio_integrity(bio); 194 - u64 virt = lower_48_bits(bip_get_seed(bip)); 195 - struct bio_vec iv; 196 - struct bvec_iter iter; 197 - 198 - /* Already remapped? */ 199 - if (bip->bip_flags & BIP_MAPPED_INTEGRITY) 200 - break; 201 - 202 - bip_for_each_vec(iv, bip, iter) { 203 - unsigned int j; 204 - void *p; 205 - 206 - p = bvec_kmap_local(&iv); 207 - for (j = 0; j < iv.bv_len; j += tuple_sz) { 208 - struct crc64_pi_tuple *pi = p + offset; 209 - u64 ref = get_unaligned_be48(pi->ref_tag); 210 - 211 - if (ref == virt) 212 - put_unaligned_be48(ref_tag, pi->ref_tag); 213 - virt++; 214 - ref_tag++; 215 - p += tuple_sz; 216 - } 217 - kunmap_local(p); 287 + if (iter->bi->flags & BLK_INTEGRITY_REF_TAG) { 288 + if (app == APP_TAG_ESCAPE) 289 + return BLK_STS_OK; 290 + if (ref != seed) { 291 + pr_err("%s: ref tag error at location %u (rcvd %u)\n", 292 + iter->bio->bi_bdev->bd_disk->disk_name, seed, 293 + ref); 294 + return BLK_STS_PROTECTION; 218 295 } 296 + } else if (app == APP_TAG_ESCAPE && ref == REF_TAG_ESCAPE) { 297 + return BLK_STS_OK; 298 + } 219 299 220 - bip->bip_flags |= BIP_MAPPED_INTEGRITY; 300 + if (guard != (u16)iter->csum) { 301 + pr_err("%s: guard tag error at sector %llu (rcvd %04x, want %04x)\n", 302 + iter->bio->bi_bdev->bd_disk->disk_name, iter->seed, 303 + guard, (u16)iter->csum); 304 + return BLK_STS_PROTECTION; 305 + } 306 + 307 + return BLK_STS_OK; 308 + } 309 + 310 + static blk_status_t blk_verify_t10_pi(struct blk_integrity_iter *iter, 311 + struct t10_pi_tuple *pi) 312 + { 313 + u16 guard = get_unaligned_be16(&pi->guard_tag); 314 + 315 + return blk_verify_pi(iter, pi, guard); 316 + } 317 + 318 + static blk_status_t blk_verify_ip_pi(struct blk_integrity_iter *iter, 319 + struct t10_pi_tuple *pi) 320 + { 321 + u16 guard = get_unaligned((u16 *)&pi->guard_tag); 322 + 323 + return blk_verify_pi(iter, pi, guard); 324 + } 325 + 326 + static blk_status_t blk_integrity_verify(struct blk_integrity_iter *iter, 327 + union pi_tuple *tuple) 328 + { 329 + switch (iter->bi->csum_type) { 330 + case BLK_INTEGRITY_CSUM_CRC64: 331 + return blk_verify_ext_pi(iter, &tuple->crc64_pi); 332 + case BLK_INTEGRITY_CSUM_CRC: 333 + return blk_verify_t10_pi(iter, &tuple->t10_pi); 334 + case BLK_INTEGRITY_CSUM_IP: 335 + return blk_verify_ip_pi(iter, &tuple->t10_pi); 336 + default: 337 + return BLK_STS_OK; 221 338 } 222 339 } 223 340 224 - static void ext_pi_type1_complete(struct request *rq, unsigned int nr_bytes) 341 + static void blk_set_ext_pi(struct blk_integrity_iter *iter, 342 + struct crc64_pi_tuple *pi) 225 343 { 226 - struct blk_integrity *bi = &rq->q->limits.integrity; 227 - unsigned intervals = nr_bytes >> bi->interval_exp; 228 - const int tuple_sz = bi->metadata_size; 229 - u64 ref_tag = ext_pi_ref_tag(rq); 230 - u8 offset = bi->pi_offset; 231 - struct bio *bio; 344 + put_unaligned_be64(iter->csum, &pi->guard_tag); 345 + put_unaligned_be16(0, &pi->app_tag); 346 + put_unaligned_be48(iter->seed, &pi->ref_tag); 347 + } 232 348 233 - __rq_for_each_bio(bio, rq) { 234 - struct bio_integrity_payload *bip = bio_integrity(bio); 235 - u64 virt = lower_48_bits(bip_get_seed(bip)); 236 - struct bio_vec iv; 237 - struct bvec_iter iter; 349 + static void blk_set_pi(struct blk_integrity_iter *iter, 350 + struct t10_pi_tuple *pi, __be16 csum) 351 + { 352 + put_unaligned(csum, &pi->guard_tag); 353 + put_unaligned_be16(0, &pi->app_tag); 354 + put_unaligned_be32(iter->seed, &pi->ref_tag); 355 + } 238 356 239 - bip_for_each_vec(iv, bip, iter) { 240 - unsigned int j; 241 - void *p; 357 + static void blk_set_t10_pi(struct blk_integrity_iter *iter, 358 + struct t10_pi_tuple *pi) 359 + { 360 + blk_set_pi(iter, pi, cpu_to_be16((u16)iter->csum)); 361 + } 242 362 243 - p = bvec_kmap_local(&iv); 244 - for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) { 245 - struct crc64_pi_tuple *pi = p + offset; 246 - u64 ref = get_unaligned_be48(pi->ref_tag); 363 + static void blk_set_ip_pi(struct blk_integrity_iter *iter, 364 + struct t10_pi_tuple *pi) 365 + { 366 + blk_set_pi(iter, pi, (__force __be16)(u16)iter->csum); 367 + } 247 368 248 - if (ref == ref_tag) 249 - put_unaligned_be48(virt, pi->ref_tag); 250 - virt++; 251 - ref_tag++; 252 - intervals--; 253 - p += tuple_sz; 254 - } 255 - kunmap_local(p); 256 - } 369 + static void blk_integrity_set(struct blk_integrity_iter *iter, 370 + union pi_tuple *tuple) 371 + { 372 + switch (iter->bi->csum_type) { 373 + case BLK_INTEGRITY_CSUM_CRC64: 374 + return blk_set_ext_pi(iter, &tuple->crc64_pi); 375 + case BLK_INTEGRITY_CSUM_CRC: 376 + return blk_set_t10_pi(iter, &tuple->t10_pi); 377 + case BLK_INTEGRITY_CSUM_IP: 378 + return blk_set_ip_pi(iter, &tuple->t10_pi); 379 + default: 380 + WARN_ON_ONCE(1); 381 + return; 257 382 } 383 + } 384 + 385 + static blk_status_t blk_integrity_interval(struct blk_integrity_iter *iter, 386 + bool verify) 387 + { 388 + blk_status_t ret = BLK_STS_OK; 389 + union pi_tuple tuple; 390 + void *ptuple = &tuple; 391 + struct bio_vec pbv; 392 + 393 + blk_integrity_csum_offset(iter); 394 + pbv = bvec_iter_bvec(iter->bip->bip_vec, iter->prot_iter); 395 + if (pbv.bv_len >= iter->bi->pi_tuple_size) { 396 + ptuple = bvec_kmap_local(&pbv); 397 + bvec_iter_advance_single(iter->bip->bip_vec, &iter->prot_iter, 398 + iter->bi->metadata_size - iter->bi->pi_offset); 399 + } else if (verify) { 400 + blk_integrity_copy_to_tuple(iter->bip, &iter->prot_iter, 401 + ptuple, iter->bi->pi_tuple_size); 402 + } 403 + 404 + if (verify) 405 + ret = blk_integrity_verify(iter, ptuple); 406 + else 407 + blk_integrity_set(iter, ptuple); 408 + 409 + if (likely(ptuple != &tuple)) { 410 + kunmap_local(ptuple); 411 + } else if (!verify) { 412 + blk_integrity_copy_from_tuple(iter->bip, &iter->prot_iter, 413 + ptuple, iter->bi->pi_tuple_size); 414 + } 415 + 416 + iter->interval_remaining = 1 << iter->bi->interval_exp; 417 + iter->csum = 0; 418 + iter->seed++; 419 + return ret; 420 + } 421 + 422 + static blk_status_t blk_integrity_iterate(struct bio *bio, 423 + struct bvec_iter *data_iter, 424 + bool verify) 425 + { 426 + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); 427 + struct bio_integrity_payload *bip = bio_integrity(bio); 428 + struct blk_integrity_iter iter = { 429 + .bio = bio, 430 + .bip = bip, 431 + .bi = bi, 432 + .data_iter = *data_iter, 433 + .prot_iter = bip->bip_iter, 434 + .interval_remaining = 1 << bi->interval_exp, 435 + .seed = data_iter->bi_sector, 436 + .csum = 0, 437 + }; 438 + blk_status_t ret = BLK_STS_OK; 439 + 440 + while (iter.data_iter.bi_size && ret == BLK_STS_OK) { 441 + struct bio_vec bv = bvec_iter_bvec(iter.bio->bi_io_vec, 442 + iter.data_iter); 443 + void *kaddr = bvec_kmap_local(&bv); 444 + void *data = kaddr; 445 + unsigned int len; 446 + 447 + bvec_iter_advance_single(iter.bio->bi_io_vec, &iter.data_iter, 448 + bv.bv_len); 449 + while (bv.bv_len && ret == BLK_STS_OK) { 450 + len = min(iter.interval_remaining, bv.bv_len); 451 + blk_calculate_guard(&iter, data, len); 452 + bv.bv_len -= len; 453 + data += len; 454 + iter.interval_remaining -= len; 455 + if (!iter.interval_remaining) 456 + ret = blk_integrity_interval(&iter, verify); 457 + } 458 + kunmap_local(kaddr); 459 + } 460 + 461 + return ret; 258 462 } 259 463 260 464 void bio_integrity_generate(struct bio *bio) 261 465 { 262 466 struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); 263 - struct bio_integrity_payload *bip = bio_integrity(bio); 264 - struct blk_integrity_iter iter; 265 - struct bvec_iter bviter; 266 - struct bio_vec bv; 267 467 268 - iter.disk_name = bio->bi_bdev->bd_disk->disk_name; 269 - iter.interval = 1 << bi->interval_exp; 270 - iter.seed = bio->bi_iter.bi_sector; 271 - iter.prot_buf = bvec_virt(bip->bip_vec); 272 - bio_for_each_segment(bv, bio, bviter) { 273 - void *kaddr = bvec_kmap_local(&bv); 274 - 275 - iter.data_buf = kaddr; 276 - iter.data_size = bv.bv_len; 277 - switch (bi->csum_type) { 278 - case BLK_INTEGRITY_CSUM_CRC64: 279 - ext_pi_crc64_generate(&iter, bi); 280 - break; 281 - case BLK_INTEGRITY_CSUM_CRC: 282 - case BLK_INTEGRITY_CSUM_IP: 283 - t10_pi_generate(&iter, bi); 284 - break; 285 - default: 286 - break; 287 - } 288 - kunmap_local(kaddr); 468 + switch (bi->csum_type) { 469 + case BLK_INTEGRITY_CSUM_CRC64: 470 + case BLK_INTEGRITY_CSUM_CRC: 471 + case BLK_INTEGRITY_CSUM_IP: 472 + blk_integrity_iterate(bio, &bio->bi_iter, false); 473 + break; 474 + default: 475 + break; 289 476 } 290 477 } 291 478 292 479 blk_status_t bio_integrity_verify(struct bio *bio, struct bvec_iter *saved_iter) 293 480 { 294 481 struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); 295 - struct bio_integrity_payload *bip = bio_integrity(bio); 296 - struct blk_integrity_iter iter; 297 - struct bvec_iter bviter; 298 - struct bio_vec bv; 299 482 300 - /* 301 - * At the moment verify is called bi_iter has been advanced during split 302 - * and completion, so use the copy created during submission here. 303 - */ 304 - iter.disk_name = bio->bi_bdev->bd_disk->disk_name; 305 - iter.interval = 1 << bi->interval_exp; 306 - iter.seed = saved_iter->bi_sector; 307 - iter.prot_buf = bvec_virt(bip->bip_vec); 308 - __bio_for_each_segment(bv, bio, bviter, *saved_iter) { 309 - void *kaddr = bvec_kmap_local(&bv); 310 - blk_status_t ret = BLK_STS_OK; 311 - 312 - iter.data_buf = kaddr; 313 - iter.data_size = bv.bv_len; 314 - switch (bi->csum_type) { 315 - case BLK_INTEGRITY_CSUM_CRC64: 316 - ret = ext_pi_crc64_verify(&iter, bi); 317 - break; 318 - case BLK_INTEGRITY_CSUM_CRC: 319 - case BLK_INTEGRITY_CSUM_IP: 320 - ret = t10_pi_verify(&iter, bi); 321 - break; 322 - default: 323 - break; 324 - } 325 - kunmap_local(kaddr); 326 - 327 - if (ret) 328 - return ret; 483 + switch (bi->csum_type) { 484 + case BLK_INTEGRITY_CSUM_CRC64: 485 + case BLK_INTEGRITY_CSUM_CRC: 486 + case BLK_INTEGRITY_CSUM_IP: 487 + return blk_integrity_iterate(bio, saved_iter, true); 488 + default: 489 + break; 329 490 } 330 491 331 492 return BLK_STS_OK; 332 493 } 333 494 334 - void blk_integrity_prepare(struct request *rq) 495 + /* 496 + * Advance @iter past the protection offset for protection formats that 497 + * contain front padding on the metadata region. 498 + */ 499 + static void blk_pi_advance_offset(struct blk_integrity *bi, 500 + struct bio_integrity_payload *bip, 501 + struct bvec_iter *iter) 502 + { 503 + unsigned int offset = bi->pi_offset; 504 + 505 + while (offset > 0) { 506 + struct bio_vec bv = mp_bvec_iter_bvec(bip->bip_vec, *iter); 507 + unsigned int len = min(bv.bv_len, offset); 508 + 509 + bvec_iter_advance_single(bip->bip_vec, iter, len); 510 + offset -= len; 511 + } 512 + } 513 + 514 + static void *blk_tuple_remap_begin(union pi_tuple *tuple, 515 + struct blk_integrity *bi, 516 + struct bio_integrity_payload *bip, 517 + struct bvec_iter *iter) 518 + { 519 + struct bvec_iter titer; 520 + struct bio_vec pbv; 521 + 522 + blk_pi_advance_offset(bi, bip, iter); 523 + pbv = bvec_iter_bvec(bip->bip_vec, *iter); 524 + if (likely(pbv.bv_len >= bi->pi_tuple_size)) 525 + return bvec_kmap_local(&pbv); 526 + 527 + /* 528 + * We need to preserve the state of the original iter for the 529 + * copy_from_tuple at the end, so make a temp iter for here. 530 + */ 531 + titer = *iter; 532 + blk_integrity_copy_to_tuple(bip, &titer, tuple, bi->pi_tuple_size); 533 + return tuple; 534 + } 535 + 536 + static void blk_tuple_remap_end(union pi_tuple *tuple, void *ptuple, 537 + struct blk_integrity *bi, 538 + struct bio_integrity_payload *bip, 539 + struct bvec_iter *iter) 540 + { 541 + unsigned int len = bi->metadata_size - bi->pi_offset; 542 + 543 + if (likely(ptuple != tuple)) { 544 + kunmap_local(ptuple); 545 + } else { 546 + blk_integrity_copy_from_tuple(bip, iter, ptuple, 547 + bi->pi_tuple_size); 548 + len -= bi->pi_tuple_size; 549 + } 550 + 551 + bvec_iter_advance(bip->bip_vec, iter, len); 552 + } 553 + 554 + static void blk_set_ext_unmap_ref(struct crc64_pi_tuple *pi, u64 virt, 555 + u64 ref_tag) 556 + { 557 + u64 ref = get_unaligned_be48(&pi->ref_tag); 558 + 559 + if (ref == lower_48_bits(ref_tag) && ref != lower_48_bits(virt)) 560 + put_unaligned_be48(virt, pi->ref_tag); 561 + } 562 + 563 + static void blk_set_t10_unmap_ref(struct t10_pi_tuple *pi, u32 virt, 564 + u32 ref_tag) 565 + { 566 + u32 ref = get_unaligned_be32(&pi->ref_tag); 567 + 568 + if (ref == ref_tag && ref != virt) 569 + put_unaligned_be32(virt, &pi->ref_tag); 570 + } 571 + 572 + static void blk_reftag_remap_complete(struct blk_integrity *bi, 573 + union pi_tuple *tuple, u64 virt, u64 ref) 574 + { 575 + switch (bi->csum_type) { 576 + case BLK_INTEGRITY_CSUM_CRC64: 577 + blk_set_ext_unmap_ref(&tuple->crc64_pi, virt, ref); 578 + break; 579 + case BLK_INTEGRITY_CSUM_CRC: 580 + case BLK_INTEGRITY_CSUM_IP: 581 + blk_set_t10_unmap_ref(&tuple->t10_pi, virt, ref); 582 + break; 583 + default: 584 + WARN_ON_ONCE(1); 585 + break; 586 + } 587 + } 588 + 589 + static void blk_set_ext_map_ref(struct crc64_pi_tuple *pi, u64 virt, 590 + u64 ref_tag) 591 + { 592 + u64 ref = get_unaligned_be48(&pi->ref_tag); 593 + 594 + if (ref == lower_48_bits(virt) && ref != ref_tag) 595 + put_unaligned_be48(ref_tag, pi->ref_tag); 596 + } 597 + 598 + static void blk_set_t10_map_ref(struct t10_pi_tuple *pi, u32 virt, u32 ref_tag) 599 + { 600 + u32 ref = get_unaligned_be32(&pi->ref_tag); 601 + 602 + if (ref == virt && ref != ref_tag) 603 + put_unaligned_be32(ref_tag, &pi->ref_tag); 604 + } 605 + 606 + static void blk_reftag_remap_prepare(struct blk_integrity *bi, 607 + union pi_tuple *tuple, 608 + u64 virt, u64 ref) 609 + { 610 + switch (bi->csum_type) { 611 + case BLK_INTEGRITY_CSUM_CRC64: 612 + blk_set_ext_map_ref(&tuple->crc64_pi, virt, ref); 613 + break; 614 + case BLK_INTEGRITY_CSUM_CRC: 615 + case BLK_INTEGRITY_CSUM_IP: 616 + blk_set_t10_map_ref(&tuple->t10_pi, virt, ref); 617 + break; 618 + default: 619 + WARN_ON_ONCE(1); 620 + break; 621 + } 622 + } 623 + 624 + static void __blk_reftag_remap(struct bio *bio, struct blk_integrity *bi, 625 + unsigned *intervals, u64 *ref, bool prep) 626 + { 627 + struct bio_integrity_payload *bip = bio_integrity(bio); 628 + struct bvec_iter iter = bip->bip_iter; 629 + u64 virt = bip_get_seed(bip); 630 + union pi_tuple *ptuple; 631 + union pi_tuple tuple; 632 + 633 + if (prep && bip->bip_flags & BIP_MAPPED_INTEGRITY) { 634 + *ref += bio->bi_iter.bi_size >> bi->interval_exp; 635 + return; 636 + } 637 + 638 + while (iter.bi_size && *intervals) { 639 + ptuple = blk_tuple_remap_begin(&tuple, bi, bip, &iter); 640 + 641 + if (prep) 642 + blk_reftag_remap_prepare(bi, ptuple, virt, *ref); 643 + else 644 + blk_reftag_remap_complete(bi, ptuple, virt, *ref); 645 + 646 + blk_tuple_remap_end(&tuple, ptuple, bi, bip, &iter); 647 + (*intervals)--; 648 + (*ref)++; 649 + virt++; 650 + } 651 + 652 + if (prep) 653 + bip->bip_flags |= BIP_MAPPED_INTEGRITY; 654 + } 655 + 656 + static void blk_integrity_remap(struct request *rq, unsigned int nr_bytes, 657 + bool prep) 335 658 { 336 659 struct blk_integrity *bi = &rq->q->limits.integrity; 660 + u64 ref = blk_rq_pos(rq) >> (bi->interval_exp - SECTOR_SHIFT); 661 + unsigned intervals = nr_bytes >> bi->interval_exp; 662 + struct bio *bio; 337 663 338 664 if (!(bi->flags & BLK_INTEGRITY_REF_TAG)) 339 665 return; 340 666 341 - if (bi->csum_type == BLK_INTEGRITY_CSUM_CRC64) 342 - ext_pi_type1_prepare(rq); 343 - else 344 - t10_pi_type1_prepare(rq); 667 + __rq_for_each_bio(bio, rq) { 668 + __blk_reftag_remap(bio, bi, &intervals, &ref, prep); 669 + if (!intervals) 670 + break; 671 + } 672 + } 673 + 674 + void blk_integrity_prepare(struct request *rq) 675 + { 676 + blk_integrity_remap(rq, blk_rq_bytes(rq), true); 345 677 } 346 678 347 679 void blk_integrity_complete(struct request *rq, unsigned int nr_bytes) 348 680 { 349 - struct blk_integrity *bi = &rq->q->limits.integrity; 350 - 351 - if (!(bi->flags & BLK_INTEGRITY_REF_TAG)) 352 - return; 353 - 354 - if (bi->csum_type == BLK_INTEGRITY_CSUM_CRC64) 355 - ext_pi_type1_complete(rq, nr_bytes); 356 - else 357 - t10_pi_type1_complete(rq, nr_bytes); 681 + blk_integrity_remap(rq, nr_bytes, false); 358 682 }
-6
crypto/Kconfig
··· 141 141 select CRYPTO_ALGAPI 142 142 select CRYPTO_ACOMP2 143 143 144 - config CRYPTO_HKDF 145 - tristate 146 - select CRYPTO_SHA256 if CRYPTO_SELFTESTS 147 - select CRYPTO_SHA512 if CRYPTO_SELFTESTS 148 - select CRYPTO_HASH2 149 - 150 144 config CRYPTO_MANAGER 151 145 tristate 152 146 default CRYPTO_ALGAPI if CRYPTO_SELFTESTS
-1
crypto/Makefile
··· 36 36 obj-$(CONFIG_CRYPTO_AKCIPHER2) += akcipher.o 37 37 obj-$(CONFIG_CRYPTO_SIG2) += sig.o 38 38 obj-$(CONFIG_CRYPTO_KPP2) += kpp.o 39 - obj-$(CONFIG_CRYPTO_HKDF) += hkdf.o 40 39 41 40 dh_generic-y := dh.o 42 41 dh_generic-y += dh_helper.o
-573
crypto/hkdf.c
··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - /* 3 - * Implementation of HKDF ("HMAC-based Extract-and-Expand Key Derivation 4 - * Function"), aka RFC 5869. See also the original paper (Krawczyk 2010): 5 - * "Cryptographic Extraction and Key Derivation: The HKDF Scheme". 6 - * 7 - * Copyright 2019 Google LLC 8 - */ 9 - 10 - #include <crypto/internal/hash.h> 11 - #include <crypto/sha2.h> 12 - #include <crypto/hkdf.h> 13 - #include <linux/module.h> 14 - 15 - /* 16 - * HKDF consists of two steps: 17 - * 18 - * 1. HKDF-Extract: extract a pseudorandom key from the input keying material 19 - * and optional salt. 20 - * 2. HKDF-Expand: expand the pseudorandom key into output keying material of 21 - * any length, parameterized by an application-specific info string. 22 - * 23 - */ 24 - 25 - /** 26 - * hkdf_extract - HKDF-Extract (RFC 5869 section 2.2) 27 - * @hmac_tfm: an HMAC transform using the hash function desired for HKDF. The 28 - * caller is responsible for setting the @prk afterwards. 29 - * @ikm: input keying material 30 - * @ikmlen: length of @ikm 31 - * @salt: input salt value 32 - * @saltlen: length of @salt 33 - * @prk: resulting pseudorandom key 34 - * 35 - * Extracts a pseudorandom key @prk from the input keying material 36 - * @ikm with length @ikmlen and salt @salt with length @saltlen. 37 - * The length of @prk is given by the digest size of @hmac_tfm. 38 - * For an 'unsalted' version of HKDF-Extract @salt must be set 39 - * to all zeroes and @saltlen must be set to the length of @prk. 40 - * 41 - * Returns 0 on success with the pseudorandom key stored in @prk, 42 - * or a negative errno value otherwise. 43 - */ 44 - int hkdf_extract(struct crypto_shash *hmac_tfm, const u8 *ikm, 45 - unsigned int ikmlen, const u8 *salt, unsigned int saltlen, 46 - u8 *prk) 47 - { 48 - int err; 49 - 50 - err = crypto_shash_setkey(hmac_tfm, salt, saltlen); 51 - if (!err) 52 - err = crypto_shash_tfm_digest(hmac_tfm, ikm, ikmlen, prk); 53 - 54 - return err; 55 - } 56 - EXPORT_SYMBOL_GPL(hkdf_extract); 57 - 58 - /** 59 - * hkdf_expand - HKDF-Expand (RFC 5869 section 2.3) 60 - * @hmac_tfm: hash context keyed with pseudorandom key 61 - * @info: application-specific information 62 - * @infolen: length of @info 63 - * @okm: output keying material 64 - * @okmlen: length of @okm 65 - * 66 - * This expands the pseudorandom key, which was already keyed into @hmac_tfm, 67 - * into @okmlen bytes of output keying material parameterized by the 68 - * application-specific @info of length @infolen bytes. 69 - * This is thread-safe and may be called by multiple threads in parallel. 70 - * 71 - * Returns 0 on success with output keying material stored in @okm, 72 - * or a negative errno value otherwise. 73 - */ 74 - int hkdf_expand(struct crypto_shash *hmac_tfm, 75 - const u8 *info, unsigned int infolen, 76 - u8 *okm, unsigned int okmlen) 77 - { 78 - SHASH_DESC_ON_STACK(desc, hmac_tfm); 79 - unsigned int i, hashlen = crypto_shash_digestsize(hmac_tfm); 80 - int err; 81 - const u8 *prev = NULL; 82 - u8 counter = 1; 83 - u8 tmp[HASH_MAX_DIGESTSIZE] = {}; 84 - 85 - if (WARN_ON(okmlen > 255 * hashlen)) 86 - return -EINVAL; 87 - 88 - desc->tfm = hmac_tfm; 89 - 90 - for (i = 0; i < okmlen; i += hashlen) { 91 - err = crypto_shash_init(desc); 92 - if (err) 93 - goto out; 94 - 95 - if (prev) { 96 - err = crypto_shash_update(desc, prev, hashlen); 97 - if (err) 98 - goto out; 99 - } 100 - 101 - if (infolen) { 102 - err = crypto_shash_update(desc, info, infolen); 103 - if (err) 104 - goto out; 105 - } 106 - 107 - BUILD_BUG_ON(sizeof(counter) != 1); 108 - if (okmlen - i < hashlen) { 109 - err = crypto_shash_finup(desc, &counter, 1, tmp); 110 - if (err) 111 - goto out; 112 - memcpy(&okm[i], tmp, okmlen - i); 113 - memzero_explicit(tmp, sizeof(tmp)); 114 - } else { 115 - err = crypto_shash_finup(desc, &counter, 1, &okm[i]); 116 - if (err) 117 - goto out; 118 - } 119 - counter++; 120 - prev = &okm[i]; 121 - } 122 - err = 0; 123 - out: 124 - if (unlikely(err)) 125 - memzero_explicit(okm, okmlen); /* so caller doesn't need to */ 126 - shash_desc_zero(desc); 127 - memzero_explicit(tmp, HASH_MAX_DIGESTSIZE); 128 - return err; 129 - } 130 - EXPORT_SYMBOL_GPL(hkdf_expand); 131 - 132 - struct hkdf_testvec { 133 - const char *test; 134 - const u8 *ikm; 135 - const u8 *salt; 136 - const u8 *info; 137 - const u8 *prk; 138 - const u8 *okm; 139 - u16 ikm_size; 140 - u16 salt_size; 141 - u16 info_size; 142 - u16 prk_size; 143 - u16 okm_size; 144 - }; 145 - 146 - /* 147 - * HKDF test vectors from RFC5869 148 - * 149 - * Additional HKDF test vectors from 150 - * https://github.com/brycx/Test-Vector-Generation/blob/master/HKDF/hkdf-hmac-sha2-test-vectors.md 151 - */ 152 - static const struct hkdf_testvec hkdf_sha256_tv[] = { 153 - { 154 - .test = "basic hdkf test", 155 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" 156 - "\x0b\x0b\x0b\x0b\x0b\x0b", 157 - .ikm_size = 22, 158 - .salt = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c", 159 - .salt_size = 13, 160 - .info = "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9", 161 - .info_size = 10, 162 - .prk = "\x07\x77\x09\x36\x2c\x2e\x32\xdf\x0d\xdc\x3f\x0d\xc4\x7b\xba\x63" 163 - "\x90\xb6\xc7\x3b\xb5\x0f\x9c\x31\x22\xec\x84\x4a\xd7\xc2\xb3\xe5", 164 - .prk_size = 32, 165 - .okm = "\x3c\xb2\x5f\x25\xfa\xac\xd5\x7a\x90\x43\x4f\x64\xd0\x36\x2f\x2a" 166 - "\x2d\x2d\x0a\x90\xcf\x1a\x5a\x4c\x5d\xb0\x2d\x56\xec\xc4\xc5\xbf" 167 - "\x34\x00\x72\x08\xd5\xb8\x87\x18\x58\x65", 168 - .okm_size = 42, 169 - }, { 170 - .test = "hkdf test with long input", 171 - .ikm = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" 172 - "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" 173 - "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" 174 - "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" 175 - "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f", 176 - .ikm_size = 80, 177 - .salt = "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" 178 - "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" 179 - "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" 180 - "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" 181 - "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf", 182 - .salt_size = 80, 183 - .info = "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" 184 - "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" 185 - "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" 186 - "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef" 187 - "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 188 - .info_size = 80, 189 - .prk = "\x06\xa6\xb8\x8c\x58\x53\x36\x1a\x06\x10\x4c\x9c\xeb\x35\xb4\x5c" 190 - "\xef\x76\x00\x14\x90\x46\x71\x01\x4a\x19\x3f\x40\xc1\x5f\xc2\x44", 191 - .prk_size = 32, 192 - .okm = "\xb1\x1e\x39\x8d\xc8\x03\x27\xa1\xc8\xe7\xf7\x8c\x59\x6a\x49\x34" 193 - "\x4f\x01\x2e\xda\x2d\x4e\xfa\xd8\xa0\x50\xcc\x4c\x19\xaf\xa9\x7c" 194 - "\x59\x04\x5a\x99\xca\xc7\x82\x72\x71\xcb\x41\xc6\x5e\x59\x0e\x09" 195 - "\xda\x32\x75\x60\x0c\x2f\x09\xb8\x36\x77\x93\xa9\xac\xa3\xdb\x71" 196 - "\xcc\x30\xc5\x81\x79\xec\x3e\x87\xc1\x4c\x01\xd5\xc1\xf3\x43\x4f" 197 - "\x1d\x87", 198 - .okm_size = 82, 199 - }, { 200 - .test = "hkdf test with zero salt and info", 201 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" 202 - "\x0b\x0b\x0b\x0b\x0b\x0b", 203 - .ikm_size = 22, 204 - .salt = NULL, 205 - .salt_size = 0, 206 - .info = NULL, 207 - .info_size = 0, 208 - .prk = "\x19\xef\x24\xa3\x2c\x71\x7b\x16\x7f\x33\xa9\x1d\x6f\x64\x8b\xdf" 209 - "\x96\x59\x67\x76\xaf\xdb\x63\x77\xac\x43\x4c\x1c\x29\x3c\xcb\x04", 210 - .prk_size = 32, 211 - .okm = "\x8d\xa4\xe7\x75\xa5\x63\xc1\x8f\x71\x5f\x80\x2a\x06\x3c\x5a\x31" 212 - "\xb8\xa1\x1f\x5c\x5e\xe1\x87\x9e\xc3\x45\x4e\x5f\x3c\x73\x8d\x2d" 213 - "\x9d\x20\x13\x95\xfa\xa4\xb6\x1a\x96\xc8", 214 - .okm_size = 42, 215 - }, { 216 - .test = "hkdf test with short input", 217 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b", 218 - .ikm_size = 11, 219 - .salt = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c", 220 - .salt_size = 13, 221 - .info = "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9", 222 - .info_size = 10, 223 - .prk = "\x82\x65\xf6\x9d\x7f\xf7\xe5\x01\x37\x93\x01\x5c\xa0\xef\x92\x0c" 224 - "\xb1\x68\x21\x99\xc8\xbc\x3a\x00\xda\x0c\xab\x47\xb7\xb0\x0f\xdf", 225 - .prk_size = 32, 226 - .okm = "\x58\xdc\xe1\x0d\x58\x01\xcd\xfd\xa8\x31\x72\x6b\xfe\xbc\xb7\x43" 227 - "\xd1\x4a\x7e\xe8\x3a\xa0\x57\xa9\x3d\x59\xb0\xa1\x31\x7f\xf0\x9d" 228 - "\x10\x5c\xce\xcf\x53\x56\x92\xb1\x4d\xd5", 229 - .okm_size = 42, 230 - }, { 231 - .test = "unsalted hkdf test with zero info", 232 - .ikm = "\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c" 233 - "\x0c\x0c\x0c\x0c\x0c\x0c", 234 - .ikm_size = 22, 235 - .salt = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" 236 - "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 237 - .salt_size = 32, 238 - .info = NULL, 239 - .info_size = 0, 240 - .prk = "\xaa\x84\x1e\x1f\x35\x74\xf3\x2d\x13\xfb\xa8\x00\x5f\xcd\x9b\x8d" 241 - "\x77\x67\x82\xa5\xdf\xa1\x92\x38\x92\xfd\x8b\x63\x5d\x3a\x89\xdf", 242 - .prk_size = 32, 243 - .okm = "\x59\x68\x99\x17\x9a\xb1\xbc\x00\xa7\xc0\x37\x86\xff\x43\xee\x53" 244 - "\x50\x04\xbe\x2b\xb9\xbe\x68\xbc\x14\x06\x63\x6f\x54\xbd\x33\x8a" 245 - "\x66\xa2\x37\xba\x2a\xcb\xce\xe3\xc9\xa7", 246 - .okm_size = 42, 247 - } 248 - }; 249 - 250 - static const struct hkdf_testvec hkdf_sha384_tv[] = { 251 - { 252 - .test = "basic hkdf test", 253 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" 254 - "\x0b\x0b\x0b\x0b\x0b\x0b", 255 - .ikm_size = 22, 256 - .salt = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c", 257 - .salt_size = 13, 258 - .info = "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9", 259 - .info_size = 10, 260 - .prk = "\x70\x4b\x39\x99\x07\x79\xce\x1d\xc5\x48\x05\x2c\x7d\xc3\x9f\x30" 261 - "\x35\x70\xdd\x13\xfb\x39\xf7\xac\xc5\x64\x68\x0b\xef\x80\xe8\xde" 262 - "\xc7\x0e\xe9\xa7\xe1\xf3\xe2\x93\xef\x68\xec\xeb\x07\x2a\x5a\xde", 263 - .prk_size = 48, 264 - .okm = "\x9b\x50\x97\xa8\x60\x38\xb8\x05\x30\x90\x76\xa4\x4b\x3a\x9f\x38" 265 - "\x06\x3e\x25\xb5\x16\xdc\xbf\x36\x9f\x39\x4c\xfa\xb4\x36\x85\xf7" 266 - "\x48\xb6\x45\x77\x63\xe4\xf0\x20\x4f\xc5", 267 - .okm_size = 42, 268 - }, { 269 - .test = "hkdf test with long input", 270 - .ikm = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" 271 - "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" 272 - "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" 273 - "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" 274 - "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f", 275 - .ikm_size = 80, 276 - .salt = "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" 277 - "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" 278 - "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" 279 - "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" 280 - "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf", 281 - .salt_size = 80, 282 - .info = "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" 283 - "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" 284 - "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" 285 - "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef" 286 - "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 287 - .info_size = 80, 288 - .prk = "\xb3\x19\xf6\x83\x1d\xff\x93\x14\xef\xb6\x43\xba\xa2\x92\x63\xb3" 289 - "\x0e\x4a\x8d\x77\x9f\xe3\x1e\x9c\x90\x1e\xfd\x7d\xe7\x37\xc8\x5b" 290 - "\x62\xe6\x76\xd4\xdc\x87\xb0\x89\x5c\x6a\x7d\xc9\x7b\x52\xce\xbb", 291 - .prk_size = 48, 292 - .okm = "\x48\x4c\xa0\x52\xb8\xcc\x72\x4f\xd1\xc4\xec\x64\xd5\x7b\x4e\x81" 293 - "\x8c\x7e\x25\xa8\xe0\xf4\x56\x9e\xd7\x2a\x6a\x05\xfe\x06\x49\xee" 294 - "\xbf\x69\xf8\xd5\xc8\x32\x85\x6b\xf4\xe4\xfb\xc1\x79\x67\xd5\x49" 295 - "\x75\x32\x4a\x94\x98\x7f\x7f\x41\x83\x58\x17\xd8\x99\x4f\xdb\xd6" 296 - "\xf4\xc0\x9c\x55\x00\xdc\xa2\x4a\x56\x22\x2f\xea\x53\xd8\x96\x7a" 297 - "\x8b\x2e", 298 - .okm_size = 82, 299 - }, { 300 - .test = "hkdf test with zero salt and info", 301 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" 302 - "\x0b\x0b\x0b\x0b\x0b\x0b", 303 - .ikm_size = 22, 304 - .salt = NULL, 305 - .salt_size = 0, 306 - .info = NULL, 307 - .info_size = 0, 308 - .prk = "\x10\xe4\x0c\xf0\x72\xa4\xc5\x62\x6e\x43\xdd\x22\xc1\xcf\x72\x7d" 309 - "\x4b\xb1\x40\x97\x5c\x9a\xd0\xcb\xc8\xe4\x5b\x40\x06\x8f\x8f\x0b" 310 - "\xa5\x7c\xdb\x59\x8a\xf9\xdf\xa6\x96\x3a\x96\x89\x9a\xf0\x47\xe5", 311 - .prk_size = 48, 312 - .okm = "\xc8\xc9\x6e\x71\x0f\x89\xb0\xd7\x99\x0b\xca\x68\xbc\xde\xc8\xcf" 313 - "\x85\x40\x62\xe5\x4c\x73\xa7\xab\xc7\x43\xfa\xde\x9b\x24\x2d\xaa" 314 - "\xcc\x1c\xea\x56\x70\x41\x5b\x52\x84\x9c", 315 - .okm_size = 42, 316 - }, { 317 - .test = "hkdf test with short input", 318 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b", 319 - .ikm_size = 11, 320 - .salt = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c", 321 - .salt_size = 13, 322 - .info = "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9", 323 - .info_size = 10, 324 - .prk = "\x6d\x31\x69\x98\x28\x79\x80\x88\xb3\x59\xda\xd5\x0b\x8f\x01\xb0" 325 - "\x15\xf1\x7a\xa3\xbd\x4e\x27\xa6\xe9\xf8\x73\xb7\x15\x85\xca\x6a" 326 - "\x00\xd1\xf0\x82\x12\x8a\xdb\x3c\xf0\x53\x0b\x57\xc0\xf9\xac\x72", 327 - .prk_size = 48, 328 - .okm = "\xfb\x7e\x67\x43\xeb\x42\xcd\xe9\x6f\x1b\x70\x77\x89\x52\xab\x75" 329 - "\x48\xca\xfe\x53\x24\x9f\x7f\xfe\x14\x97\xa1\x63\x5b\x20\x1f\xf1" 330 - "\x85\xb9\x3e\x95\x19\x92\xd8\x58\xf1\x1a", 331 - .okm_size = 42, 332 - }, { 333 - .test = "unsalted hkdf test with zero info", 334 - .ikm = "\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c" 335 - "\x0c\x0c\x0c\x0c\x0c\x0c", 336 - .ikm_size = 22, 337 - .salt = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" 338 - "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" 339 - "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 340 - .salt_size = 48, 341 - .info = NULL, 342 - .info_size = 0, 343 - .prk = "\x9d\x2d\xa5\x06\x6f\x05\xd1\x6c\x59\xfe\xdf\x6c\x5f\x32\xc7\x5e" 344 - "\xda\x9a\x47\xa7\x9c\x93\x6a\xa4\x4c\xb7\x63\xa8\xe2\x2f\xfb\xfc" 345 - "\xd8\xfe\x55\x43\x58\x53\x47\x21\x90\x39\xd1\x68\x28\x36\x33\xf5", 346 - .prk_size = 48, 347 - .okm = "\x6a\xd7\xc7\x26\xc8\x40\x09\x54\x6a\x76\xe0\x54\x5d\xf2\x66\x78" 348 - "\x7e\x2b\x2c\xd6\xca\x43\x73\xa1\xf3\x14\x50\xa7\xbd\xf9\x48\x2b" 349 - "\xfa\xb8\x11\xf5\x54\x20\x0e\xad\x8f\x53", 350 - .okm_size = 42, 351 - } 352 - }; 353 - 354 - static const struct hkdf_testvec hkdf_sha512_tv[] = { 355 - { 356 - .test = "basic hkdf test", 357 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" 358 - "\x0b\x0b\x0b\x0b\x0b\x0b", 359 - .ikm_size = 22, 360 - .salt = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c", 361 - .salt_size = 13, 362 - .info = "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9", 363 - .info_size = 10, 364 - .prk = "\x66\x57\x99\x82\x37\x37\xde\xd0\x4a\x88\xe4\x7e\x54\xa5\x89\x0b" 365 - "\xb2\xc3\xd2\x47\xc7\xa4\x25\x4a\x8e\x61\x35\x07\x23\x59\x0a\x26" 366 - "\xc3\x62\x38\x12\x7d\x86\x61\xb8\x8c\xf8\x0e\xf8\x02\xd5\x7e\x2f" 367 - "\x7c\xeb\xcf\x1e\x00\xe0\x83\x84\x8b\xe1\x99\x29\xc6\x1b\x42\x37", 368 - .prk_size = 64, 369 - .okm = "\x83\x23\x90\x08\x6c\xda\x71\xfb\x47\x62\x5b\xb5\xce\xb1\x68\xe4" 370 - "\xc8\xe2\x6a\x1a\x16\xed\x34\xd9\xfc\x7f\xe9\x2c\x14\x81\x57\x93" 371 - "\x38\xda\x36\x2c\xb8\xd9\xf9\x25\xd7\xcb", 372 - .okm_size = 42, 373 - }, { 374 - .test = "hkdf test with long input", 375 - .ikm = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" 376 - "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" 377 - "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" 378 - "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" 379 - "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f", 380 - .ikm_size = 80, 381 - .salt = "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" 382 - "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" 383 - "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" 384 - "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" 385 - "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf", 386 - .salt_size = 80, 387 - .info = "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" 388 - "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" 389 - "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" 390 - "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef" 391 - "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff", 392 - .info_size = 80, 393 - .prk = "\x35\x67\x25\x42\x90\x7d\x4e\x14\x2c\x00\xe8\x44\x99\xe7\x4e\x1d" 394 - "\xe0\x8b\xe8\x65\x35\xf9\x24\xe0\x22\x80\x4a\xd7\x75\xdd\xe2\x7e" 395 - "\xc8\x6c\xd1\xe5\xb7\xd1\x78\xc7\x44\x89\xbd\xbe\xb3\x07\x12\xbe" 396 - "\xb8\x2d\x4f\x97\x41\x6c\x5a\x94\xea\x81\xeb\xdf\x3e\x62\x9e\x4a", 397 - .prk_size = 64, 398 - .okm = "\xce\x6c\x97\x19\x28\x05\xb3\x46\xe6\x16\x1e\x82\x1e\xd1\x65\x67" 399 - "\x3b\x84\xf4\x00\xa2\xb5\x14\xb2\xfe\x23\xd8\x4c\xd1\x89\xdd\xf1" 400 - "\xb6\x95\xb4\x8c\xbd\x1c\x83\x88\x44\x11\x37\xb3\xce\x28\xf1\x6a" 401 - "\xa6\x4b\xa3\x3b\xa4\x66\xb2\x4d\xf6\xcf\xcb\x02\x1e\xcf\xf2\x35" 402 - "\xf6\xa2\x05\x6c\xe3\xaf\x1d\xe4\x4d\x57\x20\x97\xa8\x50\x5d\x9e" 403 - "\x7a\x93", 404 - .okm_size = 82, 405 - }, { 406 - .test = "hkdf test with zero salt and info", 407 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b" 408 - "\x0b\x0b\x0b\x0b\x0b\x0b", 409 - .ikm_size = 22, 410 - .salt = NULL, 411 - .salt_size = 0, 412 - .info = NULL, 413 - .info_size = 0, 414 - .prk = "\xfd\x20\x0c\x49\x87\xac\x49\x13\x13\xbd\x4a\x2a\x13\x28\x71\x21" 415 - "\x24\x72\x39\xe1\x1c\x9e\xf8\x28\x02\x04\x4b\x66\xef\x35\x7e\x5b" 416 - "\x19\x44\x98\xd0\x68\x26\x11\x38\x23\x48\x57\x2a\x7b\x16\x11\xde" 417 - "\x54\x76\x40\x94\x28\x63\x20\x57\x8a\x86\x3f\x36\x56\x2b\x0d\xf6", 418 - .prk_size = 64, 419 - .okm = "\xf5\xfa\x02\xb1\x82\x98\xa7\x2a\x8c\x23\x89\x8a\x87\x03\x47\x2c" 420 - "\x6e\xb1\x79\xdc\x20\x4c\x03\x42\x5c\x97\x0e\x3b\x16\x4b\xf9\x0f" 421 - "\xff\x22\xd0\x48\x36\xd0\xe2\x34\x3b\xac", 422 - .okm_size = 42, 423 - }, { 424 - .test = "hkdf test with short input", 425 - .ikm = "\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b\x0b", 426 - .ikm_size = 11, 427 - .salt = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c", 428 - .salt_size = 13, 429 - .info = "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9", 430 - .info_size = 10, 431 - .prk = "\x67\x40\x9c\x9c\xac\x28\xb5\x2e\xe9\xfa\xd9\x1c\x2f\xda\x99\x9f" 432 - "\x7c\xa2\x2e\x34\x34\xf0\xae\x77\x28\x63\x83\x65\x68\xad\x6a\x7f" 433 - "\x10\xcf\x11\x3b\xfd\xdd\x56\x01\x29\xa5\x94\xa8\xf5\x23\x85\xc2" 434 - "\xd6\x61\xd7\x85\xd2\x9c\xe9\x3a\x11\x40\x0c\x92\x06\x83\x18\x1d", 435 - .prk_size = 64, 436 - .okm = "\x74\x13\xe8\x99\x7e\x02\x06\x10\xfb\xf6\x82\x3f\x2c\xe1\x4b\xff" 437 - "\x01\x87\x5d\xb1\xca\x55\xf6\x8c\xfc\xf3\x95\x4d\xc8\xaf\xf5\x35" 438 - "\x59\xbd\x5e\x30\x28\xb0\x80\xf7\xc0\x68", 439 - .okm_size = 42, 440 - }, { 441 - .test = "unsalted hkdf test with zero info", 442 - .ikm = "\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c" 443 - "\x0c\x0c\x0c\x0c\x0c\x0c", 444 - .ikm_size = 22, 445 - .salt = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" 446 - "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" 447 - "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" 448 - "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", 449 - .salt_size = 64, 450 - .info = NULL, 451 - .info_size = 0, 452 - .prk = "\x53\x46\xb3\x76\xbf\x3a\xa9\xf8\x4f\x8f\x6e\xd5\xb1\xc4\xf4\x89" 453 - "\x17\x2e\x24\x4d\xac\x30\x3d\x12\xf6\x8e\xcc\x76\x6e\xa6\x00\xaa" 454 - "\x88\x49\x5e\x7f\xb6\x05\x80\x31\x22\xfa\x13\x69\x24\xa8\x40\xb1" 455 - "\xf0\x71\x9d\x2d\x5f\x68\xe2\x9b\x24\x22\x99\xd7\x58\xed\x68\x0c", 456 - .prk_size = 64, 457 - .okm = "\x14\x07\xd4\x60\x13\xd9\x8b\xc6\xde\xce\xfc\xfe\xe5\x5f\x0f\x90" 458 - "\xb0\xc7\xf6\x3d\x68\xeb\x1a\x80\xea\xf0\x7e\x95\x3c\xfc\x0a\x3a" 459 - "\x52\x40\xa1\x55\xd6\xe4\xda\xa9\x65\xbb", 460 - .okm_size = 42, 461 - } 462 - }; 463 - 464 - static int hkdf_test(const char *shash, const struct hkdf_testvec *tv) 465 - { struct crypto_shash *tfm = NULL; 466 - u8 *prk = NULL, *okm = NULL; 467 - unsigned int prk_size; 468 - const char *driver; 469 - int err; 470 - 471 - tfm = crypto_alloc_shash(shash, 0, 0); 472 - if (IS_ERR(tfm)) { 473 - pr_err("%s(%s): failed to allocate transform: %ld\n", 474 - tv->test, shash, PTR_ERR(tfm)); 475 - return PTR_ERR(tfm); 476 - } 477 - driver = crypto_shash_driver_name(tfm); 478 - 479 - prk_size = crypto_shash_digestsize(tfm); 480 - prk = kzalloc(prk_size, GFP_KERNEL); 481 - if (!prk) { 482 - err = -ENOMEM; 483 - goto out_free; 484 - } 485 - 486 - if (tv->prk_size != prk_size) { 487 - pr_err("%s(%s): prk size mismatch (vec %u, digest %u\n", 488 - tv->test, driver, tv->prk_size, prk_size); 489 - err = -EINVAL; 490 - goto out_free; 491 - } 492 - 493 - err = hkdf_extract(tfm, tv->ikm, tv->ikm_size, 494 - tv->salt, tv->salt_size, prk); 495 - if (err) { 496 - pr_err("%s(%s): hkdf_extract failed with %d\n", 497 - tv->test, driver, err); 498 - goto out_free; 499 - } 500 - 501 - if (memcmp(prk, tv->prk, tv->prk_size)) { 502 - pr_err("%s(%s): hkdf_extract prk mismatch\n", 503 - tv->test, driver); 504 - print_hex_dump(KERN_ERR, "prk: ", DUMP_PREFIX_NONE, 505 - 16, 1, prk, tv->prk_size, false); 506 - err = -EINVAL; 507 - goto out_free; 508 - } 509 - 510 - okm = kzalloc(tv->okm_size, GFP_KERNEL); 511 - if (!okm) { 512 - err = -ENOMEM; 513 - goto out_free; 514 - } 515 - 516 - err = crypto_shash_setkey(tfm, tv->prk, tv->prk_size); 517 - if (err) { 518 - pr_err("%s(%s): failed to set prk, error %d\n", 519 - tv->test, driver, err); 520 - goto out_free; 521 - } 522 - 523 - err = hkdf_expand(tfm, tv->info, tv->info_size, 524 - okm, tv->okm_size); 525 - if (err) { 526 - pr_err("%s(%s): hkdf_expand() failed with %d\n", 527 - tv->test, driver, err); 528 - } else if (memcmp(okm, tv->okm, tv->okm_size)) { 529 - pr_err("%s(%s): hkdf_expand() okm mismatch\n", 530 - tv->test, driver); 531 - print_hex_dump(KERN_ERR, "okm: ", DUMP_PREFIX_NONE, 532 - 16, 1, okm, tv->okm_size, false); 533 - err = -EINVAL; 534 - } 535 - out_free: 536 - kfree(okm); 537 - kfree(prk); 538 - crypto_free_shash(tfm); 539 - return err; 540 - } 541 - 542 - static int __init crypto_hkdf_module_init(void) 543 - { 544 - int ret = 0, i; 545 - 546 - if (!IS_ENABLED(CONFIG_CRYPTO_SELFTESTS)) 547 - return 0; 548 - 549 - for (i = 0; i < ARRAY_SIZE(hkdf_sha256_tv); i++) { 550 - ret = hkdf_test("hmac(sha256)", &hkdf_sha256_tv[i]); 551 - if (ret) 552 - return ret; 553 - } 554 - for (i = 0; i < ARRAY_SIZE(hkdf_sha384_tv); i++) { 555 - ret = hkdf_test("hmac(sha384)", &hkdf_sha384_tv[i]); 556 - if (ret) 557 - return ret; 558 - } 559 - for (i = 0; i < ARRAY_SIZE(hkdf_sha512_tv); i++) { 560 - ret = hkdf_test("hmac(sha512)", &hkdf_sha512_tv[i]); 561 - if (ret) 562 - return ret; 563 - } 564 - return 0; 565 - } 566 - 567 - static void __exit crypto_hkdf_module_exit(void) {} 568 - 569 - late_initcall(crypto_hkdf_module_init); 570 - module_exit(crypto_hkdf_module_exit); 571 - 572 - MODULE_LICENSE("GPL"); 573 - MODULE_DESCRIPTION("HMAC-based Key Derivation Function (HKDF)");
-1
drivers/block/drbd/Makefile
··· 3 3 drbd-y += drbd_worker.o drbd_receiver.o drbd_req.o drbd_actlog.o 4 4 drbd-y += drbd_main.o drbd_strings.o drbd_nl.o 5 5 drbd-y += drbd_interval.o drbd_state.o 6 - drbd-y += drbd_nla.o 7 6 drbd-$(CONFIG_DEBUG_FS) += drbd_debugfs.o 8 7 9 8 obj-$(CONFIG_BLK_DEV_DRBD) += drbd.o
+2 -2
drivers/block/drbd/drbd_main.c
··· 874 874 if (uuid && uuid != UUID_JUST_CREATED) 875 875 uuid = uuid + UUID_NEW_BM_OFFSET; 876 876 else 877 - get_random_bytes(&uuid, sizeof(u64)); 877 + uuid = get_random_u64(); 878 878 drbd_uuid_set(device, UI_BITMAP, uuid); 879 879 drbd_print_uuids(device, "updated sync UUID"); 880 880 drbd_md_sync(device); ··· 3337 3337 u64 val; 3338 3338 unsigned long long bm_uuid; 3339 3339 3340 - get_random_bytes(&val, sizeof(u64)); 3340 + val = get_random_u64(); 3341 3341 3342 3342 spin_lock_irq(&device->ldev->md.uuid_lock); 3343 3343 bm_uuid = device->ldev->md.uuid[UI_BITMAP];
+333 -269
drivers/block/drbd/drbd_nl.c
··· 74 74 int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb); 75 75 76 76 #include <linux/drbd_genl_api.h> 77 - #include "drbd_nla.h" 77 + 78 + static int drbd_pre_doit(const struct genl_split_ops *ops, 79 + struct sk_buff *skb, struct genl_info *info); 80 + static void drbd_post_doit(const struct genl_split_ops *ops, 81 + struct sk_buff *skb, struct genl_info *info); 82 + 83 + #define GENL_MAGIC_FAMILY_PRE_DOIT drbd_pre_doit 84 + #define GENL_MAGIC_FAMILY_POST_DOIT drbd_post_doit 85 + 78 86 #include <linux/genl_magic_func.h> 79 87 80 88 static atomic_t drbd_genl_seq = ATOMIC_INIT(2); /* two. */ ··· 152 144 return 0; 153 145 } 154 146 155 - /* This would be a good candidate for a "pre_doit" hook, 156 - * and per-family private info->pointers. 157 - * But we need to stay compatible with older kernels. 158 - * If it returns successfully, adm_ctx members are valid. 159 - * 147 + /* Flags for drbd_adm_prepare() */ 148 + #define DRBD_ADM_NEED_MINOR (1 << 0) 149 + #define DRBD_ADM_NEED_RESOURCE (1 << 1) 150 + #define DRBD_ADM_NEED_CONNECTION (1 << 2) 151 + 152 + /* Per-command flags for drbd_pre_doit() */ 153 + static const unsigned int drbd_genl_cmd_flags[] = { 154 + [DRBD_ADM_GET_STATUS] = DRBD_ADM_NEED_MINOR, 155 + [DRBD_ADM_NEW_MINOR] = DRBD_ADM_NEED_RESOURCE, 156 + [DRBD_ADM_DEL_MINOR] = DRBD_ADM_NEED_MINOR, 157 + [DRBD_ADM_NEW_RESOURCE] = 0, 158 + [DRBD_ADM_DEL_RESOURCE] = DRBD_ADM_NEED_RESOURCE, 159 + [DRBD_ADM_RESOURCE_OPTS] = DRBD_ADM_NEED_RESOURCE, 160 + [DRBD_ADM_CONNECT] = DRBD_ADM_NEED_RESOURCE, 161 + [DRBD_ADM_CHG_NET_OPTS] = DRBD_ADM_NEED_CONNECTION, 162 + [DRBD_ADM_DISCONNECT] = DRBD_ADM_NEED_CONNECTION, 163 + [DRBD_ADM_ATTACH] = DRBD_ADM_NEED_MINOR, 164 + [DRBD_ADM_CHG_DISK_OPTS] = DRBD_ADM_NEED_MINOR, 165 + [DRBD_ADM_RESIZE] = DRBD_ADM_NEED_MINOR, 166 + [DRBD_ADM_PRIMARY] = DRBD_ADM_NEED_MINOR, 167 + [DRBD_ADM_SECONDARY] = DRBD_ADM_NEED_MINOR, 168 + [DRBD_ADM_NEW_C_UUID] = DRBD_ADM_NEED_MINOR, 169 + [DRBD_ADM_START_OV] = DRBD_ADM_NEED_MINOR, 170 + [DRBD_ADM_DETACH] = DRBD_ADM_NEED_MINOR, 171 + [DRBD_ADM_INVALIDATE] = DRBD_ADM_NEED_MINOR, 172 + [DRBD_ADM_INVAL_PEER] = DRBD_ADM_NEED_MINOR, 173 + [DRBD_ADM_PAUSE_SYNC] = DRBD_ADM_NEED_MINOR, 174 + [DRBD_ADM_RESUME_SYNC] = DRBD_ADM_NEED_MINOR, 175 + [DRBD_ADM_SUSPEND_IO] = DRBD_ADM_NEED_MINOR, 176 + [DRBD_ADM_RESUME_IO] = DRBD_ADM_NEED_MINOR, 177 + [DRBD_ADM_OUTDATE] = DRBD_ADM_NEED_MINOR, 178 + [DRBD_ADM_GET_TIMEOUT_TYPE] = DRBD_ADM_NEED_MINOR, 179 + [DRBD_ADM_DOWN] = DRBD_ADM_NEED_RESOURCE, 180 + }; 181 + 182 + /* 160 183 * At this point, we still rely on the global genl_lock(). 161 184 * If we want to avoid that, and allow "genl_family.parallel_ops", we may need 162 185 * to add additional synchronization against object destruction/modification. 163 186 */ 164 - #define DRBD_ADM_NEED_MINOR 1 165 - #define DRBD_ADM_NEED_RESOURCE 2 166 - #define DRBD_ADM_NEED_CONNECTION 4 167 187 static int drbd_adm_prepare(struct drbd_config_context *adm_ctx, 168 188 struct sk_buff *skb, struct genl_info *info, unsigned flags) 169 189 { 170 190 struct drbd_genlmsghdr *d_in = genl_info_userhdr(info); 171 191 const u8 cmd = info->genlhdr->cmd; 172 192 int err; 173 - 174 - memset(adm_ctx, 0, sizeof(*adm_ctx)); 175 193 176 194 /* genl_rcv_msg only checks for CAP_NET_ADMIN on "GENL_ADMIN_PERM" :( */ 177 195 if (cmd != DRBD_ADM_GET_STATUS && !capable(CAP_NET_ADMIN)) ··· 238 204 goto fail; 239 205 240 206 /* and assign stuff to the adm_ctx */ 241 - nla = nested_attr_tb[__nla_type(T_ctx_volume)]; 207 + nla = nested_attr_tb[T_ctx_volume]; 242 208 if (nla) 243 209 adm_ctx->volume = nla_get_u32(nla); 244 - nla = nested_attr_tb[__nla_type(T_ctx_resource_name)]; 210 + nla = nested_attr_tb[T_ctx_resource_name]; 245 211 if (nla) 246 212 adm_ctx->resource_name = nla_data(nla); 247 - adm_ctx->my_addr = nested_attr_tb[__nla_type(T_ctx_my_addr)]; 248 - adm_ctx->peer_addr = nested_attr_tb[__nla_type(T_ctx_peer_addr)]; 213 + adm_ctx->my_addr = nested_attr_tb[T_ctx_my_addr]; 214 + adm_ctx->peer_addr = nested_attr_tb[T_ctx_peer_addr]; 249 215 if ((adm_ctx->my_addr && 250 216 nla_len(adm_ctx->my_addr) > sizeof(adm_ctx->connection->my_addr)) || 251 217 (adm_ctx->peer_addr && ··· 334 300 return err; 335 301 } 336 302 337 - static int drbd_adm_finish(struct drbd_config_context *adm_ctx, 338 - struct genl_info *info, int retcode) 303 + static int drbd_pre_doit(const struct genl_split_ops *ops, 304 + struct sk_buff *skb, struct genl_info *info) 339 305 { 306 + struct drbd_config_context *adm_ctx; 307 + u8 cmd = info->genlhdr->cmd; 308 + unsigned int flags; 309 + int err; 310 + 311 + adm_ctx = kzalloc_obj(*adm_ctx); 312 + if (!adm_ctx) 313 + return -ENOMEM; 314 + 315 + flags = (cmd < ARRAY_SIZE(drbd_genl_cmd_flags)) 316 + ? drbd_genl_cmd_flags[cmd] : 0; 317 + 318 + err = drbd_adm_prepare(adm_ctx, skb, info, flags); 319 + if (err && !adm_ctx->reply_skb) { 320 + /* Fatal error before reply_skb was allocated. */ 321 + kfree(adm_ctx); 322 + return err; 323 + } 324 + if (err) 325 + adm_ctx->reply_dh->ret_code = err; 326 + 327 + info->user_ptr[0] = adm_ctx; 328 + return 0; 329 + } 330 + 331 + static void drbd_post_doit(const struct genl_split_ops *ops, 332 + struct sk_buff *skb, struct genl_info *info) 333 + { 334 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 335 + 336 + if (!adm_ctx) 337 + return; 338 + 339 + if (adm_ctx->reply_skb) 340 + drbd_adm_send_reply(adm_ctx->reply_skb, info); 341 + 340 342 if (adm_ctx->device) { 341 343 kref_put(&adm_ctx->device->kref, drbd_destroy_device); 342 344 adm_ctx->device = NULL; ··· 386 316 adm_ctx->resource = NULL; 387 317 } 388 318 389 - if (!adm_ctx->reply_skb) 390 - return -ENOMEM; 391 - 392 - adm_ctx->reply_dh->ret_code = retcode; 393 - drbd_adm_send_reply(adm_ctx->reply_skb, info); 394 - return 0; 319 + kfree(adm_ctx); 395 320 } 396 321 397 322 static void setup_khelper_env(struct drbd_connection *connection, char **envp) ··· 824 759 static const char *from_attrs_err_to_txt(int err) 825 760 { 826 761 return err == -ENOMSG ? "required attribute missing" : 827 - err == -EOPNOTSUPP ? "unknown mandatory attribute" : 828 762 err == -EEXIST ? "can not change invariant setting" : 829 763 "invalid attribute value"; 830 764 } 831 765 832 766 int drbd_adm_set_role(struct sk_buff *skb, struct genl_info *info) 833 767 { 834 - struct drbd_config_context adm_ctx; 768 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 835 769 struct set_role_parms parms; 836 770 int err; 837 771 enum drbd_ret_code retcode; 838 772 enum drbd_state_rv rv; 839 773 840 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 841 - if (!adm_ctx.reply_skb) 842 - return retcode; 774 + if (!adm_ctx->reply_skb) 775 + return 0; 776 + retcode = adm_ctx->reply_dh->ret_code; 843 777 if (retcode != NO_ERROR) 844 778 goto out; 845 779 ··· 847 783 err = set_role_parms_from_attrs(&parms, info); 848 784 if (err) { 849 785 retcode = ERR_MANDATORY_TAG; 850 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 786 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 851 787 goto out; 852 788 } 853 789 } 854 790 genl_unlock(); 855 - mutex_lock(&adm_ctx.resource->adm_mutex); 791 + mutex_lock(&adm_ctx->resource->adm_mutex); 856 792 857 793 if (info->genlhdr->cmd == DRBD_ADM_PRIMARY) 858 - rv = drbd_set_role(adm_ctx.device, R_PRIMARY, parms.assume_uptodate); 794 + rv = drbd_set_role(adm_ctx->device, R_PRIMARY, parms.assume_uptodate); 859 795 else 860 - rv = drbd_set_role(adm_ctx.device, R_SECONDARY, 0); 796 + rv = drbd_set_role(adm_ctx->device, R_SECONDARY, 0); 861 797 862 - mutex_unlock(&adm_ctx.resource->adm_mutex); 798 + mutex_unlock(&adm_ctx->resource->adm_mutex); 863 799 genl_lock(); 864 - drbd_adm_finish(&adm_ctx, info, rv); 800 + adm_ctx->reply_dh->ret_code = rv; 865 801 return 0; 866 802 out: 867 - drbd_adm_finish(&adm_ctx, info, retcode); 803 + adm_ctx->reply_dh->ret_code = retcode; 868 804 return 0; 869 805 } 870 806 ··· 1576 1512 1577 1513 int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info) 1578 1514 { 1579 - struct drbd_config_context adm_ctx; 1515 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 1580 1516 enum drbd_ret_code retcode; 1581 1517 struct drbd_device *device; 1582 1518 struct disk_conf *new_disk_conf, *old_disk_conf; ··· 1584 1520 int err; 1585 1521 unsigned int fifo_size; 1586 1522 1587 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 1588 - if (!adm_ctx.reply_skb) 1589 - return retcode; 1523 + if (!adm_ctx->reply_skb) 1524 + return 0; 1525 + retcode = adm_ctx->reply_dh->ret_code; 1590 1526 if (retcode != NO_ERROR) 1591 1527 goto finish; 1592 1528 1593 - device = adm_ctx.device; 1594 - mutex_lock(&adm_ctx.resource->adm_mutex); 1529 + device = adm_ctx->device; 1530 + mutex_lock(&adm_ctx->resource->adm_mutex); 1595 1531 1596 1532 /* we also need a disk 1597 1533 * to change the options on */ ··· 1615 1551 err = disk_conf_from_attrs_for_change(new_disk_conf, info); 1616 1552 if (err && err != -ENOMSG) { 1617 1553 retcode = ERR_MANDATORY_TAG; 1618 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 1554 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 1619 1555 goto fail_unlock; 1620 1556 } 1621 1557 ··· 1641 1577 if (err) { 1642 1578 /* Could be just "busy". Ignore? 1643 1579 * Introduce dedicated error code? */ 1644 - drbd_msg_put_info(adm_ctx.reply_skb, 1580 + drbd_msg_put_info(adm_ctx->reply_skb, 1645 1581 "Try again without changing current al-extents setting"); 1646 1582 retcode = ERR_NOMEM; 1647 1583 goto fail_unlock; ··· 1704 1640 success: 1705 1641 put_ldev(device); 1706 1642 out: 1707 - mutex_unlock(&adm_ctx.resource->adm_mutex); 1643 + mutex_unlock(&adm_ctx->resource->adm_mutex); 1708 1644 finish: 1709 - drbd_adm_finish(&adm_ctx, info, retcode); 1645 + adm_ctx->reply_dh->ret_code = retcode; 1710 1646 return 0; 1711 1647 } 1712 1648 ··· 1798 1734 1799 1735 int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info) 1800 1736 { 1801 - struct drbd_config_context adm_ctx; 1737 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 1802 1738 struct drbd_device *device; 1803 1739 struct drbd_peer_device *peer_device; 1804 1740 struct drbd_connection *connection; ··· 1815 1751 enum drbd_state_rv rv; 1816 1752 struct net_conf *nc; 1817 1753 1818 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 1819 - if (!adm_ctx.reply_skb) 1820 - return retcode; 1754 + if (!adm_ctx->reply_skb) 1755 + return 0; 1756 + retcode = adm_ctx->reply_dh->ret_code; 1821 1757 if (retcode != NO_ERROR) 1822 1758 goto finish; 1823 1759 1824 - device = adm_ctx.device; 1825 - mutex_lock(&adm_ctx.resource->adm_mutex); 1760 + device = adm_ctx->device; 1761 + mutex_lock(&adm_ctx->resource->adm_mutex); 1826 1762 peer_device = first_peer_device(device); 1827 1763 connection = peer_device->connection; 1828 1764 conn_reconfig_start(connection); ··· 1867 1803 err = disk_conf_from_attrs(new_disk_conf, info); 1868 1804 if (err) { 1869 1805 retcode = ERR_MANDATORY_TAG; 1870 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 1806 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 1871 1807 goto fail; 1872 1808 } 1873 1809 ··· 2018 1954 drbd_warn(device, "truncating a consistent device during attach (%llu < %llu)\n", nsz, eff); 2019 1955 } else { 2020 1956 drbd_warn(device, "refusing to truncate a consistent device (%llu < %llu)\n", nsz, eff); 2021 - drbd_msg_sprintf_info(adm_ctx.reply_skb, 1957 + drbd_msg_sprintf_info(adm_ctx->reply_skb, 2022 1958 "To-be-attached device has last effective > current size, and is consistent\n" 2023 1959 "(%llu > %llu sectors). Refusing to attach.", eff, nsz); 2024 1960 retcode = ERR_IMPLICIT_SHRINK; ··· 2194 2130 kobject_uevent(&disk_to_dev(device->vdisk)->kobj, KOBJ_CHANGE); 2195 2131 put_ldev(device); 2196 2132 conn_reconfig_done(connection); 2197 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2198 - drbd_adm_finish(&adm_ctx, info, retcode); 2133 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2134 + adm_ctx->reply_dh->ret_code = retcode; 2199 2135 return 0; 2200 2136 2201 2137 force_diskless_dec: ··· 2214 2150 kfree(new_disk_conf); 2215 2151 lc_destroy(resync_lru); 2216 2152 kfree(new_plan); 2217 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2153 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2218 2154 finish: 2219 - drbd_adm_finish(&adm_ctx, info, retcode); 2155 + adm_ctx->reply_dh->ret_code = retcode; 2220 2156 return 0; 2221 2157 } 2222 2158 ··· 2238 2174 * Only then we have finally detached. */ 2239 2175 int drbd_adm_detach(struct sk_buff *skb, struct genl_info *info) 2240 2176 { 2241 - struct drbd_config_context adm_ctx; 2177 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 2242 2178 enum drbd_ret_code retcode; 2243 2179 struct detach_parms parms = { }; 2244 2180 int err; 2245 2181 2246 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 2247 - if (!adm_ctx.reply_skb) 2248 - return retcode; 2182 + if (!adm_ctx->reply_skb) 2183 + return 0; 2184 + retcode = adm_ctx->reply_dh->ret_code; 2249 2185 if (retcode != NO_ERROR) 2250 2186 goto out; 2251 2187 ··· 2253 2189 err = detach_parms_from_attrs(&parms, info); 2254 2190 if (err) { 2255 2191 retcode = ERR_MANDATORY_TAG; 2256 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 2192 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 2257 2193 goto out; 2258 2194 } 2259 2195 } 2260 2196 2261 - mutex_lock(&adm_ctx.resource->adm_mutex); 2262 - retcode = adm_detach(adm_ctx.device, parms.force_detach); 2263 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2197 + mutex_lock(&adm_ctx->resource->adm_mutex); 2198 + retcode = adm_detach(adm_ctx->device, parms.force_detach); 2199 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2264 2200 out: 2265 - drbd_adm_finish(&adm_ctx, info, retcode); 2201 + adm_ctx->reply_dh->ret_code = retcode; 2266 2202 return 0; 2267 2203 } 2268 2204 ··· 2436 2372 2437 2373 int drbd_adm_net_opts(struct sk_buff *skb, struct genl_info *info) 2438 2374 { 2439 - struct drbd_config_context adm_ctx; 2375 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 2440 2376 enum drbd_ret_code retcode; 2441 2377 struct drbd_connection *connection; 2442 2378 struct net_conf *old_net_conf, *new_net_conf = NULL; ··· 2445 2381 int rsr; /* re-sync running */ 2446 2382 struct crypto crypto = { }; 2447 2383 2448 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_CONNECTION); 2449 - if (!adm_ctx.reply_skb) 2450 - return retcode; 2384 + if (!adm_ctx->reply_skb) 2385 + return 0; 2386 + retcode = adm_ctx->reply_dh->ret_code; 2451 2387 if (retcode != NO_ERROR) 2452 2388 goto finish; 2453 2389 2454 - connection = adm_ctx.connection; 2455 - mutex_lock(&adm_ctx.resource->adm_mutex); 2390 + connection = adm_ctx->connection; 2391 + mutex_lock(&adm_ctx->resource->adm_mutex); 2456 2392 2457 2393 new_net_conf = kzalloc_obj(struct net_conf); 2458 2394 if (!new_net_conf) { ··· 2467 2403 old_net_conf = connection->net_conf; 2468 2404 2469 2405 if (!old_net_conf) { 2470 - drbd_msg_put_info(adm_ctx.reply_skb, "net conf missing, try connect"); 2406 + drbd_msg_put_info(adm_ctx->reply_skb, "net conf missing, try connect"); 2471 2407 retcode = ERR_INVALID_REQUEST; 2472 2408 goto fail; 2473 2409 } ··· 2479 2415 err = net_conf_from_attrs_for_change(new_net_conf, info); 2480 2416 if (err && err != -ENOMSG) { 2481 2417 retcode = ERR_MANDATORY_TAG; 2482 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 2418 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 2483 2419 goto fail; 2484 2420 } 2485 2421 ··· 2549 2485 done: 2550 2486 conn_reconfig_done(connection); 2551 2487 out: 2552 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2488 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2553 2489 finish: 2554 - drbd_adm_finish(&adm_ctx, info, retcode); 2490 + adm_ctx->reply_dh->ret_code = retcode; 2555 2491 return 0; 2556 2492 } 2557 2493 ··· 2580 2516 struct connection_info connection_info; 2581 2517 enum drbd_notification_type flags; 2582 2518 unsigned int peer_devices = 0; 2583 - struct drbd_config_context adm_ctx; 2519 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 2584 2520 struct drbd_peer_device *peer_device; 2585 2521 struct net_conf *old_net_conf, *new_net_conf = NULL; 2586 2522 struct crypto crypto = { }; ··· 2591 2527 int i; 2592 2528 int err; 2593 2529 2594 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); 2595 - 2596 - if (!adm_ctx.reply_skb) 2597 - return retcode; 2530 + if (!adm_ctx->reply_skb) 2531 + return 0; 2532 + retcode = adm_ctx->reply_dh->ret_code; 2598 2533 if (retcode != NO_ERROR) 2599 2534 goto out; 2600 - if (!(adm_ctx.my_addr && adm_ctx.peer_addr)) { 2601 - drbd_msg_put_info(adm_ctx.reply_skb, "connection endpoint(s) missing"); 2535 + if (!(adm_ctx->my_addr && adm_ctx->peer_addr)) { 2536 + drbd_msg_put_info(adm_ctx->reply_skb, "connection endpoint(s) missing"); 2602 2537 retcode = ERR_INVALID_REQUEST; 2603 2538 goto out; 2604 2539 } ··· 2607 2544 * concurrent reconfiguration/addition/deletion */ 2608 2545 for_each_resource(resource, &drbd_resources) { 2609 2546 for_each_connection(connection, resource) { 2610 - if (nla_len(adm_ctx.my_addr) == connection->my_addr_len && 2611 - !memcmp(nla_data(adm_ctx.my_addr), &connection->my_addr, 2547 + if (nla_len(adm_ctx->my_addr) == connection->my_addr_len && 2548 + !memcmp(nla_data(adm_ctx->my_addr), &connection->my_addr, 2612 2549 connection->my_addr_len)) { 2613 2550 retcode = ERR_LOCAL_ADDR; 2614 2551 goto out; 2615 2552 } 2616 2553 2617 - if (nla_len(adm_ctx.peer_addr) == connection->peer_addr_len && 2618 - !memcmp(nla_data(adm_ctx.peer_addr), &connection->peer_addr, 2554 + if (nla_len(adm_ctx->peer_addr) == connection->peer_addr_len && 2555 + !memcmp(nla_data(adm_ctx->peer_addr), &connection->peer_addr, 2619 2556 connection->peer_addr_len)) { 2620 2557 retcode = ERR_PEER_ADDR; 2621 2558 goto out; ··· 2623 2560 } 2624 2561 } 2625 2562 2626 - mutex_lock(&adm_ctx.resource->adm_mutex); 2627 - connection = first_connection(adm_ctx.resource); 2563 + mutex_lock(&adm_ctx->resource->adm_mutex); 2564 + connection = first_connection(adm_ctx->resource); 2628 2565 conn_reconfig_start(connection); 2629 2566 2630 2567 if (connection->cstate > C_STANDALONE) { ··· 2644 2581 err = net_conf_from_attrs(new_net_conf, info); 2645 2582 if (err && err != -ENOMSG) { 2646 2583 retcode = ERR_MANDATORY_TAG; 2647 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 2584 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 2648 2585 goto fail; 2649 2586 } 2650 2587 ··· 2660 2597 2661 2598 drbd_flush_workqueue(&connection->sender_work); 2662 2599 2663 - mutex_lock(&adm_ctx.resource->conf_update); 2600 + mutex_lock(&adm_ctx->resource->conf_update); 2664 2601 old_net_conf = connection->net_conf; 2665 2602 if (old_net_conf) { 2666 2603 retcode = ERR_NET_CONFIGURED; 2667 - mutex_unlock(&adm_ctx.resource->conf_update); 2604 + mutex_unlock(&adm_ctx->resource->conf_update); 2668 2605 goto fail; 2669 2606 } 2670 2607 rcu_assign_pointer(connection->net_conf, new_net_conf); ··· 2675 2612 connection->csums_tfm = crypto.csums_tfm; 2676 2613 connection->verify_tfm = crypto.verify_tfm; 2677 2614 2678 - connection->my_addr_len = nla_len(adm_ctx.my_addr); 2679 - memcpy(&connection->my_addr, nla_data(adm_ctx.my_addr), connection->my_addr_len); 2680 - connection->peer_addr_len = nla_len(adm_ctx.peer_addr); 2681 - memcpy(&connection->peer_addr, nla_data(adm_ctx.peer_addr), connection->peer_addr_len); 2615 + connection->my_addr_len = nla_len(adm_ctx->my_addr); 2616 + memcpy(&connection->my_addr, nla_data(adm_ctx->my_addr), connection->my_addr_len); 2617 + connection->peer_addr_len = nla_len(adm_ctx->peer_addr); 2618 + memcpy(&connection->peer_addr, nla_data(adm_ctx->peer_addr), connection->peer_addr_len); 2682 2619 2683 2620 idr_for_each_entry(&connection->peer_devices, peer_device, i) { 2684 2621 peer_devices++; ··· 2696 2633 notify_peer_device_state(NULL, 0, peer_device, &peer_device_info, NOTIFY_CREATE | flags); 2697 2634 } 2698 2635 mutex_unlock(&notification_mutex); 2699 - mutex_unlock(&adm_ctx.resource->conf_update); 2636 + mutex_unlock(&adm_ctx->resource->conf_update); 2700 2637 2701 2638 rcu_read_lock(); 2702 2639 idr_for_each_entry(&connection->peer_devices, peer_device, i) { ··· 2709 2646 rv = conn_request_state(connection, NS(conn, C_UNCONNECTED), CS_VERBOSE); 2710 2647 2711 2648 conn_reconfig_done(connection); 2712 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2713 - drbd_adm_finish(&adm_ctx, info, rv); 2649 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2650 + adm_ctx->reply_dh->ret_code = rv; 2714 2651 return 0; 2715 2652 2716 2653 fail: ··· 2718 2655 kfree(new_net_conf); 2719 2656 2720 2657 conn_reconfig_done(connection); 2721 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2658 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2722 2659 out: 2723 - drbd_adm_finish(&adm_ctx, info, retcode); 2660 + adm_ctx->reply_dh->ret_code = retcode; 2724 2661 return 0; 2725 2662 } 2726 2663 ··· 2792 2729 2793 2730 int drbd_adm_disconnect(struct sk_buff *skb, struct genl_info *info) 2794 2731 { 2795 - struct drbd_config_context adm_ctx; 2732 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 2796 2733 struct disconnect_parms parms; 2797 2734 struct drbd_connection *connection; 2798 2735 enum drbd_state_rv rv; 2799 2736 enum drbd_ret_code retcode; 2800 2737 int err; 2801 2738 2802 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_CONNECTION); 2803 - if (!adm_ctx.reply_skb) 2804 - return retcode; 2739 + if (!adm_ctx->reply_skb) 2740 + return 0; 2741 + retcode = adm_ctx->reply_dh->ret_code; 2805 2742 if (retcode != NO_ERROR) 2806 2743 goto fail; 2807 2744 2808 - connection = adm_ctx.connection; 2745 + connection = adm_ctx->connection; 2809 2746 memset(&parms, 0, sizeof(parms)); 2810 2747 if (info->attrs[DRBD_NLA_DISCONNECT_PARMS]) { 2811 2748 err = disconnect_parms_from_attrs(&parms, info); 2812 2749 if (err) { 2813 2750 retcode = ERR_MANDATORY_TAG; 2814 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 2751 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 2815 2752 goto fail; 2816 2753 } 2817 2754 } 2818 2755 2819 - mutex_lock(&adm_ctx.resource->adm_mutex); 2756 + mutex_lock(&adm_ctx->resource->adm_mutex); 2820 2757 rv = conn_try_disconnect(connection, parms.force_disconnect); 2821 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2758 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2822 2759 if (rv < SS_SUCCESS) { 2823 - drbd_adm_finish(&adm_ctx, info, rv); 2760 + adm_ctx->reply_dh->ret_code = rv; 2824 2761 return 0; 2825 2762 } 2826 2763 retcode = NO_ERROR; 2827 2764 fail: 2828 - drbd_adm_finish(&adm_ctx, info, retcode); 2765 + adm_ctx->reply_dh->ret_code = retcode; 2829 2766 return 0; 2830 2767 } 2831 2768 ··· 2847 2784 2848 2785 int drbd_adm_resize(struct sk_buff *skb, struct genl_info *info) 2849 2786 { 2850 - struct drbd_config_context adm_ctx; 2787 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 2851 2788 struct disk_conf *old_disk_conf, *new_disk_conf = NULL; 2852 2789 struct resize_parms rs; 2853 2790 struct drbd_device *device; ··· 2858 2795 sector_t u_size; 2859 2796 int err; 2860 2797 2861 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 2862 - if (!adm_ctx.reply_skb) 2863 - return retcode; 2798 + if (!adm_ctx->reply_skb) 2799 + return 0; 2800 + retcode = adm_ctx->reply_dh->ret_code; 2864 2801 if (retcode != NO_ERROR) 2865 2802 goto finish; 2866 2803 2867 - mutex_lock(&adm_ctx.resource->adm_mutex); 2868 - device = adm_ctx.device; 2804 + mutex_lock(&adm_ctx->resource->adm_mutex); 2805 + device = adm_ctx->device; 2869 2806 if (!get_ldev(device)) { 2870 2807 retcode = ERR_NO_DISK; 2871 2808 goto fail; ··· 2878 2815 err = resize_parms_from_attrs(&rs, info); 2879 2816 if (err) { 2880 2817 retcode = ERR_MANDATORY_TAG; 2881 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 2818 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 2882 2819 goto fail_ldev; 2883 2820 } 2884 2821 } ··· 2970 2907 } 2971 2908 2972 2909 fail: 2973 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2910 + mutex_unlock(&adm_ctx->resource->adm_mutex); 2974 2911 finish: 2975 - drbd_adm_finish(&adm_ctx, info, retcode); 2912 + adm_ctx->reply_dh->ret_code = retcode; 2976 2913 return 0; 2977 2914 2978 2915 fail_ldev: ··· 2983 2920 2984 2921 int drbd_adm_resource_opts(struct sk_buff *skb, struct genl_info *info) 2985 2922 { 2986 - struct drbd_config_context adm_ctx; 2923 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 2987 2924 enum drbd_ret_code retcode; 2988 2925 struct res_opts res_opts; 2989 2926 int err; 2990 2927 2991 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); 2992 - if (!adm_ctx.reply_skb) 2993 - return retcode; 2928 + if (!adm_ctx->reply_skb) 2929 + return 0; 2930 + retcode = adm_ctx->reply_dh->ret_code; 2994 2931 if (retcode != NO_ERROR) 2995 2932 goto fail; 2996 2933 2997 - res_opts = adm_ctx.resource->res_opts; 2934 + res_opts = adm_ctx->resource->res_opts; 2998 2935 if (should_set_defaults(info)) 2999 2936 set_res_opts_defaults(&res_opts); 3000 2937 3001 2938 err = res_opts_from_attrs(&res_opts, info); 3002 2939 if (err && err != -ENOMSG) { 3003 2940 retcode = ERR_MANDATORY_TAG; 3004 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 2941 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 3005 2942 goto fail; 3006 2943 } 3007 2944 3008 - mutex_lock(&adm_ctx.resource->adm_mutex); 3009 - err = set_resource_options(adm_ctx.resource, &res_opts); 2945 + mutex_lock(&adm_ctx->resource->adm_mutex); 2946 + err = set_resource_options(adm_ctx->resource, &res_opts); 3010 2947 if (err) { 3011 2948 retcode = ERR_INVALID_REQUEST; 3012 2949 if (err == -ENOMEM) 3013 2950 retcode = ERR_NOMEM; 3014 2951 } 3015 - mutex_unlock(&adm_ctx.resource->adm_mutex); 2952 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3016 2953 3017 2954 fail: 3018 - drbd_adm_finish(&adm_ctx, info, retcode); 2955 + adm_ctx->reply_dh->ret_code = retcode; 3019 2956 return 0; 3020 2957 } 3021 2958 3022 2959 int drbd_adm_invalidate(struct sk_buff *skb, struct genl_info *info) 3023 2960 { 3024 - struct drbd_config_context adm_ctx; 2961 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3025 2962 struct drbd_device *device; 3026 2963 int retcode; /* enum drbd_ret_code rsp. enum drbd_state_rv */ 3027 2964 3028 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3029 - if (!adm_ctx.reply_skb) 3030 - return retcode; 2965 + if (!adm_ctx->reply_skb) 2966 + return 0; 2967 + retcode = adm_ctx->reply_dh->ret_code; 3031 2968 if (retcode != NO_ERROR) 3032 2969 goto out; 3033 2970 3034 - device = adm_ctx.device; 2971 + device = adm_ctx->device; 3035 2972 if (!get_ldev(device)) { 3036 2973 retcode = ERR_NO_DISK; 3037 2974 goto out; 3038 2975 } 3039 2976 3040 - mutex_lock(&adm_ctx.resource->adm_mutex); 2977 + mutex_lock(&adm_ctx->resource->adm_mutex); 3041 2978 3042 2979 /* If there is still bitmap IO pending, probably because of a previous 3043 2980 * resync just being finished, wait for it before requesting a new resync. ··· 3060 2997 } else 3061 2998 retcode = drbd_request_state(device, NS(conn, C_STARTING_SYNC_T)); 3062 2999 drbd_resume_io(device); 3063 - mutex_unlock(&adm_ctx.resource->adm_mutex); 3000 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3064 3001 put_ldev(device); 3065 3002 out: 3066 - drbd_adm_finish(&adm_ctx, info, retcode); 3003 + adm_ctx->reply_dh->ret_code = retcode; 3067 3004 return 0; 3068 3005 } 3069 3006 3070 3007 static int drbd_adm_simple_request_state(struct sk_buff *skb, struct genl_info *info, 3071 3008 union drbd_state mask, union drbd_state val) 3072 3009 { 3073 - struct drbd_config_context adm_ctx; 3010 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3074 3011 enum drbd_ret_code retcode; 3075 3012 3076 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3077 - if (!adm_ctx.reply_skb) 3078 - return retcode; 3013 + if (!adm_ctx->reply_skb) 3014 + return 0; 3015 + retcode = adm_ctx->reply_dh->ret_code; 3079 3016 if (retcode != NO_ERROR) 3080 3017 goto out; 3081 3018 3082 - mutex_lock(&adm_ctx.resource->adm_mutex); 3083 - retcode = drbd_request_state(adm_ctx.device, mask, val); 3084 - mutex_unlock(&adm_ctx.resource->adm_mutex); 3019 + mutex_lock(&adm_ctx->resource->adm_mutex); 3020 + retcode = drbd_request_state(adm_ctx->device, mask, val); 3021 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3085 3022 out: 3086 - drbd_adm_finish(&adm_ctx, info, retcode); 3023 + adm_ctx->reply_dh->ret_code = retcode; 3087 3024 return 0; 3088 3025 } 3089 3026 ··· 3099 3036 3100 3037 int drbd_adm_invalidate_peer(struct sk_buff *skb, struct genl_info *info) 3101 3038 { 3102 - struct drbd_config_context adm_ctx; 3039 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3103 3040 int retcode; /* drbd_ret_code, drbd_state_rv */ 3104 3041 struct drbd_device *device; 3105 3042 3106 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3107 - if (!adm_ctx.reply_skb) 3108 - return retcode; 3043 + if (!adm_ctx->reply_skb) 3044 + return 0; 3045 + retcode = adm_ctx->reply_dh->ret_code; 3109 3046 if (retcode != NO_ERROR) 3110 3047 goto out; 3111 3048 3112 - device = adm_ctx.device; 3049 + device = adm_ctx->device; 3113 3050 if (!get_ldev(device)) { 3114 3051 retcode = ERR_NO_DISK; 3115 3052 goto out; 3116 3053 } 3117 3054 3118 - mutex_lock(&adm_ctx.resource->adm_mutex); 3055 + mutex_lock(&adm_ctx->resource->adm_mutex); 3119 3056 3120 3057 /* If there is still bitmap IO pending, probably because of a previous 3121 3058 * resync just being finished, wait for it before requesting a new resync. ··· 3141 3078 } else 3142 3079 retcode = drbd_request_state(device, NS(conn, C_STARTING_SYNC_S)); 3143 3080 drbd_resume_io(device); 3144 - mutex_unlock(&adm_ctx.resource->adm_mutex); 3081 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3145 3082 put_ldev(device); 3146 3083 out: 3147 - drbd_adm_finish(&adm_ctx, info, retcode); 3084 + adm_ctx->reply_dh->ret_code = retcode; 3148 3085 return 0; 3149 3086 } 3150 3087 3151 3088 int drbd_adm_pause_sync(struct sk_buff *skb, struct genl_info *info) 3152 3089 { 3153 - struct drbd_config_context adm_ctx; 3090 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3154 3091 enum drbd_ret_code retcode; 3155 3092 3156 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3157 - if (!adm_ctx.reply_skb) 3158 - return retcode; 3093 + if (!adm_ctx->reply_skb) 3094 + return 0; 3095 + retcode = adm_ctx->reply_dh->ret_code; 3159 3096 if (retcode != NO_ERROR) 3160 3097 goto out; 3161 3098 3162 - mutex_lock(&adm_ctx.resource->adm_mutex); 3163 - if (drbd_request_state(adm_ctx.device, NS(user_isp, 1)) == SS_NOTHING_TO_DO) 3099 + mutex_lock(&adm_ctx->resource->adm_mutex); 3100 + if (drbd_request_state(adm_ctx->device, NS(user_isp, 1)) == SS_NOTHING_TO_DO) 3164 3101 retcode = ERR_PAUSE_IS_SET; 3165 - mutex_unlock(&adm_ctx.resource->adm_mutex); 3102 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3166 3103 out: 3167 - drbd_adm_finish(&adm_ctx, info, retcode); 3104 + adm_ctx->reply_dh->ret_code = retcode; 3168 3105 return 0; 3169 3106 } 3170 3107 3171 3108 int drbd_adm_resume_sync(struct sk_buff *skb, struct genl_info *info) 3172 3109 { 3173 - struct drbd_config_context adm_ctx; 3110 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3174 3111 union drbd_dev_state s; 3175 3112 enum drbd_ret_code retcode; 3176 3113 3177 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3178 - if (!adm_ctx.reply_skb) 3179 - return retcode; 3114 + if (!adm_ctx->reply_skb) 3115 + return 0; 3116 + retcode = adm_ctx->reply_dh->ret_code; 3180 3117 if (retcode != NO_ERROR) 3181 3118 goto out; 3182 3119 3183 - mutex_lock(&adm_ctx.resource->adm_mutex); 3184 - if (drbd_request_state(adm_ctx.device, NS(user_isp, 0)) == SS_NOTHING_TO_DO) { 3185 - s = adm_ctx.device->state; 3120 + mutex_lock(&adm_ctx->resource->adm_mutex); 3121 + if (drbd_request_state(adm_ctx->device, NS(user_isp, 0)) == SS_NOTHING_TO_DO) { 3122 + s = adm_ctx->device->state; 3186 3123 if (s.conn == C_PAUSED_SYNC_S || s.conn == C_PAUSED_SYNC_T) { 3187 3124 retcode = s.aftr_isp ? ERR_PIC_AFTER_DEP : 3188 3125 s.peer_isp ? ERR_PIC_PEER_DEP : ERR_PAUSE_IS_CLEAR; ··· 3190 3127 retcode = ERR_PAUSE_IS_CLEAR; 3191 3128 } 3192 3129 } 3193 - mutex_unlock(&adm_ctx.resource->adm_mutex); 3130 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3194 3131 out: 3195 - drbd_adm_finish(&adm_ctx, info, retcode); 3132 + adm_ctx->reply_dh->ret_code = retcode; 3196 3133 return 0; 3197 3134 } 3198 3135 ··· 3203 3140 3204 3141 int drbd_adm_resume_io(struct sk_buff *skb, struct genl_info *info) 3205 3142 { 3206 - struct drbd_config_context adm_ctx; 3143 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3207 3144 struct drbd_device *device; 3208 3145 int retcode; /* enum drbd_ret_code rsp. enum drbd_state_rv */ 3209 3146 3210 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3211 - if (!adm_ctx.reply_skb) 3212 - return retcode; 3147 + if (!adm_ctx->reply_skb) 3148 + return 0; 3149 + retcode = adm_ctx->reply_dh->ret_code; 3213 3150 if (retcode != NO_ERROR) 3214 3151 goto out; 3215 3152 3216 - mutex_lock(&adm_ctx.resource->adm_mutex); 3217 - device = adm_ctx.device; 3153 + mutex_lock(&adm_ctx->resource->adm_mutex); 3154 + device = adm_ctx->device; 3218 3155 if (test_bit(NEW_CUR_UUID, &device->flags)) { 3219 3156 if (get_ldev_if_state(device, D_ATTACHING)) { 3220 3157 drbd_uuid_new_current(device); ··· 3236 3173 * matching real data uuid exists). 3237 3174 */ 3238 3175 u64 val; 3239 - get_random_bytes(&val, sizeof(u64)); 3176 + val = get_random_u64(); 3240 3177 drbd_set_ed_uuid(device, val); 3241 3178 drbd_warn(device, "Resumed without access to data; please tear down before attempting to re-configure.\n"); 3242 3179 } ··· 3251 3188 tl_restart(first_peer_device(device)->connection, FAIL_FROZEN_DISK_IO); 3252 3189 } 3253 3190 drbd_resume_io(device); 3254 - mutex_unlock(&adm_ctx.resource->adm_mutex); 3191 + mutex_unlock(&adm_ctx->resource->adm_mutex); 3255 3192 out: 3256 - drbd_adm_finish(&adm_ctx, info, retcode); 3193 + adm_ctx->reply_dh->ret_code = retcode; 3257 3194 return 0; 3258 3195 } 3259 3196 ··· 3301 3238 static struct nlattr *find_cfg_context_attr(const struct nlmsghdr *nlh, int attr) 3302 3239 { 3303 3240 const unsigned hdrlen = GENL_HDRLEN + GENL_MAGIC_FAMILY_HDRSZ; 3304 - const int maxtype = ARRAY_SIZE(drbd_cfg_context_nl_policy) - 1; 3305 3241 struct nlattr *nla; 3306 3242 3307 3243 nla = nla_find(nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen), 3308 3244 DRBD_NLA_CFG_CONTEXT); 3309 3245 if (!nla) 3310 3246 return NULL; 3311 - return drbd_nla_find_nested(maxtype, nla, __nla_type(attr)); 3247 + return nla_find_nested(nla, attr); 3312 3248 } 3313 3249 3314 3250 static void resource_to_info(struct resource_info *, struct drbd_resource *); ··· 3440 3378 if (resource_filter) { 3441 3379 retcode = ERR_RES_NOT_KNOWN; 3442 3380 resource = drbd_find_resource(nla_data(resource_filter)); 3443 - if (!resource) 3381 + if (!resource) { 3382 + rcu_read_lock(); 3444 3383 goto put_result; 3384 + } 3445 3385 cb->args[0] = (long)resource; 3446 3386 } 3447 3387 } ··· 3692 3628 if (resource_filter) { 3693 3629 retcode = ERR_RES_NOT_KNOWN; 3694 3630 resource = drbd_find_resource(nla_data(resource_filter)); 3695 - if (!resource) 3631 + if (!resource) { 3632 + rcu_read_lock(); 3696 3633 goto put_result; 3634 + } 3697 3635 } 3698 3636 cb->args[0] = (long)resource; 3699 3637 } ··· 3909 3843 3910 3844 int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info) 3911 3845 { 3912 - struct drbd_config_context adm_ctx; 3846 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 3913 3847 enum drbd_ret_code retcode; 3914 3848 int err; 3915 3849 3916 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 3917 - if (!adm_ctx.reply_skb) 3918 - return retcode; 3850 + if (!adm_ctx->reply_skb) 3851 + return 0; 3852 + retcode = adm_ctx->reply_dh->ret_code; 3919 3853 if (retcode != NO_ERROR) 3920 3854 goto out; 3921 3855 3922 - err = nla_put_status_info(adm_ctx.reply_skb, adm_ctx.device, NULL); 3856 + err = nla_put_status_info(adm_ctx->reply_skb, adm_ctx->device, NULL); 3923 3857 if (err) { 3924 - nlmsg_free(adm_ctx.reply_skb); 3858 + nlmsg_free(adm_ctx->reply_skb); 3859 + adm_ctx->reply_skb = NULL; 3925 3860 return err; 3926 3861 } 3927 3862 out: 3928 - drbd_adm_finish(&adm_ctx, info, retcode); 3863 + adm_ctx->reply_dh->ret_code = retcode; 3929 3864 return 0; 3930 3865 } 3931 3866 ··· 4065 3998 struct nlattr *nla; 4066 3999 const char *resource_name; 4067 4000 struct drbd_resource *resource; 4068 - int maxtype; 4069 4001 4070 4002 /* Is this a followup call? */ 4071 4003 if (cb->args[0]) { ··· 4084 4018 /* No explicit context given. Dump all. */ 4085 4019 if (!nla) 4086 4020 goto dump; 4087 - maxtype = ARRAY_SIZE(drbd_cfg_context_nl_policy) - 1; 4088 - nla = drbd_nla_find_nested(maxtype, nla, __nla_type(T_ctx_resource_name)); 4089 - if (IS_ERR(nla)) 4090 - return PTR_ERR(nla); 4021 + nla = nla_find_nested(nla, T_ctx_resource_name); 4091 4022 /* context given, but no name present? */ 4092 4023 if (!nla) 4093 4024 return -EINVAL; ··· 4109 4046 4110 4047 int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info) 4111 4048 { 4112 - struct drbd_config_context adm_ctx; 4049 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4113 4050 enum drbd_ret_code retcode; 4114 4051 struct timeout_parms tp; 4115 4052 int err; 4116 4053 4117 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 4118 - if (!adm_ctx.reply_skb) 4119 - return retcode; 4054 + if (!adm_ctx->reply_skb) 4055 + return 0; 4056 + retcode = adm_ctx->reply_dh->ret_code; 4120 4057 if (retcode != NO_ERROR) 4121 4058 goto out; 4122 4059 4123 4060 tp.timeout_type = 4124 - adm_ctx.device->state.pdsk == D_OUTDATED ? UT_PEER_OUTDATED : 4125 - test_bit(USE_DEGR_WFC_T, &adm_ctx.device->flags) ? UT_DEGRADED : 4061 + adm_ctx->device->state.pdsk == D_OUTDATED ? UT_PEER_OUTDATED : 4062 + test_bit(USE_DEGR_WFC_T, &adm_ctx->device->flags) ? UT_DEGRADED : 4126 4063 UT_DEFAULT; 4127 4064 4128 - err = timeout_parms_to_priv_skb(adm_ctx.reply_skb, &tp); 4065 + err = timeout_parms_to_priv_skb(adm_ctx->reply_skb, &tp); 4129 4066 if (err) { 4130 - nlmsg_free(adm_ctx.reply_skb); 4067 + nlmsg_free(adm_ctx->reply_skb); 4068 + adm_ctx->reply_skb = NULL; 4131 4069 return err; 4132 4070 } 4133 4071 out: 4134 - drbd_adm_finish(&adm_ctx, info, retcode); 4072 + adm_ctx->reply_dh->ret_code = retcode; 4135 4073 return 0; 4136 4074 } 4137 4075 4138 4076 int drbd_adm_start_ov(struct sk_buff *skb, struct genl_info *info) 4139 4077 { 4140 - struct drbd_config_context adm_ctx; 4078 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4141 4079 struct drbd_device *device; 4142 4080 enum drbd_ret_code retcode; 4143 4081 struct start_ov_parms parms; 4144 4082 4145 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 4146 - if (!adm_ctx.reply_skb) 4147 - return retcode; 4083 + if (!adm_ctx->reply_skb) 4084 + return 0; 4085 + retcode = adm_ctx->reply_dh->ret_code; 4148 4086 if (retcode != NO_ERROR) 4149 4087 goto out; 4150 4088 4151 - device = adm_ctx.device; 4089 + device = adm_ctx->device; 4152 4090 4153 4091 /* resume from last known position, if possible */ 4154 4092 parms.ov_start_sector = device->ov_start_sector; ··· 4158 4094 int err = start_ov_parms_from_attrs(&parms, info); 4159 4095 if (err) { 4160 4096 retcode = ERR_MANDATORY_TAG; 4161 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 4097 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 4162 4098 goto out; 4163 4099 } 4164 4100 } 4165 - mutex_lock(&adm_ctx.resource->adm_mutex); 4101 + mutex_lock(&adm_ctx->resource->adm_mutex); 4166 4102 4167 4103 /* w_make_ov_request expects position to be aligned */ 4168 4104 device->ov_start_sector = parms.ov_start_sector & ~(BM_SECT_PER_BIT-1); ··· 4175 4111 retcode = drbd_request_state(device, NS(conn, C_VERIFY_S)); 4176 4112 drbd_resume_io(device); 4177 4113 4178 - mutex_unlock(&adm_ctx.resource->adm_mutex); 4114 + mutex_unlock(&adm_ctx->resource->adm_mutex); 4179 4115 out: 4180 - drbd_adm_finish(&adm_ctx, info, retcode); 4116 + adm_ctx->reply_dh->ret_code = retcode; 4181 4117 return 0; 4182 4118 } 4183 4119 4184 4120 4185 4121 int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info) 4186 4122 { 4187 - struct drbd_config_context adm_ctx; 4123 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4188 4124 struct drbd_device *device; 4189 4125 enum drbd_ret_code retcode; 4190 4126 int skip_initial_sync = 0; 4191 4127 int err; 4192 4128 struct new_c_uuid_parms args; 4193 4129 4194 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 4195 - if (!adm_ctx.reply_skb) 4196 - return retcode; 4130 + if (!adm_ctx->reply_skb) 4131 + return 0; 4132 + retcode = adm_ctx->reply_dh->ret_code; 4197 4133 if (retcode != NO_ERROR) 4198 4134 goto out_nolock; 4199 4135 4200 - device = adm_ctx.device; 4136 + device = adm_ctx->device; 4201 4137 memset(&args, 0, sizeof(args)); 4202 4138 if (info->attrs[DRBD_NLA_NEW_C_UUID_PARMS]) { 4203 4139 err = new_c_uuid_parms_from_attrs(&args, info); 4204 4140 if (err) { 4205 4141 retcode = ERR_MANDATORY_TAG; 4206 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 4142 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 4207 4143 goto out_nolock; 4208 4144 } 4209 4145 } 4210 4146 4211 - mutex_lock(&adm_ctx.resource->adm_mutex); 4147 + mutex_lock(&adm_ctx->resource->adm_mutex); 4212 4148 mutex_lock(device->state_mutex); /* Protects us against serialized state changes. */ 4213 4149 4214 4150 if (!get_ldev(device)) { ··· 4253 4189 put_ldev(device); 4254 4190 out: 4255 4191 mutex_unlock(device->state_mutex); 4256 - mutex_unlock(&adm_ctx.resource->adm_mutex); 4192 + mutex_unlock(&adm_ctx->resource->adm_mutex); 4257 4193 out_nolock: 4258 - drbd_adm_finish(&adm_ctx, info, retcode); 4194 + adm_ctx->reply_dh->ret_code = retcode; 4259 4195 return 0; 4260 4196 } 4261 4197 ··· 4288 4224 int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info) 4289 4225 { 4290 4226 struct drbd_connection *connection; 4291 - struct drbd_config_context adm_ctx; 4227 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4292 4228 enum drbd_ret_code retcode; 4293 4229 struct res_opts res_opts; 4294 4230 int err; 4295 4231 4296 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, 0); 4297 - if (!adm_ctx.reply_skb) 4298 - return retcode; 4232 + if (!adm_ctx->reply_skb) 4233 + return 0; 4234 + retcode = adm_ctx->reply_dh->ret_code; 4299 4235 if (retcode != NO_ERROR) 4300 4236 goto out; 4301 4237 ··· 4303 4239 err = res_opts_from_attrs(&res_opts, info); 4304 4240 if (err && err != -ENOMSG) { 4305 4241 retcode = ERR_MANDATORY_TAG; 4306 - drbd_msg_put_info(adm_ctx.reply_skb, from_attrs_err_to_txt(err)); 4242 + drbd_msg_put_info(adm_ctx->reply_skb, from_attrs_err_to_txt(err)); 4307 4243 goto out; 4308 4244 } 4309 4245 4310 - retcode = drbd_check_resource_name(&adm_ctx); 4246 + retcode = drbd_check_resource_name(adm_ctx); 4311 4247 if (retcode != NO_ERROR) 4312 4248 goto out; 4313 4249 4314 - if (adm_ctx.resource) { 4250 + if (adm_ctx->resource) { 4315 4251 if (info->nlhdr->nlmsg_flags & NLM_F_EXCL) { 4316 4252 retcode = ERR_INVALID_REQUEST; 4317 - drbd_msg_put_info(adm_ctx.reply_skb, "resource exists"); 4253 + drbd_msg_put_info(adm_ctx->reply_skb, "resource exists"); 4318 4254 } 4319 4255 /* else: still NO_ERROR */ 4320 4256 goto out; ··· 4322 4258 4323 4259 /* not yet safe for genl_family.parallel_ops */ 4324 4260 mutex_lock(&resources_mutex); 4325 - connection = conn_create(adm_ctx.resource_name, &res_opts); 4261 + connection = conn_create(adm_ctx->resource_name, &res_opts); 4326 4262 mutex_unlock(&resources_mutex); 4327 4263 4328 4264 if (connection) { ··· 4337 4273 retcode = ERR_NOMEM; 4338 4274 4339 4275 out: 4340 - drbd_adm_finish(&adm_ctx, info, retcode); 4276 + adm_ctx->reply_dh->ret_code = retcode; 4341 4277 return 0; 4342 4278 } 4343 4279 ··· 4350 4286 4351 4287 int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info) 4352 4288 { 4353 - struct drbd_config_context adm_ctx; 4289 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4354 4290 struct drbd_genlmsghdr *dh = genl_info_userhdr(info); 4355 4291 enum drbd_ret_code retcode; 4356 4292 4357 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); 4358 - if (!adm_ctx.reply_skb) 4359 - return retcode; 4293 + if (!adm_ctx->reply_skb) 4294 + return 0; 4295 + retcode = adm_ctx->reply_dh->ret_code; 4360 4296 if (retcode != NO_ERROR) 4361 4297 goto out; 4362 4298 4363 4299 if (dh->minor > MINORMASK) { 4364 - drbd_msg_put_info(adm_ctx.reply_skb, "requested minor out of range"); 4300 + drbd_msg_put_info(adm_ctx->reply_skb, "requested minor out of range"); 4365 4301 retcode = ERR_INVALID_REQUEST; 4366 4302 goto out; 4367 4303 } 4368 - if (adm_ctx.volume > DRBD_VOLUME_MAX) { 4369 - drbd_msg_put_info(adm_ctx.reply_skb, "requested volume id out of range"); 4304 + if (adm_ctx->volume > DRBD_VOLUME_MAX) { 4305 + drbd_msg_put_info(adm_ctx->reply_skb, "requested volume id out of range"); 4370 4306 retcode = ERR_INVALID_REQUEST; 4371 4307 goto out; 4372 4308 } 4373 4309 4374 4310 /* drbd_adm_prepare made sure already 4375 4311 * that first_peer_device(device)->connection and device->vnr match the request. */ 4376 - if (adm_ctx.device) { 4312 + if (adm_ctx->device) { 4377 4313 if (info->nlhdr->nlmsg_flags & NLM_F_EXCL) 4378 4314 retcode = ERR_MINOR_OR_VOLUME_EXISTS; 4379 4315 /* else: still NO_ERROR */ 4380 4316 goto out; 4381 4317 } 4382 4318 4383 - mutex_lock(&adm_ctx.resource->adm_mutex); 4384 - retcode = drbd_create_device(&adm_ctx, dh->minor); 4319 + mutex_lock(&adm_ctx->resource->adm_mutex); 4320 + retcode = drbd_create_device(adm_ctx, dh->minor); 4385 4321 if (retcode == NO_ERROR) { 4386 4322 struct drbd_device *device; 4387 4323 struct drbd_peer_device *peer_device; ··· 4412 4348 } 4413 4349 mutex_unlock(&notification_mutex); 4414 4350 } 4415 - mutex_unlock(&adm_ctx.resource->adm_mutex); 4351 + mutex_unlock(&adm_ctx->resource->adm_mutex); 4416 4352 out: 4417 - drbd_adm_finish(&adm_ctx, info, retcode); 4353 + adm_ctx->reply_dh->ret_code = retcode; 4418 4354 return 0; 4419 4355 } 4420 4356 ··· 4457 4393 4458 4394 int drbd_adm_del_minor(struct sk_buff *skb, struct genl_info *info) 4459 4395 { 4460 - struct drbd_config_context adm_ctx; 4396 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4461 4397 enum drbd_ret_code retcode; 4462 4398 4463 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_MINOR); 4464 - if (!adm_ctx.reply_skb) 4465 - return retcode; 4399 + if (!adm_ctx->reply_skb) 4400 + return 0; 4401 + retcode = adm_ctx->reply_dh->ret_code; 4466 4402 if (retcode != NO_ERROR) 4467 4403 goto out; 4468 4404 4469 - mutex_lock(&adm_ctx.resource->adm_mutex); 4470 - retcode = adm_del_minor(adm_ctx.device); 4471 - mutex_unlock(&adm_ctx.resource->adm_mutex); 4405 + mutex_lock(&adm_ctx->resource->adm_mutex); 4406 + retcode = adm_del_minor(adm_ctx->device); 4407 + mutex_unlock(&adm_ctx->resource->adm_mutex); 4472 4408 out: 4473 - drbd_adm_finish(&adm_ctx, info, retcode); 4409 + adm_ctx->reply_dh->ret_code = retcode; 4474 4410 return 0; 4475 4411 } 4476 4412 ··· 4506 4442 4507 4443 int drbd_adm_down(struct sk_buff *skb, struct genl_info *info) 4508 4444 { 4509 - struct drbd_config_context adm_ctx; 4445 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4510 4446 struct drbd_resource *resource; 4511 4447 struct drbd_connection *connection; 4512 4448 struct drbd_device *device; 4513 4449 int retcode; /* enum drbd_ret_code rsp. enum drbd_state_rv */ 4514 4450 unsigned i; 4515 4451 4516 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); 4517 - if (!adm_ctx.reply_skb) 4518 - return retcode; 4452 + if (!adm_ctx->reply_skb) 4453 + return 0; 4454 + retcode = adm_ctx->reply_dh->ret_code; 4519 4455 if (retcode != NO_ERROR) 4520 4456 goto finish; 4521 4457 4522 - resource = adm_ctx.resource; 4458 + resource = adm_ctx->resource; 4523 4459 mutex_lock(&resource->adm_mutex); 4524 4460 /* demote */ 4525 4461 for_each_connection(connection, resource) { ··· 4528 4464 idr_for_each_entry(&connection->peer_devices, peer_device, i) { 4529 4465 retcode = drbd_set_role(peer_device->device, R_SECONDARY, 0); 4530 4466 if (retcode < SS_SUCCESS) { 4531 - drbd_msg_put_info(adm_ctx.reply_skb, "failed to demote"); 4467 + drbd_msg_put_info(adm_ctx->reply_skb, "failed to demote"); 4532 4468 goto out; 4533 4469 } 4534 4470 } 4535 4471 4536 4472 retcode = conn_try_disconnect(connection, 0); 4537 4473 if (retcode < SS_SUCCESS) { 4538 - drbd_msg_put_info(adm_ctx.reply_skb, "failed to disconnect"); 4474 + drbd_msg_put_info(adm_ctx->reply_skb, "failed to disconnect"); 4539 4475 goto out; 4540 4476 } 4541 4477 } ··· 4544 4480 idr_for_each_entry(&resource->devices, device, i) { 4545 4481 retcode = adm_detach(device, 0); 4546 4482 if (retcode < SS_SUCCESS || retcode > NO_ERROR) { 4547 - drbd_msg_put_info(adm_ctx.reply_skb, "failed to detach"); 4483 + drbd_msg_put_info(adm_ctx->reply_skb, "failed to detach"); 4548 4484 goto out; 4549 4485 } 4550 4486 } ··· 4554 4490 retcode = adm_del_minor(device); 4555 4491 if (retcode != NO_ERROR) { 4556 4492 /* "can not happen" */ 4557 - drbd_msg_put_info(adm_ctx.reply_skb, "failed to delete volume"); 4493 + drbd_msg_put_info(adm_ctx->reply_skb, "failed to delete volume"); 4558 4494 goto out; 4559 4495 } 4560 4496 } ··· 4563 4499 out: 4564 4500 mutex_unlock(&resource->adm_mutex); 4565 4501 finish: 4566 - drbd_adm_finish(&adm_ctx, info, retcode); 4502 + adm_ctx->reply_dh->ret_code = retcode; 4567 4503 return 0; 4568 4504 } 4569 4505 4570 4506 int drbd_adm_del_resource(struct sk_buff *skb, struct genl_info *info) 4571 4507 { 4572 - struct drbd_config_context adm_ctx; 4508 + struct drbd_config_context *adm_ctx = info->user_ptr[0]; 4573 4509 struct drbd_resource *resource; 4574 4510 enum drbd_ret_code retcode; 4575 4511 4576 - retcode = drbd_adm_prepare(&adm_ctx, skb, info, DRBD_ADM_NEED_RESOURCE); 4577 - if (!adm_ctx.reply_skb) 4578 - return retcode; 4512 + if (!adm_ctx->reply_skb) 4513 + return 0; 4514 + retcode = adm_ctx->reply_dh->ret_code; 4579 4515 if (retcode != NO_ERROR) 4580 4516 goto finish; 4581 - resource = adm_ctx.resource; 4517 + resource = adm_ctx->resource; 4582 4518 4583 4519 mutex_lock(&resource->adm_mutex); 4584 4520 retcode = adm_del_resource(resource); 4585 4521 mutex_unlock(&resource->adm_mutex); 4586 4522 finish: 4587 - drbd_adm_finish(&adm_ctx, info, retcode); 4523 + adm_ctx->reply_dh->ret_code = retcode; 4588 4524 return 0; 4589 4525 } 4590 4526
-56
drivers/block/drbd/drbd_nla.c
··· 1 - // SPDX-License-Identifier: GPL-2.0-only 2 - #include <linux/kernel.h> 3 - #include <net/netlink.h> 4 - #include <linux/drbd_genl_api.h> 5 - #include "drbd_nla.h" 6 - 7 - static int drbd_nla_check_mandatory(int maxtype, struct nlattr *nla) 8 - { 9 - struct nlattr *head = nla_data(nla); 10 - int len = nla_len(nla); 11 - int rem; 12 - 13 - /* 14 - * validate_nla (called from nla_parse_nested) ignores attributes 15 - * beyond maxtype, and does not understand the DRBD_GENLA_F_MANDATORY flag. 16 - * In order to have it validate attributes with the DRBD_GENLA_F_MANDATORY 17 - * flag set also, check and remove that flag before calling 18 - * nla_parse_nested. 19 - */ 20 - 21 - nla_for_each_attr(nla, head, len, rem) { 22 - if (nla->nla_type & DRBD_GENLA_F_MANDATORY) { 23 - nla->nla_type &= ~DRBD_GENLA_F_MANDATORY; 24 - if (nla_type(nla) > maxtype) 25 - return -EOPNOTSUPP; 26 - } 27 - } 28 - return 0; 29 - } 30 - 31 - int drbd_nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, 32 - const struct nla_policy *policy) 33 - { 34 - int err; 35 - 36 - err = drbd_nla_check_mandatory(maxtype, nla); 37 - if (!err) 38 - err = nla_parse_nested_deprecated(tb, maxtype, nla, policy, 39 - NULL); 40 - 41 - return err; 42 - } 43 - 44 - struct nlattr *drbd_nla_find_nested(int maxtype, struct nlattr *nla, int attrtype) 45 - { 46 - int err; 47 - /* 48 - * If any nested attribute has the DRBD_GENLA_F_MANDATORY flag set and 49 - * we don't know about that attribute, reject all the nested 50 - * attributes. 51 - */ 52 - err = drbd_nla_check_mandatory(maxtype, nla); 53 - if (err) 54 - return ERR_PTR(err); 55 - return nla_find_nested(nla, attrtype); 56 - }
-9
drivers/block/drbd/drbd_nla.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0-only */ 2 - #ifndef __DRBD_NLA_H 3 - #define __DRBD_NLA_H 4 - 5 - extern int drbd_nla_parse_nested(struct nlattr *tb[], int maxtype, struct nlattr *nla, 6 - const struct nla_policy *policy); 7 - extern struct nlattr *drbd_nla_find_nested(int maxtype, struct nlattr *nla, int attrtype); 8 - 9 - #endif /* __DRBD_NLA_H */
+434 -43
drivers/block/ublk_drv.c
··· 46 46 #include <linux/kref.h> 47 47 #include <linux/kfifo.h> 48 48 #include <linux/blk-integrity.h> 49 + #include <linux/maple_tree.h> 50 + #include <linux/xarray.h> 49 51 #include <uapi/linux/fs.h> 50 52 #include <uapi/linux/ublk_cmd.h> 51 53 ··· 60 58 #define UBLK_CMD_UPDATE_SIZE _IOC_NR(UBLK_U_CMD_UPDATE_SIZE) 61 59 #define UBLK_CMD_QUIESCE_DEV _IOC_NR(UBLK_U_CMD_QUIESCE_DEV) 62 60 #define UBLK_CMD_TRY_STOP_DEV _IOC_NR(UBLK_U_CMD_TRY_STOP_DEV) 61 + #define UBLK_CMD_REG_BUF _IOC_NR(UBLK_U_CMD_REG_BUF) 62 + #define UBLK_CMD_UNREG_BUF _IOC_NR(UBLK_U_CMD_UNREG_BUF) 63 + 64 + /* Default max shmem buffer size: 4GB (may be increased in future) */ 65 + #define UBLK_SHMEM_BUF_SIZE_MAX (1ULL << 32) 63 66 64 67 #define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF) 65 68 #define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF) ··· 88 81 | (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) ? UBLK_F_INTEGRITY : 0) \ 89 82 | UBLK_F_SAFE_STOP_DEV \ 90 83 | UBLK_F_BATCH_IO \ 91 - | UBLK_F_NO_AUTO_PART_SCAN) 84 + | UBLK_F_NO_AUTO_PART_SCAN \ 85 + | UBLK_F_SHMEM_ZC) 92 86 93 87 #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ 94 88 | UBLK_F_USER_RECOVERY_REISSUE \ ··· 297 289 struct ublk_io ios[] __counted_by(q_depth); 298 290 }; 299 291 292 + /* Maple tree value: maps a PFN range to buffer location */ 293 + struct ublk_buf_range { 294 + unsigned short buf_index; 295 + unsigned short flags; 296 + unsigned int base_offset; /* byte offset within buffer */ 297 + }; 298 + 300 299 struct ublk_device { 301 300 struct gendisk *ub_disk; 302 301 ··· 338 323 339 324 bool block_open; /* protected by open_mutex */ 340 325 326 + /* shared memory zero copy */ 327 + struct maple_tree buf_tree; 328 + struct ida buf_ida; 329 + 341 330 struct ublk_queue *queues[]; 342 331 }; 343 332 ··· 353 334 354 335 static void ublk_io_release(void *priv); 355 336 static void ublk_stop_dev_unlocked(struct ublk_device *ub); 337 + static bool ublk_try_buf_match(struct ublk_device *ub, struct request *rq, 338 + u32 *buf_idx, u32 *buf_off); 339 + static void ublk_buf_cleanup(struct ublk_device *ub); 356 340 static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq); 357 341 static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, 358 342 u16 q_id, u16 tag, struct ublk_io *io); ··· 418 396 static inline bool ublk_dev_support_zero_copy(const struct ublk_device *ub) 419 397 { 420 398 return ub->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY; 399 + } 400 + 401 + static inline bool ublk_support_shmem_zc(const struct ublk_queue *ubq) 402 + { 403 + return ubq->flags & UBLK_F_SHMEM_ZC; 404 + } 405 + 406 + static inline bool ublk_iod_is_shmem_zc(const struct ublk_queue *ubq, 407 + unsigned int tag) 408 + { 409 + return ublk_get_iod(ubq, tag)->op_flags & UBLK_IO_F_SHMEM_ZC; 410 + } 411 + 412 + static inline bool ublk_dev_support_shmem_zc(const struct ublk_device *ub) 413 + { 414 + return ub->dev_info.flags & UBLK_F_SHMEM_ZC; 421 415 } 422 416 423 417 static inline bool ublk_support_auto_buf_reg(const struct ublk_queue *ubq) ··· 846 808 847 809 static int ublk_integrity_flags(u32 flags) 848 810 { 849 - int ret_flags = 0; 811 + int ret_flags = BLK_SPLIT_INTERVAL_CAPABLE; 850 812 851 813 if (flags & LBMD_PI_CAP_INTEGRITY) { 852 814 flags &= ~LBMD_PI_CAP_INTEGRITY; ··· 1498 1460 iod->op_flags = ublk_op | ublk_req_build_flags(req); 1499 1461 iod->nr_sectors = blk_rq_sectors(req); 1500 1462 iod->start_sector = blk_rq_pos(req); 1463 + 1464 + /* Try shmem zero-copy match before setting addr */ 1465 + if (ublk_support_shmem_zc(ubq) && ublk_rq_has_data(req)) { 1466 + u32 buf_idx, buf_off; 1467 + 1468 + if (ublk_try_buf_match(ubq->dev, req, 1469 + &buf_idx, &buf_off)) { 1470 + iod->op_flags |= UBLK_IO_F_SHMEM_ZC; 1471 + iod->addr = ublk_shmem_zc_addr(buf_idx, buf_off); 1472 + return BLK_STS_OK; 1473 + } 1474 + } 1475 + 1501 1476 iod->addr = io->buf.addr; 1502 1477 1503 1478 return BLK_STS_OK; ··· 1554 1503 */ 1555 1504 if (req_op(req) != REQ_OP_READ && req_op(req) != REQ_OP_WRITE && 1556 1505 req_op(req) != REQ_OP_DRV_IN) 1506 + goto exit; 1507 + 1508 + /* shmem zero copy: no data to unmap, pages already shared */ 1509 + if (ublk_iod_is_shmem_zc(req->mq_hctx->driver_data, req->tag)) 1557 1510 goto exit; 1558 1511 1559 1512 /* for READ request, writing data in iod->addr to rq buffers */ ··· 1718 1663 static bool ublk_start_io(const struct ublk_queue *ubq, struct request *req, 1719 1664 struct ublk_io *io) 1720 1665 { 1721 - unsigned mapped_bytes = ublk_map_io(ubq, req, io); 1666 + unsigned mapped_bytes; 1667 + 1668 + /* shmem zero copy: skip data copy, pages already shared */ 1669 + if (ublk_iod_is_shmem_zc(ubq, req->tag)) 1670 + return true; 1671 + 1672 + mapped_bytes = ublk_map_io(ubq, req, io); 1722 1673 1723 1674 /* partially mapped, update io descriptor */ 1724 1675 if (unlikely(mapped_bytes != blk_rq_bytes(req))) { ··· 1850 1789 * Filter out UBLK_BATCH_IO_UNUSED_TAG entries from tag_buf. 1851 1790 * Returns the new length after filtering. 1852 1791 */ 1853 - static unsigned int ublk_filter_unused_tags(unsigned short *tag_buf, 1792 + static noinline unsigned int ublk_filter_unused_tags(unsigned short *tag_buf, 1854 1793 unsigned int len) 1855 1794 { 1856 1795 unsigned int i, j; ··· 1864 1803 } 1865 1804 1866 1805 return j; 1806 + } 1807 + 1808 + static noinline void ublk_batch_dispatch_fail(struct ublk_queue *ubq, 1809 + const struct ublk_batch_io_data *data, 1810 + unsigned short *tag_buf, size_t len, int ret) 1811 + { 1812 + int i, res; 1813 + 1814 + /* 1815 + * Undo prep state for all IOs since userspace never received them. 1816 + * This restores IOs to pre-prepared state so they can be cleanly 1817 + * re-prepared when tags are pulled from FIFO again. 1818 + */ 1819 + for (i = 0; i < len; i++) { 1820 + struct ublk_io *io = &ubq->ios[tag_buf[i]]; 1821 + int index = -1; 1822 + 1823 + ublk_io_lock(io); 1824 + if (io->flags & UBLK_IO_FLAG_AUTO_BUF_REG) 1825 + index = io->buf.auto_reg.index; 1826 + io->flags &= ~(UBLK_IO_FLAG_OWNED_BY_SRV | UBLK_IO_FLAG_AUTO_BUF_REG); 1827 + io->flags |= UBLK_IO_FLAG_ACTIVE; 1828 + ublk_io_unlock(io); 1829 + 1830 + if (index != -1) 1831 + io_buffer_unregister_bvec(data->cmd, index, 1832 + data->issue_flags); 1833 + } 1834 + 1835 + res = kfifo_in_spinlocked_noirqsave(&ubq->evts_fifo, 1836 + tag_buf, len, &ubq->evts_lock); 1837 + 1838 + pr_warn_ratelimited("%s: copy tags or post CQE failure, move back " 1839 + "tags(%d %zu) ret %d\n", __func__, res, len, 1840 + ret); 1867 1841 } 1868 1842 1869 1843 #define MAX_NR_TAG 128 ··· 1944 1848 1945 1849 sel.val = ublk_batch_copy_io_tags(fcmd, sel.addr, tag_buf, len * tag_sz); 1946 1850 ret = ublk_batch_fetch_post_cqe(fcmd, &sel, data->issue_flags); 1947 - if (unlikely(ret < 0)) { 1948 - int i, res; 1949 - 1950 - /* 1951 - * Undo prep state for all IOs since userspace never received them. 1952 - * This restores IOs to pre-prepared state so they can be cleanly 1953 - * re-prepared when tags are pulled from FIFO again. 1954 - */ 1955 - for (i = 0; i < len; i++) { 1956 - struct ublk_io *io = &ubq->ios[tag_buf[i]]; 1957 - int index = -1; 1958 - 1959 - ublk_io_lock(io); 1960 - if (io->flags & UBLK_IO_FLAG_AUTO_BUF_REG) 1961 - index = io->buf.auto_reg.index; 1962 - io->flags &= ~(UBLK_IO_FLAG_OWNED_BY_SRV | UBLK_IO_FLAG_AUTO_BUF_REG); 1963 - io->flags |= UBLK_IO_FLAG_ACTIVE; 1964 - ublk_io_unlock(io); 1965 - 1966 - if (index != -1) 1967 - io_buffer_unregister_bvec(data->cmd, index, 1968 - data->issue_flags); 1969 - } 1970 - 1971 - res = kfifo_in_spinlocked_noirqsave(&ubq->evts_fifo, 1972 - tag_buf, len, &ubq->evts_lock); 1973 - 1974 - pr_warn_ratelimited("%s: copy tags or post CQE failure, move back " 1975 - "tags(%d %zu) ret %d\n", __func__, res, len, 1976 - ret); 1977 - } 1851 + if (unlikely(ret < 0)) 1852 + ublk_batch_dispatch_fail(ubq, data, tag_buf, len, ret); 1978 1853 return ret; 1979 1854 } 1980 1855 ··· 2977 2910 ublk_cancel_dev(ub); 2978 2911 } 2979 2912 2913 + static void ublk_reset_io_flags(struct ublk_queue *ubq, struct ublk_io *io) 2914 + { 2915 + /* UBLK_IO_FLAG_CANCELED can be cleared now */ 2916 + spin_lock(&ubq->cancel_lock); 2917 + io->flags &= ~UBLK_IO_FLAG_CANCELED; 2918 + spin_unlock(&ubq->cancel_lock); 2919 + } 2920 + 2980 2921 /* reset per-queue io flags */ 2981 2922 static void ublk_queue_reset_io_flags(struct ublk_queue *ubq) 2982 2923 { 2983 - int j; 2984 - 2985 - /* UBLK_IO_FLAG_CANCELED can be cleared now */ 2986 2924 spin_lock(&ubq->cancel_lock); 2987 - for (j = 0; j < ubq->q_depth; j++) 2988 - ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED; 2989 2925 ubq->canceling = false; 2990 2926 spin_unlock(&ubq->cancel_lock); 2991 2927 ubq->fail_io = false; 2992 2928 } 2993 2929 2994 2930 /* device can only be started after all IOs are ready */ 2995 - static void ublk_mark_io_ready(struct ublk_device *ub, u16 q_id) 2931 + static void ublk_mark_io_ready(struct ublk_device *ub, u16 q_id, 2932 + struct ublk_io *io) 2996 2933 __must_hold(&ub->mutex) 2997 2934 { 2998 2935 struct ublk_queue *ubq = ublk_get_queue(ub, q_id); ··· 3005 2934 ub->unprivileged_daemons = true; 3006 2935 3007 2936 ubq->nr_io_ready++; 2937 + ublk_reset_io_flags(ubq, io); 3008 2938 3009 2939 /* Check if this specific queue is now fully ready */ 3010 2940 if (ublk_queue_ready(ubq)) { ··· 3268 3196 if (!ret) 3269 3197 ret = ublk_config_io_buf(ub, io, cmd, buf_addr, NULL); 3270 3198 if (!ret) 3271 - ublk_mark_io_ready(ub, q_id); 3199 + ublk_mark_io_ready(ub, q_id, io); 3272 3200 mutex_unlock(&ub->mutex); 3273 3201 return ret; 3274 3202 } ··· 3676 3604 ublk_io_unlock(io); 3677 3605 3678 3606 if (!ret) 3679 - ublk_mark_io_ready(data->ub, ubq->q_id); 3607 + ublk_mark_io_ready(data->ub, ubq->q_id, io); 3680 3608 3681 3609 return ret; 3682 3610 } ··· 4272 4200 { 4273 4201 struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev); 4274 4202 4203 + ublk_buf_cleanup(ub); 4275 4204 blk_mq_free_tag_set(&ub->tag_set); 4276 4205 ublk_deinit_queues(ub); 4277 4206 ublk_free_dev_number(ub); ··· 4694 4621 mutex_init(&ub->mutex); 4695 4622 spin_lock_init(&ub->lock); 4696 4623 mutex_init(&ub->cancel_mutex); 4624 + mt_init(&ub->buf_tree); 4625 + ida_init(&ub->buf_ida); 4697 4626 INIT_WORK(&ub->partition_scan_work, ublk_partition_scan_work); 4698 4627 4699 4628 ret = ublk_alloc_dev_number(ub, header->dev_id); ··· 5246 5171 return err; 5247 5172 } 5248 5173 5174 + /* 5175 + * Lock for maple tree modification: acquire ub->mutex, then freeze queue 5176 + * if device is started. If device is not yet started, only mutex is 5177 + * needed since no I/O path can access the tree. 5178 + * 5179 + * This ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked() 5180 + * already holds ub->mutex when calling del_gendisk() which freezes the queue. 5181 + */ 5182 + static unsigned int ublk_lock_buf_tree(struct ublk_device *ub) 5183 + { 5184 + unsigned int memflags = 0; 5185 + 5186 + mutex_lock(&ub->mutex); 5187 + if (ub->ub_disk) 5188 + memflags = blk_mq_freeze_queue(ub->ub_disk->queue); 5189 + 5190 + return memflags; 5191 + } 5192 + 5193 + static void ublk_unlock_buf_tree(struct ublk_device *ub, unsigned int memflags) 5194 + { 5195 + if (ub->ub_disk) 5196 + blk_mq_unfreeze_queue(ub->ub_disk->queue, memflags); 5197 + mutex_unlock(&ub->mutex); 5198 + } 5199 + 5200 + /* Erase coalesced PFN ranges from the maple tree matching buf_index */ 5201 + static void ublk_buf_erase_ranges(struct ublk_device *ub, int buf_index) 5202 + { 5203 + MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX); 5204 + struct ublk_buf_range *range; 5205 + 5206 + mas_lock(&mas); 5207 + mas_for_each(&mas, range, ULONG_MAX) { 5208 + if (range->buf_index == buf_index) { 5209 + mas_erase(&mas); 5210 + kfree(range); 5211 + } 5212 + } 5213 + mas_unlock(&mas); 5214 + } 5215 + 5216 + static int __ublk_ctrl_reg_buf(struct ublk_device *ub, 5217 + struct page **pages, unsigned long nr_pages, 5218 + int index, unsigned short flags) 5219 + { 5220 + unsigned long i; 5221 + int ret; 5222 + 5223 + for (i = 0; i < nr_pages; i++) { 5224 + unsigned long pfn = page_to_pfn(pages[i]); 5225 + unsigned long start = i; 5226 + struct ublk_buf_range *range; 5227 + 5228 + /* Find run of consecutive PFNs */ 5229 + while (i + 1 < nr_pages && 5230 + page_to_pfn(pages[i + 1]) == pfn + (i - start) + 1) 5231 + i++; 5232 + 5233 + range = kzalloc(sizeof(*range), GFP_KERNEL); 5234 + if (!range) { 5235 + ret = -ENOMEM; 5236 + goto unwind; 5237 + } 5238 + range->buf_index = index; 5239 + range->flags = flags; 5240 + range->base_offset = start << PAGE_SHIFT; 5241 + 5242 + ret = mtree_insert_range(&ub->buf_tree, pfn, 5243 + pfn + (i - start), 5244 + range, GFP_KERNEL); 5245 + if (ret) { 5246 + kfree(range); 5247 + goto unwind; 5248 + } 5249 + } 5250 + return 0; 5251 + 5252 + unwind: 5253 + ublk_buf_erase_ranges(ub, index); 5254 + return ret; 5255 + } 5256 + 5257 + /* 5258 + * Register a shared memory buffer for zero-copy I/O. 5259 + * Pins pages, builds PFN maple tree, freezes/unfreezes the queue 5260 + * internally. Returns buffer index (>= 0) on success. 5261 + */ 5262 + static int ublk_ctrl_reg_buf(struct ublk_device *ub, 5263 + struct ublksrv_ctrl_cmd *header) 5264 + { 5265 + void __user *argp = (void __user *)(unsigned long)header->addr; 5266 + struct ublk_shmem_buf_reg buf_reg; 5267 + unsigned long nr_pages; 5268 + struct page **pages = NULL; 5269 + unsigned int gup_flags; 5270 + unsigned int memflags; 5271 + long pinned; 5272 + int index; 5273 + int ret; 5274 + 5275 + if (!ublk_dev_support_shmem_zc(ub)) 5276 + return -EOPNOTSUPP; 5277 + 5278 + memset(&buf_reg, 0, sizeof(buf_reg)); 5279 + if (copy_from_user(&buf_reg, argp, 5280 + min_t(size_t, header->len, sizeof(buf_reg)))) 5281 + return -EFAULT; 5282 + 5283 + if (buf_reg.flags & ~UBLK_SHMEM_BUF_READ_ONLY) 5284 + return -EINVAL; 5285 + 5286 + if (buf_reg.reserved) 5287 + return -EINVAL; 5288 + 5289 + if (!buf_reg.len || buf_reg.len > UBLK_SHMEM_BUF_SIZE_MAX || 5290 + !PAGE_ALIGNED(buf_reg.len) || !PAGE_ALIGNED(buf_reg.addr)) 5291 + return -EINVAL; 5292 + 5293 + nr_pages = buf_reg.len >> PAGE_SHIFT; 5294 + 5295 + /* Pin pages before any locks (may sleep) */ 5296 + pages = kvmalloc_array(nr_pages, sizeof(*pages), GFP_KERNEL); 5297 + if (!pages) 5298 + return -ENOMEM; 5299 + 5300 + gup_flags = FOLL_LONGTERM; 5301 + if (!(buf_reg.flags & UBLK_SHMEM_BUF_READ_ONLY)) 5302 + gup_flags |= FOLL_WRITE; 5303 + 5304 + pinned = pin_user_pages_fast(buf_reg.addr, nr_pages, gup_flags, pages); 5305 + if (pinned < 0) { 5306 + ret = pinned; 5307 + goto err_free_pages; 5308 + } 5309 + if (pinned != nr_pages) { 5310 + ret = -EFAULT; 5311 + goto err_unpin; 5312 + } 5313 + 5314 + memflags = ublk_lock_buf_tree(ub); 5315 + 5316 + index = ida_alloc_max(&ub->buf_ida, USHRT_MAX, GFP_KERNEL); 5317 + if (index < 0) { 5318 + ret = index; 5319 + goto err_unlock; 5320 + } 5321 + 5322 + ret = __ublk_ctrl_reg_buf(ub, pages, nr_pages, index, buf_reg.flags); 5323 + if (ret) { 5324 + ida_free(&ub->buf_ida, index); 5325 + goto err_unlock; 5326 + } 5327 + 5328 + ublk_unlock_buf_tree(ub, memflags); 5329 + kvfree(pages); 5330 + return index; 5331 + 5332 + err_unlock: 5333 + ublk_unlock_buf_tree(ub, memflags); 5334 + err_unpin: 5335 + unpin_user_pages(pages, pinned); 5336 + err_free_pages: 5337 + kvfree(pages); 5338 + return ret; 5339 + } 5340 + 5341 + static int __ublk_ctrl_unreg_buf(struct ublk_device *ub, int buf_index) 5342 + { 5343 + MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX); 5344 + struct ublk_buf_range *range; 5345 + struct page *pages[32]; 5346 + int ret = -ENOENT; 5347 + 5348 + mas_lock(&mas); 5349 + mas_for_each(&mas, range, ULONG_MAX) { 5350 + unsigned long base, nr, off; 5351 + 5352 + if (range->buf_index != buf_index) 5353 + continue; 5354 + 5355 + ret = 0; 5356 + base = mas.index; 5357 + nr = mas.last - base + 1; 5358 + mas_erase(&mas); 5359 + 5360 + for (off = 0; off < nr; ) { 5361 + unsigned int batch = min_t(unsigned long, 5362 + nr - off, 32); 5363 + unsigned int j; 5364 + 5365 + for (j = 0; j < batch; j++) 5366 + pages[j] = pfn_to_page(base + off + j); 5367 + unpin_user_pages(pages, batch); 5368 + off += batch; 5369 + } 5370 + kfree(range); 5371 + } 5372 + mas_unlock(&mas); 5373 + 5374 + return ret; 5375 + } 5376 + 5377 + static int ublk_ctrl_unreg_buf(struct ublk_device *ub, 5378 + struct ublksrv_ctrl_cmd *header) 5379 + { 5380 + int index = (int)header->data[0]; 5381 + unsigned int memflags; 5382 + int ret; 5383 + 5384 + if (!ublk_dev_support_shmem_zc(ub)) 5385 + return -EOPNOTSUPP; 5386 + 5387 + if (index < 0 || index > USHRT_MAX) 5388 + return -EINVAL; 5389 + 5390 + memflags = ublk_lock_buf_tree(ub); 5391 + 5392 + ret = __ublk_ctrl_unreg_buf(ub, index); 5393 + if (!ret) 5394 + ida_free(&ub->buf_ida, index); 5395 + 5396 + ublk_unlock_buf_tree(ub, memflags); 5397 + return ret; 5398 + } 5399 + 5400 + static void ublk_buf_cleanup(struct ublk_device *ub) 5401 + { 5402 + MA_STATE(mas, &ub->buf_tree, 0, ULONG_MAX); 5403 + struct ublk_buf_range *range; 5404 + struct page *pages[32]; 5405 + 5406 + mas_for_each(&mas, range, ULONG_MAX) { 5407 + unsigned long base = mas.index; 5408 + unsigned long nr = mas.last - base + 1; 5409 + unsigned long off; 5410 + 5411 + for (off = 0; off < nr; ) { 5412 + unsigned int batch = min_t(unsigned long, 5413 + nr - off, 32); 5414 + unsigned int j; 5415 + 5416 + for (j = 0; j < batch; j++) 5417 + pages[j] = pfn_to_page(base + off + j); 5418 + unpin_user_pages(pages, batch); 5419 + off += batch; 5420 + } 5421 + kfree(range); 5422 + } 5423 + mtree_destroy(&ub->buf_tree); 5424 + ida_destroy(&ub->buf_ida); 5425 + } 5426 + 5427 + /* Check if request pages match a registered shared memory buffer */ 5428 + static bool ublk_try_buf_match(struct ublk_device *ub, 5429 + struct request *rq, 5430 + u32 *buf_idx, u32 *buf_off) 5431 + { 5432 + struct req_iterator iter; 5433 + struct bio_vec bv; 5434 + int index = -1; 5435 + unsigned long expected_offset = 0; 5436 + bool first = true; 5437 + 5438 + rq_for_each_bvec(bv, rq, iter) { 5439 + unsigned long pfn = page_to_pfn(bv.bv_page); 5440 + unsigned long end_pfn = pfn + 5441 + ((bv.bv_offset + bv.bv_len - 1) >> PAGE_SHIFT); 5442 + struct ublk_buf_range *range; 5443 + unsigned long off; 5444 + MA_STATE(mas, &ub->buf_tree, pfn, pfn); 5445 + 5446 + range = mas_walk(&mas); 5447 + if (!range) 5448 + return false; 5449 + 5450 + /* verify all pages in this bvec fall within the range */ 5451 + if (end_pfn > mas.last) 5452 + return false; 5453 + 5454 + off = range->base_offset + 5455 + (pfn - mas.index) * PAGE_SIZE + bv.bv_offset; 5456 + 5457 + if (first) { 5458 + /* Read-only buffer can't serve READ (kernel writes) */ 5459 + if ((range->flags & UBLK_SHMEM_BUF_READ_ONLY) && 5460 + req_op(rq) != REQ_OP_WRITE) 5461 + return false; 5462 + index = range->buf_index; 5463 + expected_offset = off; 5464 + *buf_off = off; 5465 + first = false; 5466 + } else { 5467 + if (range->buf_index != index) 5468 + return false; 5469 + if (off != expected_offset) 5470 + return false; 5471 + } 5472 + expected_offset += bv.bv_len; 5473 + } 5474 + 5475 + if (first) 5476 + return false; 5477 + 5478 + *buf_idx = index; 5479 + return true; 5480 + } 5481 + 5249 5482 static int ublk_ctrl_uring_cmd_permission(struct ublk_device *ub, 5250 5483 u32 cmd_op, struct ublksrv_ctrl_cmd *header) 5251 5484 { ··· 5611 5228 case UBLK_CMD_UPDATE_SIZE: 5612 5229 case UBLK_CMD_QUIESCE_DEV: 5613 5230 case UBLK_CMD_TRY_STOP_DEV: 5231 + case UBLK_CMD_REG_BUF: 5232 + case UBLK_CMD_UNREG_BUF: 5614 5233 mask = MAY_READ | MAY_WRITE; 5615 5234 break; 5616 5235 default: ··· 5736 5351 break; 5737 5352 case UBLK_CMD_TRY_STOP_DEV: 5738 5353 ret = ublk_ctrl_try_stop_dev(ub); 5354 + break; 5355 + case UBLK_CMD_REG_BUF: 5356 + ret = ublk_ctrl_reg_buf(ub, &header); 5357 + break; 5358 + case UBLK_CMD_UNREG_BUF: 5359 + ret = ublk_ctrl_unreg_buf(ub, &header); 5739 5360 break; 5740 5361 default: 5741 5362 ret = -EOPNOTSUPP;
+382 -129
drivers/block/zloop.c
··· 17 17 #include <linux/mutex.h> 18 18 #include <linux/parser.h> 19 19 #include <linux/seq_file.h> 20 + #include <linux/xattr.h> 20 21 21 22 /* 22 23 * Options for adding (and removing) a device. ··· 35 34 ZLOOP_OPT_BUFFERED_IO = (1 << 8), 36 35 ZLOOP_OPT_ZONE_APPEND = (1 << 9), 37 36 ZLOOP_OPT_ORDERED_ZONE_APPEND = (1 << 10), 37 + ZLOOP_OPT_DISCARD_WRITE_CACHE = (1 << 11), 38 + ZLOOP_OPT_MAX_OPEN_ZONES = (1 << 12), 38 39 }; 39 40 40 41 static const match_table_t zloop_opt_tokens = { ··· 51 48 { ZLOOP_OPT_BUFFERED_IO, "buffered_io" }, 52 49 { ZLOOP_OPT_ZONE_APPEND, "zone_append=%u" }, 53 50 { ZLOOP_OPT_ORDERED_ZONE_APPEND, "ordered_zone_append" }, 51 + { ZLOOP_OPT_DISCARD_WRITE_CACHE, "discard_write_cache" }, 52 + { ZLOOP_OPT_MAX_OPEN_ZONES, "max_open_zones=%u" }, 54 53 { ZLOOP_OPT_ERR, NULL } 55 54 }; 56 55 ··· 61 56 #define ZLOOP_DEF_ZONE_SIZE ((256ULL * SZ_1M) >> SECTOR_SHIFT) 62 57 #define ZLOOP_DEF_NR_ZONES 64 63 58 #define ZLOOP_DEF_NR_CONV_ZONES 8 59 + #define ZLOOP_DEF_MAX_OPEN_ZONES 0 64 60 #define ZLOOP_DEF_BASE_DIR "/var/local/zloop" 65 61 #define ZLOOP_DEF_NR_QUEUES 1 66 62 #define ZLOOP_DEF_QUEUE_DEPTH 128 ··· 79 73 sector_t zone_size; 80 74 sector_t zone_capacity; 81 75 unsigned int nr_conv_zones; 76 + unsigned int max_open_zones; 82 77 char *base_dir; 83 78 unsigned int nr_queues; 84 79 unsigned int queue_depth; 85 80 bool buffered_io; 86 81 bool zone_append; 87 82 bool ordered_zone_append; 83 + bool discard_write_cache; 88 84 }; 89 85 90 86 /* ··· 103 95 ZLOOP_ZONE_SEQ_ERROR, 104 96 }; 105 97 98 + /* 99 + * Zone descriptor. 100 + * Locking order: z.lock -> z.wp_lock -> zlo.open_zones_lock 101 + */ 106 102 struct zloop_zone { 103 + struct list_head open_zone_entry; 107 104 struct file *file; 108 105 109 106 unsigned long flags; ··· 132 119 bool buffered_io; 133 120 bool zone_append; 134 121 bool ordered_zone_append; 122 + bool discard_write_cache; 135 123 136 124 const char *base_dir; 137 125 struct file *data_dir; ··· 142 128 sector_t zone_capacity; 143 129 unsigned int nr_zones; 144 130 unsigned int nr_conv_zones; 131 + unsigned int max_open_zones; 145 132 unsigned int block_size; 133 + 134 + spinlock_t open_zones_lock; 135 + struct list_head open_zones_lru_list; 136 + unsigned int nr_open_zones; 146 137 147 138 struct zloop_zone zones[] __counted_by(nr_zones); 148 139 }; ··· 170 151 struct zloop_device *zlo = rq->q->queuedata; 171 152 172 153 return blk_rq_pos(rq) >> zlo->zone_shift; 154 + } 155 + 156 + /* 157 + * Open an already open zone. This is mostly a no-op, except for the imp open -> 158 + * exp open condition change that may happen. We also move a zone at the tail of 159 + * the list of open zones so that if we need to 160 + * implicitly close one open zone, we can do so in LRU order. 161 + */ 162 + static inline void zloop_lru_rotate_open_zone(struct zloop_device *zlo, 163 + struct zloop_zone *zone) 164 + { 165 + if (zlo->max_open_zones) { 166 + spin_lock(&zlo->open_zones_lock); 167 + list_move_tail(&zone->open_zone_entry, 168 + &zlo->open_zones_lru_list); 169 + spin_unlock(&zlo->open_zones_lock); 170 + } 171 + } 172 + 173 + static inline void zloop_lru_remove_open_zone(struct zloop_device *zlo, 174 + struct zloop_zone *zone) 175 + { 176 + if (zone->cond == BLK_ZONE_COND_IMP_OPEN || 177 + zone->cond == BLK_ZONE_COND_EXP_OPEN) { 178 + spin_lock(&zlo->open_zones_lock); 179 + list_del_init(&zone->open_zone_entry); 180 + zlo->nr_open_zones--; 181 + spin_unlock(&zlo->open_zones_lock); 182 + } 183 + } 184 + 185 + static inline bool zloop_can_open_zone(struct zloop_device *zlo) 186 + { 187 + return !zlo->max_open_zones || zlo->nr_open_zones < zlo->max_open_zones; 188 + } 189 + 190 + /* 191 + * If we have reached the maximum open zones limit, attempt to close an 192 + * implicitly open zone (if we have any) so that we can implicitly open another 193 + * zone without exceeding the maximum number of open zones. 194 + */ 195 + static bool zloop_close_imp_open_zone(struct zloop_device *zlo) 196 + { 197 + struct zloop_zone *zone; 198 + 199 + lockdep_assert_held(&zlo->open_zones_lock); 200 + 201 + if (zloop_can_open_zone(zlo)) 202 + return true; 203 + 204 + list_for_each_entry(zone, &zlo->open_zones_lru_list, open_zone_entry) { 205 + if (zone->cond == BLK_ZONE_COND_IMP_OPEN) { 206 + zone->cond = BLK_ZONE_COND_CLOSED; 207 + list_del_init(&zone->open_zone_entry); 208 + zlo->nr_open_zones--; 209 + return true; 210 + } 211 + } 212 + 213 + return false; 214 + } 215 + 216 + static bool zloop_open_closed_or_empty_zone(struct zloop_device *zlo, 217 + struct zloop_zone *zone, 218 + bool explicit) 219 + { 220 + spin_lock(&zlo->open_zones_lock); 221 + 222 + if (explicit) { 223 + /* 224 + * Explicit open: we cannot allow this if we have reached the 225 + * maximum open zones limit. 226 + */ 227 + if (!zloop_can_open_zone(zlo)) 228 + goto fail; 229 + zone->cond = BLK_ZONE_COND_EXP_OPEN; 230 + } else { 231 + /* 232 + * Implicit open case: if we have reached the maximum open zones 233 + * limit, try to close an implicitly open zone first. 234 + */ 235 + if (!zloop_close_imp_open_zone(zlo)) 236 + goto fail; 237 + zone->cond = BLK_ZONE_COND_IMP_OPEN; 238 + } 239 + 240 + zlo->nr_open_zones++; 241 + list_add_tail(&zone->open_zone_entry, 242 + &zlo->open_zones_lru_list); 243 + 244 + spin_unlock(&zlo->open_zones_lock); 245 + 246 + return true; 247 + 248 + fail: 249 + spin_unlock(&zlo->open_zones_lock); 250 + 251 + return false; 252 + } 253 + 254 + static bool zloop_do_open_zone(struct zloop_device *zlo, 255 + struct zloop_zone *zone, bool explicit) 256 + { 257 + switch (zone->cond) { 258 + case BLK_ZONE_COND_IMP_OPEN: 259 + case BLK_ZONE_COND_EXP_OPEN: 260 + if (explicit) 261 + zone->cond = BLK_ZONE_COND_EXP_OPEN; 262 + zloop_lru_rotate_open_zone(zlo, zone); 263 + return true; 264 + case BLK_ZONE_COND_EMPTY: 265 + case BLK_ZONE_COND_CLOSED: 266 + return zloop_open_closed_or_empty_zone(zlo, zone, explicit); 267 + default: 268 + return false; 269 + } 173 270 } 174 271 175 272 static int zloop_update_seq_zone(struct zloop_device *zlo, unsigned int zone_no) ··· 321 186 322 187 spin_lock_irqsave(&zone->wp_lock, flags); 323 188 if (!file_sectors) { 189 + zloop_lru_remove_open_zone(zlo, zone); 324 190 zone->cond = BLK_ZONE_COND_EMPTY; 325 191 zone->wp = zone->start; 326 192 } else if (file_sectors == zlo->zone_capacity) { 193 + zloop_lru_remove_open_zone(zlo, zone); 327 194 zone->cond = BLK_ZONE_COND_FULL; 328 195 zone->wp = ULLONG_MAX; 329 196 } else { 330 - zone->cond = BLK_ZONE_COND_CLOSED; 197 + if (zone->cond != BLK_ZONE_COND_IMP_OPEN && 198 + zone->cond != BLK_ZONE_COND_EXP_OPEN) 199 + zone->cond = BLK_ZONE_COND_CLOSED; 331 200 zone->wp = zone->start + file_sectors; 332 201 } 333 202 spin_unlock_irqrestore(&zone->wp_lock, flags); ··· 355 216 goto unlock; 356 217 } 357 218 358 - switch (zone->cond) { 359 - case BLK_ZONE_COND_EXP_OPEN: 360 - break; 361 - case BLK_ZONE_COND_EMPTY: 362 - case BLK_ZONE_COND_CLOSED: 363 - case BLK_ZONE_COND_IMP_OPEN: 364 - zone->cond = BLK_ZONE_COND_EXP_OPEN; 365 - break; 366 - case BLK_ZONE_COND_FULL: 367 - default: 219 + if (!zloop_do_open_zone(zlo, zone, true)) 368 220 ret = -EIO; 369 - break; 370 - } 371 221 372 222 unlock: 373 223 mutex_unlock(&zone->lock); ··· 387 259 case BLK_ZONE_COND_IMP_OPEN: 388 260 case BLK_ZONE_COND_EXP_OPEN: 389 261 spin_lock_irqsave(&zone->wp_lock, flags); 262 + zloop_lru_remove_open_zone(zlo, zone); 390 263 if (zone->wp == zone->start) 391 264 zone->cond = BLK_ZONE_COND_EMPTY; 392 265 else ··· 429 300 } 430 301 431 302 spin_lock_irqsave(&zone->wp_lock, flags); 303 + zloop_lru_remove_open_zone(zlo, zone); 432 304 zone->cond = BLK_ZONE_COND_EMPTY; 433 305 zone->wp = zone->start; 434 306 clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags); ··· 477 347 } 478 348 479 349 spin_lock_irqsave(&zone->wp_lock, flags); 350 + zloop_lru_remove_open_zone(zlo, zone); 480 351 zone->cond = BLK_ZONE_COND_FULL; 481 352 zone->wp = ULLONG_MAX; 482 353 clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags); ··· 509 378 zloop_put_cmd(cmd); 510 379 } 511 380 512 - static void zloop_rw(struct zloop_cmd *cmd) 381 + static int zloop_do_rw(struct zloop_cmd *cmd) 513 382 { 514 383 struct request *rq = blk_mq_rq_from_pdu(cmd); 384 + int rw = req_op(rq) == REQ_OP_READ ? ITER_DEST : ITER_SOURCE; 385 + unsigned int nr_bvec = blk_rq_nr_bvec(rq); 515 386 struct zloop_device *zlo = rq->q->queuedata; 516 - unsigned int zone_no = rq_zone_no(rq); 517 - sector_t sector = blk_rq_pos(rq); 518 - sector_t nr_sectors = blk_rq_sectors(rq); 519 - bool is_append = req_op(rq) == REQ_OP_ZONE_APPEND; 520 - bool is_write = req_op(rq) == REQ_OP_WRITE || is_append; 521 - int rw = is_write ? ITER_SOURCE : ITER_DEST; 387 + struct zloop_zone *zone = &zlo->zones[rq_zone_no(rq)]; 522 388 struct req_iterator rq_iter; 523 - struct zloop_zone *zone; 524 389 struct iov_iter iter; 525 - struct bio_vec tmp; 526 - unsigned long flags; 527 - sector_t zone_end; 528 - unsigned int nr_bvec; 529 - int ret; 530 - 531 - atomic_set(&cmd->ref, 2); 532 - cmd->sector = sector; 533 - cmd->nr_sectors = nr_sectors; 534 - cmd->ret = 0; 535 - 536 - if (WARN_ON_ONCE(is_append && !zlo->zone_append)) { 537 - ret = -EIO; 538 - goto out; 539 - } 540 - 541 - /* We should never get an I/O beyond the device capacity. */ 542 - if (WARN_ON_ONCE(zone_no >= zlo->nr_zones)) { 543 - ret = -EIO; 544 - goto out; 545 - } 546 - zone = &zlo->zones[zone_no]; 547 - zone_end = zone->start + zlo->zone_capacity; 548 - 549 - /* 550 - * The block layer should never send requests that are not fully 551 - * contained within the zone. 552 - */ 553 - if (WARN_ON_ONCE(sector + nr_sectors > zone->start + zlo->zone_size)) { 554 - ret = -EIO; 555 - goto out; 556 - } 557 - 558 - if (test_and_clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags)) { 559 - mutex_lock(&zone->lock); 560 - ret = zloop_update_seq_zone(zlo, zone_no); 561 - mutex_unlock(&zone->lock); 562 - if (ret) 563 - goto out; 564 - } 565 - 566 - if (!test_bit(ZLOOP_ZONE_CONV, &zone->flags) && is_write) { 567 - mutex_lock(&zone->lock); 568 - 569 - spin_lock_irqsave(&zone->wp_lock, flags); 570 - 571 - /* 572 - * Zone append operations always go at the current write 573 - * pointer, but regular write operations must already be 574 - * aligned to the write pointer when submitted. 575 - */ 576 - if (is_append) { 577 - /* 578 - * If ordered zone append is in use, we already checked 579 - * and set the target sector in zloop_queue_rq(). 580 - */ 581 - if (!zlo->ordered_zone_append) { 582 - if (zone->cond == BLK_ZONE_COND_FULL || 583 - zone->wp + nr_sectors > zone_end) { 584 - spin_unlock_irqrestore(&zone->wp_lock, 585 - flags); 586 - ret = -EIO; 587 - goto unlock; 588 - } 589 - sector = zone->wp; 590 - } 591 - cmd->sector = sector; 592 - } else if (sector != zone->wp) { 593 - spin_unlock_irqrestore(&zone->wp_lock, flags); 594 - pr_err("Zone %u: unaligned write: sect %llu, wp %llu\n", 595 - zone_no, sector, zone->wp); 596 - ret = -EIO; 597 - goto unlock; 598 - } 599 - 600 - /* Implicitly open the target zone. */ 601 - if (zone->cond == BLK_ZONE_COND_CLOSED || 602 - zone->cond == BLK_ZONE_COND_EMPTY) 603 - zone->cond = BLK_ZONE_COND_IMP_OPEN; 604 - 605 - /* 606 - * Advance the write pointer, unless ordered zone append is in 607 - * use. If the write fails, the write pointer position will be 608 - * corrected when the next I/O starts execution. 609 - */ 610 - if (!is_append || !zlo->ordered_zone_append) { 611 - zone->wp += nr_sectors; 612 - if (zone->wp == zone_end) { 613 - zone->cond = BLK_ZONE_COND_FULL; 614 - zone->wp = ULLONG_MAX; 615 - } 616 - } 617 - 618 - spin_unlock_irqrestore(&zone->wp_lock, flags); 619 - } 620 - 621 - nr_bvec = blk_rq_nr_bvec(rq); 622 390 623 391 if (rq->bio != rq->biotail) { 624 - struct bio_vec *bvec; 392 + struct bio_vec tmp, *bvec; 625 393 626 394 cmd->bvec = kmalloc_objs(*cmd->bvec, nr_bvec, GFP_NOIO); 627 - if (!cmd->bvec) { 628 - ret = -EIO; 629 - goto unlock; 630 - } 395 + if (!cmd->bvec) 396 + return -EIO; 631 397 632 398 /* 633 399 * The bios of the request may be started from the middle of ··· 550 522 iter.iov_offset = rq->bio->bi_iter.bi_bvec_done; 551 523 } 552 524 553 - cmd->iocb.ki_pos = (sector - zone->start) << SECTOR_SHIFT; 525 + cmd->iocb.ki_pos = (cmd->sector - zone->start) << SECTOR_SHIFT; 554 526 cmd->iocb.ki_filp = zone->file; 555 527 cmd->iocb.ki_complete = zloop_rw_complete; 556 528 if (!zlo->buffered_io) ··· 558 530 cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0); 559 531 560 532 if (rw == ITER_SOURCE) 561 - ret = zone->file->f_op->write_iter(&cmd->iocb, &iter); 562 - else 563 - ret = zone->file->f_op->read_iter(&cmd->iocb, &iter); 564 - unlock: 565 - if (!test_bit(ZLOOP_ZONE_CONV, &zone->flags) && is_write) 533 + return zone->file->f_op->write_iter(&cmd->iocb, &iter); 534 + return zone->file->f_op->read_iter(&cmd->iocb, &iter); 535 + } 536 + 537 + static int zloop_seq_write_prep(struct zloop_cmd *cmd) 538 + { 539 + struct request *rq = blk_mq_rq_from_pdu(cmd); 540 + struct zloop_device *zlo = rq->q->queuedata; 541 + unsigned int zone_no = rq_zone_no(rq); 542 + sector_t nr_sectors = blk_rq_sectors(rq); 543 + bool is_append = req_op(rq) == REQ_OP_ZONE_APPEND; 544 + struct zloop_zone *zone = &zlo->zones[zone_no]; 545 + sector_t zone_end = zone->start + zlo->zone_capacity; 546 + unsigned long flags; 547 + int ret = 0; 548 + 549 + spin_lock_irqsave(&zone->wp_lock, flags); 550 + 551 + /* 552 + * Zone append operations always go at the current write pointer, but 553 + * regular write operations must already be aligned to the write pointer 554 + * when submitted. 555 + */ 556 + if (is_append) { 557 + /* 558 + * If ordered zone append is in use, we already checked and set 559 + * the target sector in zloop_queue_rq(). 560 + */ 561 + if (!zlo->ordered_zone_append) { 562 + if (zone->cond == BLK_ZONE_COND_FULL || 563 + zone->wp + nr_sectors > zone_end) { 564 + ret = -EIO; 565 + goto out_unlock; 566 + } 567 + cmd->sector = zone->wp; 568 + } 569 + } else { 570 + if (cmd->sector != zone->wp) { 571 + pr_err("Zone %u: unaligned write: sect %llu, wp %llu\n", 572 + zone_no, cmd->sector, zone->wp); 573 + ret = -EIO; 574 + goto out_unlock; 575 + } 576 + } 577 + 578 + /* Implicitly open the target zone. */ 579 + if (!zloop_do_open_zone(zlo, zone, false)) { 580 + ret = -EIO; 581 + goto out_unlock; 582 + } 583 + 584 + /* 585 + * Advance the write pointer, unless ordered zone append is in use. If 586 + * the write fails, the write pointer position will be corrected when 587 + * the next I/O starts execution. 588 + */ 589 + if (!is_append || !zlo->ordered_zone_append) { 590 + zone->wp += nr_sectors; 591 + if (zone->wp == zone_end) { 592 + zloop_lru_remove_open_zone(zlo, zone); 593 + zone->cond = BLK_ZONE_COND_FULL; 594 + zone->wp = ULLONG_MAX; 595 + } 596 + } 597 + out_unlock: 598 + spin_unlock_irqrestore(&zone->wp_lock, flags); 599 + return ret; 600 + } 601 + 602 + static void zloop_rw(struct zloop_cmd *cmd) 603 + { 604 + struct request *rq = blk_mq_rq_from_pdu(cmd); 605 + struct zloop_device *zlo = rq->q->queuedata; 606 + unsigned int zone_no = rq_zone_no(rq); 607 + sector_t nr_sectors = blk_rq_sectors(rq); 608 + bool is_append = req_op(rq) == REQ_OP_ZONE_APPEND; 609 + bool is_write = req_op(rq) == REQ_OP_WRITE || is_append; 610 + struct zloop_zone *zone; 611 + int ret = -EIO; 612 + 613 + atomic_set(&cmd->ref, 2); 614 + cmd->sector = blk_rq_pos(rq); 615 + cmd->nr_sectors = nr_sectors; 616 + cmd->ret = 0; 617 + 618 + if (WARN_ON_ONCE(is_append && !zlo->zone_append)) 619 + goto out; 620 + 621 + /* We should never get an I/O beyond the device capacity. */ 622 + if (WARN_ON_ONCE(zone_no >= zlo->nr_zones)) 623 + goto out; 624 + 625 + zone = &zlo->zones[zone_no]; 626 + 627 + /* 628 + * The block layer should never send requests that are not fully 629 + * contained within the zone. 630 + */ 631 + if (WARN_ON_ONCE(cmd->sector + nr_sectors > 632 + zone->start + zlo->zone_size)) 633 + goto out; 634 + 635 + if (test_and_clear_bit(ZLOOP_ZONE_SEQ_ERROR, &zone->flags)) { 636 + mutex_lock(&zone->lock); 637 + ret = zloop_update_seq_zone(zlo, zone_no); 566 638 mutex_unlock(&zone->lock); 639 + if (ret) 640 + goto out; 641 + } 642 + 643 + if (!test_bit(ZLOOP_ZONE_CONV, &zone->flags) && is_write) { 644 + mutex_lock(&zone->lock); 645 + ret = zloop_seq_write_prep(cmd); 646 + if (!ret) 647 + ret = zloop_do_rw(cmd); 648 + mutex_unlock(&zone->lock); 649 + } else { 650 + ret = zloop_do_rw(cmd); 651 + } 567 652 out: 568 653 if (ret != -EIOCBQUEUED) 569 654 zloop_rw_complete(&cmd->iocb, ret); 570 655 zloop_put_cmd(cmd); 656 + } 657 + 658 + static inline bool zloop_zone_is_active(struct zloop_zone *zone) 659 + { 660 + switch (zone->cond) { 661 + case BLK_ZONE_COND_EXP_OPEN: 662 + case BLK_ZONE_COND_IMP_OPEN: 663 + case BLK_ZONE_COND_CLOSED: 664 + return true; 665 + default: 666 + return false; 667 + } 668 + } 669 + 670 + static int zloop_record_safe_wps(struct zloop_device *zlo) 671 + { 672 + unsigned int i; 673 + int ret; 674 + 675 + for (i = 0; i < zlo->nr_zones; i++) { 676 + struct zloop_zone *zone = &zlo->zones[i]; 677 + struct file *file = zone->file; 678 + 679 + if (!zloop_zone_is_active(zone)) 680 + continue; 681 + ret = vfs_setxattr(file_mnt_idmap(file), file_dentry(file), 682 + "user.zloop.wp", &zone->wp, sizeof(zone->wp), 0); 683 + if (ret) { 684 + pr_err("%pg: failed to record write pointer (%d)\n", 685 + zlo->disk->part0, ret); 686 + return ret; 687 + } 688 + } 689 + 690 + return 0; 571 691 } 572 692 573 693 /* ··· 725 549 { 726 550 struct super_block *sb = file_inode(zlo->data_dir)->i_sb; 727 551 int ret; 552 + 553 + if (zlo->discard_write_cache) { 554 + ret = zloop_record_safe_wps(zlo); 555 + if (ret) 556 + return ret; 557 + } 728 558 729 559 down_read(&sb->s_umount); 730 560 ret = sync_filesystem(sb); ··· 874 692 rq->__sector = zone->wp; 875 693 zone->wp += blk_rq_sectors(rq); 876 694 if (zone->wp >= zone_end) { 695 + zloop_lru_remove_open_zone(zlo, zone); 877 696 zone->cond = BLK_ZONE_COND_FULL; 878 697 zone->wp = ULLONG_MAX; 879 698 } ··· 1072 889 int ret; 1073 890 1074 891 mutex_init(&zone->lock); 892 + INIT_LIST_HEAD(&zone->open_zone_entry); 1075 893 spin_lock_init(&zone->wp_lock); 1076 894 zone->start = (sector_t)zone_no << zlo->zone_shift; 1077 895 ··· 1193 1009 goto out; 1194 1010 } 1195 1011 1012 + if (opts->max_open_zones > nr_zones - opts->nr_conv_zones) { 1013 + pr_err("Invalid maximum number of open zones %u\n", 1014 + opts->max_open_zones); 1015 + goto out; 1016 + } 1017 + 1196 1018 zlo = kvzalloc_flex(*zlo, zones, nr_zones); 1197 1019 if (!zlo) { 1198 1020 ret = -ENOMEM; 1199 1021 goto out; 1200 1022 } 1201 1023 WRITE_ONCE(zlo->state, Zlo_creating); 1024 + spin_lock_init(&zlo->open_zones_lock); 1025 + INIT_LIST_HEAD(&zlo->open_zones_lru_list); 1202 1026 1203 1027 ret = mutex_lock_killable(&zloop_ctl_mutex); 1204 1028 if (ret) ··· 1234 1042 zlo->zone_capacity = zlo->zone_size; 1235 1043 zlo->nr_zones = nr_zones; 1236 1044 zlo->nr_conv_zones = opts->nr_conv_zones; 1045 + zlo->max_open_zones = opts->max_open_zones; 1237 1046 zlo->buffered_io = opts->buffered_io; 1238 1047 zlo->zone_append = opts->zone_append; 1239 1048 if (zlo->zone_append) 1240 1049 zlo->ordered_zone_append = opts->ordered_zone_append; 1050 + zlo->discard_write_cache = opts->discard_write_cache; 1241 1051 1242 1052 zlo->workqueue = alloc_workqueue("zloop%d", WQ_UNBOUND | WQ_FREEZABLE, 1243 1053 opts->nr_queues * opts->queue_depth, zlo->id); ··· 1282 1088 lim.logical_block_size = zlo->block_size; 1283 1089 if (zlo->zone_append) 1284 1090 lim.max_hw_zone_append_sectors = lim.max_hw_sectors; 1091 + lim.max_open_zones = zlo->max_open_zones; 1285 1092 1286 1093 zlo->tag_set.ops = &zloop_mq_ops; 1287 1094 zlo->tag_set.nr_hw_queues = opts->nr_queues; ··· 1363 1168 return ret; 1364 1169 } 1365 1170 1171 + static void zloop_truncate(struct file *file, loff_t pos) 1172 + { 1173 + struct mnt_idmap *idmap = file_mnt_idmap(file); 1174 + struct dentry *dentry = file_dentry(file); 1175 + struct iattr newattrs; 1176 + 1177 + newattrs.ia_size = pos; 1178 + newattrs.ia_valid = ATTR_SIZE; 1179 + 1180 + inode_lock(dentry->d_inode); 1181 + notify_change(idmap, dentry, &newattrs, NULL); 1182 + inode_unlock(dentry->d_inode); 1183 + } 1184 + 1185 + static void zloop_forget_cache(struct zloop_device *zlo) 1186 + { 1187 + unsigned int i; 1188 + int ret; 1189 + 1190 + pr_info("%pg: discarding volatile write cache\n", zlo->disk->part0); 1191 + 1192 + for (i = 0; i < zlo->nr_zones; i++) { 1193 + struct zloop_zone *zone = &zlo->zones[i]; 1194 + struct file *file = zone->file; 1195 + sector_t old_wp; 1196 + 1197 + if (!zloop_zone_is_active(zone)) 1198 + continue; 1199 + 1200 + ret = vfs_getxattr(file_mnt_idmap(file), file_dentry(file), 1201 + "user.zloop.wp", &old_wp, sizeof(old_wp)); 1202 + if (ret == -ENODATA) { 1203 + old_wp = 0; 1204 + } else if (ret != sizeof(old_wp)) { 1205 + pr_err("%pg: failed to retrieve write pointer (%d)\n", 1206 + zlo->disk->part0, ret); 1207 + continue; 1208 + } 1209 + if (old_wp < zone->wp) 1210 + zloop_truncate(file, old_wp); 1211 + } 1212 + } 1213 + 1366 1214 static int zloop_ctl_remove(struct zloop_options *opts) 1367 1215 { 1368 1216 struct zloop_device *zlo; ··· 1440 1202 return ret; 1441 1203 1442 1204 del_gendisk(zlo->disk); 1205 + 1206 + if (zlo->discard_write_cache) 1207 + zloop_forget_cache(zlo); 1208 + 1443 1209 put_disk(zlo->disk); 1444 1210 1445 1211 pr_info("Removed device %d\n", opts->id); ··· 1466 1224 opts->capacity = ZLOOP_DEF_ZONE_SIZE * ZLOOP_DEF_NR_ZONES; 1467 1225 opts->zone_size = ZLOOP_DEF_ZONE_SIZE; 1468 1226 opts->nr_conv_zones = ZLOOP_DEF_NR_CONV_ZONES; 1227 + opts->max_open_zones = ZLOOP_DEF_MAX_OPEN_ZONES; 1469 1228 opts->nr_queues = ZLOOP_DEF_NR_QUEUES; 1470 1229 opts->queue_depth = ZLOOP_DEF_QUEUE_DEPTH; 1471 1230 opts->buffered_io = ZLOOP_DEF_BUFFERED_IO; ··· 1545 1302 } 1546 1303 opts->nr_conv_zones = token; 1547 1304 break; 1305 + case ZLOOP_OPT_MAX_OPEN_ZONES: 1306 + if (match_uint(args, &token)) { 1307 + ret = -EINVAL; 1308 + goto out; 1309 + } 1310 + opts->max_open_zones = token; 1311 + break; 1548 1312 case ZLOOP_OPT_BASE_DIR: 1549 1313 p = match_strdup(args); 1550 1314 if (!p) { ··· 1602 1352 break; 1603 1353 case ZLOOP_OPT_ORDERED_ZONE_APPEND: 1604 1354 opts->ordered_zone_append = true; 1355 + break; 1356 + case ZLOOP_OPT_DISCARD_WRITE_CACHE: 1357 + opts->discard_write_cache = true; 1605 1358 break; 1606 1359 case ZLOOP_OPT_ERR: 1607 1360 default:
+8
drivers/md/bcache/super.c
··· 1373 1373 1374 1374 mutex_unlock(&bch_register_lock); 1375 1375 1376 + /* 1377 + * Wait for any pending sb_write to complete before free. 1378 + * The sb_bio is embedded in struct cached_dev, so we must 1379 + * ensure no I/O is in progress. 1380 + */ 1381 + down(&dc->sb_write_mutex); 1382 + up(&dc->sb_write_mutex); 1383 + 1376 1384 if (dc->sb_disk) 1377 1385 folio_put(virt_to_folio(dc->sb_disk)); 1378 1386
+195 -18
drivers/md/md-llbitmap.c
··· 208 208 BitNeedSync, 209 209 /* data is synchronizing */ 210 210 BitSyncing, 211 + /* 212 + * Proactive sync requested for unwritten region (raid456 only). 213 + * Triggered via sysfs when user wants to pre-build XOR parity 214 + * for regions that have never been written. 215 + */ 216 + BitNeedSyncUnwritten, 217 + /* Proactive sync in progress for unwritten region */ 218 + BitSyncingUnwritten, 219 + /* 220 + * XOR parity has been pre-built for a region that has never had 221 + * user data written. When user writes to this region, it transitions 222 + * to BitDirty. 223 + */ 224 + BitCleanUnwritten, 211 225 BitStateCount, 212 226 BitNone = 0xff, 213 227 }; ··· 246 232 * BitNeedSync. 247 233 */ 248 234 BitmapActionStale, 235 + /* 236 + * Proactive sync trigger for raid456 - builds XOR parity for 237 + * Unwritten regions without requiring user data write first. 238 + */ 239 + BitmapActionProactiveSync, 240 + BitmapActionClearUnwritten, 249 241 BitmapActionCount, 250 242 /* Init state is BitUnwritten */ 251 243 BitmapActionInit, ··· 324 304 [BitmapActionDaemon] = BitNone, 325 305 [BitmapActionDiscard] = BitNone, 326 306 [BitmapActionStale] = BitNone, 307 + [BitmapActionProactiveSync] = BitNeedSyncUnwritten, 308 + [BitmapActionClearUnwritten] = BitNone, 327 309 }, 328 310 [BitClean] = { 329 311 [BitmapActionStartwrite] = BitDirty, ··· 336 314 [BitmapActionDaemon] = BitNone, 337 315 [BitmapActionDiscard] = BitUnwritten, 338 316 [BitmapActionStale] = BitNeedSync, 317 + [BitmapActionProactiveSync] = BitNone, 318 + [BitmapActionClearUnwritten] = BitNone, 339 319 }, 340 320 [BitDirty] = { 341 321 [BitmapActionStartwrite] = BitNone, ··· 348 324 [BitmapActionDaemon] = BitClean, 349 325 [BitmapActionDiscard] = BitUnwritten, 350 326 [BitmapActionStale] = BitNeedSync, 327 + [BitmapActionProactiveSync] = BitNone, 328 + [BitmapActionClearUnwritten] = BitNone, 351 329 }, 352 330 [BitNeedSync] = { 353 331 [BitmapActionStartwrite] = BitNone, ··· 360 334 [BitmapActionDaemon] = BitNone, 361 335 [BitmapActionDiscard] = BitUnwritten, 362 336 [BitmapActionStale] = BitNone, 337 + [BitmapActionProactiveSync] = BitNone, 338 + [BitmapActionClearUnwritten] = BitNone, 363 339 }, 364 340 [BitSyncing] = { 365 341 [BitmapActionStartwrite] = BitNone, ··· 372 344 [BitmapActionDaemon] = BitNone, 373 345 [BitmapActionDiscard] = BitUnwritten, 374 346 [BitmapActionStale] = BitNeedSync, 347 + [BitmapActionProactiveSync] = BitNone, 348 + [BitmapActionClearUnwritten] = BitNone, 349 + }, 350 + [BitNeedSyncUnwritten] = { 351 + [BitmapActionStartwrite] = BitNeedSync, 352 + [BitmapActionStartsync] = BitSyncingUnwritten, 353 + [BitmapActionEndsync] = BitNone, 354 + [BitmapActionAbortsync] = BitUnwritten, 355 + [BitmapActionReload] = BitUnwritten, 356 + [BitmapActionDaemon] = BitNone, 357 + [BitmapActionDiscard] = BitUnwritten, 358 + [BitmapActionStale] = BitUnwritten, 359 + [BitmapActionProactiveSync] = BitNone, 360 + [BitmapActionClearUnwritten] = BitUnwritten, 361 + }, 362 + [BitSyncingUnwritten] = { 363 + [BitmapActionStartwrite] = BitSyncing, 364 + [BitmapActionStartsync] = BitSyncingUnwritten, 365 + [BitmapActionEndsync] = BitCleanUnwritten, 366 + [BitmapActionAbortsync] = BitUnwritten, 367 + [BitmapActionReload] = BitUnwritten, 368 + [BitmapActionDaemon] = BitNone, 369 + [BitmapActionDiscard] = BitUnwritten, 370 + [BitmapActionStale] = BitUnwritten, 371 + [BitmapActionProactiveSync] = BitNone, 372 + [BitmapActionClearUnwritten] = BitUnwritten, 373 + }, 374 + [BitCleanUnwritten] = { 375 + [BitmapActionStartwrite] = BitDirty, 376 + [BitmapActionStartsync] = BitNone, 377 + [BitmapActionEndsync] = BitNone, 378 + [BitmapActionAbortsync] = BitNone, 379 + [BitmapActionReload] = BitNone, 380 + [BitmapActionDaemon] = BitNone, 381 + [BitmapActionDiscard] = BitUnwritten, 382 + [BitmapActionStale] = BitUnwritten, 383 + [BitmapActionProactiveSync] = BitNone, 384 + [BitmapActionClearUnwritten] = BitUnwritten, 375 385 }, 376 386 }; 377 387 ··· 442 376 pctl->state[pos] = level_456 ? BitNeedSync : BitDirty; 443 377 break; 444 378 case BitClean: 379 + case BitCleanUnwritten: 445 380 pctl->state[pos] = BitDirty; 446 381 break; 447 382 } ··· 450 383 } 451 384 452 385 static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx, 453 - int offset) 386 + int offset, bool infect) 454 387 { 455 388 struct llbitmap_page_ctl *pctl = llbitmap->pctl[idx]; 456 389 unsigned int io_size = llbitmap->io_size; ··· 465 398 * resync all the dirty bits, hence skip infect new dirty bits to 466 399 * prevent resync unnecessary data. 467 400 */ 468 - if (llbitmap->mddev->degraded) { 401 + if (llbitmap->mddev->degraded || !infect) { 469 402 set_bit(block, pctl->dirty); 470 403 return; 471 404 } ··· 505 438 506 439 llbitmap->pctl[idx]->state[bit] = state; 507 440 if (state == BitDirty || state == BitNeedSync) 508 - llbitmap_set_page_dirty(llbitmap, idx, bit); 441 + llbitmap_set_page_dirty(llbitmap, idx, bit, true); 442 + else if (state == BitNeedSyncUnwritten) 443 + llbitmap_set_page_dirty(llbitmap, idx, bit, false); 509 444 } 510 445 511 446 static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx) ··· 528 459 rdev_for_each(rdev, mddev) { 529 460 sector_t sector; 530 461 531 - if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags)) 462 + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags) || 463 + !test_bit(In_sync, &rdev->flags)) 532 464 continue; 533 465 534 466 sector = mddev->bitmap_info.offset + ··· 654 584 return 0; 655 585 } 656 586 587 + /* 588 + * Check if all underlying disks support write_zeroes with unmap. 589 + */ 590 + static bool llbitmap_all_disks_support_wzeroes_unmap(struct llbitmap *llbitmap) 591 + { 592 + struct mddev *mddev = llbitmap->mddev; 593 + struct md_rdev *rdev; 594 + 595 + rdev_for_each(rdev, mddev) { 596 + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags)) 597 + continue; 598 + 599 + if (bdev_write_zeroes_unmap_sectors(rdev->bdev) == 0) 600 + return false; 601 + } 602 + 603 + return true; 604 + } 605 + 606 + /* 607 + * Issue write_zeroes to all underlying disks to zero their data regions. 608 + * This ensures parity consistency for RAID-456 (0 XOR 0 = 0). 609 + * Returns true if all disks were successfully zeroed. 610 + */ 611 + static bool llbitmap_zero_all_disks(struct llbitmap *llbitmap) 612 + { 613 + struct mddev *mddev = llbitmap->mddev; 614 + struct md_rdev *rdev; 615 + sector_t dev_sectors = mddev->dev_sectors; 616 + int ret; 617 + 618 + rdev_for_each(rdev, mddev) { 619 + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags)) 620 + continue; 621 + 622 + ret = blkdev_issue_zeroout(rdev->bdev, 623 + rdev->data_offset, 624 + dev_sectors, 625 + GFP_KERNEL, 0); 626 + if (ret) { 627 + pr_warn("md/llbitmap: failed to zero disk %pg: %d\n", 628 + rdev->bdev, ret); 629 + return false; 630 + } 631 + } 632 + 633 + return true; 634 + } 635 + 657 636 static void llbitmap_init_state(struct llbitmap *llbitmap) 658 637 { 638 + struct mddev *mddev = llbitmap->mddev; 659 639 enum llbitmap_state state = BitUnwritten; 660 640 unsigned long i; 661 641 662 - if (test_and_clear_bit(BITMAP_CLEAN, &llbitmap->flags)) 642 + if (test_and_clear_bit(BITMAP_CLEAN, &llbitmap->flags)) { 663 643 state = BitClean; 644 + } else if (raid_is_456(mddev) && 645 + llbitmap_all_disks_support_wzeroes_unmap(llbitmap)) { 646 + /* 647 + * All disks support write_zeroes with unmap. Zero all disks 648 + * to ensure parity consistency, then set BitCleanUnwritten 649 + * to skip initial sync. 650 + */ 651 + if (llbitmap_zero_all_disks(llbitmap)) 652 + state = BitCleanUnwritten; 653 + } 664 654 665 655 for (i = 0; i < llbitmap->chunks; i++) 666 656 llbitmap_write(llbitmap, state, i); ··· 756 626 goto write_bitmap; 757 627 } 758 628 759 - if (c == BitNeedSync) 629 + if (c == BitNeedSync || c == BitNeedSyncUnwritten) 760 630 need_resync = !mddev->degraded; 761 631 762 632 state = state_machine[c][action]; 763 - 764 633 write_bitmap: 765 634 if (unlikely(mddev->degraded)) { 766 635 /* For degraded array, mark new data as need sync. */ ··· 786 657 } 787 658 788 659 llbitmap_write(llbitmap, state, start); 789 - 790 - if (state == BitNeedSync) 660 + if (state == BitNeedSync || state == BitNeedSyncUnwritten) 791 661 need_resync = !mddev->degraded; 792 662 else if (state == BitDirty && 793 663 !timer_pending(&llbitmap->pending_timer)) ··· 1197 1069 int page_start = (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; 1198 1070 int page_end = (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; 1199 1071 1200 - llbitmap_state_machine(llbitmap, start, end, BitmapActionStartwrite); 1201 - 1202 1072 while (page_start <= page_end) { 1203 1073 llbitmap_raise_barrier(llbitmap, page_start); 1204 1074 page_start++; 1205 1075 } 1076 + 1077 + llbitmap_state_machine(llbitmap, start, end, BitmapActionStartwrite); 1206 1078 } 1207 1079 1208 1080 static void llbitmap_end_write(struct mddev *mddev, sector_t offset, ··· 1229 1101 int page_start = (start + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; 1230 1102 int page_end = (end + BITMAP_DATA_OFFSET) >> PAGE_SHIFT; 1231 1103 1232 - llbitmap_state_machine(llbitmap, start, end, BitmapActionDiscard); 1233 - 1234 1104 while (page_start <= page_end) { 1235 1105 llbitmap_raise_barrier(llbitmap, page_start); 1236 1106 page_start++; 1237 1107 } 1108 + 1109 + llbitmap_state_machine(llbitmap, start, end, BitmapActionDiscard); 1238 1110 } 1239 1111 1240 1112 static void llbitmap_end_discard(struct mddev *mddev, sector_t offset, ··· 1356 1228 unsigned long p = offset >> llbitmap->chunkshift; 1357 1229 enum llbitmap_state c = llbitmap_read(llbitmap, p); 1358 1230 1359 - return c == BitClean || c == BitDirty; 1231 + return c == BitClean || c == BitDirty || c == BitCleanUnwritten; 1360 1232 } 1361 1233 1362 1234 static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset) ··· 1368 1240 1369 1241 /* always skip unwritten blocks */ 1370 1242 if (c == BitUnwritten) 1243 + return blocks; 1244 + 1245 + /* Skip CleanUnwritten - no user data, will be reset after recovery */ 1246 + if (c == BitCleanUnwritten) 1371 1247 return blocks; 1372 1248 1373 1249 /* For degraded array, don't skip */ ··· 1392 1260 { 1393 1261 struct llbitmap *llbitmap = mddev->bitmap; 1394 1262 unsigned long p = offset >> llbitmap->chunkshift; 1263 + enum llbitmap_state state; 1264 + 1265 + /* 1266 + * Before recovery starts, convert CleanUnwritten to Unwritten. 1267 + * This ensures the new disk won't have stale parity data. 1268 + */ 1269 + if (offset == 0 && test_bit(MD_RECOVERY_RECOVER, &mddev->recovery) && 1270 + !test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery)) 1271 + llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, 1272 + BitmapActionClearUnwritten); 1273 + 1395 1274 1396 1275 /* 1397 1276 * Handle one bit at a time, this is much simpler. And it doesn't matter 1398 1277 * if md_do_sync() loop more times. 1399 1278 */ 1400 1279 *blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1)); 1401 - return llbitmap_state_machine(llbitmap, p, p, 1402 - BitmapActionStartsync) == BitSyncing; 1280 + state = llbitmap_state_machine(llbitmap, p, p, BitmapActionStartsync); 1281 + return state == BitSyncing || state == BitSyncingUnwritten; 1403 1282 } 1404 1283 1405 1284 /* Something is wrong, sync_thread stop at @offset */ ··· 1616 1473 } 1617 1474 1618 1475 mutex_unlock(&mddev->bitmap_info.mutex); 1619 - return sprintf(page, "unwritten %d\nclean %d\ndirty %d\nneed sync %d\nsyncing %d\n", 1476 + return sprintf(page, 1477 + "unwritten %d\nclean %d\ndirty %d\n" 1478 + "need sync %d\nsyncing %d\n" 1479 + "need sync unwritten %d\nsyncing unwritten %d\n" 1480 + "clean unwritten %d\n", 1620 1481 bits[BitUnwritten], bits[BitClean], bits[BitDirty], 1621 - bits[BitNeedSync], bits[BitSyncing]); 1482 + bits[BitNeedSync], bits[BitSyncing], 1483 + bits[BitNeedSyncUnwritten], bits[BitSyncingUnwritten], 1484 + bits[BitCleanUnwritten]); 1622 1485 } 1623 1486 1624 1487 static struct md_sysfs_entry llbitmap_bits = __ATTR_RO(bits); ··· 1697 1548 1698 1549 static struct md_sysfs_entry llbitmap_barrier_idle = __ATTR_RW(barrier_idle); 1699 1550 1551 + static ssize_t 1552 + proactive_sync_store(struct mddev *mddev, const char *buf, size_t len) 1553 + { 1554 + struct llbitmap *llbitmap; 1555 + 1556 + /* Only for RAID-456 */ 1557 + if (!raid_is_456(mddev)) 1558 + return -EINVAL; 1559 + 1560 + mutex_lock(&mddev->bitmap_info.mutex); 1561 + llbitmap = mddev->bitmap; 1562 + if (!llbitmap || !llbitmap->pctl) { 1563 + mutex_unlock(&mddev->bitmap_info.mutex); 1564 + return -ENODEV; 1565 + } 1566 + 1567 + /* Trigger proactive sync on all Unwritten regions */ 1568 + llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, 1569 + BitmapActionProactiveSync); 1570 + 1571 + mutex_unlock(&mddev->bitmap_info.mutex); 1572 + return len; 1573 + } 1574 + 1575 + static struct md_sysfs_entry llbitmap_proactive_sync = 1576 + __ATTR(proactive_sync, 0200, NULL, proactive_sync_store); 1577 + 1700 1578 static struct attribute *md_llbitmap_attrs[] = { 1701 1579 &llbitmap_bits.attr, 1702 1580 &llbitmap_metadata.attr, 1703 1581 &llbitmap_daemon_sleep.attr, 1704 1582 &llbitmap_barrier_idle.attr, 1583 + &llbitmap_proactive_sync.attr, 1705 1584 NULL 1706 1585 }; 1707 1586
+145 -26
drivers/md/md.c
··· 84 84 static const struct kobj_type md_ktype; 85 85 86 86 static DECLARE_WAIT_QUEUE_HEAD(resync_wait); 87 - static struct workqueue_struct *md_wq; 88 87 89 88 /* 90 89 * This workqueue is used for sync_work to register new sync_thread, and for ··· 97 98 static int remove_and_add_spares(struct mddev *mddev, 98 99 struct md_rdev *this); 99 100 static void mddev_detach(struct mddev *mddev); 100 - static void export_rdev(struct md_rdev *rdev, struct mddev *mddev); 101 + static void export_rdev(struct md_rdev *rdev); 101 102 static void md_wakeup_thread_directly(struct md_thread __rcu **thread); 102 103 103 104 /* ··· 187 188 188 189 spin_lock_init(&serial_tmp->serial_lock); 189 190 serial_tmp->serial_rb = RB_ROOT_CACHED; 190 - init_waitqueue_head(&serial_tmp->serial_io_wait); 191 191 } 192 192 193 193 rdev->serial = serial; ··· 487 489 } 488 490 489 491 percpu_ref_kill(&mddev->active_io); 492 + 493 + /* 494 + * RAID456 IO can sleep in wait_for_reshape while still holding an 495 + * active_io reference. If reshape is already interrupted or frozen, 496 + * wake those waiters so they can abort and drop the reference instead 497 + * of deadlocking suspend. 498 + */ 499 + if (mddev->pers && mddev->pers->prepare_suspend && 500 + reshape_interrupted(mddev)) 501 + mddev->pers->prepare_suspend(mddev); 502 + 490 503 if (interruptible) 491 504 err = wait_event_interruptible(mddev->sb_wait, 492 505 percpu_ref_is_zero(&mddev->active_io)); ··· 968 959 list_for_each_entry_safe(rdev, tmp, &delete, same_set) { 969 960 list_del_init(&rdev->same_set); 970 961 kobject_del(&rdev->kobj); 971 - export_rdev(rdev, mddev); 962 + export_rdev(rdev); 972 963 } 973 964 974 965 if (!legacy_async_del_gendisk) { ··· 2641 2632 /* just for claiming the bdev */ 2642 2633 static struct md_rdev claim_rdev; 2643 2634 2644 - static void export_rdev(struct md_rdev *rdev, struct mddev *mddev) 2635 + static void export_rdev(struct md_rdev *rdev) 2645 2636 { 2646 2637 pr_debug("md: export_rdev(%pg)\n", rdev->bdev); 2647 2638 md_rdev_clear(rdev); ··· 2797 2788 if (!md_is_rdwr(mddev)) { 2798 2789 if (force_change) 2799 2790 set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); 2800 - pr_err("%s: can't update sb for read-only array %s\n", __func__, mdname(mddev)); 2791 + if (!mddev_is_dm(mddev)) 2792 + pr_err_ratelimited("%s: can't update sb for read-only array %s\n", 2793 + __func__, mdname(mddev)); 2801 2794 return; 2802 2795 } 2803 2796 ··· 4859 4848 err = bind_rdev_to_array(rdev, mddev); 4860 4849 out: 4861 4850 if (err) 4862 - export_rdev(rdev, mddev); 4851 + export_rdev(rdev); 4863 4852 mddev_unlock_and_resume(mddev); 4864 4853 if (!err) 4865 4854 md_new_event(); ··· 6139 6128 } 6140 6129 spin_unlock(&all_mddevs_lock); 6141 6130 rv = entry->store(mddev, page, length); 6142 - mddev_put(mddev); 6143 6131 6132 + /* 6133 + * For "array_state=clear", dropping the extra kobject reference from 6134 + * sysfs_break_active_protection() can trigger md kobject deletion. 6135 + * Restore active protection before mddev_put() so deletion happens 6136 + * after the sysfs write path fully unwinds. 6137 + */ 6144 6138 if (kn) 6145 6139 sysfs_unbreak_active_protection(kn); 6140 + mddev_put(mddev); 6146 6141 6147 6142 return rv; 6148 6143 } ··· 6464 6447 6465 6448 static int start_dirty_degraded; 6466 6449 6450 + /* 6451 + * Read bitmap superblock and return the bitmap_id based on disk version. 6452 + * This is used as fallback when default bitmap version and on-disk version 6453 + * doesn't match, and mdadm is not the latest version to set bitmap_type. 6454 + */ 6455 + static enum md_submodule_id md_bitmap_get_id_from_sb(struct mddev *mddev) 6456 + { 6457 + struct md_rdev *rdev; 6458 + struct page *sb_page; 6459 + bitmap_super_t *sb; 6460 + enum md_submodule_id id = ID_BITMAP_NONE; 6461 + sector_t sector; 6462 + u32 version; 6463 + 6464 + if (!mddev->bitmap_info.offset) 6465 + return ID_BITMAP_NONE; 6466 + 6467 + sb_page = alloc_page(GFP_KERNEL); 6468 + if (!sb_page) { 6469 + pr_warn("md: %s: failed to allocate memory for bitmap\n", 6470 + mdname(mddev)); 6471 + return ID_BITMAP_NONE; 6472 + } 6473 + 6474 + sector = mddev->bitmap_info.offset; 6475 + 6476 + rdev_for_each(rdev, mddev) { 6477 + u32 iosize; 6478 + 6479 + if (!test_bit(In_sync, &rdev->flags) || 6480 + test_bit(Faulty, &rdev->flags) || 6481 + test_bit(Bitmap_sync, &rdev->flags)) 6482 + continue; 6483 + 6484 + iosize = roundup(sizeof(bitmap_super_t), 6485 + bdev_logical_block_size(rdev->bdev)); 6486 + if (sync_page_io(rdev, sector, iosize, sb_page, REQ_OP_READ, 6487 + true)) 6488 + goto read_ok; 6489 + } 6490 + pr_warn("md: %s: failed to read bitmap from any device\n", 6491 + mdname(mddev)); 6492 + goto out; 6493 + 6494 + read_ok: 6495 + sb = kmap_local_page(sb_page); 6496 + if (sb->magic != cpu_to_le32(BITMAP_MAGIC)) { 6497 + pr_warn("md: %s: invalid bitmap magic 0x%x\n", 6498 + mdname(mddev), le32_to_cpu(sb->magic)); 6499 + goto out_unmap; 6500 + } 6501 + 6502 + version = le32_to_cpu(sb->version); 6503 + switch (version) { 6504 + case BITMAP_MAJOR_LO: 6505 + case BITMAP_MAJOR_HI: 6506 + case BITMAP_MAJOR_CLUSTERED: 6507 + id = ID_BITMAP; 6508 + break; 6509 + case BITMAP_MAJOR_LOCKLESS: 6510 + id = ID_LLBITMAP; 6511 + break; 6512 + default: 6513 + pr_warn("md: %s: unknown bitmap version %u\n", 6514 + mdname(mddev), version); 6515 + break; 6516 + } 6517 + 6518 + out_unmap: 6519 + kunmap_local(sb); 6520 + out: 6521 + __free_page(sb_page); 6522 + return id; 6523 + } 6524 + 6467 6525 static int md_bitmap_create(struct mddev *mddev) 6468 6526 { 6527 + enum md_submodule_id orig_id = mddev->bitmap_id; 6528 + enum md_submodule_id sb_id; 6529 + int err; 6530 + 6469 6531 if (mddev->bitmap_id == ID_BITMAP_NONE) 6470 6532 return -EINVAL; 6471 6533 6472 6534 if (!mddev_set_bitmap_ops(mddev)) 6473 6535 return -ENOENT; 6474 6536 6475 - return mddev->bitmap_ops->create(mddev); 6537 + err = mddev->bitmap_ops->create(mddev); 6538 + if (!err) 6539 + return 0; 6540 + 6541 + /* 6542 + * Create failed, if default bitmap version and on-disk version 6543 + * doesn't match, and mdadm is not the latest version to set 6544 + * bitmap_type, set bitmap_ops based on the disk version. 6545 + */ 6546 + mddev_clear_bitmap_ops(mddev); 6547 + 6548 + sb_id = md_bitmap_get_id_from_sb(mddev); 6549 + if (sb_id == ID_BITMAP_NONE || sb_id == orig_id) 6550 + return err; 6551 + 6552 + pr_info("md: %s: bitmap version mismatch, switching from %d to %d\n", 6553 + mdname(mddev), orig_id, sb_id); 6554 + 6555 + mddev->bitmap_id = sb_id; 6556 + if (!mddev_set_bitmap_ops(mddev)) { 6557 + mddev->bitmap_id = orig_id; 6558 + return -ENOENT; 6559 + } 6560 + 6561 + err = mddev->bitmap_ops->create(mddev); 6562 + if (err) { 6563 + mddev_clear_bitmap_ops(mddev); 6564 + mddev->bitmap_id = orig_id; 6565 + } 6566 + 6567 + return err; 6476 6568 } 6477 6569 6478 6570 static void md_bitmap_destroy(struct mddev *mddev) ··· 7266 7140 rdev_for_each_list(rdev, tmp, &candidates) { 7267 7141 list_del_init(&rdev->same_set); 7268 7142 if (bind_rdev_to_array(rdev, mddev)) 7269 - export_rdev(rdev, mddev); 7143 + export_rdev(rdev); 7270 7144 } 7271 7145 autorun_array(mddev); 7272 7146 mddev_unlock_and_resume(mddev); ··· 7276 7150 */ 7277 7151 rdev_for_each_list(rdev, tmp, &candidates) { 7278 7152 list_del_init(&rdev->same_set); 7279 - export_rdev(rdev, mddev); 7153 + export_rdev(rdev); 7280 7154 } 7281 7155 mddev_put(mddev); 7282 7156 } ··· 7464 7338 pr_warn("md: %pg has different UUID to %pg\n", 7465 7339 rdev->bdev, 7466 7340 rdev0->bdev); 7467 - export_rdev(rdev, mddev); 7341 + export_rdev(rdev); 7468 7342 return -EINVAL; 7469 7343 } 7470 7344 } 7471 7345 err = bind_rdev_to_array(rdev, mddev); 7472 7346 if (err) 7473 - export_rdev(rdev, mddev); 7347 + export_rdev(rdev); 7474 7348 return err; 7475 7349 } 7476 7350 ··· 7513 7387 /* This was a hot-add request, but events doesn't 7514 7388 * match, so reject it. 7515 7389 */ 7516 - export_rdev(rdev, mddev); 7390 + export_rdev(rdev); 7517 7391 return -EINVAL; 7518 7392 } 7519 7393 ··· 7539 7413 } 7540 7414 } 7541 7415 if (has_journal || mddev->bitmap) { 7542 - export_rdev(rdev, mddev); 7416 + export_rdev(rdev); 7543 7417 return -EBUSY; 7544 7418 } 7545 7419 set_bit(Journal, &rdev->flags); ··· 7554 7428 /* --add initiated by this node */ 7555 7429 err = mddev->cluster_ops->add_new_disk(mddev, rdev); 7556 7430 if (err) { 7557 - export_rdev(rdev, mddev); 7431 + export_rdev(rdev); 7558 7432 return err; 7559 7433 } 7560 7434 } ··· 7564 7438 err = bind_rdev_to_array(rdev, mddev); 7565 7439 7566 7440 if (err) 7567 - export_rdev(rdev, mddev); 7441 + export_rdev(rdev); 7568 7442 7569 7443 if (mddev_is_clustered(mddev)) { 7570 7444 if (info->state & (1 << MD_DISK_CANDIDATE)) { ··· 7627 7501 7628 7502 err = bind_rdev_to_array(rdev, mddev); 7629 7503 if (err) { 7630 - export_rdev(rdev, mddev); 7504 + export_rdev(rdev); 7631 7505 return err; 7632 7506 } 7633 7507 } ··· 7739 7613 return 0; 7740 7614 7741 7615 abort_export: 7742 - export_rdev(rdev, mddev); 7616 + export_rdev(rdev); 7743 7617 return err; 7744 7618 } 7745 7619 ··· 10629 10503 goto err_bitmap; 10630 10504 10631 10505 ret = -ENOMEM; 10632 - md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM | WQ_PERCPU, 0); 10633 - if (!md_wq) 10634 - goto err_wq; 10635 - 10636 10506 md_misc_wq = alloc_workqueue("md_misc", WQ_PERCPU, 0); 10637 10507 if (!md_misc_wq) 10638 10508 goto err_misc_wq; ··· 10653 10531 err_md: 10654 10532 destroy_workqueue(md_misc_wq); 10655 10533 err_misc_wq: 10656 - destroy_workqueue(md_wq); 10657 - err_wq: 10658 10534 md_llbitmap_exit(); 10659 10535 err_bitmap: 10660 10536 md_bitmap_exit(); ··· 10961 10841 spin_unlock(&all_mddevs_lock); 10962 10842 10963 10843 destroy_workqueue(md_misc_wq); 10964 - destroy_workqueue(md_wq); 10965 10844 md_bitmap_exit(); 10966 10845 } 10967 10846
+4 -1
drivers/md/md.h
··· 126 126 struct serial_in_rdev { 127 127 struct rb_root_cached serial_rb; 128 128 spinlock_t serial_lock; 129 - wait_queue_head_t serial_io_wait; 130 129 }; 131 130 132 131 /* ··· 380 381 struct rb_node node; 381 382 sector_t start; /* start sector of rb node */ 382 383 sector_t last; /* end sector of rb node */ 384 + sector_t wnode_start; /* address of waiting nodes on the same list */ 383 385 sector_t _subtree_last; /* highest sector in subtree of rb node */ 386 + struct list_head list_node; 387 + struct list_head waiters; 388 + struct completion ready; 384 389 }; 385 390 386 391 /*
+9 -9
drivers/md/raid0.c
··· 143 143 } 144 144 145 145 err = -ENOMEM; 146 - conf->strip_zone = kzalloc_objs(struct strip_zone, conf->nr_strip_zones); 146 + conf->strip_zone = kvzalloc_objs(struct strip_zone, conf->nr_strip_zones); 147 147 if (!conf->strip_zone) 148 148 goto abort; 149 - conf->devlist = kzalloc(array3_size(sizeof(struct md_rdev *), 150 - conf->nr_strip_zones, 151 - mddev->raid_disks), 152 - GFP_KERNEL); 149 + conf->devlist = kvzalloc(array3_size(sizeof(struct md_rdev *), 150 + conf->nr_strip_zones, 151 + mddev->raid_disks), 152 + GFP_KERNEL); 153 153 if (!conf->devlist) 154 154 goto abort; 155 155 ··· 291 291 292 292 return 0; 293 293 abort: 294 - kfree(conf->strip_zone); 295 - kfree(conf->devlist); 294 + kvfree(conf->strip_zone); 295 + kvfree(conf->devlist); 296 296 kfree(conf); 297 297 *private_conf = ERR_PTR(err); 298 298 return err; ··· 373 373 { 374 374 struct r0conf *conf = priv; 375 375 376 - kfree(conf->strip_zone); 377 - kfree(conf->devlist); 376 + kvfree(conf->strip_zone); 377 + kvfree(conf->devlist); 378 378 kfree(conf); 379 379 } 380 380
+39 -16
drivers/md/raid1.c
··· 57 57 START, LAST, static inline, raid1_rb); 58 58 59 59 static int check_and_add_serial(struct md_rdev *rdev, struct r1bio *r1_bio, 60 - struct serial_info *si, int idx) 60 + struct serial_info *si) 61 61 { 62 62 unsigned long flags; 63 63 int ret = 0; 64 64 sector_t lo = r1_bio->sector; 65 - sector_t hi = lo + r1_bio->sectors; 65 + sector_t hi = lo + r1_bio->sectors - 1; 66 + int idx = sector_to_idx(r1_bio->sector); 66 67 struct serial_in_rdev *serial = &rdev->serial[idx]; 68 + struct serial_info *head_si; 67 69 68 70 spin_lock_irqsave(&serial->serial_lock, flags); 69 71 /* collision happened */ 70 - if (raid1_rb_iter_first(&serial->serial_rb, lo, hi)) 71 - ret = -EBUSY; 72 - else { 72 + head_si = raid1_rb_iter_first(&serial->serial_rb, lo, hi); 73 + if (head_si && head_si != si) { 73 74 si->start = lo; 74 75 si->last = hi; 76 + si->wnode_start = head_si->wnode_start; 77 + list_add_tail(&si->list_node, &head_si->waiters); 78 + ret = -EBUSY; 79 + } else if (!head_si) { 80 + si->start = lo; 81 + si->last = hi; 82 + si->wnode_start = si->start; 75 83 raid1_rb_insert(si, &serial->serial_rb); 76 84 } 77 85 spin_unlock_irqrestore(&serial->serial_lock, flags); ··· 91 83 { 92 84 struct mddev *mddev = rdev->mddev; 93 85 struct serial_info *si; 94 - int idx = sector_to_idx(r1_bio->sector); 95 - struct serial_in_rdev *serial = &rdev->serial[idx]; 96 86 97 87 if (WARN_ON(!mddev->serial_info_pool)) 98 88 return; 99 89 si = mempool_alloc(mddev->serial_info_pool, GFP_NOIO); 100 - wait_event(serial->serial_io_wait, 101 - check_and_add_serial(rdev, r1_bio, si, idx) == 0); 90 + INIT_LIST_HEAD(&si->waiters); 91 + INIT_LIST_HEAD(&si->list_node); 92 + init_completion(&si->ready); 93 + while (check_and_add_serial(rdev, r1_bio, si)) { 94 + wait_for_completion(&si->ready); 95 + reinit_completion(&si->ready); 96 + } 102 97 } 103 98 104 99 static void remove_serial(struct md_rdev *rdev, sector_t lo, sector_t hi) 105 100 { 106 - struct serial_info *si; 101 + struct serial_info *si, *iter_si; 107 102 unsigned long flags; 108 103 int found = 0; 109 104 struct mddev *mddev = rdev->mddev; ··· 117 106 for (si = raid1_rb_iter_first(&serial->serial_rb, lo, hi); 118 107 si; si = raid1_rb_iter_next(si, lo, hi)) { 119 108 if (si->start == lo && si->last == hi) { 120 - raid1_rb_remove(si, &serial->serial_rb); 121 - mempool_free(si, mddev->serial_info_pool); 122 109 found = 1; 123 110 break; 124 111 } 125 112 } 126 - if (!found) 113 + if (found) { 114 + raid1_rb_remove(si, &serial->serial_rb); 115 + if (!list_empty(&si->waiters)) { 116 + list_for_each_entry(iter_si, &si->waiters, list_node) { 117 + if (iter_si->wnode_start == si->wnode_start) { 118 + list_del_init(&iter_si->list_node); 119 + list_splice_init(&si->waiters, &iter_si->waiters); 120 + raid1_rb_insert(iter_si, &serial->serial_rb); 121 + complete(&iter_si->ready); 122 + break; 123 + } 124 + } 125 + } 126 + mempool_free(si, mddev->serial_info_pool); 127 + } else { 127 128 WARN(1, "The write IO is not recorded for serialization\n"); 129 + } 128 130 spin_unlock_irqrestore(&serial->serial_lock, flags); 129 - wake_up(&serial->serial_io_wait); 130 131 } 131 132 132 133 /* ··· 475 452 int mirror = find_bio_disk(r1_bio, bio); 476 453 struct md_rdev *rdev = conf->mirrors[mirror].rdev; 477 454 sector_t lo = r1_bio->sector; 478 - sector_t hi = r1_bio->sector + r1_bio->sectors; 455 + sector_t hi = r1_bio->sector + r1_bio->sectors - 1; 479 456 bool ignore_error = !raid1_should_handle_error(bio) || 480 457 (bio->bi_status && bio_op(bio) == REQ_OP_DISCARD); 481 458 ··· 1901 1878 if (info->rdev) 1902 1879 return false; 1903 1880 1904 - if (bdev_nonrot(rdev->bdev)) { 1881 + if (!bdev_rot(rdev->bdev)) { 1905 1882 set_bit(Nonrot, &rdev->flags); 1906 1883 WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks + 1); 1907 1884 }
+3 -3
drivers/md/raid10.c
··· 806 806 if (!do_balance) 807 807 break; 808 808 809 - nonrot = bdev_nonrot(rdev->bdev); 809 + nonrot = !bdev_rot(rdev->bdev); 810 810 has_nonrot_disk |= nonrot; 811 811 pending = atomic_read(&rdev->nr_pending); 812 812 if (min_pending > pending && nonrot) { ··· 1184 1184 } 1185 1185 1186 1186 if (!regular_request_wait(mddev, conf, bio, r10_bio->sectors)) { 1187 - raid_end_bio_io(r10_bio); 1187 + free_r10bio(r10_bio); 1188 1188 return; 1189 1189 } 1190 1190 ··· 1372 1372 1373 1373 sectors = r10_bio->sectors; 1374 1374 if (!regular_request_wait(mddev, conf, bio, sectors)) { 1375 - raid_end_bio_io(r10_bio); 1375 + free_r10bio(r10_bio); 1376 1376 return; 1377 1377 } 1378 1378
+33 -15
drivers/md/raid5-cache.c
··· 2002 2002 return -ENOMEM; 2003 2003 2004 2004 while (mb_offset < le32_to_cpu(mb->meta_size)) { 2005 + sector_t payload_len; 2006 + 2005 2007 payload = (void *)mb + mb_offset; 2006 2008 payload_flush = (void *)mb + mb_offset; 2007 2009 2008 2010 if (le16_to_cpu(payload->header.type) == R5LOG_PAYLOAD_DATA) { 2011 + payload_len = sizeof(struct r5l_payload_data_parity) + 2012 + (sector_t)sizeof(__le32) * 2013 + (le32_to_cpu(payload->size) >> (PAGE_SHIFT - 9)); 2014 + if (mb_offset + payload_len > le32_to_cpu(mb->meta_size)) 2015 + goto mismatch; 2009 2016 if (r5l_recovery_verify_data_checksum( 2010 2017 log, ctx, page, log_offset, 2011 2018 payload->checksum[0]) < 0) 2012 2019 goto mismatch; 2013 2020 } else if (le16_to_cpu(payload->header.type) == R5LOG_PAYLOAD_PARITY) { 2021 + payload_len = sizeof(struct r5l_payload_data_parity) + 2022 + (sector_t)sizeof(__le32) * 2023 + (le32_to_cpu(payload->size) >> (PAGE_SHIFT - 9)); 2024 + if (mb_offset + payload_len > le32_to_cpu(mb->meta_size)) 2025 + goto mismatch; 2014 2026 if (r5l_recovery_verify_data_checksum( 2015 2027 log, ctx, page, log_offset, 2016 2028 payload->checksum[0]) < 0) ··· 2035 2023 payload->checksum[1]) < 0) 2036 2024 goto mismatch; 2037 2025 } else if (le16_to_cpu(payload->header.type) == R5LOG_PAYLOAD_FLUSH) { 2038 - /* nothing to do for R5LOG_PAYLOAD_FLUSH here */ 2026 + payload_len = sizeof(struct r5l_payload_flush) + 2027 + (sector_t)le32_to_cpu(payload_flush->size); 2028 + if (mb_offset + payload_len > le32_to_cpu(mb->meta_size)) 2029 + goto mismatch; 2039 2030 } else /* not R5LOG_PAYLOAD_DATA/PARITY/FLUSH */ 2040 2031 goto mismatch; 2041 2032 2042 - if (le16_to_cpu(payload->header.type) == R5LOG_PAYLOAD_FLUSH) { 2043 - mb_offset += sizeof(struct r5l_payload_flush) + 2044 - le32_to_cpu(payload_flush->size); 2045 - } else { 2046 - /* DATA or PARITY payload */ 2033 + if (le16_to_cpu(payload->header.type) != R5LOG_PAYLOAD_FLUSH) { 2047 2034 log_offset = r5l_ring_add(log, log_offset, 2048 2035 le32_to_cpu(payload->size)); 2049 - mb_offset += sizeof(struct r5l_payload_data_parity) + 2050 - sizeof(__le32) * 2051 - (le32_to_cpu(payload->size) >> (PAGE_SHIFT - 9)); 2052 2036 } 2053 - 2037 + mb_offset += payload_len; 2054 2038 } 2055 2039 2056 2040 put_page(page); ··· 2097 2089 log_offset = r5l_ring_add(log, ctx->pos, BLOCK_SECTORS); 2098 2090 2099 2091 while (mb_offset < le32_to_cpu(mb->meta_size)) { 2092 + sector_t payload_len; 2100 2093 int dd; 2101 2094 2102 2095 payload = (void *)mb + mb_offset; ··· 2105 2096 2106 2097 if (le16_to_cpu(payload->header.type) == R5LOG_PAYLOAD_FLUSH) { 2107 2098 int i, count; 2099 + 2100 + payload_len = sizeof(struct r5l_payload_flush) + 2101 + (sector_t)le32_to_cpu(payload_flush->size); 2102 + if (mb_offset + payload_len > 2103 + le32_to_cpu(mb->meta_size)) 2104 + return -EINVAL; 2108 2105 2109 2106 count = le32_to_cpu(payload_flush->size) / sizeof(__le64); 2110 2107 for (i = 0; i < count; ++i) { ··· 2125 2110 } 2126 2111 } 2127 2112 2128 - mb_offset += sizeof(struct r5l_payload_flush) + 2129 - le32_to_cpu(payload_flush->size); 2113 + mb_offset += payload_len; 2130 2114 continue; 2131 2115 } 2132 2116 2133 2117 /* DATA or PARITY payload */ 2118 + payload_len = sizeof(struct r5l_payload_data_parity) + 2119 + (sector_t)sizeof(__le32) * 2120 + (le32_to_cpu(payload->size) >> (PAGE_SHIFT - 9)); 2121 + if (mb_offset + payload_len > le32_to_cpu(mb->meta_size)) 2122 + return -EINVAL; 2123 + 2134 2124 stripe_sect = (le16_to_cpu(payload->header.type) == R5LOG_PAYLOAD_DATA) ? 2135 2125 raid5_compute_sector( 2136 2126 conf, le64_to_cpu(payload->location), 0, &dd, ··· 2200 2180 log_offset = r5l_ring_add(log, log_offset, 2201 2181 le32_to_cpu(payload->size)); 2202 2182 2203 - mb_offset += sizeof(struct r5l_payload_data_parity) + 2204 - sizeof(__le32) * 2205 - (le32_to_cpu(payload->size) >> (PAGE_SHIFT - 9)); 2183 + mb_offset += payload_len; 2206 2184 } 2207 2185 2208 2186 return 0;
+23 -16
drivers/md/raid5.c
··· 3916 3916 break; 3917 3917 } 3918 3918 BUG_ON(other < 0); 3919 + if (test_bit(R5_LOCKED, &sh->dev[other].flags)) 3920 + return 0; 3919 3921 pr_debug("Computing stripe %llu blocks %d,%d\n", 3920 3922 (unsigned long long)sh->sector, 3921 3923 disk_idx, other); ··· 4596 4594 async_tx_quiesce(&tx); 4597 4595 } 4598 4596 4599 - /* 4600 - * handle_stripe - do things to a stripe. 4601 - * 4602 - * We lock the stripe by setting STRIPE_ACTIVE and then examine the 4603 - * state of various bits to see what needs to be done. 4604 - * Possible results: 4605 - * return some read requests which now have data 4606 - * return some write requests which are safely on storage 4607 - * schedule a read on some buffers 4608 - * schedule a write of some buffers 4609 - * return confirmation of parity correctness 4610 - * 4611 - */ 4612 - 4613 4597 static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) 4614 4598 { 4615 4599 struct r5conf *conf = sh->raid_conf; ··· 4889 4901 set_bit(STRIPE_HANDLE, &head_sh->state); 4890 4902 } 4891 4903 4904 + /* 4905 + * handle_stripe - do things to a stripe. 4906 + * 4907 + * We lock the stripe by setting STRIPE_ACTIVE and then examine the 4908 + * state of various bits to see what needs to be done. 4909 + * Possible results: 4910 + * return some read requests which now have data 4911 + * return some write requests which are safely on storage 4912 + * schedule a read on some buffers 4913 + * schedule a write of some buffers 4914 + * return confirmation of parity correctness 4915 + */ 4892 4916 static void handle_stripe(struct stripe_head *sh) 4893 4917 { 4894 4918 struct stripe_head_state s; ··· 6641 6641 } 6642 6642 6643 6643 if (!add_stripe_bio(sh, raid_bio, dd_idx, 0, 0)) { 6644 - raid5_release_stripe(sh); 6644 + int hash; 6645 + 6646 + spin_lock_irq(&conf->device_lock); 6647 + hash = sh->hash_lock_index; 6648 + __release_stripe(conf, sh, 6649 + &conf->temp_inactive_list[hash]); 6650 + spin_unlock_irq(&conf->device_lock); 6645 6651 conf->retry_read_aligned = raid_bio; 6646 6652 conf->retry_read_offset = scnt; 6647 6653 return handled; ··· 7547 7541 rdev_for_each(rdev, mddev) { 7548 7542 if (test_bit(Journal, &rdev->flags)) 7549 7543 continue; 7550 - if (bdev_nonrot(rdev->bdev)) { 7544 + if (!bdev_rot(rdev->bdev)) { 7551 7545 conf->batch_bio_dispatch = false; 7552 7546 break; 7553 7547 } ··· 7786 7780 lim.logical_block_size = mddev->logical_block_size; 7787 7781 lim.io_min = mddev->chunk_sectors << 9; 7788 7782 lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded); 7783 + lim.chunk_sectors = lim.io_opt >> 9; 7789 7784 lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE; 7790 7785 lim.discard_granularity = stripe; 7791 7786 lim.max_write_zeroes_sectors = 0;
-1
drivers/md/raid5.h
··· 801 801 } 802 802 #endif 803 803 804 - void md_raid5_kick_device(struct r5conf *conf); 805 804 int raid5_set_cache_size(struct mddev *mddev, int size); 806 805 sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous); 807 806 void raid5_release_stripe(struct stripe_head *sh);
+6
drivers/nvme/common/.kunitconfig
··· 1 + CONFIG_KUNIT=y 2 + CONFIG_PCI=y 3 + CONFIG_BLOCK=y 4 + CONFIG_BLK_DEV_NVME=y 5 + CONFIG_NVME_HOST_AUTH=y 6 + CONFIG_NVME_AUTH_KUNIT_TEST=y
+10 -4
drivers/nvme/common/Kconfig
··· 7 7 config NVME_AUTH 8 8 tristate 9 9 select CRYPTO 10 - select CRYPTO_HMAC 11 - select CRYPTO_SHA256 12 - select CRYPTO_SHA512 13 10 select CRYPTO_DH 14 11 select CRYPTO_DH_RFC7919_GROUPS 15 - select CRYPTO_HKDF 12 + select CRYPTO_LIB_SHA256 13 + select CRYPTO_LIB_SHA512 14 + 15 + config NVME_AUTH_KUNIT_TEST 16 + tristate "KUnit tests for NVMe authentication" if !KUNIT_ALL_TESTS 17 + depends on KUNIT && NVME_AUTH 18 + default KUNIT_ALL_TESTS 19 + help 20 + Enable KUnit tests for some of the common code for NVMe over Fabrics 21 + In-Band Authentication.
+2
drivers/nvme/common/Makefile
··· 7 7 8 8 nvme-auth-y += auth.o 9 9 nvme-keyring-y += keyring.o 10 + 11 + obj-$(CONFIG_NVME_AUTH_KUNIT_TEST) += tests/auth_kunit.o
+236 -355
drivers/nvme/common/auth.c
··· 9 9 #include <linux/prandom.h> 10 10 #include <linux/scatterlist.h> 11 11 #include <linux/unaligned.h> 12 - #include <crypto/hash.h> 13 12 #include <crypto/dh.h> 14 - #include <crypto/hkdf.h> 13 + #include <crypto/sha2.h> 15 14 #include <linux/nvme.h> 16 15 #include <linux/nvme-auth.h> 17 - 18 - #define HKDF_MAX_HASHLEN 64 19 16 20 17 static u32 nvme_dhchap_seqnum; 21 18 static DEFINE_MUTEX(nvme_dhchap_mutex); ··· 35 38 } 36 39 EXPORT_SYMBOL_GPL(nvme_auth_get_seqnum); 37 40 38 - static struct nvme_auth_dhgroup_map { 39 - const char name[16]; 40 - const char kpp[16]; 41 + static const struct nvme_auth_dhgroup_map { 42 + char name[16]; 43 + char kpp[16]; 41 44 } dhgroup_map[] = { 42 45 [NVME_AUTH_DHGROUP_NULL] = { 43 46 .name = "null", .kpp = "null" }, ··· 86 89 } 87 90 EXPORT_SYMBOL_GPL(nvme_auth_dhgroup_id); 88 91 89 - static struct nvme_dhchap_hash_map { 92 + static const struct nvme_dhchap_hash_map { 90 93 int len; 91 - const char hmac[15]; 92 - const char digest[8]; 94 + char hmac[15]; 93 95 } hash_map[] = { 94 96 [NVME_AUTH_HASH_SHA256] = { 95 97 .len = 32, 96 98 .hmac = "hmac(sha256)", 97 - .digest = "sha256", 98 99 }, 99 100 [NVME_AUTH_HASH_SHA384] = { 100 101 .len = 48, 101 102 .hmac = "hmac(sha384)", 102 - .digest = "sha384", 103 103 }, 104 104 [NVME_AUTH_HASH_SHA512] = { 105 105 .len = 64, 106 106 .hmac = "hmac(sha512)", 107 - .digest = "sha512", 108 107 }, 109 108 }; 110 109 ··· 111 118 return hash_map[hmac_id].hmac; 112 119 } 113 120 EXPORT_SYMBOL_GPL(nvme_auth_hmac_name); 114 - 115 - const char *nvme_auth_digest_name(u8 hmac_id) 116 - { 117 - if (hmac_id >= ARRAY_SIZE(hash_map)) 118 - return NULL; 119 - return hash_map[hmac_id].digest; 120 - } 121 - EXPORT_SYMBOL_GPL(nvme_auth_digest_name); 122 121 123 122 u8 nvme_auth_hmac_id(const char *hmac_name) 124 123 { ··· 146 161 } 147 162 EXPORT_SYMBOL_GPL(nvme_auth_key_struct_size); 148 163 149 - struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, 150 - u8 key_hash) 164 + struct nvme_dhchap_key *nvme_auth_extract_key(const char *secret, u8 key_hash) 151 165 { 152 166 struct nvme_dhchap_key *key; 153 - unsigned char *p; 167 + const char *p; 154 168 u32 crc; 155 169 int ret, key_len; 156 170 size_t allocated_len = strlen(secret); ··· 167 183 pr_debug("base64 key decoding error %d\n", 168 184 key_len); 169 185 ret = key_len; 170 - goto out_free_secret; 186 + goto out_free_key; 171 187 } 172 188 173 189 if (key_len != 36 && key_len != 52 && 174 190 key_len != 68) { 175 191 pr_err("Invalid key len %d\n", key_len); 176 192 ret = -EINVAL; 177 - goto out_free_secret; 193 + goto out_free_key; 178 194 } 179 195 180 196 /* The last four bytes is the CRC in little-endian format */ ··· 189 205 pr_err("key crc mismatch (key %08x, crc %08x)\n", 190 206 get_unaligned_le32(key->key + key_len), crc); 191 207 ret = -EKEYREJECTED; 192 - goto out_free_secret; 208 + goto out_free_key; 193 209 } 194 210 key->len = key_len; 195 211 key->hash = key_hash; 196 212 return key; 197 - out_free_secret: 213 + out_free_key: 198 214 nvme_auth_free_key(key); 199 215 return ERR_PTR(ret); 200 216 } ··· 221 237 } 222 238 EXPORT_SYMBOL_GPL(nvme_auth_free_key); 223 239 224 - struct nvme_dhchap_key *nvme_auth_transform_key( 225 - struct nvme_dhchap_key *key, char *nqn) 240 + /* 241 + * Start computing an HMAC value, given the algorithm ID and raw key. 242 + * 243 + * The context should be zeroized at the end of its lifetime. The caller can do 244 + * that implicitly by calling nvme_auth_hmac_final(), or explicitly (needed when 245 + * a context is abandoned without finalizing it) by calling memzero_explicit(). 246 + */ 247 + int nvme_auth_hmac_init(struct nvme_auth_hmac_ctx *hmac, u8 hmac_id, 248 + const u8 *key, size_t key_len) 226 249 { 227 - const char *hmac_name; 228 - struct crypto_shash *key_tfm; 229 - SHASH_DESC_ON_STACK(shash, key_tfm); 250 + hmac->hmac_id = hmac_id; 251 + switch (hmac_id) { 252 + case NVME_AUTH_HASH_SHA256: 253 + hmac_sha256_init_usingrawkey(&hmac->sha256, key, key_len); 254 + return 0; 255 + case NVME_AUTH_HASH_SHA384: 256 + hmac_sha384_init_usingrawkey(&hmac->sha384, key, key_len); 257 + return 0; 258 + case NVME_AUTH_HASH_SHA512: 259 + hmac_sha512_init_usingrawkey(&hmac->sha512, key, key_len); 260 + return 0; 261 + } 262 + pr_warn("%s: invalid hash algorithm %d\n", __func__, hmac_id); 263 + return -EINVAL; 264 + } 265 + EXPORT_SYMBOL_GPL(nvme_auth_hmac_init); 266 + 267 + void nvme_auth_hmac_update(struct nvme_auth_hmac_ctx *hmac, const u8 *data, 268 + size_t data_len) 269 + { 270 + switch (hmac->hmac_id) { 271 + case NVME_AUTH_HASH_SHA256: 272 + hmac_sha256_update(&hmac->sha256, data, data_len); 273 + return; 274 + case NVME_AUTH_HASH_SHA384: 275 + hmac_sha384_update(&hmac->sha384, data, data_len); 276 + return; 277 + case NVME_AUTH_HASH_SHA512: 278 + hmac_sha512_update(&hmac->sha512, data, data_len); 279 + return; 280 + } 281 + /* Unreachable because nvme_auth_hmac_init() validated hmac_id */ 282 + WARN_ON_ONCE(1); 283 + } 284 + EXPORT_SYMBOL_GPL(nvme_auth_hmac_update); 285 + 286 + /* Finish computing an HMAC value. Note that this zeroizes the HMAC context. */ 287 + void nvme_auth_hmac_final(struct nvme_auth_hmac_ctx *hmac, u8 *out) 288 + { 289 + switch (hmac->hmac_id) { 290 + case NVME_AUTH_HASH_SHA256: 291 + hmac_sha256_final(&hmac->sha256, out); 292 + return; 293 + case NVME_AUTH_HASH_SHA384: 294 + hmac_sha384_final(&hmac->sha384, out); 295 + return; 296 + case NVME_AUTH_HASH_SHA512: 297 + hmac_sha512_final(&hmac->sha512, out); 298 + return; 299 + } 300 + /* Unreachable because nvme_auth_hmac_init() validated hmac_id */ 301 + WARN_ON_ONCE(1); 302 + } 303 + EXPORT_SYMBOL_GPL(nvme_auth_hmac_final); 304 + 305 + static int nvme_auth_hmac(u8 hmac_id, const u8 *key, size_t key_len, 306 + const u8 *data, size_t data_len, u8 *out) 307 + { 308 + struct nvme_auth_hmac_ctx hmac; 309 + int ret; 310 + 311 + ret = nvme_auth_hmac_init(&hmac, hmac_id, key, key_len); 312 + if (ret == 0) { 313 + nvme_auth_hmac_update(&hmac, data, data_len); 314 + nvme_auth_hmac_final(&hmac, out); 315 + } 316 + return ret; 317 + } 318 + 319 + static int nvme_auth_hash(u8 hmac_id, const u8 *data, size_t data_len, u8 *out) 320 + { 321 + switch (hmac_id) { 322 + case NVME_AUTH_HASH_SHA256: 323 + sha256(data, data_len, out); 324 + return 0; 325 + case NVME_AUTH_HASH_SHA384: 326 + sha384(data, data_len, out); 327 + return 0; 328 + case NVME_AUTH_HASH_SHA512: 329 + sha512(data, data_len, out); 330 + return 0; 331 + } 332 + pr_warn("%s: invalid hash algorithm %d\n", __func__, hmac_id); 333 + return -EINVAL; 334 + } 335 + 336 + struct nvme_dhchap_key *nvme_auth_transform_key( 337 + const struct nvme_dhchap_key *key, const char *nqn) 338 + { 339 + struct nvme_auth_hmac_ctx hmac; 230 340 struct nvme_dhchap_key *transformed_key; 231 341 int ret, key_len; 232 342 ··· 335 257 return ERR_PTR(-ENOMEM); 336 258 return transformed_key; 337 259 } 338 - hmac_name = nvme_auth_hmac_name(key->hash); 339 - if (!hmac_name) { 340 - pr_warn("Invalid key hash id %d\n", key->hash); 341 - return ERR_PTR(-EINVAL); 342 - } 343 - 344 - key_tfm = crypto_alloc_shash(hmac_name, 0, 0); 345 - if (IS_ERR(key_tfm)) 346 - return ERR_CAST(key_tfm); 347 - 348 - key_len = crypto_shash_digestsize(key_tfm); 260 + ret = nvme_auth_hmac_init(&hmac, key->hash, key->key, key->len); 261 + if (ret) 262 + return ERR_PTR(ret); 263 + key_len = nvme_auth_hmac_hash_len(key->hash); 349 264 transformed_key = nvme_auth_alloc_key(key_len, key->hash); 350 265 if (!transformed_key) { 351 - ret = -ENOMEM; 352 - goto out_free_key; 266 + memzero_explicit(&hmac, sizeof(hmac)); 267 + return ERR_PTR(-ENOMEM); 353 268 } 354 - 355 - shash->tfm = key_tfm; 356 - ret = crypto_shash_setkey(key_tfm, key->key, key->len); 357 - if (ret < 0) 358 - goto out_free_transformed_key; 359 - ret = crypto_shash_init(shash); 360 - if (ret < 0) 361 - goto out_free_transformed_key; 362 - ret = crypto_shash_update(shash, nqn, strlen(nqn)); 363 - if (ret < 0) 364 - goto out_free_transformed_key; 365 - ret = crypto_shash_update(shash, "NVMe-over-Fabrics", 17); 366 - if (ret < 0) 367 - goto out_free_transformed_key; 368 - ret = crypto_shash_final(shash, transformed_key->key); 369 - if (ret < 0) 370 - goto out_free_transformed_key; 371 - 372 - crypto_free_shash(key_tfm); 373 - 269 + nvme_auth_hmac_update(&hmac, nqn, strlen(nqn)); 270 + nvme_auth_hmac_update(&hmac, "NVMe-over-Fabrics", 17); 271 + nvme_auth_hmac_final(&hmac, transformed_key->key); 374 272 return transformed_key; 375 - 376 - out_free_transformed_key: 377 - nvme_auth_free_key(transformed_key); 378 - out_free_key: 379 - crypto_free_shash(key_tfm); 380 - 381 - return ERR_PTR(ret); 382 273 } 383 274 EXPORT_SYMBOL_GPL(nvme_auth_transform_key); 384 275 385 - static int nvme_auth_hash_skey(int hmac_id, u8 *skey, size_t skey_len, u8 *hkey) 276 + int nvme_auth_augmented_challenge(u8 hmac_id, const u8 *skey, size_t skey_len, 277 + const u8 *challenge, u8 *aug, size_t hlen) 386 278 { 387 - const char *digest_name; 388 - struct crypto_shash *tfm; 279 + u8 hashed_key[NVME_AUTH_MAX_DIGEST_SIZE]; 389 280 int ret; 390 281 391 - digest_name = nvme_auth_digest_name(hmac_id); 392 - if (!digest_name) { 393 - pr_debug("%s: failed to get digest for %d\n", __func__, 394 - hmac_id); 395 - return -EINVAL; 396 - } 397 - tfm = crypto_alloc_shash(digest_name, 0, 0); 398 - if (IS_ERR(tfm)) 399 - return -ENOMEM; 400 - 401 - ret = crypto_shash_tfm_digest(tfm, skey, skey_len, hkey); 402 - if (ret < 0) 403 - pr_debug("%s: Failed to hash digest len %zu\n", __func__, 404 - skey_len); 405 - 406 - crypto_free_shash(tfm); 407 - return ret; 408 - } 409 - 410 - int nvme_auth_augmented_challenge(u8 hmac_id, u8 *skey, size_t skey_len, 411 - u8 *challenge, u8 *aug, size_t hlen) 412 - { 413 - struct crypto_shash *tfm; 414 - u8 *hashed_key; 415 - const char *hmac_name; 416 - int ret; 417 - 418 - hashed_key = kmalloc(hlen, GFP_KERNEL); 419 - if (!hashed_key) 420 - return -ENOMEM; 421 - 422 - ret = nvme_auth_hash_skey(hmac_id, skey, 423 - skey_len, hashed_key); 424 - if (ret < 0) 425 - goto out_free_key; 426 - 427 - hmac_name = nvme_auth_hmac_name(hmac_id); 428 - if (!hmac_name) { 429 - pr_warn("%s: invalid hash algorithm %d\n", 430 - __func__, hmac_id); 431 - ret = -EINVAL; 432 - goto out_free_key; 433 - } 434 - 435 - tfm = crypto_alloc_shash(hmac_name, 0, 0); 436 - if (IS_ERR(tfm)) { 437 - ret = PTR_ERR(tfm); 438 - goto out_free_key; 439 - } 440 - 441 - ret = crypto_shash_setkey(tfm, hashed_key, hlen); 282 + ret = nvme_auth_hash(hmac_id, skey, skey_len, hashed_key); 442 283 if (ret) 443 - goto out_free_hash; 444 - 445 - ret = crypto_shash_tfm_digest(tfm, challenge, hlen, aug); 446 - out_free_hash: 447 - crypto_free_shash(tfm); 448 - out_free_key: 449 - kfree_sensitive(hashed_key); 284 + return ret; 285 + ret = nvme_auth_hmac(hmac_id, hashed_key, hlen, challenge, hlen, aug); 286 + memzero_explicit(hashed_key, sizeof(hashed_key)); 450 287 return ret; 451 288 } 452 289 EXPORT_SYMBOL_GPL(nvme_auth_augmented_challenge); ··· 404 411 EXPORT_SYMBOL_GPL(nvme_auth_gen_pubkey); 405 412 406 413 int nvme_auth_gen_shared_secret(struct crypto_kpp *dh_tfm, 407 - u8 *ctrl_key, size_t ctrl_key_len, 414 + const u8 *ctrl_key, size_t ctrl_key_len, 408 415 u8 *sess_key, size_t sess_key_len) 409 416 { 410 417 struct kpp_request *req; ··· 431 438 } 432 439 EXPORT_SYMBOL_GPL(nvme_auth_gen_shared_secret); 433 440 434 - int nvme_auth_generate_key(u8 *secret, struct nvme_dhchap_key **ret_key) 441 + int nvme_auth_parse_key(const char *secret, struct nvme_dhchap_key **ret_key) 435 442 { 436 443 struct nvme_dhchap_key *key; 437 444 u8 key_hash; ··· 454 461 *ret_key = key; 455 462 return 0; 456 463 } 457 - EXPORT_SYMBOL_GPL(nvme_auth_generate_key); 464 + EXPORT_SYMBOL_GPL(nvme_auth_parse_key); 458 465 459 466 /** 460 467 * nvme_auth_generate_psk - Generate a PSK for TLS ··· 479 486 * Returns 0 on success with a valid generated PSK pointer in @ret_psk and 480 487 * the length of @ret_psk in @ret_len, or a negative error number otherwise. 481 488 */ 482 - int nvme_auth_generate_psk(u8 hmac_id, u8 *skey, size_t skey_len, 483 - u8 *c1, u8 *c2, size_t hash_len, u8 **ret_psk, size_t *ret_len) 489 + int nvme_auth_generate_psk(u8 hmac_id, const u8 *skey, size_t skey_len, 490 + const u8 *c1, const u8 *c2, size_t hash_len, 491 + u8 **ret_psk, size_t *ret_len) 484 492 { 485 - struct crypto_shash *tfm; 486 - SHASH_DESC_ON_STACK(shash, tfm); 493 + size_t psk_len = nvme_auth_hmac_hash_len(hmac_id); 494 + struct nvme_auth_hmac_ctx hmac; 487 495 u8 *psk; 488 - const char *hmac_name; 489 - int ret, psk_len; 496 + int ret; 490 497 491 498 if (!c1 || !c2) 492 499 return -EINVAL; 493 500 494 - hmac_name = nvme_auth_hmac_name(hmac_id); 495 - if (!hmac_name) { 496 - pr_warn("%s: invalid hash algorithm %d\n", 497 - __func__, hmac_id); 498 - return -EINVAL; 499 - } 500 - 501 - tfm = crypto_alloc_shash(hmac_name, 0, 0); 502 - if (IS_ERR(tfm)) 503 - return PTR_ERR(tfm); 504 - 505 - psk_len = crypto_shash_digestsize(tfm); 501 + ret = nvme_auth_hmac_init(&hmac, hmac_id, skey, skey_len); 502 + if (ret) 503 + return ret; 506 504 psk = kzalloc(psk_len, GFP_KERNEL); 507 505 if (!psk) { 508 - ret = -ENOMEM; 509 - goto out_free_tfm; 506 + memzero_explicit(&hmac, sizeof(hmac)); 507 + return -ENOMEM; 510 508 } 511 - 512 - shash->tfm = tfm; 513 - ret = crypto_shash_setkey(tfm, skey, skey_len); 514 - if (ret) 515 - goto out_free_psk; 516 - 517 - ret = crypto_shash_init(shash); 518 - if (ret) 519 - goto out_free_psk; 520 - 521 - ret = crypto_shash_update(shash, c1, hash_len); 522 - if (ret) 523 - goto out_free_psk; 524 - 525 - ret = crypto_shash_update(shash, c2, hash_len); 526 - if (ret) 527 - goto out_free_psk; 528 - 529 - ret = crypto_shash_final(shash, psk); 530 - if (!ret) { 531 - *ret_psk = psk; 532 - *ret_len = psk_len; 533 - } 534 - 535 - out_free_psk: 536 - if (ret) 537 - kfree_sensitive(psk); 538 - out_free_tfm: 539 - crypto_free_shash(tfm); 540 - 541 - return ret; 509 + nvme_auth_hmac_update(&hmac, c1, hash_len); 510 + nvme_auth_hmac_update(&hmac, c2, hash_len); 511 + nvme_auth_hmac_final(&hmac, psk); 512 + *ret_psk = psk; 513 + *ret_len = psk_len; 514 + return 0; 542 515 } 543 516 EXPORT_SYMBOL_GPL(nvme_auth_generate_psk); 544 517 ··· 543 584 * Returns 0 on success with a valid digest pointer in @ret_digest, or a 544 585 * negative error number on failure. 545 586 */ 546 - int nvme_auth_generate_digest(u8 hmac_id, u8 *psk, size_t psk_len, 547 - char *subsysnqn, char *hostnqn, u8 **ret_digest) 587 + int nvme_auth_generate_digest(u8 hmac_id, const u8 *psk, size_t psk_len, 588 + const char *subsysnqn, const char *hostnqn, 589 + char **ret_digest) 548 590 { 549 - struct crypto_shash *tfm; 550 - SHASH_DESC_ON_STACK(shash, tfm); 551 - u8 *digest, *enc; 552 - const char *hmac_name; 553 - size_t digest_len, hmac_len; 591 + struct nvme_auth_hmac_ctx hmac; 592 + u8 digest[NVME_AUTH_MAX_DIGEST_SIZE]; 593 + size_t hash_len = nvme_auth_hmac_hash_len(hmac_id); 594 + char *enc; 595 + size_t enc_len; 554 596 int ret; 555 597 556 598 if (WARN_ON(!subsysnqn || !hostnqn)) 557 599 return -EINVAL; 558 600 559 - hmac_name = nvme_auth_hmac_name(hmac_id); 560 - if (!hmac_name) { 601 + if (hash_len == 0) { 561 602 pr_warn("%s: invalid hash algorithm %d\n", 562 603 __func__, hmac_id); 563 604 return -EINVAL; 564 605 } 565 606 566 - switch (nvme_auth_hmac_hash_len(hmac_id)) { 607 + switch (hash_len) { 567 608 case 32: 568 - hmac_len = 44; 609 + enc_len = 44; 569 610 break; 570 611 case 48: 571 - hmac_len = 64; 612 + enc_len = 64; 572 613 break; 573 614 default: 574 615 pr_warn("%s: invalid hash algorithm '%s'\n", 575 - __func__, hmac_name); 616 + __func__, nvme_auth_hmac_name(hmac_id)); 576 617 return -EINVAL; 577 618 } 578 619 579 - enc = kzalloc(hmac_len + 1, GFP_KERNEL); 580 - if (!enc) 581 - return -ENOMEM; 582 - 583 - tfm = crypto_alloc_shash(hmac_name, 0, 0); 584 - if (IS_ERR(tfm)) { 585 - ret = PTR_ERR(tfm); 586 - goto out_free_enc; 587 - } 588 - 589 - digest_len = crypto_shash_digestsize(tfm); 590 - digest = kzalloc(digest_len, GFP_KERNEL); 591 - if (!digest) { 620 + enc = kzalloc(enc_len + 1, GFP_KERNEL); 621 + if (!enc) { 592 622 ret = -ENOMEM; 593 - goto out_free_tfm; 623 + goto out; 594 624 } 595 625 596 - shash->tfm = tfm; 597 - ret = crypto_shash_setkey(tfm, psk, psk_len); 626 + ret = nvme_auth_hmac_init(&hmac, hmac_id, psk, psk_len); 598 627 if (ret) 599 - goto out_free_digest; 628 + goto out; 629 + nvme_auth_hmac_update(&hmac, hostnqn, strlen(hostnqn)); 630 + nvme_auth_hmac_update(&hmac, " ", 1); 631 + nvme_auth_hmac_update(&hmac, subsysnqn, strlen(subsysnqn)); 632 + nvme_auth_hmac_update(&hmac, " NVMe-over-Fabrics", 18); 633 + nvme_auth_hmac_final(&hmac, digest); 600 634 601 - ret = crypto_shash_init(shash); 602 - if (ret) 603 - goto out_free_digest; 604 - 605 - ret = crypto_shash_update(shash, hostnqn, strlen(hostnqn)); 606 - if (ret) 607 - goto out_free_digest; 608 - 609 - ret = crypto_shash_update(shash, " ", 1); 610 - if (ret) 611 - goto out_free_digest; 612 - 613 - ret = crypto_shash_update(shash, subsysnqn, strlen(subsysnqn)); 614 - if (ret) 615 - goto out_free_digest; 616 - 617 - ret = crypto_shash_update(shash, " NVMe-over-Fabrics", 18); 618 - if (ret) 619 - goto out_free_digest; 620 - 621 - ret = crypto_shash_final(shash, digest); 622 - if (ret) 623 - goto out_free_digest; 624 - 625 - ret = base64_encode(digest, digest_len, enc, true, BASE64_STD); 626 - if (ret < hmac_len) { 635 + ret = base64_encode(digest, hash_len, enc, true, BASE64_STD); 636 + if (ret < enc_len) { 627 637 ret = -ENOKEY; 628 - goto out_free_digest; 638 + goto out; 629 639 } 630 640 *ret_digest = enc; 631 641 ret = 0; 632 642 633 - out_free_digest: 634 - kfree_sensitive(digest); 635 - out_free_tfm: 636 - crypto_free_shash(tfm); 637 - out_free_enc: 643 + out: 638 644 if (ret) 639 645 kfree_sensitive(enc); 640 - 646 + memzero_explicit(digest, sizeof(digest)); 641 647 return ret; 642 648 } 643 649 EXPORT_SYMBOL_GPL(nvme_auth_generate_digest); 644 - 645 - /** 646 - * hkdf_expand_label - HKDF-Expand-Label (RFC 8846 section 7.1) 647 - * @hmac_tfm: hash context keyed with pseudorandom key 648 - * @label: ASCII label without "tls13 " prefix 649 - * @labellen: length of @label 650 - * @context: context bytes 651 - * @contextlen: length of @context 652 - * @okm: output keying material 653 - * @okmlen: length of @okm 654 - * 655 - * Build the TLS 1.3 HkdfLabel structure and invoke hkdf_expand(). 656 - * 657 - * Returns 0 on success with output keying material stored in @okm, 658 - * or a negative errno value otherwise. 659 - */ 660 - static int hkdf_expand_label(struct crypto_shash *hmac_tfm, 661 - const u8 *label, unsigned int labellen, 662 - const u8 *context, unsigned int contextlen, 663 - u8 *okm, unsigned int okmlen) 664 - { 665 - int err; 666 - u8 *info; 667 - unsigned int infolen; 668 - const char *tls13_prefix = "tls13 "; 669 - unsigned int prefixlen = strlen(tls13_prefix); 670 - 671 - if (WARN_ON(labellen > (255 - prefixlen))) 672 - return -EINVAL; 673 - if (WARN_ON(contextlen > 255)) 674 - return -EINVAL; 675 - 676 - infolen = 2 + (1 + prefixlen + labellen) + (1 + contextlen); 677 - info = kzalloc(infolen, GFP_KERNEL); 678 - if (!info) 679 - return -ENOMEM; 680 - 681 - /* HkdfLabel.Length */ 682 - put_unaligned_be16(okmlen, info); 683 - 684 - /* HkdfLabel.Label */ 685 - info[2] = prefixlen + labellen; 686 - memcpy(info + 3, tls13_prefix, prefixlen); 687 - memcpy(info + 3 + prefixlen, label, labellen); 688 - 689 - /* HkdfLabel.Context */ 690 - info[3 + prefixlen + labellen] = contextlen; 691 - memcpy(info + 4 + prefixlen + labellen, context, contextlen); 692 - 693 - err = hkdf_expand(hmac_tfm, info, infolen, okm, okmlen); 694 - kfree_sensitive(info); 695 - return err; 696 - } 697 650 698 651 /** 699 652 * nvme_auth_derive_tls_psk - Derive TLS PSK ··· 634 763 * Returns 0 on success with a valid psk pointer in @ret_psk or a negative 635 764 * error number otherwise. 636 765 */ 637 - int nvme_auth_derive_tls_psk(int hmac_id, u8 *psk, size_t psk_len, 638 - u8 *psk_digest, u8 **ret_psk) 766 + int nvme_auth_derive_tls_psk(int hmac_id, const u8 *psk, size_t psk_len, 767 + const char *psk_digest, u8 **ret_psk) 639 768 { 640 - struct crypto_shash *hmac_tfm; 641 - const char *hmac_name; 642 - const char *label = "nvme-tls-psk"; 643 - static const char default_salt[HKDF_MAX_HASHLEN]; 644 - size_t prk_len; 645 - const char *ctx; 646 - unsigned char *prk, *tls_key; 769 + static const u8 default_salt[NVME_AUTH_MAX_DIGEST_SIZE]; 770 + static const char label[] = "tls13 nvme-tls-psk"; 771 + const size_t label_len = sizeof(label) - 1; 772 + u8 prk[NVME_AUTH_MAX_DIGEST_SIZE]; 773 + size_t hash_len, ctx_len; 774 + u8 *hmac_data = NULL, *tls_key; 775 + size_t i; 647 776 int ret; 648 777 649 - hmac_name = nvme_auth_hmac_name(hmac_id); 650 - if (!hmac_name) { 778 + hash_len = nvme_auth_hmac_hash_len(hmac_id); 779 + if (hash_len == 0) { 651 780 pr_warn("%s: invalid hash algorithm %d\n", 652 781 __func__, hmac_id); 653 782 return -EINVAL; 654 783 } 655 784 if (hmac_id == NVME_AUTH_HASH_SHA512) { 656 785 pr_warn("%s: unsupported hash algorithm %s\n", 657 - __func__, hmac_name); 786 + __func__, nvme_auth_hmac_name(hmac_id)); 658 787 return -EINVAL; 659 788 } 660 789 661 - hmac_tfm = crypto_alloc_shash(hmac_name, 0, 0); 662 - if (IS_ERR(hmac_tfm)) 663 - return PTR_ERR(hmac_tfm); 664 - 665 - prk_len = crypto_shash_digestsize(hmac_tfm); 666 - prk = kzalloc(prk_len, GFP_KERNEL); 667 - if (!prk) { 668 - ret = -ENOMEM; 669 - goto out_free_shash; 790 + if (psk_len != hash_len) { 791 + pr_warn("%s: unexpected psk_len %zu\n", __func__, psk_len); 792 + return -EINVAL; 670 793 } 671 794 672 - if (WARN_ON(prk_len > HKDF_MAX_HASHLEN)) { 795 + /* HKDF-Extract */ 796 + ret = nvme_auth_hmac(hmac_id, default_salt, hash_len, psk, psk_len, 797 + prk); 798 + if (ret) 799 + goto out; 800 + 801 + /* 802 + * HKDF-Expand-Label (RFC 8446 section 7.1), with output length equal to 803 + * the hash length (so only a single HMAC operation is needed) 804 + */ 805 + 806 + hmac_data = kmalloc(/* output length */ 2 + 807 + /* label */ 1 + label_len + 808 + /* context (max) */ 1 + 3 + 1 + strlen(psk_digest) + 809 + /* counter */ 1, 810 + GFP_KERNEL); 811 + if (!hmac_data) { 812 + ret = -ENOMEM; 813 + goto out; 814 + } 815 + /* output length */ 816 + i = 0; 817 + hmac_data[i++] = hash_len >> 8; 818 + hmac_data[i++] = hash_len; 819 + 820 + /* label */ 821 + static_assert(label_len <= 255); 822 + hmac_data[i] = label_len; 823 + memcpy(&hmac_data[i + 1], label, label_len); 824 + i += 1 + label_len; 825 + 826 + /* context */ 827 + ctx_len = sprintf(&hmac_data[i + 1], "%02d %s", hmac_id, psk_digest); 828 + if (ctx_len > 255) { 673 829 ret = -EINVAL; 674 - goto out_free_prk; 830 + goto out; 675 831 } 676 - ret = hkdf_extract(hmac_tfm, psk, psk_len, 677 - default_salt, prk_len, prk); 678 - if (ret) 679 - goto out_free_prk; 832 + hmac_data[i] = ctx_len; 833 + i += 1 + ctx_len; 680 834 681 - ret = crypto_shash_setkey(hmac_tfm, prk, prk_len); 682 - if (ret) 683 - goto out_free_prk; 684 - 685 - ctx = kasprintf(GFP_KERNEL, "%02d %s", hmac_id, psk_digest); 686 - if (!ctx) { 687 - ret = -ENOMEM; 688 - goto out_free_prk; 689 - } 835 + /* counter (this overwrites the NUL terminator written by sprintf) */ 836 + hmac_data[i++] = 1; 690 837 691 838 tls_key = kzalloc(psk_len, GFP_KERNEL); 692 839 if (!tls_key) { 693 840 ret = -ENOMEM; 694 - goto out_free_ctx; 841 + goto out; 695 842 } 696 - ret = hkdf_expand_label(hmac_tfm, 697 - label, strlen(label), 698 - ctx, strlen(ctx), 699 - tls_key, psk_len); 843 + ret = nvme_auth_hmac(hmac_id, prk, hash_len, hmac_data, i, tls_key); 700 844 if (ret) { 701 - kfree(tls_key); 702 - goto out_free_ctx; 845 + kfree_sensitive(tls_key); 846 + goto out; 703 847 } 704 848 *ret_psk = tls_key; 705 - 706 - out_free_ctx: 707 - kfree(ctx); 708 - out_free_prk: 709 - kfree(prk); 710 - out_free_shash: 711 - crypto_free_shash(hmac_tfm); 712 - 849 + out: 850 + kfree_sensitive(hmac_data); 851 + memzero_explicit(prk, sizeof(prk)); 713 852 return ret; 714 853 } 715 854 EXPORT_SYMBOL_GPL(nvme_auth_derive_tls_psk);
+175
drivers/nvme/common/tests/auth_kunit.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Unit tests for NVMe authentication functions 4 + * 5 + * Copyright 2026 Google LLC 6 + */ 7 + 8 + #include <crypto/sha2.h> 9 + #include <kunit/test.h> 10 + #include <linux/nvme.h> 11 + #include <linux/nvme-auth.h> 12 + #include <linux/slab.h> 13 + 14 + struct nvme_auth_test_values { 15 + u8 hmac_id; 16 + size_t hash_len; 17 + u8 expected_psk[NVME_AUTH_MAX_DIGEST_SIZE]; 18 + char *expected_psk_digest; 19 + u8 expected_tls_psk[NVME_AUTH_MAX_DIGEST_SIZE]; 20 + }; 21 + 22 + static void kfree_action(void *ptr) 23 + { 24 + kfree(ptr); 25 + } 26 + 27 + static void kunit_add_kfree_action(struct kunit *test, void *ptr) 28 + { 29 + KUNIT_ASSERT_EQ(test, 0, 30 + kunit_add_action_or_reset(test, kfree_action, ptr)); 31 + } 32 + 33 + /* 34 + * Test the derivation of a TLS PSK from the initial skey. The vals parameter 35 + * gives the expected value of tls_psk as well as the intermediate values psk 36 + * and psk_digest. The inputs are implicitly the fixed values set below. 37 + */ 38 + static void 39 + test_nvme_auth_derive_tls_psk(struct kunit *test, 40 + const struct nvme_auth_test_values *vals) 41 + { 42 + const u8 hmac_id = vals->hmac_id; 43 + const size_t hash_len = vals->hash_len; 44 + const size_t skey_len = hash_len; 45 + u8 skey[NVME_AUTH_MAX_DIGEST_SIZE]; 46 + u8 c1[NVME_AUTH_MAX_DIGEST_SIZE]; 47 + u8 c2[NVME_AUTH_MAX_DIGEST_SIZE]; 48 + const char *subsysnqn = "subsysnqn"; 49 + const char *hostnqn = "hostnqn"; 50 + u8 *psk = NULL, *tls_psk = NULL; 51 + char *psk_digest = NULL; 52 + size_t psk_len; 53 + int ret; 54 + 55 + for (int i = 0; i < NVME_AUTH_MAX_DIGEST_SIZE; i++) { 56 + skey[i] = 'A' + i; 57 + c1[i] = i; 58 + c2[i] = 0xff - i; 59 + } 60 + 61 + ret = nvme_auth_generate_psk(hmac_id, skey, skey_len, c1, c2, hash_len, 62 + &psk, &psk_len); 63 + kunit_add_kfree_action(test, psk); 64 + KUNIT_ASSERT_EQ(test, 0, ret); 65 + KUNIT_ASSERT_EQ(test, hash_len, psk_len); 66 + KUNIT_ASSERT_MEMEQ(test, vals->expected_psk, psk, psk_len); 67 + 68 + ret = nvme_auth_generate_digest(hmac_id, psk, psk_len, subsysnqn, 69 + hostnqn, &psk_digest); 70 + kunit_add_kfree_action(test, psk_digest); 71 + if (vals->expected_psk_digest == NULL) { 72 + /* 73 + * Algorithm has an ID assigned but is not supported by 74 + * nvme_auth_generate_digest(). 75 + */ 76 + KUNIT_ASSERT_EQ(test, -EINVAL, ret); 77 + return; 78 + } 79 + KUNIT_ASSERT_EQ(test, 0, ret); 80 + KUNIT_ASSERT_STREQ(test, vals->expected_psk_digest, psk_digest); 81 + 82 + ret = nvme_auth_derive_tls_psk(hmac_id, psk, psk_len, psk_digest, 83 + &tls_psk); 84 + kunit_add_kfree_action(test, tls_psk); 85 + KUNIT_ASSERT_EQ(test, 0, ret); 86 + KUNIT_ASSERT_MEMEQ(test, vals->expected_tls_psk, tls_psk, psk_len); 87 + } 88 + 89 + static void test_nvme_auth_derive_tls_psk_hmac_sha256(struct kunit *test) 90 + { 91 + static const struct nvme_auth_test_values vals = { 92 + .hmac_id = NVME_AUTH_HASH_SHA256, 93 + .hash_len = SHA256_DIGEST_SIZE, 94 + .expected_psk = { 95 + 0x17, 0x33, 0xc5, 0x9f, 0xa7, 0xf4, 0x8f, 0xcf, 96 + 0x37, 0xf5, 0xf2, 0x6f, 0xc4, 0xff, 0x02, 0x68, 97 + 0xad, 0x4f, 0x78, 0xe0, 0x30, 0xf4, 0xf3, 0xb0, 98 + 0xbf, 0xd1, 0xd4, 0x7e, 0x7b, 0xb1, 0x44, 0x7a, 99 + }, 100 + .expected_psk_digest = "OldoKuTfKddMuyCznAZojkWD7P4D9/AtzDzLimtOxqI=", 101 + .expected_tls_psk = { 102 + 0x3c, 0x17, 0xda, 0x62, 0x84, 0x74, 0xa0, 0x4d, 103 + 0x22, 0x47, 0xc4, 0xca, 0xb4, 0x79, 0x68, 0xc9, 104 + 0x15, 0x38, 0x81, 0x93, 0xf7, 0xc0, 0x71, 0xbd, 105 + 0x94, 0x89, 0xcc, 0x36, 0x66, 0xcd, 0x7c, 0xc8, 106 + }, 107 + }; 108 + 109 + test_nvme_auth_derive_tls_psk(test, &vals); 110 + } 111 + 112 + static void test_nvme_auth_derive_tls_psk_hmac_sha384(struct kunit *test) 113 + { 114 + static const struct nvme_auth_test_values vals = { 115 + .hmac_id = NVME_AUTH_HASH_SHA384, 116 + .hash_len = SHA384_DIGEST_SIZE, 117 + .expected_psk = { 118 + 0xf1, 0x4b, 0x2d, 0xd3, 0x23, 0x4c, 0x45, 0x96, 119 + 0x94, 0xd3, 0xbc, 0x63, 0xf8, 0x96, 0x8b, 0xd6, 120 + 0xb3, 0x7c, 0x2c, 0x6d, 0xe8, 0x49, 0xe2, 0x2e, 121 + 0x11, 0x87, 0x49, 0x00, 0x1c, 0xe4, 0xbb, 0xe8, 122 + 0x64, 0x0b, 0x9e, 0x3a, 0x74, 0x8c, 0xb1, 0x1c, 123 + 0xe4, 0xb1, 0xd7, 0x1d, 0x35, 0x9c, 0xce, 0x39, 124 + }, 125 + .expected_psk_digest = "cffMWk8TSS7HOQebjgYEIkrPrjWPV4JE5cdPB8WhEvY4JBW5YynKyv66XscN4A9n", 126 + .expected_tls_psk = { 127 + 0x27, 0x74, 0x75, 0x32, 0x33, 0x53, 0x7b, 0x3f, 128 + 0xa5, 0x0e, 0xb7, 0xd1, 0x6a, 0x8e, 0x43, 0x45, 129 + 0x7d, 0x85, 0xf4, 0x90, 0x6c, 0x00, 0x5b, 0x22, 130 + 0x36, 0x61, 0x6c, 0x5d, 0x80, 0x93, 0x9d, 0x08, 131 + 0x98, 0xff, 0xf1, 0x5b, 0xb8, 0xb7, 0x71, 0x19, 132 + 0xd2, 0xbe, 0x0a, 0xac, 0x42, 0x3e, 0x75, 0x90, 133 + }, 134 + }; 135 + 136 + test_nvme_auth_derive_tls_psk(test, &vals); 137 + } 138 + 139 + static void test_nvme_auth_derive_tls_psk_hmac_sha512(struct kunit *test) 140 + { 141 + static const struct nvme_auth_test_values vals = { 142 + .hmac_id = NVME_AUTH_HASH_SHA512, 143 + .hash_len = SHA512_DIGEST_SIZE, 144 + .expected_psk = { 145 + 0x9c, 0x9f, 0x08, 0x9a, 0x61, 0x8b, 0x47, 0xd2, 146 + 0xd7, 0x5f, 0x4b, 0x6c, 0x28, 0x07, 0x04, 0x24, 147 + 0x48, 0x7b, 0x44, 0x5d, 0xd9, 0x6e, 0x70, 0xc4, 148 + 0xc0, 0x9b, 0x55, 0xe8, 0xb6, 0x00, 0x01, 0x52, 149 + 0xa3, 0x36, 0x3c, 0x34, 0x54, 0x04, 0x3f, 0x38, 150 + 0xf0, 0xb8, 0x50, 0x36, 0xde, 0xd4, 0x06, 0x55, 151 + 0x35, 0x0a, 0xa8, 0x7b, 0x8b, 0x6a, 0x28, 0x2b, 152 + 0x5c, 0x1a, 0xca, 0xe1, 0x62, 0x33, 0xdd, 0x5b, 153 + }, 154 + /* nvme_auth_generate_digest() doesn't support SHA-512 yet. */ 155 + .expected_psk_digest = NULL, 156 + }; 157 + 158 + test_nvme_auth_derive_tls_psk(test, &vals); 159 + } 160 + 161 + static struct kunit_case nvme_auth_test_cases[] = { 162 + KUNIT_CASE(test_nvme_auth_derive_tls_psk_hmac_sha256), 163 + KUNIT_CASE(test_nvme_auth_derive_tls_psk_hmac_sha384), 164 + KUNIT_CASE(test_nvme_auth_derive_tls_psk_hmac_sha512), 165 + {}, 166 + }; 167 + 168 + static struct kunit_suite nvme_auth_test_suite = { 169 + .name = "nvme-auth", 170 + .test_cases = nvme_auth_test_cases, 171 + }; 172 + kunit_test_suite(nvme_auth_test_suite); 173 + 174 + MODULE_DESCRIPTION("Unit tests for NVMe authentication functions"); 175 + MODULE_LICENSE("GPL");
+67 -120
drivers/nvme/host/auth.c
··· 7 7 #include <linux/base64.h> 8 8 #include <linux/prandom.h> 9 9 #include <linux/unaligned.h> 10 - #include <crypto/hash.h> 11 10 #include <crypto/dh.h> 12 11 #include "nvme.h" 13 12 #include "fabrics.h" ··· 21 22 struct list_head entry; 22 23 struct work_struct auth_work; 23 24 struct nvme_ctrl *ctrl; 24 - struct crypto_shash *shash_tfm; 25 25 struct crypto_kpp *dh_tfm; 26 26 struct nvme_dhchap_key *transformed_key; 27 27 void *buf; ··· 36 38 u8 hash_id; 37 39 u8 sc_c; 38 40 size_t hash_len; 39 - u8 c1[64]; 40 - u8 c2[64]; 41 - u8 response[64]; 41 + u8 c1[NVME_AUTH_MAX_DIGEST_SIZE]; 42 + u8 c2[NVME_AUTH_MAX_DIGEST_SIZE]; 43 + u8 response[NVME_AUTH_MAX_DIGEST_SIZE]; 42 44 u8 *ctrl_key; 43 45 u8 *host_key; 44 46 u8 *sess_key; ··· 123 125 { 124 126 struct nvmf_auth_dhchap_negotiate_data *data = chap->buf; 125 127 size_t size = sizeof(*data) + sizeof(union nvmf_auth_protocol); 128 + u8 dh_list_offset = NVME_AUTH_DHCHAP_MAX_DH_IDS; 129 + u8 *idlist = data->auth_protocol[0].dhchap.idlist; 126 130 127 131 if (size > CHAP_BUF_SIZE) { 128 132 chap->status = NVME_AUTH_DHCHAP_FAILURE_INCORRECT_PAYLOAD; ··· 141 141 data->sc_c = NVME_AUTH_SECP_NEWTLSPSK; 142 142 } else 143 143 data->sc_c = NVME_AUTH_SECP_NOSC; 144 + chap->sc_c = data->sc_c; 144 145 data->napd = 1; 145 146 data->auth_protocol[0].dhchap.authid = NVME_AUTH_DHCHAP_AUTH_ID; 146 147 data->auth_protocol[0].dhchap.halen = 3; 147 - data->auth_protocol[0].dhchap.dhlen = 6; 148 - data->auth_protocol[0].dhchap.idlist[0] = NVME_AUTH_HASH_SHA256; 149 - data->auth_protocol[0].dhchap.idlist[1] = NVME_AUTH_HASH_SHA384; 150 - data->auth_protocol[0].dhchap.idlist[2] = NVME_AUTH_HASH_SHA512; 151 - data->auth_protocol[0].dhchap.idlist[30] = NVME_AUTH_DHGROUP_NULL; 152 - data->auth_protocol[0].dhchap.idlist[31] = NVME_AUTH_DHGROUP_2048; 153 - data->auth_protocol[0].dhchap.idlist[32] = NVME_AUTH_DHGROUP_3072; 154 - data->auth_protocol[0].dhchap.idlist[33] = NVME_AUTH_DHGROUP_4096; 155 - data->auth_protocol[0].dhchap.idlist[34] = NVME_AUTH_DHGROUP_6144; 156 - data->auth_protocol[0].dhchap.idlist[35] = NVME_AUTH_DHGROUP_8192; 157 - 158 - chap->sc_c = data->sc_c; 148 + idlist[0] = NVME_AUTH_HASH_SHA256; 149 + idlist[1] = NVME_AUTH_HASH_SHA384; 150 + idlist[2] = NVME_AUTH_HASH_SHA512; 151 + if (chap->sc_c == NVME_AUTH_SECP_NOSC) 152 + idlist[dh_list_offset++] = NVME_AUTH_DHGROUP_NULL; 153 + idlist[dh_list_offset++] = NVME_AUTH_DHGROUP_2048; 154 + idlist[dh_list_offset++] = NVME_AUTH_DHGROUP_3072; 155 + idlist[dh_list_offset++] = NVME_AUTH_DHGROUP_4096; 156 + idlist[dh_list_offset++] = NVME_AUTH_DHGROUP_6144; 157 + idlist[dh_list_offset++] = NVME_AUTH_DHGROUP_8192; 158 + data->auth_protocol[0].dhchap.dhlen = 159 + dh_list_offset - NVME_AUTH_DHCHAP_MAX_DH_IDS; 159 160 160 161 return size; 161 162 } ··· 184 183 return -EPROTO; 185 184 } 186 185 187 - if (chap->hash_id == data->hashid && chap->shash_tfm && 188 - !strcmp(crypto_shash_alg_name(chap->shash_tfm), hmac_name) && 189 - crypto_shash_digestsize(chap->shash_tfm) == data->hl) { 186 + if (chap->hash_id == data->hashid && chap->hash_len == data->hl) { 190 187 dev_dbg(ctrl->device, 191 188 "qid %d: reuse existing hash %s\n", 192 189 chap->qid, hmac_name); 193 190 goto select_kpp; 194 191 } 195 192 196 - /* Reset if hash cannot be reused */ 197 - if (chap->shash_tfm) { 198 - crypto_free_shash(chap->shash_tfm); 199 - chap->hash_id = 0; 200 - chap->hash_len = 0; 201 - } 202 - chap->shash_tfm = crypto_alloc_shash(hmac_name, 0, 203 - CRYPTO_ALG_ALLOCATES_MEMORY); 204 - if (IS_ERR(chap->shash_tfm)) { 205 - dev_warn(ctrl->device, 206 - "qid %d: failed to allocate hash %s, error %ld\n", 207 - chap->qid, hmac_name, PTR_ERR(chap->shash_tfm)); 208 - chap->shash_tfm = NULL; 209 - chap->status = NVME_AUTH_DHCHAP_FAILURE_FAILED; 210 - return -ENOMEM; 211 - } 212 - 213 - if (crypto_shash_digestsize(chap->shash_tfm) != data->hl) { 193 + if (nvme_auth_hmac_hash_len(data->hashid) != data->hl) { 214 194 dev_warn(ctrl->device, 215 195 "qid %d: invalid hash length %d\n", 216 196 chap->qid, data->hl); 217 - crypto_free_shash(chap->shash_tfm); 218 - chap->shash_tfm = NULL; 219 197 chap->status = NVME_AUTH_DHCHAP_FAILURE_HASH_UNUSABLE; 220 198 return -EPROTO; 221 199 } ··· 414 434 static int nvme_auth_dhchap_setup_host_response(struct nvme_ctrl *ctrl, 415 435 struct nvme_dhchap_queue_context *chap) 416 436 { 417 - SHASH_DESC_ON_STACK(shash, chap->shash_tfm); 437 + struct nvme_auth_hmac_ctx hmac; 418 438 u8 buf[4], *challenge = chap->c1; 419 439 int ret; 420 440 ··· 434 454 __func__, chap->qid); 435 455 } 436 456 437 - ret = crypto_shash_setkey(chap->shash_tfm, 438 - chap->transformed_key->key, chap->transformed_key->len); 439 - if (ret) { 440 - dev_warn(ctrl->device, "qid %d: failed to set key, error %d\n", 441 - chap->qid, ret); 457 + ret = nvme_auth_hmac_init(&hmac, chap->hash_id, 458 + chap->transformed_key->key, 459 + chap->transformed_key->len); 460 + if (ret) 442 461 goto out; 443 - } 444 462 445 463 if (chap->dh_tfm) { 446 464 challenge = kmalloc(chap->hash_len, GFP_KERNEL); ··· 455 477 goto out; 456 478 } 457 479 458 - shash->tfm = chap->shash_tfm; 459 - ret = crypto_shash_init(shash); 460 - if (ret) 461 - goto out; 462 - ret = crypto_shash_update(shash, challenge, chap->hash_len); 463 - if (ret) 464 - goto out; 480 + nvme_auth_hmac_update(&hmac, challenge, chap->hash_len); 481 + 465 482 put_unaligned_le32(chap->s1, buf); 466 - ret = crypto_shash_update(shash, buf, 4); 467 - if (ret) 468 - goto out; 483 + nvme_auth_hmac_update(&hmac, buf, 4); 484 + 469 485 put_unaligned_le16(chap->transaction, buf); 470 - ret = crypto_shash_update(shash, buf, 2); 471 - if (ret) 472 - goto out; 486 + nvme_auth_hmac_update(&hmac, buf, 2); 487 + 473 488 *buf = chap->sc_c; 474 - ret = crypto_shash_update(shash, buf, 1); 475 - if (ret) 476 - goto out; 477 - ret = crypto_shash_update(shash, "HostHost", 8); 478 - if (ret) 479 - goto out; 480 - ret = crypto_shash_update(shash, ctrl->opts->host->nqn, 481 - strlen(ctrl->opts->host->nqn)); 482 - if (ret) 483 - goto out; 489 + nvme_auth_hmac_update(&hmac, buf, 1); 490 + nvme_auth_hmac_update(&hmac, "HostHost", 8); 491 + nvme_auth_hmac_update(&hmac, ctrl->opts->host->nqn, 492 + strlen(ctrl->opts->host->nqn)); 484 493 memset(buf, 0, sizeof(buf)); 485 - ret = crypto_shash_update(shash, buf, 1); 486 - if (ret) 487 - goto out; 488 - ret = crypto_shash_update(shash, ctrl->opts->subsysnqn, 489 - strlen(ctrl->opts->subsysnqn)); 490 - if (ret) 491 - goto out; 492 - ret = crypto_shash_final(shash, chap->response); 494 + nvme_auth_hmac_update(&hmac, buf, 1); 495 + nvme_auth_hmac_update(&hmac, ctrl->opts->subsysnqn, 496 + strlen(ctrl->opts->subsysnqn)); 497 + nvme_auth_hmac_final(&hmac, chap->response); 498 + ret = 0; 493 499 out: 494 500 if (challenge != chap->c1) 495 501 kfree(challenge); 502 + memzero_explicit(&hmac, sizeof(hmac)); 496 503 return ret; 497 504 } 498 505 499 506 static int nvme_auth_dhchap_setup_ctrl_response(struct nvme_ctrl *ctrl, 500 507 struct nvme_dhchap_queue_context *chap) 501 508 { 502 - SHASH_DESC_ON_STACK(shash, chap->shash_tfm); 509 + struct nvme_auth_hmac_ctx hmac; 503 510 struct nvme_dhchap_key *transformed_key; 504 511 u8 buf[4], *challenge = chap->c2; 505 512 int ret; ··· 496 533 return ret; 497 534 } 498 535 499 - ret = crypto_shash_setkey(chap->shash_tfm, 500 - transformed_key->key, transformed_key->len); 536 + ret = nvme_auth_hmac_init(&hmac, chap->hash_id, transformed_key->key, 537 + transformed_key->len); 501 538 if (ret) { 502 - dev_warn(ctrl->device, "qid %d: failed to set key, error %d\n", 539 + dev_warn(ctrl->device, "qid %d: failed to init hmac, error %d\n", 503 540 chap->qid, ret); 504 541 goto out; 505 542 } ··· 526 563 __func__, chap->qid, ctrl->opts->subsysnqn); 527 564 dev_dbg(ctrl->device, "%s: qid %d hostnqn %s\n", 528 565 __func__, chap->qid, ctrl->opts->host->nqn); 529 - shash->tfm = chap->shash_tfm; 530 - ret = crypto_shash_init(shash); 531 - if (ret) 532 - goto out; 533 - ret = crypto_shash_update(shash, challenge, chap->hash_len); 534 - if (ret) 535 - goto out; 566 + 567 + nvme_auth_hmac_update(&hmac, challenge, chap->hash_len); 568 + 536 569 put_unaligned_le32(chap->s2, buf); 537 - ret = crypto_shash_update(shash, buf, 4); 538 - if (ret) 539 - goto out; 570 + nvme_auth_hmac_update(&hmac, buf, 4); 571 + 540 572 put_unaligned_le16(chap->transaction, buf); 541 - ret = crypto_shash_update(shash, buf, 2); 542 - if (ret) 543 - goto out; 573 + nvme_auth_hmac_update(&hmac, buf, 2); 574 + 544 575 memset(buf, 0, 4); 545 - ret = crypto_shash_update(shash, buf, 1); 546 - if (ret) 547 - goto out; 548 - ret = crypto_shash_update(shash, "Controller", 10); 549 - if (ret) 550 - goto out; 551 - ret = crypto_shash_update(shash, ctrl->opts->subsysnqn, 552 - strlen(ctrl->opts->subsysnqn)); 553 - if (ret) 554 - goto out; 555 - ret = crypto_shash_update(shash, buf, 1); 556 - if (ret) 557 - goto out; 558 - ret = crypto_shash_update(shash, ctrl->opts->host->nqn, 559 - strlen(ctrl->opts->host->nqn)); 560 - if (ret) 561 - goto out; 562 - ret = crypto_shash_final(shash, chap->response); 576 + nvme_auth_hmac_update(&hmac, buf, 1); 577 + nvme_auth_hmac_update(&hmac, "Controller", 10); 578 + nvme_auth_hmac_update(&hmac, ctrl->opts->subsysnqn, 579 + strlen(ctrl->opts->subsysnqn)); 580 + nvme_auth_hmac_update(&hmac, buf, 1); 581 + nvme_auth_hmac_update(&hmac, ctrl->opts->host->nqn, 582 + strlen(ctrl->opts->host->nqn)); 583 + nvme_auth_hmac_final(&hmac, chap->response); 584 + ret = 0; 563 585 out: 564 586 if (challenge != chap->c2) 565 587 kfree(challenge); 588 + memzero_explicit(&hmac, sizeof(hmac)); 566 589 nvme_auth_free_key(transformed_key); 567 590 return ret; 568 591 } ··· 638 689 { 639 690 nvme_auth_reset_dhchap(chap); 640 691 chap->authenticated = false; 641 - if (chap->shash_tfm) 642 - crypto_free_shash(chap->shash_tfm); 643 692 if (chap->dh_tfm) 644 693 crypto_free_kpp(chap->dh_tfm); 645 694 } ··· 655 708 static int nvme_auth_secure_concat(struct nvme_ctrl *ctrl, 656 709 struct nvme_dhchap_queue_context *chap) 657 710 { 658 - u8 *psk, *digest, *tls_psk; 711 + u8 *psk, *tls_psk; 712 + char *digest; 659 713 struct key *tls_key; 660 714 size_t psk_len; 661 715 int ret = 0; ··· 1019 1071 INIT_WORK(&ctrl->dhchap_auth_work, nvme_ctrl_auth_work); 1020 1072 if (!ctrl->opts) 1021 1073 return 0; 1022 - ret = nvme_auth_generate_key(ctrl->opts->dhchap_secret, 1023 - &ctrl->host_key); 1074 + ret = nvme_auth_parse_key(ctrl->opts->dhchap_secret, &ctrl->host_key); 1024 1075 if (ret) 1025 1076 return ret; 1026 - ret = nvme_auth_generate_key(ctrl->opts->dhchap_ctrl_secret, 1027 - &ctrl->ctrl_key); 1077 + ret = nvme_auth_parse_key(ctrl->opts->dhchap_ctrl_secret, 1078 + &ctrl->ctrl_key); 1028 1079 if (ret) 1029 1080 goto err_free_dhchap_secret; 1030 1081
+60 -29
drivers/nvme/host/core.c
··· 1875 1875 break; 1876 1876 } 1877 1877 1878 + bi->flags |= BLK_SPLIT_INTERVAL_CAPABLE; 1878 1879 bi->metadata_size = head->ms; 1879 1880 if (bi->csum_type) { 1880 1881 bi->pi_tuple_size = head->pi_size; 1881 1882 bi->pi_offset = info->pi_offset; 1882 1883 } 1883 1884 return true; 1884 - } 1885 - 1886 - static void nvme_config_discard(struct nvme_ns *ns, struct queue_limits *lim) 1887 - { 1888 - struct nvme_ctrl *ctrl = ns->ctrl; 1889 - 1890 - if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(ns->head, UINT_MAX)) 1891 - lim->max_hw_discard_sectors = 1892 - nvme_lba_to_sect(ns->head, ctrl->dmrsl); 1893 - else if (ctrl->oncs & NVME_CTRL_ONCS_DSM) 1894 - lim->max_hw_discard_sectors = UINT_MAX; 1895 - else 1896 - lim->max_hw_discard_sectors = 0; 1897 - 1898 - lim->discard_granularity = lim->logical_block_size; 1899 - 1900 - if (ctrl->dmrl) 1901 - lim->max_discard_segments = ctrl->dmrl; 1902 - else 1903 - lim->max_discard_segments = NVME_DSM_MAX_RANGES; 1904 1885 } 1905 1886 1906 1887 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b) ··· 2059 2078 } 2060 2079 2061 2080 static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id, 2062 - struct queue_limits *lim) 2081 + struct nvme_id_ns_nvm *nvm, struct queue_limits *lim) 2063 2082 { 2064 2083 struct nvme_ns_head *head = ns->head; 2084 + struct nvme_ctrl *ctrl = ns->ctrl; 2065 2085 u32 bs = 1U << head->lba_shift; 2066 2086 u32 atomic_bs, phys_bs, io_opt = 0; 2087 + u32 npdg = 1, npda = 1; 2067 2088 bool valid = true; 2089 + u8 optperf; 2068 2090 2069 2091 /* 2070 2092 * The block layer can't support LBA sizes larger than the page size ··· 2082 2098 phys_bs = bs; 2083 2099 atomic_bs = nvme_configure_atomic_write(ns, id, lim, bs); 2084 2100 2085 - if (id->nsfeat & NVME_NS_FEAT_IO_OPT) { 2101 + optperf = id->nsfeat >> NVME_NS_FEAT_OPTPERF_SHIFT; 2102 + if (ctrl->vs >= NVME_VS(2, 1, 0)) 2103 + optperf &= NVME_NS_FEAT_OPTPERF_MASK_2_1; 2104 + else 2105 + optperf &= NVME_NS_FEAT_OPTPERF_MASK; 2106 + if (optperf) { 2086 2107 /* NPWG = Namespace Preferred Write Granularity */ 2087 2108 phys_bs = bs * (1 + le16_to_cpu(id->npwg)); 2088 2109 /* NOWS = Namespace Optimal Write Size */ ··· 2104 2115 lim->physical_block_size = min(phys_bs, atomic_bs); 2105 2116 lim->io_min = phys_bs; 2106 2117 lim->io_opt = io_opt; 2107 - if ((ns->ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES) && 2108 - (ns->ctrl->oncs & NVME_CTRL_ONCS_DSM)) 2118 + if ((ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES) && 2119 + (ctrl->oncs & NVME_CTRL_ONCS_DSM)) 2109 2120 lim->max_write_zeroes_sectors = UINT_MAX; 2110 2121 else 2111 - lim->max_write_zeroes_sectors = ns->ctrl->max_zeroes_sectors; 2122 + lim->max_write_zeroes_sectors = ctrl->max_zeroes_sectors; 2123 + 2124 + if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(ns->head, UINT_MAX)) 2125 + lim->max_hw_discard_sectors = 2126 + nvme_lba_to_sect(ns->head, ctrl->dmrsl); 2127 + else if (ctrl->oncs & NVME_CTRL_ONCS_DSM) 2128 + lim->max_hw_discard_sectors = UINT_MAX; 2129 + else 2130 + lim->max_hw_discard_sectors = 0; 2131 + 2132 + /* 2133 + * NVMe namespaces advertise both a preferred deallocate granularity 2134 + * (for a discard length) and alignment (for a discard starting offset). 2135 + * However, Linux block devices advertise a single discard_granularity. 2136 + * From NVM Command Set specification 1.1 section 5.2.2, the NPDGL/NPDAL 2137 + * fields in the NVM Command Set Specific Identify Namespace structure 2138 + * are preferred to NPDG/NPDA in the Identify Namespace structure since 2139 + * they can represent larger values. However, NPDGL or NPDAL may be 0 if 2140 + * unsupported. NPDG and NPDA are 0's based. 2141 + * From Figure 115 of NVM Command Set specification 1.1, NPDGL and NPDAL 2142 + * are supported if the high bit of OPTPERF is set. NPDG is supported if 2143 + * the low bit of OPTPERF is set. NPDA is supported if either is set. 2144 + * NPDG should be a multiple of NPDA, and likewise NPDGL should be a 2145 + * multiple of NPDAL, but the spec doesn't say anything about NPDG vs. 2146 + * NPDAL or NPDGL vs. NPDA. So compute the maximum instead of assuming 2147 + * NPDG(L) is the larger. If neither NPDG, NPDGL, NPDA, nor NPDAL are 2148 + * supported, default the discard_granularity to the logical block size. 2149 + */ 2150 + if (optperf & 0x2 && nvm && nvm->npdgl) 2151 + npdg = le32_to_cpu(nvm->npdgl); 2152 + else if (optperf & 0x1) 2153 + npdg = from0based(id->npdg); 2154 + if (optperf & 0x2 && nvm && nvm->npdal) 2155 + npda = le32_to_cpu(nvm->npdal); 2156 + else if (optperf) 2157 + npda = from0based(id->npda); 2158 + if (check_mul_overflow(max(npdg, npda), lim->logical_block_size, 2159 + &lim->discard_granularity)) 2160 + lim->discard_granularity = lim->logical_block_size; 2161 + 2162 + if (ctrl->dmrl) 2163 + lim->max_discard_segments = ctrl->dmrl; 2164 + else 2165 + lim->max_discard_segments = NVME_DSM_MAX_RANGES; 2112 2166 return valid; 2113 2167 } 2114 2168 ··· 2385 2353 } 2386 2354 lbaf = nvme_lbaf_index(id->flbas); 2387 2355 2388 - if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) { 2356 + if (nvme_id_cns_ok(ns->ctrl, NVME_ID_CNS_CS_NS)) { 2389 2357 ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm); 2390 2358 if (ret < 0) 2391 2359 goto out; ··· 2413 2381 nvme_set_ctrl_limits(ns->ctrl, &lim, false); 2414 2382 nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info); 2415 2383 nvme_set_chunk_sectors(ns, id, &lim); 2416 - if (!nvme_update_disk_info(ns, id, &lim)) 2384 + if (!nvme_update_disk_info(ns, id, nvm, &lim)) 2417 2385 capacity = 0; 2418 2386 2419 - nvme_config_discard(ns, &lim); 2420 2387 if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && 2421 2388 ns->head->ids.csi == NVME_CSI_ZNS) 2422 2389 nvme_update_zone_info(ns, &lim, &zi); ··· 3419 3388 3420 3389 ctrl->dmrl = id->dmrl; 3421 3390 ctrl->dmrsl = le32_to_cpu(id->dmrsl); 3422 - if (id->wzsl) 3391 + if (id->wzsl && !(ctrl->quirks & NVME_QUIRK_DISABLE_WRITE_ZEROES)) 3423 3392 ctrl->max_zeroes_sectors = nvme_mps_to_sectors(ctrl, id->wzsl); 3424 3393 3425 3394 free_data:
+1 -14
drivers/nvme/host/multipath.c
··· 154 154 } 155 155 156 156 spin_lock_irqsave(&ns->head->requeue_lock, flags); 157 - for (bio = req->bio; bio; bio = bio->bi_next) { 157 + for (bio = req->bio; bio; bio = bio->bi_next) 158 158 bio_set_dev(bio, ns->head->disk->part0); 159 - if (bio->bi_opf & REQ_POLLED) { 160 - bio->bi_opf &= ~REQ_POLLED; 161 - bio->bi_cookie = BLK_QC_T_NONE; 162 - } 163 - /* 164 - * The alternate request queue that we may end up submitting 165 - * the bio to may be frozen temporarily, in this case REQ_NOWAIT 166 - * will fail the I/O immediately with EAGAIN to the issuer. 167 - * We are not in the issuer context which cannot block. Clear 168 - * the flag to avoid spurious EAGAIN I/O failures. 169 - */ 170 - bio->bi_opf &= ~REQ_NOWAIT; 171 - } 172 159 blk_steal_bios(&ns->head->requeue_list, req); 173 160 spin_unlock_irqrestore(&ns->head->requeue_lock, flags); 174 161
+6
drivers/nvme/host/nvme.h
··· 762 762 return (len >> 2) - 1; 763 763 } 764 764 765 + /* Decode a 2-byte "0's based"/"0-based" field */ 766 + static inline u32 from0based(__le16 value) 767 + { 768 + return (u32)le16_to_cpu(value) + 1; 769 + } 770 + 765 771 static inline bool nvme_is_ana_error(u16 status) 766 772 { 767 773 switch (status & NVME_SCT_SC_MASK) {
+2
drivers/nvme/host/pci.c
··· 4178 4178 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 4179 4179 { PCI_DEVICE(0x2646, 0x501E), /* KINGSTON OM3PGP4xxxxQ OS21011 NVMe SSD */ 4180 4180 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 4181 + { PCI_DEVICE(0x2646, 0x502F), /* KINGSTON OM3SGP4xxxxK NVMe SSD */ 4182 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 4181 4183 { PCI_DEVICE(0x1f40, 0x1202), /* Netac Technologies Co. NV3000 NVMe SSD */ 4182 4184 .driver_data = NVME_QUIRK_BOGUS_NID, }, 4183 4185 { PCI_DEVICE(0x1f40, 0x5236), /* Netac Technologies Co. NV7000 NVMe SSD */
+46 -4
drivers/nvme/host/sysfs.c
··· 658 658 struct nvme_dhchap_key *key, *host_key; 659 659 int ret; 660 660 661 - ret = nvme_auth_generate_key(dhchap_secret, &key); 661 + ret = nvme_auth_parse_key(dhchap_secret, &key); 662 662 if (ret) { 663 663 kfree(dhchap_secret); 664 664 return ret; ··· 716 716 struct nvme_dhchap_key *key, *ctrl_key; 717 717 int ret; 718 718 719 - ret = nvme_auth_generate_key(dhchap_secret, &key); 719 + ret = nvme_auth_parse_key(dhchap_secret, &key); 720 720 if (ret) { 721 721 kfree(dhchap_secret); 722 722 return ret; ··· 829 829 830 830 return sysfs_emit(buf, "%08x\n", key_serial(key)); 831 831 } 832 - static DEVICE_ATTR_RO(tls_configured_key); 832 + 833 + static ssize_t tls_configured_key_store(struct device *dev, 834 + struct device_attribute *attr, 835 + const char *buf, size_t count) 836 + { 837 + struct nvme_ctrl *ctrl = dev_get_drvdata(dev); 838 + int error, qid; 839 + 840 + error = kstrtoint(buf, 10, &qid); 841 + if (error) 842 + return error; 843 + 844 + /* 845 + * We currently only allow userspace to write a `0` indicating 846 + * generate a new key. 847 + */ 848 + if (qid) 849 + return -EINVAL; 850 + 851 + if (!ctrl->opts || !ctrl->opts->concat) 852 + return -EOPNOTSUPP; 853 + 854 + error = nvme_auth_negotiate(ctrl, 0); 855 + if (error < 0) { 856 + nvme_reset_ctrl(ctrl); 857 + return error; 858 + } 859 + 860 + error = nvme_auth_wait(ctrl, 0); 861 + if (error < 0) { 862 + nvme_reset_ctrl(ctrl); 863 + return error; 864 + } 865 + 866 + /* 867 + * We need to reset the TLS connection, so let's just 868 + * reset the controller. 869 + */ 870 + nvme_reset_ctrl(ctrl); 871 + 872 + return count; 873 + } 874 + static DEVICE_ATTR_RW(tls_configured_key); 833 875 834 876 static ssize_t tls_keyring_show(struct device *dev, 835 877 struct device_attribute *attr, char *buf) ··· 903 861 !ctrl->opts->tls && !ctrl->opts->concat) 904 862 return 0; 905 863 if (a == &dev_attr_tls_configured_key.attr && 906 - (!ctrl->opts->tls_key || ctrl->opts->concat)) 864 + !ctrl->opts->concat) 907 865 return 0; 908 866 if (a == &dev_attr_tls_keyring.attr && 909 867 !ctrl->opts->keyring)
+3 -1
drivers/nvme/target/admin-cmd.c
··· 1057 1057 status = NVME_SC_INTERNAL; 1058 1058 goto out; 1059 1059 } 1060 + if (req->ns->bdev) 1061 + nvmet_bdev_set_nvm_limits(req->ns->bdev, id); 1060 1062 status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id)); 1061 1063 kfree(id); 1062 1064 out: ··· 1605 1603 1606 1604 pr_debug("ctrl %d update keep-alive timer for %d secs\n", 1607 1605 ctrl->cntlid, ctrl->kato); 1608 - mod_delayed_work(system_wq, &ctrl->ka_work, ctrl->kato * HZ); 1606 + mod_delayed_work(system_percpu_wq, &ctrl->ka_work, ctrl->kato * HZ); 1609 1607 out: 1610 1608 nvmet_req_complete(req, status); 1611 1609 }
+58 -144
drivers/nvme/target/auth.c
··· 9 9 #include <linux/init.h> 10 10 #include <linux/slab.h> 11 11 #include <linux/err.h> 12 - #include <crypto/hash.h> 13 12 #include <linux/crc32.h> 14 13 #include <linux/base64.h> 15 14 #include <linux/ctype.h> ··· 43 44 pr_warn("Invalid DH-HMAC-CHAP hash id %d\n", 44 45 key_hash); 45 46 return -EINVAL; 46 - } 47 - if (key_hash > 0) { 48 - /* Validate selected hash algorithm */ 49 - const char *hmac = nvme_auth_hmac_name(key_hash); 50 - 51 - if (!crypto_has_shash(hmac, 0, 0)) { 52 - pr_err("DH-HMAC-CHAP hash %s unsupported\n", hmac); 53 - return -ENOTSUPP; 54 - } 55 47 } 56 48 dhchap_secret = kstrdup(secret, GFP_KERNEL); 57 49 if (!dhchap_secret) ··· 130 140 return ret; 131 141 } 132 142 133 - u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq) 143 + u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, bool reset) 134 144 { 135 145 int ret = 0; 136 146 struct nvmet_host_link *p; ··· 156 166 goto out_unlock; 157 167 } 158 168 159 - if (nvmet_queue_tls_keyid(sq)) { 169 + if (!reset && nvmet_queue_tls_keyid(sq)) { 160 170 pr_debug("host %s tls enabled\n", ctrl->hostnqn); 161 171 goto out_unlock; 162 172 } ··· 282 292 int nvmet_auth_host_hash(struct nvmet_req *req, u8 *response, 283 293 unsigned int shash_len) 284 294 { 285 - struct crypto_shash *shash_tfm; 286 - SHASH_DESC_ON_STACK(shash, shash_tfm); 295 + struct nvme_auth_hmac_ctx hmac; 287 296 struct nvmet_ctrl *ctrl = req->sq->ctrl; 288 - const char *hash_name; 289 297 u8 *challenge = req->sq->dhchap_c1; 290 298 struct nvme_dhchap_key *transformed_key; 291 299 u8 buf[4]; 292 300 int ret; 293 301 294 - hash_name = nvme_auth_hmac_name(ctrl->shash_id); 295 - if (!hash_name) { 296 - pr_warn("Hash ID %d invalid\n", ctrl->shash_id); 297 - return -EINVAL; 298 - } 299 - 300 - shash_tfm = crypto_alloc_shash(hash_name, 0, 0); 301 - if (IS_ERR(shash_tfm)) { 302 - pr_err("failed to allocate shash %s\n", hash_name); 303 - return PTR_ERR(shash_tfm); 304 - } 305 - 306 - if (shash_len != crypto_shash_digestsize(shash_tfm)) { 307 - pr_err("%s: hash len mismatch (len %d digest %d)\n", 308 - __func__, shash_len, 309 - crypto_shash_digestsize(shash_tfm)); 310 - ret = -EINVAL; 311 - goto out_free_tfm; 312 - } 313 - 314 302 transformed_key = nvme_auth_transform_key(ctrl->host_key, 315 303 ctrl->hostnqn); 316 - if (IS_ERR(transformed_key)) { 317 - ret = PTR_ERR(transformed_key); 318 - goto out_free_tfm; 319 - } 304 + if (IS_ERR(transformed_key)) 305 + return PTR_ERR(transformed_key); 320 306 321 - ret = crypto_shash_setkey(shash_tfm, transformed_key->key, 307 + ret = nvme_auth_hmac_init(&hmac, ctrl->shash_id, transformed_key->key, 322 308 transformed_key->len); 323 309 if (ret) 324 310 goto out_free_response; 311 + 312 + if (shash_len != nvme_auth_hmac_hash_len(ctrl->shash_id)) { 313 + pr_err("%s: hash len mismatch (len %u digest %zu)\n", __func__, 314 + shash_len, nvme_auth_hmac_hash_len(ctrl->shash_id)); 315 + ret = -EINVAL; 316 + goto out_free_response; 317 + } 325 318 326 319 if (ctrl->dh_gid != NVME_AUTH_DHGROUP_NULL) { 327 320 challenge = kmalloc(shash_len, GFP_KERNEL); ··· 318 345 req->sq->dhchap_c1, 319 346 challenge, shash_len); 320 347 if (ret) 321 - goto out; 348 + goto out_free_challenge; 322 349 } 323 350 324 351 pr_debug("ctrl %d qid %d host response seq %u transaction %d\n", 325 352 ctrl->cntlid, req->sq->qid, req->sq->dhchap_s1, 326 353 req->sq->dhchap_tid); 327 354 328 - shash->tfm = shash_tfm; 329 - ret = crypto_shash_init(shash); 330 - if (ret) 331 - goto out; 332 - ret = crypto_shash_update(shash, challenge, shash_len); 333 - if (ret) 334 - goto out; 355 + nvme_auth_hmac_update(&hmac, challenge, shash_len); 356 + 335 357 put_unaligned_le32(req->sq->dhchap_s1, buf); 336 - ret = crypto_shash_update(shash, buf, 4); 337 - if (ret) 338 - goto out; 358 + nvme_auth_hmac_update(&hmac, buf, 4); 359 + 339 360 put_unaligned_le16(req->sq->dhchap_tid, buf); 340 - ret = crypto_shash_update(shash, buf, 2); 341 - if (ret) 342 - goto out; 361 + nvme_auth_hmac_update(&hmac, buf, 2); 362 + 343 363 *buf = req->sq->sc_c; 344 - ret = crypto_shash_update(shash, buf, 1); 345 - if (ret) 346 - goto out; 347 - ret = crypto_shash_update(shash, "HostHost", 8); 348 - if (ret) 349 - goto out; 364 + nvme_auth_hmac_update(&hmac, buf, 1); 365 + nvme_auth_hmac_update(&hmac, "HostHost", 8); 350 366 memset(buf, 0, 4); 351 - ret = crypto_shash_update(shash, ctrl->hostnqn, strlen(ctrl->hostnqn)); 352 - if (ret) 353 - goto out; 354 - ret = crypto_shash_update(shash, buf, 1); 355 - if (ret) 356 - goto out; 357 - ret = crypto_shash_update(shash, ctrl->subsys->subsysnqn, 358 - strlen(ctrl->subsys->subsysnqn)); 359 - if (ret) 360 - goto out; 361 - ret = crypto_shash_final(shash, response); 362 - out: 367 + nvme_auth_hmac_update(&hmac, ctrl->hostnqn, strlen(ctrl->hostnqn)); 368 + nvme_auth_hmac_update(&hmac, buf, 1); 369 + nvme_auth_hmac_update(&hmac, ctrl->subsys->subsysnqn, 370 + strlen(ctrl->subsys->subsysnqn)); 371 + nvme_auth_hmac_final(&hmac, response); 372 + ret = 0; 373 + out_free_challenge: 363 374 if (challenge != req->sq->dhchap_c1) 364 375 kfree(challenge); 365 376 out_free_response: 377 + memzero_explicit(&hmac, sizeof(hmac)); 366 378 nvme_auth_free_key(transformed_key); 367 - out_free_tfm: 368 - crypto_free_shash(shash_tfm); 369 379 return ret; 370 380 } 371 381 372 382 int nvmet_auth_ctrl_hash(struct nvmet_req *req, u8 *response, 373 383 unsigned int shash_len) 374 384 { 375 - struct crypto_shash *shash_tfm; 376 - struct shash_desc *shash; 385 + struct nvme_auth_hmac_ctx hmac; 377 386 struct nvmet_ctrl *ctrl = req->sq->ctrl; 378 - const char *hash_name; 379 387 u8 *challenge = req->sq->dhchap_c2; 380 388 struct nvme_dhchap_key *transformed_key; 381 389 u8 buf[4]; 382 390 int ret; 383 391 384 - hash_name = nvme_auth_hmac_name(ctrl->shash_id); 385 - if (!hash_name) { 386 - pr_warn("Hash ID %d invalid\n", ctrl->shash_id); 387 - return -EINVAL; 388 - } 389 - 390 - shash_tfm = crypto_alloc_shash(hash_name, 0, 0); 391 - if (IS_ERR(shash_tfm)) { 392 - pr_err("failed to allocate shash %s\n", hash_name); 393 - return PTR_ERR(shash_tfm); 394 - } 395 - 396 - if (shash_len != crypto_shash_digestsize(shash_tfm)) { 397 - pr_debug("%s: hash len mismatch (len %d digest %d)\n", 398 - __func__, shash_len, 399 - crypto_shash_digestsize(shash_tfm)); 400 - ret = -EINVAL; 401 - goto out_free_tfm; 402 - } 403 - 404 392 transformed_key = nvme_auth_transform_key(ctrl->ctrl_key, 405 393 ctrl->subsys->subsysnqn); 406 - if (IS_ERR(transformed_key)) { 407 - ret = PTR_ERR(transformed_key); 408 - goto out_free_tfm; 409 - } 394 + if (IS_ERR(transformed_key)) 395 + return PTR_ERR(transformed_key); 410 396 411 - ret = crypto_shash_setkey(shash_tfm, transformed_key->key, 397 + ret = nvme_auth_hmac_init(&hmac, ctrl->shash_id, transformed_key->key, 412 398 transformed_key->len); 413 399 if (ret) 414 400 goto out_free_response; 401 + 402 + if (shash_len != nvme_auth_hmac_hash_len(ctrl->shash_id)) { 403 + pr_err("%s: hash len mismatch (len %u digest %zu)\n", __func__, 404 + shash_len, nvme_auth_hmac_hash_len(ctrl->shash_id)); 405 + ret = -EINVAL; 406 + goto out_free_response; 407 + } 415 408 416 409 if (ctrl->dh_gid != NVME_AUTH_DHGROUP_NULL) { 417 410 challenge = kmalloc(shash_len, GFP_KERNEL); ··· 394 455 goto out_free_challenge; 395 456 } 396 457 397 - shash = kzalloc(sizeof(*shash) + crypto_shash_descsize(shash_tfm), 398 - GFP_KERNEL); 399 - if (!shash) { 400 - ret = -ENOMEM; 401 - goto out_free_challenge; 402 - } 403 - shash->tfm = shash_tfm; 458 + nvme_auth_hmac_update(&hmac, challenge, shash_len); 404 459 405 - ret = crypto_shash_init(shash); 406 - if (ret) 407 - goto out; 408 - ret = crypto_shash_update(shash, challenge, shash_len); 409 - if (ret) 410 - goto out; 411 460 put_unaligned_le32(req->sq->dhchap_s2, buf); 412 - ret = crypto_shash_update(shash, buf, 4); 413 - if (ret) 414 - goto out; 461 + nvme_auth_hmac_update(&hmac, buf, 4); 462 + 415 463 put_unaligned_le16(req->sq->dhchap_tid, buf); 416 - ret = crypto_shash_update(shash, buf, 2); 417 - if (ret) 418 - goto out; 464 + nvme_auth_hmac_update(&hmac, buf, 2); 465 + 419 466 memset(buf, 0, 4); 420 - ret = crypto_shash_update(shash, buf, 1); 421 - if (ret) 422 - goto out; 423 - ret = crypto_shash_update(shash, "Controller", 10); 424 - if (ret) 425 - goto out; 426 - ret = crypto_shash_update(shash, ctrl->subsys->subsysnqn, 427 - strlen(ctrl->subsys->subsysnqn)); 428 - if (ret) 429 - goto out; 430 - ret = crypto_shash_update(shash, buf, 1); 431 - if (ret) 432 - goto out; 433 - ret = crypto_shash_update(shash, ctrl->hostnqn, strlen(ctrl->hostnqn)); 434 - if (ret) 435 - goto out; 436 - ret = crypto_shash_final(shash, response); 437 - out: 438 - kfree(shash); 467 + nvme_auth_hmac_update(&hmac, buf, 1); 468 + nvme_auth_hmac_update(&hmac, "Controller", 10); 469 + nvme_auth_hmac_update(&hmac, ctrl->subsys->subsysnqn, 470 + strlen(ctrl->subsys->subsysnqn)); 471 + nvme_auth_hmac_update(&hmac, buf, 1); 472 + nvme_auth_hmac_update(&hmac, ctrl->hostnqn, strlen(ctrl->hostnqn)); 473 + nvme_auth_hmac_final(&hmac, response); 474 + ret = 0; 439 475 out_free_challenge: 440 476 if (challenge != req->sq->dhchap_c2) 441 477 kfree(challenge); 442 478 out_free_response: 479 + memzero_explicit(&hmac, sizeof(hmac)); 443 480 nvme_auth_free_key(transformed_key); 444 - out_free_tfm: 445 - crypto_free_shash(shash_tfm); 446 481 return ret; 447 482 } 448 483 ··· 444 531 } 445 532 446 533 int nvmet_auth_ctrl_sesskey(struct nvmet_req *req, 447 - u8 *pkey, int pkey_size) 534 + const u8 *pkey, int pkey_size) 448 535 { 449 536 struct nvmet_ctrl *ctrl = req->sq->ctrl; 450 537 int ret; ··· 470 557 void nvmet_auth_insert_psk(struct nvmet_sq *sq) 471 558 { 472 559 int hash_len = nvme_auth_hmac_hash_len(sq->ctrl->shash_id); 473 - u8 *psk, *digest, *tls_psk; 560 + u8 *psk, *tls_psk; 561 + char *digest; 474 562 size_t psk_len; 475 563 int ret; 476 564 #ifdef CONFIG_NVME_TARGET_TCP_TLS
-3
drivers/nvme/target/configfs.c
··· 17 17 #include <linux/nvme-auth.h> 18 18 #endif 19 19 #include <linux/nvme-keyring.h> 20 - #include <crypto/hash.h> 21 20 #include <crypto/kpp.h> 22 21 #include <linux/nospec.h> 23 22 ··· 2180 2181 hmac_id = nvme_auth_hmac_id(page); 2181 2182 if (hmac_id == NVME_AUTH_HASH_INVALID) 2182 2183 return -EINVAL; 2183 - if (!crypto_has_shash(nvme_auth_hmac_name(hmac_id), 0, 0)) 2184 - return -ENOTSUPP; 2185 2184 host->dhchap_hash_id = hmac_id; 2186 2185 return count; 2187 2186 }
+4 -3
drivers/nvme/target/core.c
··· 1688 1688 if (args->hostid) 1689 1689 uuid_copy(&ctrl->hostid, args->hostid); 1690 1690 1691 - dhchap_status = nvmet_setup_auth(ctrl, args->sq); 1691 + dhchap_status = nvmet_setup_auth(ctrl, args->sq, false); 1692 1692 if (dhchap_status) { 1693 1693 pr_err("Failed to setup authentication, dhchap status %u\n", 1694 1694 dhchap_status); ··· 1944 1944 if (!nvmet_bvec_cache) 1945 1945 return -ENOMEM; 1946 1946 1947 - zbd_wq = alloc_workqueue("nvmet-zbd-wq", WQ_MEM_RECLAIM, 0); 1947 + zbd_wq = alloc_workqueue("nvmet-zbd-wq", WQ_MEM_RECLAIM | WQ_PERCPU, 1948 + 0); 1948 1949 if (!zbd_wq) 1949 1950 goto out_destroy_bvec_cache; 1950 1951 1951 1952 buffered_io_wq = alloc_workqueue("nvmet-buffered-io-wq", 1952 - WQ_MEM_RECLAIM, 0); 1953 + WQ_MEM_RECLAIM | WQ_PERCPU, 0); 1953 1954 if (!buffered_io_wq) 1954 1955 goto out_free_zbd_work_queue; 1955 1956
+8 -10
drivers/nvme/target/fabrics-cmd-auth.c
··· 8 8 #include <linux/blkdev.h> 9 9 #include <linux/random.h> 10 10 #include <linux/nvme-auth.h> 11 - #include <crypto/hash.h> 12 11 #include <crypto/kpp.h> 13 12 #include "nvmet.h" 14 13 ··· 74 75 for (i = 0; i < data->auth_protocol[0].dhchap.halen; i++) { 75 76 u8 host_hmac_id = data->auth_protocol[0].dhchap.idlist[i]; 76 77 77 - if (!fallback_hash_id && 78 - crypto_has_shash(nvme_auth_hmac_name(host_hmac_id), 0, 0)) 78 + if (!fallback_hash_id && nvme_auth_hmac_hash_len(host_hmac_id)) 79 79 fallback_hash_id = host_hmac_id; 80 80 if (ctrl->shash_id != host_hmac_id) 81 81 continue; ··· 291 293 pr_debug("%s: ctrl %d qid %d reset negotiation\n", 292 294 __func__, ctrl->cntlid, req->sq->qid); 293 295 if (!req->sq->qid) { 294 - dhchap_status = nvmet_setup_auth(ctrl, req->sq); 296 + dhchap_status = nvmet_setup_auth(ctrl, req->sq, 297 + true); 295 298 if (dhchap_status) { 296 299 pr_err("ctrl %d qid 0 failed to setup re-authentication\n", 297 300 ctrl->cntlid); ··· 390 391 req->sq->dhchap_step != NVME_AUTH_DHCHAP_MESSAGE_FAILURE2) { 391 392 unsigned long auth_expire_secs = ctrl->kato ? ctrl->kato : 120; 392 393 393 - mod_delayed_work(system_wq, &req->sq->auth_expired_work, 394 + mod_delayed_work(system_percpu_wq, &req->sq->auth_expired_work, 394 395 auth_expire_secs * HZ); 395 396 goto complete; 396 397 } 397 398 /* Final states, clear up variables */ 398 - nvmet_auth_sq_free(req->sq); 399 - if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_FAILURE2) 399 + if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_FAILURE2) { 400 + nvmet_auth_sq_free(req->sq); 400 401 nvmet_ctrl_fatal_error(ctrl); 402 + } 401 403 402 404 complete: 403 405 nvmet_req_complete(req, status); ··· 574 574 status = nvmet_copy_to_sgl(req, 0, d, al); 575 575 kfree(d); 576 576 done: 577 - if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_SUCCESS2) 578 - nvmet_auth_sq_free(req->sq); 579 - else if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_FAILURE1) { 577 + if (req->sq->dhchap_step == NVME_AUTH_DHCHAP_MESSAGE_FAILURE1) { 580 578 nvmet_auth_sq_free(req->sq); 581 579 nvmet_ctrl_fatal_error(ctrl); 582 580 }
+3 -3
drivers/nvme/target/fc.c
··· 792 792 if (!queue) 793 793 return NULL; 794 794 795 - queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0, 796 - assoc->tgtport->fc_target_port.port_num, 797 - assoc->a_id, qid); 795 + queue->work_q = alloc_workqueue("ntfc%d.%d.%d", WQ_PERCPU, 0, 796 + assoc->tgtport->fc_target_port.port_num, 797 + assoc->a_id, qid); 798 798 if (!queue->work_q) 799 799 goto out_free_queue; 800 800
+15 -4
drivers/nvme/target/io-cmd-bdev.c
··· 30 30 id->nacwu = lpp0b; 31 31 32 32 /* 33 - * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and 34 - * NOWS are defined for this namespace and should be used by 35 - * the host for I/O optimization. 33 + * OPTPERF = 11b indicates that the fields NPWG, NPWA, NPDG, NPDA, 34 + * NPDGL, NPDAL, and NOWS are defined for this namespace and should be 35 + * used by the host for I/O optimization. 36 36 */ 37 - id->nsfeat |= 1 << 4; 37 + id->nsfeat |= 0x3 << NVME_NS_FEAT_OPTPERF_SHIFT; 38 38 /* NPWG = Namespace Preferred Write Granularity. 0's based */ 39 39 id->npwg = to0based(bdev_io_min(bdev) / bdev_logical_block_size(bdev)); 40 40 /* NPWA = Namespace Preferred Write Alignment. 0's based */ ··· 50 50 /* Set WZDS and DRB if device supports unmapped write zeroes */ 51 51 if (bdev_write_zeroes_unmap_sectors(bdev)) 52 52 id->dlfeat = (1 << 3) | 0x1; 53 + } 54 + 55 + void nvmet_bdev_set_nvm_limits(struct block_device *bdev, 56 + struct nvme_id_ns_nvm *id) 57 + { 58 + /* 59 + * NPDGL = Namespace Preferred Deallocate Granularity Large 60 + * NPDAL = Namespace Preferred Deallocate Alignment Large 61 + */ 62 + id->npdgl = id->npdal = cpu_to_le32(bdev_discard_granularity(bdev) / 63 + bdev_logical_block_size(bdev)); 53 64 } 54 65 55 66 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
-2
drivers/nvme/target/loop.c
··· 419 419 { 420 420 if (ctrl->ctrl.queue_count > 1) { 421 421 nvme_quiesce_io_queues(&ctrl->ctrl); 422 - nvme_cancel_tagset(&ctrl->ctrl); 423 422 nvme_loop_destroy_io_queues(ctrl); 424 423 } 425 424 ··· 426 427 if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_LIVE) 427 428 nvme_disable_ctrl(&ctrl->ctrl, true); 428 429 429 - nvme_cancel_admin_tagset(&ctrl->ctrl); 430 430 nvme_loop_destroy_admin_queue(ctrl); 431 431 } 432 432
+5 -3
drivers/nvme/target/nvmet.h
··· 550 550 u16 nvmet_parse_connect_cmd(struct nvmet_req *req); 551 551 u32 nvmet_connect_cmd_data_len(struct nvmet_req *req); 552 552 void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id); 553 + void nvmet_bdev_set_nvm_limits(struct block_device *bdev, 554 + struct nvme_id_ns_nvm *id); 553 555 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req); 554 556 u16 nvmet_file_parse_io_cmd(struct nvmet_req *req); 555 557 u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req); ··· 898 896 int nvmet_auth_set_key(struct nvmet_host *host, const char *secret, 899 897 bool set_ctrl); 900 898 int nvmet_auth_set_host_hash(struct nvmet_host *host, const char *hash); 901 - u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq); 899 + u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, struct nvmet_sq *sq, bool reset); 902 900 void nvmet_auth_sq_init(struct nvmet_sq *sq); 903 901 void nvmet_destroy_auth(struct nvmet_ctrl *ctrl); 904 902 void nvmet_auth_sq_free(struct nvmet_sq *sq); ··· 915 913 int nvmet_auth_ctrl_exponential(struct nvmet_req *req, 916 914 u8 *buf, int buf_size); 917 915 int nvmet_auth_ctrl_sesskey(struct nvmet_req *req, 918 - u8 *buf, int buf_size); 916 + const u8 *pkey, int pkey_size); 919 917 void nvmet_auth_insert_psk(struct nvmet_sq *sq); 920 918 #else 921 919 static inline u8 nvmet_setup_auth(struct nvmet_ctrl *ctrl, 922 - struct nvmet_sq *sq) 920 + struct nvmet_sq *sq, bool reset) 923 921 { 924 922 return 0; 925 923 }
+1 -1
drivers/nvme/target/tcp.c
··· 2225 2225 int ret; 2226 2226 2227 2227 nvmet_tcp_wq = alloc_workqueue("nvmet_tcp_wq", 2228 - WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); 2228 + WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_PERCPU, 0); 2229 2229 if (!nvmet_tcp_wq) 2230 2230 return -ENOMEM; 2231 2231
+175 -1
drivers/scsi/scsi_bsg.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 #include <linux/bsg.h> 3 + #include <linux/io_uring/cmd.h> 3 4 #include <scsi/scsi.h> 4 5 #include <scsi/scsi_ioctl.h> 5 6 #include <scsi/scsi_cmnd.h> ··· 9 8 #include "scsi_priv.h" 10 9 11 10 #define uptr64(val) ((void __user *)(uintptr_t)(val)) 11 + 12 + /* 13 + * Per-command BSG SCSI PDU stored in io_uring_cmd.pdu[32]. 14 + * Holds temporary state between submission, completion and task_work. 15 + */ 16 + struct scsi_bsg_uring_cmd_pdu { 17 + struct bio *bio; /* mapped user buffer, unmap in task work */ 18 + struct request *req; /* block request, freed in task work */ 19 + u64 response_addr; /* user space response buffer address */ 20 + }; 21 + static_assert(sizeof(struct scsi_bsg_uring_cmd_pdu) <= sizeof_field(struct io_uring_cmd, pdu)); 22 + 23 + static inline struct scsi_bsg_uring_cmd_pdu *scsi_bsg_uring_cmd_pdu( 24 + struct io_uring_cmd *ioucmd) 25 + { 26 + return io_uring_cmd_to_pdu(ioucmd, struct scsi_bsg_uring_cmd_pdu); 27 + } 28 + 29 + /* Task work: build res2 (layout in uapi/linux/bsg.h) and copy sense to user. */ 30 + static void scsi_bsg_uring_task_cb(struct io_tw_req tw_req, io_tw_token_t tw) 31 + { 32 + struct io_uring_cmd *ioucmd = io_uring_cmd_from_tw(tw_req); 33 + struct scsi_bsg_uring_cmd_pdu *pdu = scsi_bsg_uring_cmd_pdu(ioucmd); 34 + struct request *rq = pdu->req; 35 + struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(rq); 36 + u64 res2; 37 + int ret = 0; 38 + u8 driver_status = 0; 39 + u8 sense_len_wr = 0; 40 + 41 + if (pdu->bio) 42 + blk_rq_unmap_user(pdu->bio); 43 + 44 + if (scsi_status_is_check_condition(scmd->result)) { 45 + driver_status = DRIVER_SENSE; 46 + if (pdu->response_addr) 47 + sense_len_wr = min_t(u8, scmd->sense_len, 48 + SCSI_SENSE_BUFFERSIZE); 49 + } 50 + 51 + if (sense_len_wr) { 52 + if (copy_to_user(uptr64(pdu->response_addr), scmd->sense_buffer, 53 + sense_len_wr)) 54 + ret = -EFAULT; 55 + } 56 + 57 + res2 = bsg_scsi_res2_build(status_byte(scmd->result), driver_status, 58 + host_byte(scmd->result), sense_len_wr, 59 + scmd->resid_len); 60 + 61 + blk_mq_free_request(rq); 62 + io_uring_cmd_done32(ioucmd, ret, res2, 63 + IO_URING_CMD_TASK_WORK_ISSUE_FLAGS); 64 + } 65 + 66 + static enum rq_end_io_ret scsi_bsg_uring_cmd_done(struct request *req, 67 + blk_status_t status, 68 + const struct io_comp_batch *iocb) 69 + { 70 + struct io_uring_cmd *ioucmd = req->end_io_data; 71 + 72 + io_uring_cmd_do_in_task_lazy(ioucmd, scsi_bsg_uring_task_cb); 73 + return RQ_END_IO_NONE; 74 + } 75 + 76 + static int scsi_bsg_map_user_buffer(struct request *req, 77 + struct io_uring_cmd *ioucmd, 78 + unsigned int issue_flags, gfp_t gfp_mask) 79 + { 80 + const struct bsg_uring_cmd *cmd = io_uring_sqe128_cmd(ioucmd->sqe, struct bsg_uring_cmd); 81 + bool is_write = cmd->dout_xfer_len > 0; 82 + u64 buf_addr = is_write ? cmd->dout_xferp : cmd->din_xferp; 83 + unsigned long buf_len = is_write ? cmd->dout_xfer_len : cmd->din_xfer_len; 84 + struct iov_iter iter; 85 + int ret; 86 + 87 + if (ioucmd->flags & IORING_URING_CMD_FIXED) { 88 + ret = io_uring_cmd_import_fixed(buf_addr, buf_len, 89 + is_write ? WRITE : READ, 90 + &iter, ioucmd, issue_flags); 91 + if (ret < 0) 92 + return ret; 93 + ret = blk_rq_map_user_iov(req->q, req, NULL, &iter, gfp_mask); 94 + } else { 95 + ret = blk_rq_map_user(req->q, req, NULL, uptr64(buf_addr), 96 + buf_len, gfp_mask); 97 + } 98 + 99 + return ret; 100 + } 101 + 102 + static int scsi_bsg_uring_cmd(struct request_queue *q, struct io_uring_cmd *ioucmd, 103 + unsigned int issue_flags, bool open_for_write) 104 + { 105 + struct scsi_bsg_uring_cmd_pdu *pdu = scsi_bsg_uring_cmd_pdu(ioucmd); 106 + const struct bsg_uring_cmd *cmd = io_uring_sqe128_cmd(ioucmd->sqe, struct bsg_uring_cmd); 107 + struct scsi_cmnd *scmd; 108 + struct request *req; 109 + blk_mq_req_flags_t blk_flags = 0; 110 + gfp_t gfp_mask = GFP_KERNEL; 111 + int ret; 112 + 113 + if (cmd->protocol != BSG_PROTOCOL_SCSI || 114 + cmd->subprotocol != BSG_SUB_PROTOCOL_SCSI_CMD) 115 + return -EINVAL; 116 + 117 + if (!cmd->request || cmd->request_len == 0) 118 + return -EINVAL; 119 + 120 + if (cmd->dout_xfer_len && cmd->din_xfer_len) { 121 + pr_warn_once("BIDI support in bsg has been removed.\n"); 122 + return -EOPNOTSUPP; 123 + } 124 + 125 + if (cmd->dout_iovec_count > 0 || cmd->din_iovec_count > 0) 126 + return -EOPNOTSUPP; 127 + 128 + if (issue_flags & IO_URING_F_NONBLOCK) { 129 + blk_flags = BLK_MQ_REQ_NOWAIT; 130 + gfp_mask = GFP_NOWAIT; 131 + } 132 + 133 + req = scsi_alloc_request(q, cmd->dout_xfer_len ? 134 + REQ_OP_DRV_OUT : REQ_OP_DRV_IN, blk_flags); 135 + if (IS_ERR(req)) 136 + return PTR_ERR(req); 137 + 138 + scmd = blk_mq_rq_to_pdu(req); 139 + if (cmd->request_len > sizeof(scmd->cmnd)) { 140 + ret = -EINVAL; 141 + goto out_free_req; 142 + } 143 + scmd->cmd_len = cmd->request_len; 144 + scmd->allowed = SG_DEFAULT_RETRIES; 145 + 146 + if (copy_from_user(scmd->cmnd, uptr64(cmd->request), cmd->request_len)) { 147 + ret = -EFAULT; 148 + goto out_free_req; 149 + } 150 + 151 + if (!scsi_cmd_allowed(scmd->cmnd, open_for_write)) { 152 + ret = -EPERM; 153 + goto out_free_req; 154 + } 155 + 156 + pdu->response_addr = cmd->response; 157 + scmd->sense_len = cmd->max_response_len ? 158 + min(cmd->max_response_len, SCSI_SENSE_BUFFERSIZE) : SCSI_SENSE_BUFFERSIZE; 159 + 160 + if (cmd->dout_xfer_len || cmd->din_xfer_len) { 161 + ret = scsi_bsg_map_user_buffer(req, ioucmd, issue_flags, gfp_mask); 162 + if (ret) 163 + goto out_free_req; 164 + pdu->bio = req->bio; 165 + } else { 166 + pdu->bio = NULL; 167 + } 168 + 169 + req->timeout = cmd->timeout_ms ? 170 + msecs_to_jiffies(cmd->timeout_ms) : BLK_DEFAULT_SG_TIMEOUT; 171 + 172 + req->end_io = scsi_bsg_uring_cmd_done; 173 + req->end_io_data = ioucmd; 174 + pdu->req = req; 175 + 176 + blk_execute_rq_nowait(req, false); 177 + return -EIOCBQUEUED; 178 + 179 + out_free_req: 180 + blk_mq_free_request(req); 181 + return ret; 182 + } 12 183 13 184 static int scsi_bsg_sg_io_fn(struct request_queue *q, struct sg_io_v4 *hdr, 14 185 bool open_for_write, unsigned int timeout) ··· 272 99 struct bsg_device *scsi_bsg_register_queue(struct scsi_device *sdev) 273 100 { 274 101 return bsg_register_queue(sdev->request_queue, &sdev->sdev_gendev, 275 - dev_name(&sdev->sdev_gendev), scsi_bsg_sg_io_fn); 102 + dev_name(&sdev->sdev_gendev), scsi_bsg_sg_io_fn, 103 + scsi_bsg_uring_cmd); 276 104 }
+1 -1
drivers/target/target_core_file.c
··· 173 173 */ 174 174 dev->dev_attrib.max_write_same_len = 0xFFFF; 175 175 176 - if (bdev_nonrot(bdev)) 176 + if (!bdev_rot(bdev)) 177 177 dev->dev_attrib.is_nonrot = 1; 178 178 } else { 179 179 if (!(fd_dev->fbd_flags & FBDF_HAS_SIZE)) {
+1 -1
drivers/target/target_core_iblock.c
··· 148 148 else 149 149 dev->dev_attrib.max_write_same_len = 0xFFFF; 150 150 151 - if (bdev_nonrot(bd)) 151 + if (!bdev_rot(bd)) 152 152 dev->dev_attrib.is_nonrot = 1; 153 153 154 154 target_configure_write_atomic_from_bdev(&dev->dev_attrib, bd);
+2 -2
fs/btrfs/volumes.c
··· 694 694 set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); 695 695 } 696 696 697 - if (!bdev_nonrot(file_bdev(bdev_file))) 697 + if (bdev_rot(file_bdev(bdev_file))) 698 698 fs_devices->rotating = true; 699 699 700 700 if (bdev_max_discard_sectors(file_bdev(bdev_file))) ··· 2919 2919 2920 2920 atomic64_add(device->total_bytes, &fs_info->free_chunk_space); 2921 2921 2922 - if (!bdev_nonrot(device->bdev)) 2922 + if (bdev_rot(device->bdev)) 2923 2923 fs_devices->rotating = true; 2924 2924 2925 2925 orig_super_total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
+1 -1
fs/ext4/mballoc-test.c
··· 73 73 ext4_fsblk_t block; 74 74 int ret; 75 75 76 - /* needed by ext4_mb_init->bdev_nonrot(sb->s_bdev) */ 76 + /* needed by ext4_mb_init->bdev_rot(sb->s_bdev) */ 77 77 sb->s_bdev = kzalloc_obj(*sb->s_bdev); 78 78 if (sb->s_bdev == NULL) 79 79 return -ENOMEM;
+1 -1
fs/ext4/mballoc.c
··· 3840 3840 spin_lock_init(&lg->lg_prealloc_lock); 3841 3841 } 3842 3842 3843 - if (bdev_nonrot(sb->s_bdev)) 3843 + if (!bdev_rot(sb->s_bdev)) 3844 3844 sbi->s_mb_max_linear_groups = 0; 3845 3845 else 3846 3846 sbi->s_mb_max_linear_groups = MB_DEFAULT_LINEAR_LIMIT;
+15 -23
fs/xfs/xfs_zone_gc.c
··· 670 670 struct xfs_inode *ip; 671 671 struct bio *bio; 672 672 xfs_daddr_t daddr; 673 - unsigned int len; 674 673 bool is_seq; 675 674 676 675 if (xfs_is_shutdown(mp)) ··· 684 685 return false; 685 686 } 686 687 687 - len = XFS_FSB_TO_B(mp, irec.rm_blockcount); 688 - bio = bio_alloc_bioset(bdev, 689 - min(howmany(len, XFS_GC_BUF_SIZE) + 1, XFS_GC_NR_BUFS), 690 - REQ_OP_READ, GFP_NOFS, &data->bio_set); 691 - 688 + /* 689 + * Scratch allocation can wrap around to the same buffer again, 690 + * provision an extra bvec for that case. 691 + */ 692 + bio = bio_alloc_bioset(bdev, XFS_GC_NR_BUFS + 1, REQ_OP_READ, GFP_NOFS, 693 + &data->bio_set); 692 694 chunk = container_of(bio, struct xfs_gc_bio, bio); 693 695 chunk->ip = ip; 694 696 chunk->offset = XFS_FSB_TO_B(mp, irec.rm_offset); 695 - chunk->len = len; 697 + chunk->len = XFS_FSB_TO_B(mp, irec.rm_blockcount); 696 698 chunk->old_startblock = 697 699 xfs_rgbno_to_rtb(iter->victim_rtg, irec.rm_startblock); 698 700 chunk->new_daddr = daddr; ··· 707 707 bio->bi_iter.bi_sector = xfs_rtb_to_daddr(mp, chunk->old_startblock); 708 708 bio->bi_end_io = xfs_zone_gc_end_io; 709 709 xfs_zone_gc_add_data(chunk); 710 - data->scratch_head = (data->scratch_head + len) % data->scratch_size; 711 - data->scratch_available -= len; 710 + data->scratch_head = 711 + (data->scratch_head + chunk->len) % data->scratch_size; 712 + data->scratch_available -= chunk->len; 712 713 713 714 XFS_STATS_INC(mp, xs_gc_read_calls); 714 715 ··· 900 899 901 900 static void 902 901 xfs_submit_zone_reset_bio( 903 - struct xfs_rtgroup *rtg, 904 - struct bio *bio) 902 + struct bio *bio, 903 + void *priv) 905 904 { 905 + struct xfs_rtgroup *rtg = priv; 906 906 struct xfs_mount *mp = rtg_mount(rtg); 907 907 908 908 trace_xfs_zone_reset(rtg); ··· 935 933 submit_bio(bio); 936 934 } 937 935 938 - static void xfs_bio_wait_endio(struct bio *bio) 939 - { 940 - complete(bio->bi_private); 941 - } 942 - 943 936 int 944 937 xfs_zone_gc_reset_sync( 945 938 struct xfs_rtgroup *rtg) 946 939 { 947 - DECLARE_COMPLETION_ONSTACK(done); 948 940 struct bio bio; 949 941 int error; 950 942 951 943 bio_init(&bio, rtg_mount(rtg)->m_rtdev_targp->bt_bdev, NULL, 0, 952 944 REQ_OP_ZONE_RESET | REQ_SYNC); 953 - bio.bi_private = &done; 954 - bio.bi_end_io = xfs_bio_wait_endio; 955 - xfs_submit_zone_reset_bio(rtg, &bio); 956 - wait_for_completion_io(&done); 957 - 945 + bio_await(&bio, rtg, xfs_submit_zone_reset_bio); 958 946 error = blk_status_to_errno(bio.bi_status); 959 947 bio_uninit(&bio); 960 948 return error; ··· 981 989 chunk->data = data; 982 990 WRITE_ONCE(chunk->state, XFS_GC_BIO_NEW); 983 991 list_add_tail(&chunk->entry, &data->resetting); 984 - xfs_submit_zone_reset_bio(rtg, bio); 992 + xfs_submit_zone_reset_bio(bio, rtg); 985 993 } while (next); 986 994 } 987 995
-20
include/crypto/hkdf.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - /* 3 - * HKDF: HMAC-based Key Derivation Function (HKDF), RFC 5869 4 - * 5 - * Extracted from fs/crypto/hkdf.c, which has 6 - * Copyright 2019 Google LLC 7 - */ 8 - 9 - #ifndef _CRYPTO_HKDF_H 10 - #define _CRYPTO_HKDF_H 11 - 12 - #include <crypto/hash.h> 13 - 14 - int hkdf_extract(struct crypto_shash *hmac_tfm, const u8 *ikm, 15 - unsigned int ikmlen, const u8 *salt, unsigned int saltlen, 16 - u8 *prk); 17 - int hkdf_expand(struct crypto_shash *hmac_tfm, 18 - const u8 *info, unsigned int infolen, 19 - u8 *okm, unsigned int okmlen); 20 - #endif
+3 -2
include/linux/bio.h
··· 350 350 extern int biovec_init_pool(mempool_t *pool, int pool_entries); 351 351 352 352 struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, 353 - blk_opf_t opf, gfp_t gfp_mask, 354 - struct bio_set *bs); 353 + blk_opf_t opf, gfp_t gfp, struct bio_set *bs); 355 354 struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask); 356 355 extern void bio_put(struct bio *); 357 356 ··· 432 433 void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf); 433 434 void bio_reuse(struct bio *bio, blk_opf_t opf); 434 435 void bio_chain(struct bio *, struct bio *); 436 + void bio_await(struct bio *bio, void *priv, 437 + void (*submit)(struct bio *bio, void *priv)); 435 438 436 439 int __must_check bio_add_page(struct bio *bio, struct page *page, unsigned len, 437 440 unsigned off);
+1
include/linux/blk-integrity.h
··· 14 14 BLK_INTEGRITY_DEVICE_CAPABLE = 1 << 2, 15 15 BLK_INTEGRITY_REF_TAG = 1 << 3, 16 16 BLK_INTEGRITY_STACKED = 1 << 4, 17 + BLK_SPLIT_INTERVAL_CAPABLE = 1 << 5, 17 18 }; 18 19 19 20 const char *blk_integrity_profile_name(struct blk_integrity *bi);
+10 -7
include/linux/blkdev.h
··· 13 13 #include <linux/minmax.h> 14 14 #include <linux/timer.h> 15 15 #include <linux/workqueue.h> 16 + #include <linux/completion.h> 16 17 #include <linux/wait.h> 17 18 #include <linux/bio.h> 18 19 #include <linux/gfp.h> ··· 202 201 u8 __rcu *zones_cond; 203 202 unsigned int zone_wplugs_hash_bits; 204 203 atomic_t nr_zone_wplugs; 205 - spinlock_t zone_wplugs_lock; 204 + spinlock_t zone_wplugs_hash_lock; 206 205 struct mempool *zone_wplugs_pool; 207 206 struct hlist_head *zone_wplugs_hash; 208 207 struct workqueue_struct *zone_wplugs_wq; 208 + spinlock_t zone_wplugs_list_lock; 209 + struct list_head zone_wplugs_list; 210 + struct task_struct *zone_wplugs_worker; 211 + struct completion zone_wplugs_worker_bio_done; 209 212 #endif /* CONFIG_BLK_DEV_ZONED */ 210 213 211 214 #if IS_ENABLED(CONFIG_CDROM) ··· 508 503 509 504 /* hw dispatch queues */ 510 505 unsigned int nr_hw_queues; 511 - struct blk_mq_hw_ctx * __rcu *queue_hw_ctx; 506 + struct blk_mq_hw_ctx * __rcu *queue_hw_ctx __counted_by_ptr(nr_hw_queues); 512 507 513 508 struct percpu_ref q_usage_counter; 514 509 struct lock_class_key io_lock_cls_key; ··· 674 669 QUEUE_FLAG_NO_ELV_SWITCH, /* can't switch elevator any more */ 675 670 QUEUE_FLAG_QOS_ENABLED, /* qos is enabled */ 676 671 QUEUE_FLAG_BIO_ISSUE_TIME, /* record bio->issue_time_ns */ 672 + QUEUE_FLAG_ZONED_QD1_WRITES, /* Limit zoned devices writes to QD=1 */ 677 673 QUEUE_FLAG_MAX 678 674 }; 679 675 ··· 714 708 test_bit(QUEUE_FLAG_DISABLE_WBT_DEF, &(q)->queue_flags) 715 709 #define blk_queue_no_elv_switch(q) \ 716 710 test_bit(QUEUE_FLAG_NO_ELV_SWITCH, &(q)->queue_flags) 711 + #define blk_queue_zoned_qd1_writes(q) \ 712 + test_bit(QUEUE_FLAG_ZONED_QD1_WRITES, &(q)->queue_flags) 717 713 718 714 extern void blk_set_pm_only(struct request_queue *q); 719 715 extern void blk_clear_pm_only(struct request_queue *q); ··· 1474 1466 static inline bool bdev_rot(struct block_device *bdev) 1475 1467 { 1476 1468 return blk_queue_rot(bdev_get_queue(bdev)); 1477 - } 1478 - 1479 - static inline bool bdev_nonrot(struct block_device *bdev) 1480 - { 1481 - return !bdev_rot(bdev); 1482 1469 } 1483 1470 1484 1471 static inline bool bdev_synchronous(struct block_device *bdev)
+5 -1
include/linux/bsg.h
··· 7 7 struct bsg_device; 8 8 struct device; 9 9 struct request_queue; 10 + struct io_uring_cmd; 10 11 11 12 typedef int (bsg_sg_io_fn)(struct request_queue *, struct sg_io_v4 *hdr, 12 13 bool open_for_write, unsigned int timeout); 13 14 15 + typedef int (bsg_uring_cmd_fn)(struct request_queue *q, struct io_uring_cmd *ioucmd, 16 + unsigned int issue_flags, bool open_for_write); 17 + 14 18 struct bsg_device *bsg_register_queue(struct request_queue *q, 15 19 struct device *parent, const char *name, 16 - bsg_sg_io_fn *sg_io_fn); 20 + bsg_sg_io_fn *sg_io_fn, bsg_uring_cmd_fn *uring_cmd_fn); 17 21 void bsg_unregister_queue(struct bsg_device *bcd); 18 22 19 23 #endif /* _LINUX_BSG_H */
-9
include/linux/bvec.h
··· 203 203 ((bvl = mp_bvec_iter_bvec((bio_vec), (iter))), 1); \ 204 204 bvec_iter_advance_single((bio_vec), &(iter), (bvl).bv_len)) 205 205 206 - /* for iterating one bio from start to end */ 207 - #define BVEC_ITER_ALL_INIT (struct bvec_iter) \ 208 - { \ 209 - .bi_sector = 0, \ 210 - .bi_size = UINT_MAX, \ 211 - .bi_idx = 0, \ 212 - .bi_bvec_done = 0, \ 213 - } 214 - 215 206 static inline struct bio_vec *bvec_init_iter_all(struct bvec_iter_all *iter_all) 216 207 { 217 208 iter_all->done = 0;
+102 -102
include/linux/drbd_genl.h
··· 87 87 */ 88 88 GENL_struct(DRBD_NLA_CFG_REPLY, 1, drbd_cfg_reply, 89 89 /* "arbitrary" size strings, nla_policy.len = 0 */ 90 - __str_field(1, DRBD_GENLA_F_MANDATORY, info_text, 0) 90 + __str_field(1, 0, info_text, 0) 91 91 ) 92 92 93 93 /* Configuration requests typically need a context to operate on. ··· 96 96 * and/or the replication group (aka resource) name, 97 97 * and the volume id within the resource. */ 98 98 GENL_struct(DRBD_NLA_CFG_CONTEXT, 2, drbd_cfg_context, 99 - __u32_field(1, DRBD_GENLA_F_MANDATORY, ctx_volume) 100 - __str_field(2, DRBD_GENLA_F_MANDATORY, ctx_resource_name, 128) 101 - __bin_field(3, DRBD_GENLA_F_MANDATORY, ctx_my_addr, 128) 102 - __bin_field(4, DRBD_GENLA_F_MANDATORY, ctx_peer_addr, 128) 99 + __u32_field(1, 0, ctx_volume) 100 + __str_field(2, 0, ctx_resource_name, 128) 101 + __bin_field(3, 0, ctx_my_addr, 128) 102 + __bin_field(4, 0, ctx_peer_addr, 128) 103 103 ) 104 104 105 105 GENL_struct(DRBD_NLA_DISK_CONF, 3, disk_conf, ··· 108 108 __s32_field(3, DRBD_F_REQUIRED | DRBD_F_INVARIANT, meta_dev_idx) 109 109 110 110 /* use the resize command to try and change the disk_size */ 111 - __u64_field(4, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, disk_size) 111 + __u64_field(4, DRBD_F_INVARIANT, disk_size) 112 112 /* we could change the max_bio_bvecs, 113 113 * but it won't propagate through the stack */ 114 - __u32_field(5, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, max_bio_bvecs) 114 + __u32_field(5, DRBD_F_INVARIANT, max_bio_bvecs) 115 115 116 - __u32_field_def(6, DRBD_GENLA_F_MANDATORY, on_io_error, DRBD_ON_IO_ERROR_DEF) 117 - __u32_field_def(7, DRBD_GENLA_F_MANDATORY, fencing, DRBD_FENCING_DEF) 116 + __u32_field_def(6, 0, on_io_error, DRBD_ON_IO_ERROR_DEF) 117 + __u32_field_def(7, 0, fencing, DRBD_FENCING_DEF) 118 118 119 - __u32_field_def(8, DRBD_GENLA_F_MANDATORY, resync_rate, DRBD_RESYNC_RATE_DEF) 120 - __s32_field_def(9, DRBD_GENLA_F_MANDATORY, resync_after, DRBD_MINOR_NUMBER_DEF) 121 - __u32_field_def(10, DRBD_GENLA_F_MANDATORY, al_extents, DRBD_AL_EXTENTS_DEF) 122 - __u32_field_def(11, DRBD_GENLA_F_MANDATORY, c_plan_ahead, DRBD_C_PLAN_AHEAD_DEF) 123 - __u32_field_def(12, DRBD_GENLA_F_MANDATORY, c_delay_target, DRBD_C_DELAY_TARGET_DEF) 124 - __u32_field_def(13, DRBD_GENLA_F_MANDATORY, c_fill_target, DRBD_C_FILL_TARGET_DEF) 125 - __u32_field_def(14, DRBD_GENLA_F_MANDATORY, c_max_rate, DRBD_C_MAX_RATE_DEF) 126 - __u32_field_def(15, DRBD_GENLA_F_MANDATORY, c_min_rate, DRBD_C_MIN_RATE_DEF) 127 - __u32_field_def(20, DRBD_GENLA_F_MANDATORY, disk_timeout, DRBD_DISK_TIMEOUT_DEF) 119 + __u32_field_def(8, 0, resync_rate, DRBD_RESYNC_RATE_DEF) 120 + __s32_field_def(9, 0, resync_after, DRBD_MINOR_NUMBER_DEF) 121 + __u32_field_def(10, 0, al_extents, DRBD_AL_EXTENTS_DEF) 122 + __u32_field_def(11, 0, c_plan_ahead, DRBD_C_PLAN_AHEAD_DEF) 123 + __u32_field_def(12, 0, c_delay_target, DRBD_C_DELAY_TARGET_DEF) 124 + __u32_field_def(13, 0, c_fill_target, DRBD_C_FILL_TARGET_DEF) 125 + __u32_field_def(14, 0, c_max_rate, DRBD_C_MAX_RATE_DEF) 126 + __u32_field_def(15, 0, c_min_rate, DRBD_C_MIN_RATE_DEF) 127 + __u32_field_def(20, 0, disk_timeout, DRBD_DISK_TIMEOUT_DEF) 128 128 __u32_field_def(21, 0 /* OPTIONAL */, read_balancing, DRBD_READ_BALANCING_DEF) 129 129 __u32_field_def(25, 0 /* OPTIONAL */, rs_discard_granularity, DRBD_RS_DISCARD_GRANULARITY_DEF) 130 130 131 - __flg_field_def(16, DRBD_GENLA_F_MANDATORY, disk_barrier, DRBD_DISK_BARRIER_DEF) 132 - __flg_field_def(17, DRBD_GENLA_F_MANDATORY, disk_flushes, DRBD_DISK_FLUSHES_DEF) 133 - __flg_field_def(18, DRBD_GENLA_F_MANDATORY, disk_drain, DRBD_DISK_DRAIN_DEF) 134 - __flg_field_def(19, DRBD_GENLA_F_MANDATORY, md_flushes, DRBD_MD_FLUSHES_DEF) 131 + __flg_field_def(16, 0, disk_barrier, DRBD_DISK_BARRIER_DEF) 132 + __flg_field_def(17, 0, disk_flushes, DRBD_DISK_FLUSHES_DEF) 133 + __flg_field_def(18, 0, disk_drain, DRBD_DISK_DRAIN_DEF) 134 + __flg_field_def(19, 0, md_flushes, DRBD_MD_FLUSHES_DEF) 135 135 __flg_field_def(23, 0 /* OPTIONAL */, al_updates, DRBD_AL_UPDATES_DEF) 136 136 __flg_field_def(24, 0 /* OPTIONAL */, discard_zeroes_if_aligned, DRBD_DISCARD_ZEROES_IF_ALIGNED_DEF) 137 137 __flg_field_def(26, 0 /* OPTIONAL */, disable_write_same, DRBD_DISABLE_WRITE_SAME_DEF) 138 138 ) 139 139 140 140 GENL_struct(DRBD_NLA_RESOURCE_OPTS, 4, res_opts, 141 - __str_field_def(1, DRBD_GENLA_F_MANDATORY, cpu_mask, DRBD_CPU_MASK_SIZE) 142 - __u32_field_def(2, DRBD_GENLA_F_MANDATORY, on_no_data, DRBD_ON_NO_DATA_DEF) 141 + __str_field_def(1, 0, cpu_mask, DRBD_CPU_MASK_SIZE) 142 + __u32_field_def(2, 0, on_no_data, DRBD_ON_NO_DATA_DEF) 143 143 ) 144 144 145 145 GENL_struct(DRBD_NLA_NET_CONF, 5, net_conf, 146 - __str_field_def(1, DRBD_GENLA_F_MANDATORY | DRBD_F_SENSITIVE, 146 + __str_field_def(1, DRBD_F_SENSITIVE, 147 147 shared_secret, SHARED_SECRET_MAX) 148 - __str_field_def(2, DRBD_GENLA_F_MANDATORY, cram_hmac_alg, SHARED_SECRET_MAX) 149 - __str_field_def(3, DRBD_GENLA_F_MANDATORY, integrity_alg, SHARED_SECRET_MAX) 150 - __str_field_def(4, DRBD_GENLA_F_MANDATORY, verify_alg, SHARED_SECRET_MAX) 151 - __str_field_def(5, DRBD_GENLA_F_MANDATORY, csums_alg, SHARED_SECRET_MAX) 152 - __u32_field_def(6, DRBD_GENLA_F_MANDATORY, wire_protocol, DRBD_PROTOCOL_DEF) 153 - __u32_field_def(7, DRBD_GENLA_F_MANDATORY, connect_int, DRBD_CONNECT_INT_DEF) 154 - __u32_field_def(8, DRBD_GENLA_F_MANDATORY, timeout, DRBD_TIMEOUT_DEF) 155 - __u32_field_def(9, DRBD_GENLA_F_MANDATORY, ping_int, DRBD_PING_INT_DEF) 156 - __u32_field_def(10, DRBD_GENLA_F_MANDATORY, ping_timeo, DRBD_PING_TIMEO_DEF) 157 - __u32_field_def(11, DRBD_GENLA_F_MANDATORY, sndbuf_size, DRBD_SNDBUF_SIZE_DEF) 158 - __u32_field_def(12, DRBD_GENLA_F_MANDATORY, rcvbuf_size, DRBD_RCVBUF_SIZE_DEF) 159 - __u32_field_def(13, DRBD_GENLA_F_MANDATORY, ko_count, DRBD_KO_COUNT_DEF) 160 - __u32_field_def(14, DRBD_GENLA_F_MANDATORY, max_buffers, DRBD_MAX_BUFFERS_DEF) 161 - __u32_field_def(15, DRBD_GENLA_F_MANDATORY, max_epoch_size, DRBD_MAX_EPOCH_SIZE_DEF) 162 - __u32_field_def(16, DRBD_GENLA_F_MANDATORY, unplug_watermark, DRBD_UNPLUG_WATERMARK_DEF) 163 - __u32_field_def(17, DRBD_GENLA_F_MANDATORY, after_sb_0p, DRBD_AFTER_SB_0P_DEF) 164 - __u32_field_def(18, DRBD_GENLA_F_MANDATORY, after_sb_1p, DRBD_AFTER_SB_1P_DEF) 165 - __u32_field_def(19, DRBD_GENLA_F_MANDATORY, after_sb_2p, DRBD_AFTER_SB_2P_DEF) 166 - __u32_field_def(20, DRBD_GENLA_F_MANDATORY, rr_conflict, DRBD_RR_CONFLICT_DEF) 167 - __u32_field_def(21, DRBD_GENLA_F_MANDATORY, on_congestion, DRBD_ON_CONGESTION_DEF) 168 - __u32_field_def(22, DRBD_GENLA_F_MANDATORY, cong_fill, DRBD_CONG_FILL_DEF) 169 - __u32_field_def(23, DRBD_GENLA_F_MANDATORY, cong_extents, DRBD_CONG_EXTENTS_DEF) 170 - __flg_field_def(24, DRBD_GENLA_F_MANDATORY, two_primaries, DRBD_ALLOW_TWO_PRIMARIES_DEF) 171 - __flg_field(25, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, discard_my_data) 172 - __flg_field_def(26, DRBD_GENLA_F_MANDATORY, tcp_cork, DRBD_TCP_CORK_DEF) 173 - __flg_field_def(27, DRBD_GENLA_F_MANDATORY, always_asbp, DRBD_ALWAYS_ASBP_DEF) 174 - __flg_field(28, DRBD_GENLA_F_MANDATORY | DRBD_F_INVARIANT, tentative) 175 - __flg_field_def(29, DRBD_GENLA_F_MANDATORY, use_rle, DRBD_USE_RLE_DEF) 176 - /* 9: __u32_field_def(30, DRBD_GENLA_F_MANDATORY, fencing_policy, DRBD_FENCING_DEF) */ 177 - /* 9: __str_field_def(31, DRBD_GENLA_F_MANDATORY, name, SHARED_SECRET_MAX) */ 148 + __str_field_def(2, 0, cram_hmac_alg, SHARED_SECRET_MAX) 149 + __str_field_def(3, 0, integrity_alg, SHARED_SECRET_MAX) 150 + __str_field_def(4, 0, verify_alg, SHARED_SECRET_MAX) 151 + __str_field_def(5, 0, csums_alg, SHARED_SECRET_MAX) 152 + __u32_field_def(6, 0, wire_protocol, DRBD_PROTOCOL_DEF) 153 + __u32_field_def(7, 0, connect_int, DRBD_CONNECT_INT_DEF) 154 + __u32_field_def(8, 0, timeout, DRBD_TIMEOUT_DEF) 155 + __u32_field_def(9, 0, ping_int, DRBD_PING_INT_DEF) 156 + __u32_field_def(10, 0, ping_timeo, DRBD_PING_TIMEO_DEF) 157 + __u32_field_def(11, 0, sndbuf_size, DRBD_SNDBUF_SIZE_DEF) 158 + __u32_field_def(12, 0, rcvbuf_size, DRBD_RCVBUF_SIZE_DEF) 159 + __u32_field_def(13, 0, ko_count, DRBD_KO_COUNT_DEF) 160 + __u32_field_def(14, 0, max_buffers, DRBD_MAX_BUFFERS_DEF) 161 + __u32_field_def(15, 0, max_epoch_size, DRBD_MAX_EPOCH_SIZE_DEF) 162 + __u32_field_def(16, 0, unplug_watermark, DRBD_UNPLUG_WATERMARK_DEF) 163 + __u32_field_def(17, 0, after_sb_0p, DRBD_AFTER_SB_0P_DEF) 164 + __u32_field_def(18, 0, after_sb_1p, DRBD_AFTER_SB_1P_DEF) 165 + __u32_field_def(19, 0, after_sb_2p, DRBD_AFTER_SB_2P_DEF) 166 + __u32_field_def(20, 0, rr_conflict, DRBD_RR_CONFLICT_DEF) 167 + __u32_field_def(21, 0, on_congestion, DRBD_ON_CONGESTION_DEF) 168 + __u32_field_def(22, 0, cong_fill, DRBD_CONG_FILL_DEF) 169 + __u32_field_def(23, 0, cong_extents, DRBD_CONG_EXTENTS_DEF) 170 + __flg_field_def(24, 0, two_primaries, DRBD_ALLOW_TWO_PRIMARIES_DEF) 171 + __flg_field(25, DRBD_F_INVARIANT, discard_my_data) 172 + __flg_field_def(26, 0, tcp_cork, DRBD_TCP_CORK_DEF) 173 + __flg_field_def(27, 0, always_asbp, DRBD_ALWAYS_ASBP_DEF) 174 + __flg_field(28, DRBD_F_INVARIANT, tentative) 175 + __flg_field_def(29, 0, use_rle, DRBD_USE_RLE_DEF) 176 + /* 9: __u32_field_def(30, 0, fencing_policy, DRBD_FENCING_DEF) */ 177 + /* 9: __str_field_def(31, 0, name, SHARED_SECRET_MAX) */ 178 178 /* 9: __u32_field(32, DRBD_F_REQUIRED | DRBD_F_INVARIANT, peer_node_id) */ 179 179 __flg_field_def(33, 0 /* OPTIONAL */, csums_after_crash_only, DRBD_CSUMS_AFTER_CRASH_ONLY_DEF) 180 180 __u32_field_def(34, 0 /* OPTIONAL */, sock_check_timeo, DRBD_SOCKET_CHECK_TIMEO_DEF) 181 181 ) 182 182 183 183 GENL_struct(DRBD_NLA_SET_ROLE_PARMS, 6, set_role_parms, 184 - __flg_field(1, DRBD_GENLA_F_MANDATORY, assume_uptodate) 184 + __flg_field(1, 0, assume_uptodate) 185 185 ) 186 186 187 187 GENL_struct(DRBD_NLA_RESIZE_PARMS, 7, resize_parms, 188 - __u64_field(1, DRBD_GENLA_F_MANDATORY, resize_size) 189 - __flg_field(2, DRBD_GENLA_F_MANDATORY, resize_force) 190 - __flg_field(3, DRBD_GENLA_F_MANDATORY, no_resync) 188 + __u64_field(1, 0, resize_size) 189 + __flg_field(2, 0, resize_force) 190 + __flg_field(3, 0, no_resync) 191 191 __u32_field_def(4, 0 /* OPTIONAL */, al_stripes, DRBD_AL_STRIPES_DEF) 192 192 __u32_field_def(5, 0 /* OPTIONAL */, al_stripe_size, DRBD_AL_STRIPE_SIZE_DEF) 193 193 ) ··· 195 195 GENL_struct(DRBD_NLA_STATE_INFO, 8, state_info, 196 196 /* the reason of the broadcast, 197 197 * if this is an event triggered broadcast. */ 198 - __u32_field(1, DRBD_GENLA_F_MANDATORY, sib_reason) 198 + __u32_field(1, 0, sib_reason) 199 199 __u32_field(2, DRBD_F_REQUIRED, current_state) 200 - __u64_field(3, DRBD_GENLA_F_MANDATORY, capacity) 201 - __u64_field(4, DRBD_GENLA_F_MANDATORY, ed_uuid) 200 + __u64_field(3, 0, capacity) 201 + __u64_field(4, 0, ed_uuid) 202 202 203 203 /* These are for broadcast from after state change work. 204 204 * prev_state and new_state are from the moment the state change took 205 205 * place, new_state is not neccessarily the same as current_state, 206 206 * there may have been more state changes since. Which will be 207 207 * broadcasted soon, in their respective after state change work. */ 208 - __u32_field(5, DRBD_GENLA_F_MANDATORY, prev_state) 209 - __u32_field(6, DRBD_GENLA_F_MANDATORY, new_state) 208 + __u32_field(5, 0, prev_state) 209 + __u32_field(6, 0, new_state) 210 210 211 211 /* if we have a local disk: */ 212 - __bin_field(7, DRBD_GENLA_F_MANDATORY, uuids, (UI_SIZE*sizeof(__u64))) 213 - __u32_field(8, DRBD_GENLA_F_MANDATORY, disk_flags) 214 - __u64_field(9, DRBD_GENLA_F_MANDATORY, bits_total) 215 - __u64_field(10, DRBD_GENLA_F_MANDATORY, bits_oos) 212 + __bin_field(7, 0, uuids, (UI_SIZE*sizeof(__u64))) 213 + __u32_field(8, 0, disk_flags) 214 + __u64_field(9, 0, bits_total) 215 + __u64_field(10, 0, bits_oos) 216 216 /* and in case resync or online verify is active */ 217 - __u64_field(11, DRBD_GENLA_F_MANDATORY, bits_rs_total) 218 - __u64_field(12, DRBD_GENLA_F_MANDATORY, bits_rs_failed) 217 + __u64_field(11, 0, bits_rs_total) 218 + __u64_field(12, 0, bits_rs_failed) 219 219 220 220 /* for pre and post notifications of helper execution */ 221 - __str_field(13, DRBD_GENLA_F_MANDATORY, helper, 32) 222 - __u32_field(14, DRBD_GENLA_F_MANDATORY, helper_exit_code) 221 + __str_field(13, 0, helper, 32) 222 + __u32_field(14, 0, helper_exit_code) 223 223 224 224 __u64_field(15, 0, send_cnt) 225 225 __u64_field(16, 0, recv_cnt) ··· 233 233 ) 234 234 235 235 GENL_struct(DRBD_NLA_START_OV_PARMS, 9, start_ov_parms, 236 - __u64_field(1, DRBD_GENLA_F_MANDATORY, ov_start_sector) 237 - __u64_field(2, DRBD_GENLA_F_MANDATORY, ov_stop_sector) 236 + __u64_field(1, 0, ov_start_sector) 237 + __u64_field(2, 0, ov_stop_sector) 238 238 ) 239 239 240 240 GENL_struct(DRBD_NLA_NEW_C_UUID_PARMS, 10, new_c_uuid_parms, 241 - __flg_field(1, DRBD_GENLA_F_MANDATORY, clear_bm) 241 + __flg_field(1, 0, clear_bm) 242 242 ) 243 243 244 244 GENL_struct(DRBD_NLA_TIMEOUT_PARMS, 11, timeout_parms, ··· 246 246 ) 247 247 248 248 GENL_struct(DRBD_NLA_DISCONNECT_PARMS, 12, disconnect_parms, 249 - __flg_field(1, DRBD_GENLA_F_MANDATORY, force_disconnect) 249 + __flg_field(1, 0, force_disconnect) 250 250 ) 251 251 252 252 GENL_struct(DRBD_NLA_DETACH_PARMS, 13, detach_parms, 253 - __flg_field(1, DRBD_GENLA_F_MANDATORY, force_detach) 253 + __flg_field(1, 0, force_detach) 254 254 ) 255 255 256 256 GENL_struct(DRBD_NLA_RESOURCE_INFO, 15, resource_info, ··· 315 315 ) 316 316 317 317 GENL_struct(DRBD_NLA_NOTIFICATION_HEADER, 23, drbd_notification_header, 318 - __u32_field(1, DRBD_GENLA_F_MANDATORY, nh_type) 318 + __u32_field(1, 0, nh_type) 319 319 ) 320 320 321 321 GENL_struct(DRBD_NLA_HELPER, 24, drbd_helper_info, 322 - __str_field(1, DRBD_GENLA_F_MANDATORY, helper_name, 32) 323 - __u32_field(2, DRBD_GENLA_F_MANDATORY, helper_status) 322 + __str_field(1, 0, helper_name, 32) 323 + __u32_field(2, 0, helper_status) 324 324 ) 325 325 326 326 /* ··· 333 333 DRBD_EVENT, 1, events, 334 334 GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) 335 335 GENL_tla_expected(DRBD_NLA_STATE_INFO, DRBD_F_REQUIRED) 336 - GENL_tla_expected(DRBD_NLA_NET_CONF, DRBD_GENLA_F_MANDATORY) 337 - GENL_tla_expected(DRBD_NLA_DISK_CONF, DRBD_GENLA_F_MANDATORY) 338 - GENL_tla_expected(DRBD_NLA_SYNCER_CONF, DRBD_GENLA_F_MANDATORY) 336 + GENL_tla_expected(DRBD_NLA_NET_CONF, 0) 337 + GENL_tla_expected(DRBD_NLA_DISK_CONF, 0) 338 + GENL_tla_expected(DRBD_NLA_SYNCER_CONF, 0) 339 339 ) 340 340 341 341 /* query kernel for specific or all info */ ··· 349 349 ), 350 350 /* To select the object .doit. 351 351 * Or a subset of objects in .dumpit. */ 352 - GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY) 352 + GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, 0) 353 353 ) 354 354 355 355 /* add DRBD minor devices as volumes to resources */ ··· 367 367 GENL_op(DRBD_ADM_RESOURCE_OPTS, 9, 368 368 GENL_doit(drbd_adm_resource_opts), 369 369 GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) 370 - GENL_tla_expected(DRBD_NLA_RESOURCE_OPTS, DRBD_GENLA_F_MANDATORY) 370 + GENL_tla_expected(DRBD_NLA_RESOURCE_OPTS, 0) 371 371 ) 372 372 373 373 GENL_op( ··· 403 403 DRBD_ADM_RESIZE, 13, 404 404 GENL_doit(drbd_adm_resize), 405 405 GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) 406 - GENL_tla_expected(DRBD_NLA_RESIZE_PARMS, DRBD_GENLA_F_MANDATORY) 406 + GENL_tla_expected(DRBD_NLA_RESIZE_PARMS, 0) 407 407 ) 408 408 409 409 GENL_op( ··· 424 424 DRBD_ADM_NEW_C_UUID, 16, 425 425 GENL_doit(drbd_adm_new_c_uuid), 426 426 GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) 427 - GENL_tla_expected(DRBD_NLA_NEW_C_UUID_PARMS, DRBD_GENLA_F_MANDATORY) 427 + GENL_tla_expected(DRBD_NLA_NEW_C_UUID_PARMS, 0) 428 428 ) 429 429 430 430 GENL_op( 431 431 DRBD_ADM_START_OV, 17, 432 432 GENL_doit(drbd_adm_start_ov), 433 - GENL_tla_expected(DRBD_NLA_START_OV_PARMS, DRBD_GENLA_F_MANDATORY) 433 + GENL_tla_expected(DRBD_NLA_START_OV_PARMS, 0) 434 434 ) 435 435 436 436 GENL_op(DRBD_ADM_DETACH, 18, GENL_doit(drbd_adm_detach), 437 437 GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED) 438 - GENL_tla_expected(DRBD_NLA_DETACH_PARMS, DRBD_GENLA_F_MANDATORY)) 438 + GENL_tla_expected(DRBD_NLA_DETACH_PARMS, 0)) 439 439 440 440 GENL_op(DRBD_ADM_INVALIDATE, 19, GENL_doit(drbd_adm_invalidate), 441 441 GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)) ··· 460 460 GENL_op_init( 461 461 .dumpit = drbd_adm_dump_resources, 462 462 ), 463 - GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY) 464 - GENL_tla_expected(DRBD_NLA_RESOURCE_INFO, DRBD_GENLA_F_MANDATORY) 465 - GENL_tla_expected(DRBD_NLA_RESOURCE_STATISTICS, DRBD_GENLA_F_MANDATORY)) 463 + GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, 0) 464 + GENL_tla_expected(DRBD_NLA_RESOURCE_INFO, 0) 465 + GENL_tla_expected(DRBD_NLA_RESOURCE_STATISTICS, 0)) 466 466 467 467 GENL_op(DRBD_ADM_GET_DEVICES, 31, 468 468 GENL_op_init( 469 469 .dumpit = drbd_adm_dump_devices, 470 470 .done = drbd_adm_dump_devices_done, 471 471 ), 472 - GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY) 473 - GENL_tla_expected(DRBD_NLA_DEVICE_INFO, DRBD_GENLA_F_MANDATORY) 474 - GENL_tla_expected(DRBD_NLA_DEVICE_STATISTICS, DRBD_GENLA_F_MANDATORY)) 472 + GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, 0) 473 + GENL_tla_expected(DRBD_NLA_DEVICE_INFO, 0) 474 + GENL_tla_expected(DRBD_NLA_DEVICE_STATISTICS, 0)) 475 475 476 476 GENL_op(DRBD_ADM_GET_CONNECTIONS, 32, 477 477 GENL_op_init( 478 478 .dumpit = drbd_adm_dump_connections, 479 479 .done = drbd_adm_dump_connections_done, 480 480 ), 481 - GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY) 482 - GENL_tla_expected(DRBD_NLA_CONNECTION_INFO, DRBD_GENLA_F_MANDATORY) 483 - GENL_tla_expected(DRBD_NLA_CONNECTION_STATISTICS, DRBD_GENLA_F_MANDATORY)) 481 + GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, 0) 482 + GENL_tla_expected(DRBD_NLA_CONNECTION_INFO, 0) 483 + GENL_tla_expected(DRBD_NLA_CONNECTION_STATISTICS, 0)) 484 484 485 485 GENL_op(DRBD_ADM_GET_PEER_DEVICES, 33, 486 486 GENL_op_init( 487 487 .dumpit = drbd_adm_dump_peer_devices, 488 488 .done = drbd_adm_dump_peer_devices_done, 489 489 ), 490 - GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY) 491 - GENL_tla_expected(DRBD_NLA_PEER_DEVICE_INFO, DRBD_GENLA_F_MANDATORY) 492 - GENL_tla_expected(DRBD_NLA_PEER_DEVICE_STATISTICS, DRBD_GENLA_F_MANDATORY)) 490 + GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, 0) 491 + GENL_tla_expected(DRBD_NLA_PEER_DEVICE_INFO, 0) 492 + GENL_tla_expected(DRBD_NLA_PEER_DEVICE_STATISTICS, 0)) 493 493 494 494 GENL_notification( 495 495 DRBD_RESOURCE_STATE, 34, events, ··· 524 524 GENL_op_init( 525 525 .dumpit = drbd_adm_get_initial_state, 526 526 ), 527 - GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)) 527 + GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, 0)) 528 528 529 529 GENL_notification( 530 530 DRBD_HELPER, 40, events,
+6 -1
include/linux/genl_magic_func.h
··· 149 149 if (!tla) \ 150 150 return -ENOMSG; \ 151 151 DPRINT_TLA(#s_name, "<=-", #tag_name); \ 152 - err = drbd_nla_parse_nested(ntb, maxtype, tla, s_name ## _nl_policy); \ 152 + err = nla_parse_nested_deprecated(ntb, maxtype, tla, \ 153 + s_name ## _nl_policy, NULL); \ 153 154 if (err) \ 154 155 return err; \ 155 156 \ ··· 293 292 #endif 294 293 .maxattr = ARRAY_SIZE(CONCATENATE(GENL_MAGIC_FAMILY, _tla_nl_policy))-1, 295 294 .policy = CONCATENATE(GENL_MAGIC_FAMILY, _tla_nl_policy), 295 + #ifdef GENL_MAGIC_FAMILY_PRE_DOIT 296 + .pre_doit = GENL_MAGIC_FAMILY_PRE_DOIT, 297 + .post_doit = GENL_MAGIC_FAMILY_POST_DOIT, 298 + #endif 296 299 .ops = ZZZ_genl_ops, 297 300 .n_ops = ARRAY_SIZE(ZZZ_genl_ops), 298 301 .mcgrps = ZZZ_genl_mcgrps,
+2 -13
include/linux/genl_magic_struct.h
··· 26 26 */ 27 27 28 28 /* 29 - * @DRBD_GENLA_F_MANDATORY: By default, netlink ignores attributes it does not 30 - * know about. This flag can be set in nlattr->nla_type to indicate that this 31 - * attribute must not be ignored. 32 - * 33 - * We check and remove this flag in drbd_nla_check_mandatory() before 34 - * validating the attribute types and lengths via nla_parse_nested(). 35 - */ 36 - #define DRBD_GENLA_F_MANDATORY (1 << 14) 37 - 38 - /* 39 29 * Flags specific to drbd and not visible at the netlink layer, used in 40 30 * <struct>_from_attrs and <struct>_to_skb: 41 31 * ··· 42 52 #define DRBD_F_SENSITIVE (1 << 1) 43 53 #define DRBD_F_INVARIANT (1 << 2) 44 54 45 - #define __nla_type(x) ((__u16)((x) & NLA_TYPE_MASK & ~DRBD_GENLA_F_MANDATORY)) 46 55 47 56 /* }}}1 48 57 * MAGIC ··· 147 158 #undef __field 148 159 #define __field(attr_nr, attr_flag, name, nla_type, type, \ 149 160 __get, __put, __is_signed) \ 150 - T_ ## name = (__u16)(attr_nr | ((attr_flag) & DRBD_GENLA_F_MANDATORY)), 161 + T_ ## name = (__u16)(attr_nr), 151 162 152 163 #undef __array 153 164 #define __array(attr_nr, attr_flag, name, nla_type, type, \ 154 165 maxlen, __get, __put, __is_signed) \ 155 - T_ ## name = (__u16)(attr_nr | ((attr_flag) & DRBD_GENLA_F_MANDATORY)), 166 + T_ ## name = (__u16)(attr_nr), 156 167 157 168 #include GENL_MAGIC_INCLUDE_FILE 158 169
+27 -14
include/linux/nvme-auth.h
··· 7 7 #define _NVME_AUTH_H 8 8 9 9 #include <crypto/kpp.h> 10 + #include <crypto/sha2.h> 10 11 11 12 struct nvme_dhchap_key { 12 13 size_t len; ··· 21 20 u8 nvme_auth_dhgroup_id(const char *dhgroup_name); 22 21 23 22 const char *nvme_auth_hmac_name(u8 hmac_id); 24 - const char *nvme_auth_digest_name(u8 hmac_id); 25 23 size_t nvme_auth_hmac_hash_len(u8 hmac_id); 26 24 u8 nvme_auth_hmac_id(const char *hmac_name); 25 + struct nvme_auth_hmac_ctx { 26 + u8 hmac_id; 27 + union { 28 + struct hmac_sha256_ctx sha256; 29 + struct hmac_sha384_ctx sha384; 30 + struct hmac_sha512_ctx sha512; 31 + }; 32 + }; 33 + int nvme_auth_hmac_init(struct nvme_auth_hmac_ctx *hmac, u8 hmac_id, 34 + const u8 *key, size_t key_len); 35 + void nvme_auth_hmac_update(struct nvme_auth_hmac_ctx *hmac, const u8 *data, 36 + size_t data_len); 37 + void nvme_auth_hmac_final(struct nvme_auth_hmac_ctx *hmac, u8 *out); 27 38 28 39 u32 nvme_auth_key_struct_size(u32 key_len); 29 - struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, 30 - u8 key_hash); 40 + struct nvme_dhchap_key *nvme_auth_extract_key(const char *secret, u8 key_hash); 31 41 void nvme_auth_free_key(struct nvme_dhchap_key *key); 32 42 struct nvme_dhchap_key *nvme_auth_alloc_key(u32 len, u8 hash); 33 43 struct nvme_dhchap_key *nvme_auth_transform_key( 34 - struct nvme_dhchap_key *key, char *nqn); 35 - int nvme_auth_generate_key(u8 *secret, struct nvme_dhchap_key **ret_key); 36 - int nvme_auth_augmented_challenge(u8 hmac_id, u8 *skey, size_t skey_len, 37 - u8 *challenge, u8 *aug, size_t hlen); 44 + const struct nvme_dhchap_key *key, const char *nqn); 45 + int nvme_auth_parse_key(const char *secret, struct nvme_dhchap_key **ret_key); 46 + int nvme_auth_augmented_challenge(u8 hmac_id, const u8 *skey, size_t skey_len, 47 + const u8 *challenge, u8 *aug, size_t hlen); 38 48 int nvme_auth_gen_privkey(struct crypto_kpp *dh_tfm, u8 dh_gid); 39 49 int nvme_auth_gen_pubkey(struct crypto_kpp *dh_tfm, 40 50 u8 *host_key, size_t host_key_len); 41 51 int nvme_auth_gen_shared_secret(struct crypto_kpp *dh_tfm, 42 - u8 *ctrl_key, size_t ctrl_key_len, 52 + const u8 *ctrl_key, size_t ctrl_key_len, 43 53 u8 *sess_key, size_t sess_key_len); 44 - int nvme_auth_generate_psk(u8 hmac_id, u8 *skey, size_t skey_len, 45 - u8 *c1, u8 *c2, size_t hash_len, 54 + int nvme_auth_generate_psk(u8 hmac_id, const u8 *skey, size_t skey_len, 55 + const u8 *c1, const u8 *c2, size_t hash_len, 46 56 u8 **ret_psk, size_t *ret_len); 47 - int nvme_auth_generate_digest(u8 hmac_id, u8 *psk, size_t psk_len, 48 - char *subsysnqn, char *hostnqn, u8 **ret_digest); 49 - int nvme_auth_derive_tls_psk(int hmac_id, u8 *psk, size_t psk_len, 50 - u8 *psk_digest, u8 **ret_psk); 57 + int nvme_auth_generate_digest(u8 hmac_id, const u8 *psk, size_t psk_len, 58 + const char *subsysnqn, const char *hostnqn, 59 + char **ret_digest); 60 + int nvme_auth_derive_tls_psk(int hmac_id, const u8 *psk, size_t psk_len, 61 + const char *psk_digest, u8 **ret_psk); 51 62 52 63 #endif /* _NVME_AUTH_H */
+22 -2
include/linux/nvme.h
··· 513 513 __u8 pic; 514 514 __u8 rsvd9[3]; 515 515 __le32 elbaf[64]; 516 - __u8 rsvd268[3828]; 516 + __le32 npdgl; 517 + __le32 nprg; 518 + __le32 npra; 519 + __le32 nors; 520 + __le32 npdal; 521 + __u8 rsvd288[3808]; 517 522 }; 523 + 524 + static_assert(sizeof(struct nvme_id_ns_nvm) == 4096); 518 525 519 526 enum { 520 527 NVME_ID_NS_NVM_STS_MASK = 0x7f, ··· 597 590 enum { 598 591 NVME_NS_FEAT_THIN = 1 << 0, 599 592 NVME_NS_FEAT_ATOMICS = 1 << 1, 600 - NVME_NS_FEAT_IO_OPT = 1 << 4, 593 + NVME_NS_FEAT_OPTPERF_SHIFT = 4, 594 + /* In NVMe version 2.0 and below, OPTPERF is only bit 4 of NSFEAT */ 595 + NVME_NS_FEAT_OPTPERF_MASK = 0x1, 596 + /* Since version 2.1, OPTPERF is bits 4 and 5 of NSFEAT */ 597 + NVME_NS_FEAT_OPTPERF_MASK_2_1 = 0x3, 601 598 NVME_NS_ATTR_RO = 1 << 0, 602 599 NVME_NS_FLBAS_LBA_MASK = 0xf, 603 600 NVME_NS_FLBAS_LBA_UMASK = 0x60, ··· 1848 1837 NVME_AUTH_HASH_INVALID = 0xff, 1849 1838 }; 1850 1839 1840 + /* Maximum digest size for any NVME_AUTH_HASH_* value */ 1841 + enum { 1842 + NVME_AUTH_MAX_DIGEST_SIZE = 64, 1843 + }; 1844 + 1851 1845 /* Defined Diffie-Hellman group identifiers for DH-HMAC-CHAP authentication */ 1852 1846 enum { 1853 1847 NVME_AUTH_DHGROUP_NULL = 0x00, ··· 2347 2331 }; 2348 2332 2349 2333 #define NVME_PR_IGNORE_KEY (1 << 3) 2334 + 2335 + /* Section 8.3.4.5.2 of the NVMe 2.1 */ 2336 + #define NVME_AUTH_DHCHAP_MAX_HASH_IDS 30 2337 + #define NVME_AUTH_DHCHAP_MAX_DH_IDS 30 2350 2338 2351 2339 #endif /* _LINUX_NVME_H */
+5
include/linux/sed-opal.h
··· 53 53 case IOC_OPAL_DISCOVERY: 54 54 case IOC_OPAL_REVERT_LSP: 55 55 case IOC_OPAL_SET_SID_PW: 56 + case IOC_OPAL_REACTIVATE_LSP: 57 + case IOC_OPAL_LR_SET_START_LEN: 58 + case IOC_OPAL_ENABLE_DISABLE_LR: 59 + case IOC_OPAL_GET_SUM_STATUS: 60 + case IOC_OPAL_STACK_RESET: 56 61 return true; 57 62 } 58 63 return false;
+75
include/uapi/linux/bsg.h
··· 2 2 #ifndef _UAPIBSG_H 3 3 #define _UAPIBSG_H 4 4 5 + #ifdef __KERNEL__ 6 + #include <linux/build_bug.h> 7 + #endif /* __KERNEL__ */ 5 8 #include <linux/types.h> 6 9 7 10 #define BSG_PROTOCOL_SCSI 0 ··· 66 63 __u32 padding; 67 64 }; 68 65 66 + struct bsg_uring_cmd { 67 + __u64 request; /* [i], [*i] command descriptor address */ 68 + __u32 request_len; /* [i] command descriptor length in bytes */ 69 + __u32 protocol; /* [i] protocol type (BSG_PROTOCOL_*) */ 70 + __u32 subprotocol; /* [i] subprotocol type (BSG_SUB_PROTOCOL_*) */ 71 + __u32 max_response_len; /* [i] response buffer size in bytes */ 72 + 73 + __u64 response; /* [i], [*o] response data address */ 74 + __u64 dout_xferp; /* [i], [*i] */ 75 + __u32 dout_xfer_len; /* [i] bytes to be transferred to device */ 76 + __u32 dout_iovec_count; /* [i] 0 -> "flat" dout transfer else 77 + * dout_xferp points to array of iovec 78 + */ 79 + __u64 din_xferp; /* [i], [*o] */ 80 + __u32 din_xfer_len; /* [i] bytes to be transferred from device */ 81 + __u32 din_iovec_count; /* [i] 0 -> "flat" din transfer */ 82 + 83 + __u32 timeout_ms; /* [i] timeout in milliseconds */ 84 + __u8 reserved[12]; /* reserved for future extension */ 85 + }; 86 + 87 + #ifdef __KERNEL__ 88 + /* Must match IORING_OP_URING_CMD payload size (e.g. SQE128). */ 89 + static_assert(sizeof(struct bsg_uring_cmd) == 80); 90 + #endif /* __KERNEL__ */ 91 + 92 + 93 + /* 94 + * SCSI BSG io_uring completion (res2, 64-bit) 95 + * 96 + * When using BSG_PROTOCOL_SCSI + BSG_SUB_PROTOCOL_SCSI_CMD with 97 + * IORING_OP_URING_CMD, the completion queue entry (CQE) contains: 98 + * - result: errno (0 on success) 99 + * - res2: packed SCSI status 100 + * 101 + * res2 bit layout: 102 + * [0..7] device_status (SCSI status byte, e.g. CHECK_CONDITION) 103 + * [8..15] driver_status (e.g. DRIVER_SENSE when sense data is valid) 104 + * [16..23] host_status (e.g. DID_OK, DID_TIME_OUT) 105 + * [24..31] sense_len_wr (bytes of sense data written to response buffer) 106 + * [32..63] resid_len (residual transfer length) 107 + */ 108 + static inline __u8 bsg_scsi_res2_device_status(__u64 res2) 109 + { 110 + return res2 & 0xff; 111 + } 112 + static inline __u8 bsg_scsi_res2_driver_status(__u64 res2) 113 + { 114 + return res2 >> 8; 115 + } 116 + static inline __u8 bsg_scsi_res2_host_status(__u64 res2) 117 + { 118 + return res2 >> 16; 119 + } 120 + static inline __u8 bsg_scsi_res2_sense_len(__u64 res2) 121 + { 122 + return res2 >> 24; 123 + } 124 + static inline __u32 bsg_scsi_res2_resid_len(__u64 res2) 125 + { 126 + return res2 >> 32; 127 + } 128 + static inline __u64 bsg_scsi_res2_build(__u8 device_status, __u8 driver_status, 129 + __u8 host_status, __u8 sense_len_wr, 130 + __u32 resid_len) 131 + { 132 + return ((__u64)(__u32)(resid_len) << 32) | 133 + ((__u64)sense_len_wr << 24) | 134 + ((__u64)host_status << 16) | 135 + ((__u64)driver_status << 8) | 136 + (__u64)device_status; 137 + } 69 138 70 139 #endif /* _UAPIBSG_H */
+30
include/uapi/linux/sed-opal.h
··· 74 74 __u8 align[2]; /* Align to 8 byte boundary */ 75 75 }; 76 76 77 + struct opal_lr_react { 78 + struct opal_key key; 79 + struct opal_key new_admin_key; /* Set new Admin1 PIN if key_len is > 0 */ 80 + __u8 num_lrs; /* 81 + * Configure selected ranges (from lr[]) in SUM. 82 + * If num_lrs > 0 the 'entire_table' must be 0 83 + */ 84 + __u8 lr[OPAL_MAX_LRS]; 85 + __u8 range_policy; /* Set RangeStartRangeLengthPolicy parameter */ 86 + __u8 entire_table; /* Set all locking objects in SUM */ 87 + __u8 align[4]; /* Align to 8 byte boundary */ 88 + }; 89 + 77 90 struct opal_session_info { 78 91 __u32 sum; 79 92 __u32 who; ··· 109 96 __u32 WLE; /* Write Lock Enabled */ 110 97 __u32 l_state; 111 98 __u8 align[4]; 99 + }; 100 + 101 + struct opal_sum_ranges { 102 + /* 103 + * Initiate Admin1 session if key_len > 0, 104 + * use Anybody session otherwise. 105 + */ 106 + struct opal_key key; 107 + __u8 num_lrs; 108 + __u8 lr[OPAL_MAX_LRS]; 109 + __u8 range_policy; 110 + __u8 align[5]; /* Align to 8 byte boundary */ 112 111 }; 113 112 114 113 struct opal_lock_unlock { ··· 241 216 #define IOC_OPAL_DISCOVERY _IOW('p', 239, struct opal_discovery) 242 217 #define IOC_OPAL_REVERT_LSP _IOW('p', 240, struct opal_revert_lsp) 243 218 #define IOC_OPAL_SET_SID_PW _IOW('p', 241, struct opal_new_pw) 219 + #define IOC_OPAL_REACTIVATE_LSP _IOW('p', 242, struct opal_lr_react) 220 + #define IOC_OPAL_LR_SET_START_LEN _IOW('p', 243, struct opal_user_lr_setup) 221 + #define IOC_OPAL_ENABLE_DISABLE_LR _IOW('p', 244, struct opal_user_lr_setup) 222 + #define IOC_OPAL_GET_SUM_STATUS _IOW('p', 245, struct opal_sum_ranges) 223 + #define IOC_OPAL_STACK_RESET _IO('p', 246) 244 224 245 225 #endif /* _UAPI_SED_OPAL_H */
+80
include/uapi/linux/ublk_cmd.h
··· 58 58 #define UBLK_U_CMD_TRY_STOP_DEV \ 59 59 _IOWR('u', 0x17, struct ublksrv_ctrl_cmd) 60 60 /* 61 + * Register a shared memory buffer for zero-copy I/O. 62 + * Input: ctrl_cmd.addr points to struct ublk_shmem_buf_reg (buffer VA + size) 63 + * ctrl_cmd.len = sizeof(struct ublk_shmem_buf_reg) 64 + * Result: >= 0 is the assigned buffer index, < 0 is error 65 + * 66 + * The kernel pins pages from the calling process's address space 67 + * and inserts PFN ranges into a per-device maple tree. When a block 68 + * request's pages match registered pages, the driver sets 69 + * UBLK_IO_F_SHMEM_ZC and encodes the buffer index + offset in addr, 70 + * allowing the server to access the data via its own mapping of the 71 + * same shared memory — true zero copy. 72 + * 73 + * The memory can be backed by memfd, hugetlbfs, or any GUP-compatible 74 + * shared mapping. Queue freeze is handled internally. 75 + * 76 + * The buffer VA and size are passed via a user buffer (not inline in 77 + * ctrl_cmd) so that unprivileged devices can prepend the device path 78 + * to ctrl_cmd.addr without corrupting the VA. 79 + */ 80 + #define UBLK_U_CMD_REG_BUF \ 81 + _IOWR('u', 0x18, struct ublksrv_ctrl_cmd) 82 + /* 83 + * Unregister a shared memory buffer. 84 + * Input: ctrl_cmd.data[0] = buffer index 85 + */ 86 + #define UBLK_U_CMD_UNREG_BUF \ 87 + _IOWR('u', 0x19, struct ublksrv_ctrl_cmd) 88 + 89 + /* Parameter buffer for UBLK_U_CMD_REG_BUF, pointed to by ctrl_cmd.addr */ 90 + struct ublk_shmem_buf_reg { 91 + __u64 addr; /* userspace virtual address of shared memory */ 92 + __u64 len; /* buffer size in bytes, page-aligned, default max 4GB */ 93 + __u32 flags; 94 + __u32 reserved; 95 + }; 96 + 97 + /* Pin pages without FOLL_WRITE; usable with write-sealed memfd */ 98 + #define UBLK_SHMEM_BUF_READ_ONLY (1U << 0) 99 + /* 61 100 * 64bits are enough now, and it should be easy to extend in case of 62 101 * running out of feature flags 63 102 */ ··· 409 370 /* Disable automatic partition scanning when device is started */ 410 371 #define UBLK_F_NO_AUTO_PART_SCAN (1ULL << 18) 411 372 373 + /* 374 + * Enable shared memory zero copy. When enabled, the server can register 375 + * shared memory buffers via UBLK_U_CMD_REG_BUF. If a block request's 376 + * pages match a registered buffer, UBLK_IO_F_SHMEM_ZC is set and addr 377 + * encodes the buffer index + offset instead of a userspace buffer address. 378 + */ 379 + #define UBLK_F_SHMEM_ZC (1ULL << 19) 380 + 412 381 /* device state */ 413 382 #define UBLK_S_DEV_DEAD 0 414 383 #define UBLK_S_DEV_LIVE 1 ··· 516 469 #define UBLK_IO_F_NEED_REG_BUF (1U << 17) 517 470 /* Request has an integrity data buffer */ 518 471 #define UBLK_IO_F_INTEGRITY (1UL << 18) 472 + /* 473 + * I/O buffer is in a registered shared memory buffer. When set, the addr 474 + * field in ublksrv_io_desc encodes buffer index and byte offset instead 475 + * of a userspace virtual address. 476 + */ 477 + #define UBLK_IO_F_SHMEM_ZC (1U << 19) 519 478 520 479 /* 521 480 * io cmd is described by this structure, and stored in share memory, indexed ··· 795 742 struct ublk_param_segment seg; 796 743 struct ublk_param_integrity integrity; 797 744 }; 745 + 746 + /* 747 + * Shared memory zero-copy addr encoding for UBLK_IO_F_SHMEM_ZC. 748 + * 749 + * When UBLK_IO_F_SHMEM_ZC is set, ublksrv_io_desc.addr is encoded as: 750 + * bits [0:31] = byte offset within the buffer (up to 4GB) 751 + * bits [32:47] = buffer index (up to 65536) 752 + * bits [48:63] = reserved (must be zero) 753 + */ 754 + #define UBLK_SHMEM_ZC_OFF_MASK 0xffffffffULL 755 + #define UBLK_SHMEM_ZC_IDX_OFF 32 756 + #define UBLK_SHMEM_ZC_IDX_MASK 0xffffULL 757 + 758 + static inline __u64 ublk_shmem_zc_addr(__u16 index, __u32 offset) 759 + { 760 + return ((__u64)index << UBLK_SHMEM_ZC_IDX_OFF) | offset; 761 + } 762 + 763 + static inline __u16 ublk_shmem_zc_index(__u64 addr) 764 + { 765 + return (addr >> UBLK_SHMEM_ZC_IDX_OFF) & UBLK_SHMEM_ZC_IDX_MASK; 766 + } 767 + 768 + static inline __u32 ublk_shmem_zc_offset(__u64 addr) 769 + { 770 + return (__u32)(addr & UBLK_SHMEM_ZC_OFF_MASK); 771 + } 798 772 799 773 #endif
+1 -1
mm/swapfile.c
··· 3460 3460 if (si->bdev && bdev_synchronous(si->bdev)) 3461 3461 si->flags |= SWP_SYNCHRONOUS_IO; 3462 3462 3463 - if (si->bdev && bdev_nonrot(si->bdev)) { 3463 + if (si->bdev && !bdev_rot(si->bdev)) { 3464 3464 si->flags |= SWP_SOLIDSTATE; 3465 3465 } else { 3466 3466 atomic_inc(&nr_rotate_swap);
+6
tools/testing/selftests/ublk/Makefile
··· 18 18 TEST_PROGS += test_generic_12.sh 19 19 TEST_PROGS += test_generic_13.sh 20 20 TEST_PROGS += test_generic_16.sh 21 + TEST_PROGS += test_generic_17.sh 21 22 22 23 TEST_PROGS += test_batch_01.sh 23 24 TEST_PROGS += test_batch_02.sh ··· 51 50 52 51 TEST_PROGS += test_part_01.sh 53 52 TEST_PROGS += test_part_02.sh 53 + 54 + TEST_PROGS += test_shmemzc_01.sh 55 + TEST_PROGS += test_shmemzc_02.sh 56 + TEST_PROGS += test_shmemzc_03.sh 57 + TEST_PROGS += test_shmemzc_04.sh 54 58 55 59 TEST_PROGS += test_stress_01.sh 56 60 TEST_PROGS += test_stress_02.sh
+49 -3
tools/testing/selftests/ublk/fault_inject.c
··· 10 10 11 11 #include "kublk.h" 12 12 13 + struct fi_opts { 14 + long long delay_ns; 15 + bool die_during_fetch; 16 + }; 17 + 13 18 static int ublk_fault_inject_tgt_init(const struct dev_ctx *ctx, 14 19 struct ublk_dev *dev) 15 20 { 16 21 const struct ublksrv_ctrl_dev_info *info = &dev->dev_info; 17 22 unsigned long dev_size = 250UL << 30; 23 + struct fi_opts *opts = NULL; 18 24 19 25 if (ctx->auto_zc_fallback) { 20 26 ublk_err("%s: not support auto_zc_fallback\n", __func__); ··· 41 35 }; 42 36 ublk_set_integrity_params(ctx, &dev->tgt.params); 43 37 44 - dev->private_data = (void *)(unsigned long)(ctx->fault_inject.delay_us * 1000); 38 + opts = calloc(1, sizeof(*opts)); 39 + if (!opts) { 40 + ublk_err("%s: couldn't allocate memory for opts\n", __func__); 41 + return -ENOMEM; 42 + } 43 + 44 + opts->delay_ns = ctx->fault_inject.delay_us * 1000; 45 + opts->die_during_fetch = ctx->fault_inject.die_during_fetch; 46 + dev->private_data = opts; 47 + 45 48 return 0; 49 + } 50 + 51 + static void ublk_fault_inject_pre_fetch_io(struct ublk_thread *t, 52 + struct ublk_queue *q, int tag, 53 + bool batch) 54 + { 55 + struct fi_opts *opts = q->dev->private_data; 56 + 57 + if (!opts->die_during_fetch) 58 + return; 59 + 60 + /* 61 + * Each queue fetches its IOs in increasing order of tags, so 62 + * dying just before we're about to fetch tag 1 (regardless of 63 + * what queue we're on) guarantees that we've fetched a nonempty 64 + * proper subset of the tags on that queue. 65 + */ 66 + if (tag == 1) { 67 + /* 68 + * Ensure our commands are actually live in the kernel 69 + * before we die. 70 + */ 71 + io_uring_submit(&t->ring); 72 + raise(SIGKILL); 73 + } 46 74 } 47 75 48 76 static int ublk_fault_inject_queue_io(struct ublk_thread *t, ··· 84 44 { 85 45 const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag); 86 46 struct io_uring_sqe *sqe; 47 + struct fi_opts *opts = q->dev->private_data; 87 48 struct __kernel_timespec ts = { 88 - .tv_nsec = (long long)q->dev->private_data, 49 + .tv_nsec = opts->delay_ns, 89 50 }; 90 51 91 52 ublk_io_alloc_sqes(t, &sqe, 1); ··· 118 77 { 119 78 static const struct option longopts[] = { 120 79 { "delay_us", 1, NULL, 0 }, 80 + { "die_during_fetch", 1, NULL, 0 }, 121 81 { 0, 0, 0, 0 } 122 82 }; 123 83 int option_idx, opt; 124 84 125 85 ctx->fault_inject.delay_us = 0; 86 + ctx->fault_inject.die_during_fetch = false; 126 87 while ((opt = getopt_long(argc, argv, "", 127 88 longopts, &option_idx)) != -1) { 128 89 switch (opt) { 129 90 case 0: 130 91 if (!strcmp(longopts[option_idx].name, "delay_us")) 131 92 ctx->fault_inject.delay_us = strtoll(optarg, NULL, 10); 93 + if (!strcmp(longopts[option_idx].name, "die_during_fetch")) 94 + ctx->fault_inject.die_during_fetch = strtoll(optarg, NULL, 10); 132 95 } 133 96 } 134 97 } 135 98 136 99 static void ublk_fault_inject_usage(const struct ublk_tgt_ops *ops) 137 100 { 138 - printf("\tfault_inject: [--delay_us us (default 0)]\n"); 101 + printf("\tfault_inject: [--delay_us us (default 0)] [--die_during_fetch 1]\n"); 139 102 } 140 103 141 104 const struct ublk_tgt_ops fault_inject_tgt_ops = { 142 105 .name = "fault_inject", 143 106 .init_tgt = ublk_fault_inject_tgt_init, 107 + .pre_fetch_io = ublk_fault_inject_pre_fetch_io, 144 108 .queue_io = ublk_fault_inject_queue_io, 145 109 .tgt_io_done = ublk_fault_inject_tgt_io_done, 146 110 .parse_cmd_line = ublk_fault_inject_cmd_line,
+38
tools/testing/selftests/ublk/file_backed.c
··· 27 27 return 1; 28 28 } 29 29 30 + /* 31 + * Shared memory zero-copy I/O: when UBLK_IO_F_SHMEM_ZC is set, the 32 + * request's data lives in a registered shared memory buffer. Decode 33 + * index + offset from iod->addr and use the server's mmap of that 34 + * buffer as the I/O buffer for the backing file. 35 + */ 36 + static int loop_queue_shmem_zc_io(struct ublk_thread *t, struct ublk_queue *q, 37 + const struct ublksrv_io_desc *iod, int tag) 38 + { 39 + unsigned ublk_op = ublksrv_get_op(iod); 40 + enum io_uring_op op = ublk_to_uring_op(iod, 0); 41 + __u64 file_offset = iod->start_sector << 9; 42 + __u32 len = iod->nr_sectors << 9; 43 + __u32 shmem_idx = ublk_shmem_zc_index(iod->addr); 44 + __u32 shmem_off = ublk_shmem_zc_offset(iod->addr); 45 + struct io_uring_sqe *sqe[1]; 46 + void *addr; 47 + 48 + if (shmem_idx >= UBLK_BUF_MAX || !shmem_table[shmem_idx].mmap_base) 49 + return -EINVAL; 50 + 51 + addr = shmem_table[shmem_idx].mmap_base + shmem_off; 52 + 53 + ublk_io_alloc_sqes(t, sqe, 1); 54 + if (!sqe[0]) 55 + return -ENOMEM; 56 + 57 + io_uring_prep_rw(op, sqe[0], ublk_get_registered_fd(q, 1), 58 + addr, len, file_offset); 59 + io_uring_sqe_set_flags(sqe[0], IOSQE_FIXED_FILE); 60 + sqe[0]->user_data = build_user_data(tag, ublk_op, 0, q->q_id, 1); 61 + return 1; 62 + } 63 + 30 64 static int loop_queue_tgt_rw_io(struct ublk_thread *t, struct ublk_queue *q, 31 65 const struct ublksrv_io_desc *iod, int tag) 32 66 { ··· 74 40 struct io_uring_sqe *sqe[3]; 75 41 void *addr = io->buf_addr; 76 42 unsigned short buf_index = ublk_io_buf_idx(t, q, tag); 43 + 44 + /* shared memory zero-copy path */ 45 + if (iod->op_flags & UBLK_IO_F_SHMEM_ZC) 46 + return loop_queue_shmem_zc_io(t, q, iod, tag); 77 47 78 48 if (iod->op_flags & UBLK_IO_F_INTEGRITY) { 79 49 ublk_io_alloc_sqes(t, sqe, 1);
+352 -2
tools/testing/selftests/ublk/kublk.c
··· 4 4 */ 5 5 6 6 #include <linux/fs.h> 7 + #include <sys/un.h> 7 8 #include "kublk.h" 8 9 9 10 #define MAX_NR_TGT_ARG 64 ··· 797 796 q = &t->dev->q[q_id]; 798 797 io = &q->ios[tag]; 799 798 io->buf_index = j++; 799 + if (q->tgt_ops->pre_fetch_io) 800 + q->tgt_ops->pre_fetch_io(t, q, tag, false); 800 801 ublk_queue_io_cmd(t, io); 801 802 } 802 803 } else { ··· 810 807 for (i = 0; i < q->q_depth; i++) { 811 808 io = &q->ios[i]; 812 809 io->buf_index = i; 810 + if (q->tgt_ops->pre_fetch_io) 811 + q->tgt_ops->pre_fetch_io(t, q, i, false); 813 812 ublk_queue_io_cmd(t, io); 814 813 } 815 814 } ··· 988 983 if (t->q_map[i] == 0) 989 984 continue; 990 985 986 + if (q->tgt_ops->pre_fetch_io) 987 + q->tgt_ops->pre_fetch_io(t, q, 0, true); 988 + 991 989 ret = ublk_batch_queue_prep_io_cmds(t, q); 992 990 ublk_assert(ret >= 0); 993 991 } ··· 1093 1085 } 1094 1086 1095 1087 1088 + /* 1089 + * Shared memory registration socket listener. 1090 + * 1091 + * The parent daemon context listens on a per-device unix socket at 1092 + * /run/ublk/ublkb<dev_id>.sock for shared memory registration requests 1093 + * from clients. Clients send a memfd via SCM_RIGHTS; the server 1094 + * registers it with the kernel, mmaps it, and returns the assigned index. 1095 + */ 1096 + #define UBLK_SHMEM_SOCK_DIR "/run/ublk" 1097 + 1098 + /* defined in kublk.h, shared with file_backed.c (loop target) */ 1099 + struct ublk_shmem_entry shmem_table[UBLK_BUF_MAX]; 1100 + int shmem_count; 1101 + 1102 + static void ublk_shmem_sock_path(int dev_id, char *buf, size_t len) 1103 + { 1104 + snprintf(buf, len, "%s/ublkb%d.sock", UBLK_SHMEM_SOCK_DIR, dev_id); 1105 + } 1106 + 1107 + static int ublk_shmem_sock_create(int dev_id) 1108 + { 1109 + struct sockaddr_un addr = { .sun_family = AF_UNIX }; 1110 + char path[108]; 1111 + int fd; 1112 + 1113 + mkdir(UBLK_SHMEM_SOCK_DIR, 0755); 1114 + ublk_shmem_sock_path(dev_id, path, sizeof(path)); 1115 + unlink(path); 1116 + 1117 + fd = socket(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK, 0); 1118 + if (fd < 0) 1119 + return -1; 1120 + 1121 + snprintf(addr.sun_path, sizeof(addr.sun_path), "%s", path); 1122 + if (bind(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) { 1123 + close(fd); 1124 + return -1; 1125 + } 1126 + 1127 + listen(fd, 4); 1128 + ublk_dbg(UBLK_DBG_DEV, "shmem socket created: %s\n", path); 1129 + return fd; 1130 + } 1131 + 1132 + static void ublk_shmem_sock_destroy(int dev_id, int sock_fd) 1133 + { 1134 + char path[108]; 1135 + 1136 + if (sock_fd >= 0) 1137 + close(sock_fd); 1138 + ublk_shmem_sock_path(dev_id, path, sizeof(path)); 1139 + unlink(path); 1140 + } 1141 + 1142 + /* Receive a memfd from a client via SCM_RIGHTS */ 1143 + static int ublk_shmem_recv_fd(int client_fd) 1144 + { 1145 + char buf[1]; 1146 + struct iovec iov = { .iov_base = buf, .iov_len = sizeof(buf) }; 1147 + union { 1148 + char cmsg_buf[CMSG_SPACE(sizeof(int))]; 1149 + struct cmsghdr align; 1150 + } u; 1151 + struct msghdr msg = { 1152 + .msg_iov = &iov, 1153 + .msg_iovlen = 1, 1154 + .msg_control = u.cmsg_buf, 1155 + .msg_controllen = sizeof(u.cmsg_buf), 1156 + }; 1157 + struct cmsghdr *cmsg; 1158 + 1159 + if (recvmsg(client_fd, &msg, 0) <= 0) 1160 + return -1; 1161 + 1162 + cmsg = CMSG_FIRSTHDR(&msg); 1163 + if (!cmsg || cmsg->cmsg_level != SOL_SOCKET || 1164 + cmsg->cmsg_type != SCM_RIGHTS) 1165 + return -1; 1166 + 1167 + return *(int *)CMSG_DATA(cmsg); 1168 + } 1169 + 1170 + /* Register a shared memory buffer: store fd, mmap it, return index */ 1171 + static int ublk_shmem_register(int shmem_fd) 1172 + { 1173 + off_t size; 1174 + void *base; 1175 + int idx; 1176 + 1177 + if (shmem_count >= UBLK_BUF_MAX) 1178 + return -1; 1179 + 1180 + size = lseek(shmem_fd, 0, SEEK_END); 1181 + if (size <= 0) 1182 + return -1; 1183 + 1184 + base = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, 1185 + shmem_fd, 0); 1186 + if (base == MAP_FAILED) 1187 + return -1; 1188 + 1189 + idx = shmem_count++; 1190 + shmem_table[idx].fd = shmem_fd; 1191 + shmem_table[idx].mmap_base = base; 1192 + shmem_table[idx].size = size; 1193 + 1194 + ublk_dbg(UBLK_DBG_DEV, "shmem registered: index=%d fd=%d size=%zu\n", 1195 + idx, shmem_fd, (size_t)size); 1196 + return idx; 1197 + } 1198 + 1199 + static void ublk_shmem_unregister_all(void) 1200 + { 1201 + int i; 1202 + 1203 + for (i = 0; i < shmem_count; i++) { 1204 + if (shmem_table[i].mmap_base) { 1205 + munmap(shmem_table[i].mmap_base, 1206 + shmem_table[i].size); 1207 + close(shmem_table[i].fd); 1208 + shmem_table[i].mmap_base = NULL; 1209 + } 1210 + } 1211 + shmem_count = 0; 1212 + } 1213 + 1214 + static int ublk_ctrl_reg_buf(struct ublk_dev *dev, void *addr, size_t size, 1215 + __u32 flags) 1216 + { 1217 + struct ublk_shmem_buf_reg buf_reg = { 1218 + .addr = (unsigned long)addr, 1219 + .len = size, 1220 + .flags = flags, 1221 + }; 1222 + struct ublk_ctrl_cmd_data data = { 1223 + .cmd_op = UBLK_U_CMD_REG_BUF, 1224 + .flags = CTRL_CMD_HAS_BUF, 1225 + .addr = (unsigned long)&buf_reg, 1226 + .len = sizeof(buf_reg), 1227 + }; 1228 + 1229 + return __ublk_ctrl_cmd(dev, &data); 1230 + } 1231 + 1232 + /* 1233 + * Handle one client connection: receive memfd, mmap it, register 1234 + * the VA range with kernel, send back the assigned index. 1235 + */ 1236 + static void ublk_shmem_handle_client(int sock_fd, struct ublk_dev *dev) 1237 + { 1238 + int client_fd, memfd, idx, ret; 1239 + int32_t reply; 1240 + off_t size; 1241 + void *base; 1242 + 1243 + client_fd = accept(sock_fd, NULL, NULL); 1244 + if (client_fd < 0) 1245 + return; 1246 + 1247 + memfd = ublk_shmem_recv_fd(client_fd); 1248 + if (memfd < 0) { 1249 + reply = -1; 1250 + goto out; 1251 + } 1252 + 1253 + /* mmap the memfd in server address space */ 1254 + size = lseek(memfd, 0, SEEK_END); 1255 + if (size <= 0) { 1256 + reply = -1; 1257 + close(memfd); 1258 + goto out; 1259 + } 1260 + base = mmap(NULL, size, PROT_READ | PROT_WRITE, 1261 + MAP_SHARED | MAP_POPULATE, memfd, 0); 1262 + if (base == MAP_FAILED) { 1263 + reply = -1; 1264 + close(memfd); 1265 + goto out; 1266 + } 1267 + 1268 + /* Register server's VA range with kernel for PFN matching */ 1269 + ret = ublk_ctrl_reg_buf(dev, base, size, 0); 1270 + if (ret < 0) { 1271 + ublk_dbg(UBLK_DBG_DEV, 1272 + "shmem_zc: kernel reg failed %d\n", ret); 1273 + munmap(base, size); 1274 + close(memfd); 1275 + reply = ret; 1276 + goto out; 1277 + } 1278 + 1279 + /* Store in table for I/O handling */ 1280 + idx = ublk_shmem_register(memfd); 1281 + if (idx >= 0) { 1282 + shmem_table[idx].mmap_base = base; 1283 + shmem_table[idx].size = size; 1284 + } 1285 + reply = idx; 1286 + out: 1287 + send(client_fd, &reply, sizeof(reply), 0); 1288 + close(client_fd); 1289 + } 1290 + 1291 + struct shmem_listener_info { 1292 + int dev_id; 1293 + int stop_efd; /* eventfd to signal listener to stop */ 1294 + int sock_fd; /* listener socket fd (output) */ 1295 + struct ublk_dev *dev; 1296 + }; 1297 + 1298 + /* 1299 + * Socket listener thread: runs in the parent daemon context alongside 1300 + * the I/O threads. Accepts shared memory registration requests from 1301 + * clients via SCM_RIGHTS. Exits when stop_efd is signaled. 1302 + */ 1303 + static void *ublk_shmem_listener_fn(void *data) 1304 + { 1305 + struct shmem_listener_info *info = data; 1306 + struct pollfd pfds[2]; 1307 + 1308 + info->sock_fd = ublk_shmem_sock_create(info->dev_id); 1309 + if (info->sock_fd < 0) 1310 + return NULL; 1311 + 1312 + pfds[0].fd = info->sock_fd; 1313 + pfds[0].events = POLLIN; 1314 + pfds[1].fd = info->stop_efd; 1315 + pfds[1].events = POLLIN; 1316 + 1317 + while (1) { 1318 + int ret = poll(pfds, 2, -1); 1319 + 1320 + if (ret < 0) 1321 + break; 1322 + 1323 + /* Stop signal from parent */ 1324 + if (pfds[1].revents & POLLIN) 1325 + break; 1326 + 1327 + /* Client connection */ 1328 + if (pfds[0].revents & POLLIN) 1329 + ublk_shmem_handle_client(info->sock_fd, info->dev); 1330 + } 1331 + 1332 + return NULL; 1333 + } 1334 + 1335 + static int ublk_shmem_htlb_setup(const struct dev_ctx *ctx, 1336 + struct ublk_dev *dev) 1337 + { 1338 + int fd, idx, ret; 1339 + struct stat st; 1340 + void *base; 1341 + 1342 + fd = open(ctx->htlb_path, O_RDWR); 1343 + if (fd < 0) { 1344 + ublk_err("htlb: can't open %s\n", ctx->htlb_path); 1345 + return -errno; 1346 + } 1347 + 1348 + if (fstat(fd, &st) < 0 || st.st_size <= 0) { 1349 + ublk_err("htlb: invalid file size\n"); 1350 + close(fd); 1351 + return -EINVAL; 1352 + } 1353 + 1354 + base = mmap(NULL, st.st_size, 1355 + ctx->rdonly_shmem_buf ? PROT_READ : PROT_READ | PROT_WRITE, 1356 + MAP_SHARED | MAP_POPULATE, fd, 0); 1357 + if (base == MAP_FAILED) { 1358 + ublk_err("htlb: mmap failed\n"); 1359 + close(fd); 1360 + return -ENOMEM; 1361 + } 1362 + 1363 + ret = ublk_ctrl_reg_buf(dev, base, st.st_size, 1364 + ctx->rdonly_shmem_buf ? UBLK_SHMEM_BUF_READ_ONLY : 0); 1365 + if (ret < 0) { 1366 + ublk_err("htlb: reg_buf failed: %d\n", ret); 1367 + munmap(base, st.st_size); 1368 + close(fd); 1369 + return ret; 1370 + } 1371 + 1372 + if (shmem_count >= UBLK_BUF_MAX) { 1373 + munmap(base, st.st_size); 1374 + close(fd); 1375 + return -ENOMEM; 1376 + } 1377 + 1378 + idx = shmem_count++; 1379 + shmem_table[idx].fd = fd; 1380 + shmem_table[idx].mmap_base = base; 1381 + shmem_table[idx].size = st.st_size; 1382 + 1383 + ublk_dbg(UBLK_DBG_DEV, "htlb registered: index=%d size=%zu\n", 1384 + idx, (size_t)st.st_size); 1385 + return 0; 1386 + } 1387 + 1096 1388 static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev) 1097 1389 { 1098 1390 const struct ublksrv_ctrl_dev_info *dinfo = &dev->dev_info; 1391 + struct shmem_listener_info linfo = {}; 1099 1392 struct ublk_thread_info *tinfo; 1100 1393 unsigned long long extra_flags = 0; 1101 1394 cpu_set_t *affinity_buf; 1102 1395 unsigned char (*q_thread_map)[UBLK_MAX_QUEUES] = NULL; 1396 + uint64_t stop_val = 1; 1397 + pthread_t listener; 1103 1398 void *thread_ret; 1104 1399 sem_t ready; 1105 1400 int ret, i; ··· 1491 1180 goto fail_start; 1492 1181 } 1493 1182 1183 + if (ctx->htlb_path) { 1184 + ret = ublk_shmem_htlb_setup(ctx, dev); 1185 + if (ret < 0) { 1186 + ublk_err("htlb setup failed: %d\n", ret); 1187 + ublk_ctrl_stop_dev(dev); 1188 + goto fail_start; 1189 + } 1190 + } 1191 + 1494 1192 ublk_ctrl_get_info(dev); 1495 1193 if (ctx->fg) 1496 1194 ublk_ctrl_dump(dev); 1497 1195 else 1498 1196 ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id); 1499 1197 fail_start: 1500 - /* wait until we are terminated */ 1501 - for (i = 0; i < dev->nthreads; i++) 1198 + /* 1199 + * Wait for I/O threads to exit. While waiting, a listener 1200 + * thread accepts shared memory registration requests from 1201 + * clients via a per-device unix socket (SCM_RIGHTS fd passing). 1202 + */ 1203 + linfo.dev_id = dinfo->dev_id; 1204 + linfo.dev = dev; 1205 + linfo.stop_efd = eventfd(0, 0); 1206 + if (linfo.stop_efd >= 0) 1207 + pthread_create(&listener, NULL, 1208 + ublk_shmem_listener_fn, &linfo); 1209 + 1210 + for (i = 0; i < (int)dev->nthreads; i++) 1502 1211 pthread_join(tinfo[i].thread, &thread_ret); 1212 + 1213 + /* Signal listener thread to stop and wait for it */ 1214 + if (linfo.stop_efd >= 0) { 1215 + write(linfo.stop_efd, &stop_val, sizeof(stop_val)); 1216 + pthread_join(listener, NULL); 1217 + close(linfo.stop_efd); 1218 + ublk_shmem_sock_destroy(dinfo->dev_id, linfo.sock_fd); 1219 + } 1220 + ublk_shmem_unregister_all(); 1503 1221 free(tinfo); 1504 1222 fail: 1505 1223 for (i = 0; i < dinfo->nr_hw_queues; i++) ··· 1958 1618 FEAT_NAME(UBLK_F_SAFE_STOP_DEV), 1959 1619 FEAT_NAME(UBLK_F_BATCH_IO), 1960 1620 FEAT_NAME(UBLK_F_NO_AUTO_PART_SCAN), 1621 + FEAT_NAME(UBLK_F_SHMEM_ZC), 1961 1622 }; 1962 1623 struct ublk_dev *dev; 1963 1624 __u64 features = 0; ··· 2131 1790 { "safe", 0, NULL, 0 }, 2132 1791 { "batch", 0, NULL, 'b'}, 2133 1792 { "no_auto_part_scan", 0, NULL, 0 }, 1793 + { "shmem_zc", 0, NULL, 0 }, 1794 + { "htlb", 1, NULL, 0 }, 1795 + { "rdonly_shmem_buf", 0, NULL, 0 }, 2134 1796 { 0, 0, 0, 0 } 2135 1797 }; 2136 1798 const struct ublk_tgt_ops *ops = NULL; ··· 2249 1905 ctx.safe_stop = 1; 2250 1906 if (!strcmp(longopts[option_idx].name, "no_auto_part_scan")) 2251 1907 ctx.flags |= UBLK_F_NO_AUTO_PART_SCAN; 1908 + if (!strcmp(longopts[option_idx].name, "shmem_zc")) 1909 + ctx.flags |= UBLK_F_SHMEM_ZC; 1910 + if (!strcmp(longopts[option_idx].name, "htlb")) 1911 + ctx.htlb_path = strdup(optarg); 1912 + if (!strcmp(longopts[option_idx].name, "rdonly_shmem_buf")) 1913 + ctx.rdonly_shmem_buf = 1; 2252 1914 break; 2253 1915 case '?': 2254 1916 /*
+18
tools/testing/selftests/ublk/kublk.h
··· 60 60 struct fault_inject_ctx { 61 61 /* fault_inject */ 62 62 unsigned long delay_us; 63 + bool die_during_fetch; 63 64 }; 64 65 65 66 struct dev_ctx { ··· 81 80 unsigned int no_ublk_fixed_fd:1; 82 81 unsigned int safe_stop:1; 83 82 unsigned int no_auto_part_scan:1; 83 + unsigned int rdonly_shmem_buf:1; 84 84 __u32 integrity_flags; 85 85 __u8 metadata_size; 86 86 __u8 pi_offset; ··· 96 94 97 95 /* for 'update_size' command */ 98 96 unsigned long long size; 97 + 98 + char *htlb_path; 99 99 100 100 union { 101 101 struct stripe_ctx stripe; ··· 142 138 int (*init_tgt)(const struct dev_ctx *ctx, struct ublk_dev *); 143 139 void (*deinit_tgt)(struct ublk_dev *); 144 140 141 + void (*pre_fetch_io)(struct ublk_thread *t, struct ublk_queue *q, 142 + int tag, bool batch); 145 143 int (*queue_io)(struct ublk_thread *, struct ublk_queue *, int tag); 146 144 void (*tgt_io_done)(struct ublk_thread *, struct ublk_queue *, 147 145 const struct io_uring_cqe *); ··· 604 598 io->result = 0; 605 599 } 606 600 } 601 + 602 + /* shared memory zero-copy support */ 603 + #define UBLK_BUF_MAX 256 604 + 605 + struct ublk_shmem_entry { 606 + int fd; 607 + void *mmap_base; 608 + size_t size; 609 + }; 610 + 611 + extern struct ublk_shmem_entry shmem_table[UBLK_BUF_MAX]; 612 + extern int shmem_count; 607 613 608 614 extern const struct ublk_tgt_ops null_tgt_ops; 609 615 extern const struct ublk_tgt_ops loop_tgt_ops;
+12 -5
tools/testing/selftests/ublk/test_common.sh
··· 88 88 _mkfs_mount_test() 89 89 { 90 90 local dev=$1 91 + shift 91 92 local err_code=0 92 93 local mnt_dir; 93 94 ··· 100 99 fi 101 100 102 101 mount -t ext4 "$dev" "$mnt_dir" > /dev/null 2>&1 103 - umount "$dev" 104 - err_code=$? 105 - _remove_tmp_dir "$mnt_dir" 106 - if [ $err_code -ne 0 ]; then 107 - return $err_code 102 + if [ $# -gt 0 ]; then 103 + cd "$mnt_dir" && "$@" 104 + err_code=$? 105 + cd - > /dev/null 108 106 fi 107 + umount "$dev" 108 + if [ $err_code -eq 0 ]; then 109 + err_code=$? 110 + fi 111 + _remove_tmp_dir "$mnt_dir" 112 + return $err_code 109 113 } 110 114 111 115 _check_root() { ··· 138 132 local base_dir=${TMPDIR:-./ublktest-dir} 139 133 mkdir -p "$base_dir" 140 134 UBLK_TEST_DIR=$(mktemp -d ${base_dir}/${TID}.XXXXXX) 135 + UBLK_TEST_DIR=$(realpath ${UBLK_TEST_DIR}) 141 136 UBLK_TMP=$(mktemp ${UBLK_TEST_DIR}/ublk_test_XXXXX) 142 137 [ "$UBLK_TEST_QUIET" -eq 0 ] && echo "ublk $type: $*" 143 138 echo "ublk selftest: $TID starting at $(date '+%F %T')" | tee /dev/kmsg
+35
tools/testing/selftests/ublk/test_generic_17.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + . "$(cd "$(dirname "$0")" && pwd)"/test_common.sh 5 + 6 + ERR_CODE=0 7 + 8 + _prep_test "fault_inject" "teardown after incomplete recovery" 9 + 10 + # First start and stop a ublk server with device configured for recovery 11 + dev_id=$(_add_ublk_dev -t fault_inject -r 1) 12 + _check_add_dev $TID $? 13 + state=$(__ublk_kill_daemon "${dev_id}" "QUIESCED") 14 + if [ "$state" != "QUIESCED" ]; then 15 + echo "device isn't quiesced($state) after $action" 16 + ERR_CODE=255 17 + fi 18 + 19 + # Then recover the device, but use --die_during_fetch to have the ublk 20 + # server die while a queue has some (but not all) I/Os fetched 21 + ${UBLK_PROG} recover -n "${dev_id}" --foreground -t fault_inject --die_during_fetch 1 22 + RECOVER_RES=$? 23 + # 137 is the result when dying of SIGKILL 24 + if (( RECOVER_RES != 137 )); then 25 + echo "recover command exited with unexpected code ${RECOVER_RES}!" 26 + ERR_CODE=255 27 + fi 28 + 29 + # Clean up the device. This can only succeed once teardown of the above 30 + # exited ublk server completes. So if teardown never completes, we will 31 + # time out here 32 + _ublk_del_dev "${dev_id}" 33 + 34 + _cleanup_test "fault_inject" 35 + _show_result $TID $ERR_CODE
+72
tools/testing/selftests/ublk/test_shmemzc_01.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Test: shmem_zc with hugetlbfs buffer on null target 4 + # 5 + # kublk and fio both mmap the same hugetlbfs file (MAP_SHARED), 6 + # so they share physical pages. The kernel PFN match enables 7 + # zero-copy I/O without socket-based fd passing. 8 + 9 + . "$(cd "$(dirname "$0")" && pwd)"/test_common.sh 10 + 11 + ERR_CODE=0 12 + 13 + _prep_test "shmem_zc" "null target hugetlbfs shmem zero-copy test" 14 + 15 + if ! _have_program fio; then 16 + echo "SKIP: fio not available" 17 + exit "$UBLK_SKIP_CODE" 18 + fi 19 + 20 + if ! grep -q hugetlbfs /proc/filesystems; then 21 + echo "SKIP: hugetlbfs not supported" 22 + exit "$UBLK_SKIP_CODE" 23 + fi 24 + 25 + # Allocate hugepages 26 + OLD_NR_HP=$(cat /proc/sys/vm/nr_hugepages) 27 + echo 10 > /proc/sys/vm/nr_hugepages 28 + NR_HP=$(cat /proc/sys/vm/nr_hugepages) 29 + if [ "$NR_HP" -lt 2 ]; then 30 + echo "SKIP: cannot allocate hugepages" 31 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 32 + exit "$UBLK_SKIP_CODE" 33 + fi 34 + 35 + # Mount hugetlbfs 36 + HTLB_MNT=$(mktemp -d "${UBLK_TEST_DIR}/htlb_mnt_XXXXXX") 37 + if ! mount -t hugetlbfs none "$HTLB_MNT"; then 38 + echo "SKIP: cannot mount hugetlbfs" 39 + rmdir "$HTLB_MNT" 40 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 41 + exit "$UBLK_SKIP_CODE" 42 + fi 43 + 44 + HTLB_FILE="$HTLB_MNT/ublk_buf" 45 + fallocate -l 4M "$HTLB_FILE" 46 + 47 + dev_id=$(_add_ublk_dev -t null --shmem_zc --htlb "$HTLB_FILE") 48 + _check_add_dev $TID $? 49 + 50 + fio --name=htlb_zc \ 51 + --filename=/dev/ublkb"${dev_id}" \ 52 + --ioengine=io_uring \ 53 + --rw=randwrite \ 54 + --direct=1 \ 55 + --bs=4k \ 56 + --size=4M \ 57 + --iodepth=32 \ 58 + --mem=mmaphuge:"$HTLB_FILE" \ 59 + > /dev/null 2>&1 60 + ERR_CODE=$? 61 + 62 + # Delete device first so daemon releases the htlb mmap 63 + _ublk_del_dev "${dev_id}" 64 + 65 + rm -f "$HTLB_FILE" 66 + umount "$HTLB_MNT" 67 + rmdir "$HTLB_MNT" 68 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 69 + 70 + _cleanup_test "shmem_zc" 71 + 72 + _show_result $TID $ERR_CODE
+68
tools/testing/selftests/ublk/test_shmemzc_02.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Test: shmem_zc with hugetlbfs buffer on loop target 4 + # 5 + # kublk and fio both mmap the same hugetlbfs file (MAP_SHARED), 6 + # so they share physical pages. The kernel PFN match enables 7 + # zero-copy I/O without socket-based fd passing. 8 + 9 + . "$(cd "$(dirname "$0")" && pwd)"/test_common.sh 10 + 11 + ERR_CODE=0 12 + 13 + _prep_test "shmem_zc" "loop target hugetlbfs shmem zero-copy test" 14 + 15 + if ! _have_program fio; then 16 + echo "SKIP: fio not available" 17 + exit "$UBLK_SKIP_CODE" 18 + fi 19 + 20 + if ! grep -q hugetlbfs /proc/filesystems; then 21 + echo "SKIP: hugetlbfs not supported" 22 + exit "$UBLK_SKIP_CODE" 23 + fi 24 + 25 + # Allocate hugepages 26 + OLD_NR_HP=$(cat /proc/sys/vm/nr_hugepages) 27 + echo 10 > /proc/sys/vm/nr_hugepages 28 + NR_HP=$(cat /proc/sys/vm/nr_hugepages) 29 + if [ "$NR_HP" -lt 2 ]; then 30 + echo "SKIP: cannot allocate hugepages" 31 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 32 + exit "$UBLK_SKIP_CODE" 33 + fi 34 + 35 + # Mount hugetlbfs 36 + HTLB_MNT=$(mktemp -d "${UBLK_TEST_DIR}/htlb_mnt_XXXXXX") 37 + if ! mount -t hugetlbfs none "$HTLB_MNT"; then 38 + echo "SKIP: cannot mount hugetlbfs" 39 + rmdir "$HTLB_MNT" 40 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 41 + exit "$UBLK_SKIP_CODE" 42 + fi 43 + 44 + HTLB_FILE="$HTLB_MNT/ublk_buf" 45 + fallocate -l 4M "$HTLB_FILE" 46 + 47 + _create_backfile 0 128M 48 + BACKFILE="${UBLK_BACKFILES[0]}" 49 + 50 + dev_id=$(_add_ublk_dev -t loop --shmem_zc --htlb "$HTLB_FILE" "$BACKFILE") 51 + _check_add_dev $TID $? 52 + 53 + _run_fio_verify_io --filename=/dev/ublkb"${dev_id}" \ 54 + --size=128M \ 55 + --mem=mmaphuge:"$HTLB_FILE" 56 + ERR_CODE=$? 57 + 58 + # Delete device first so daemon releases the htlb mmap 59 + _ublk_del_dev "${dev_id}" 60 + 61 + rm -f "$HTLB_FILE" 62 + umount "$HTLB_MNT" 63 + rmdir "$HTLB_MNT" 64 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 65 + 66 + _cleanup_test "shmem_zc" 67 + 68 + _show_result $TID $ERR_CODE
+69
tools/testing/selftests/ublk/test_shmemzc_03.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Test: shmem_zc with fio verify over filesystem on loop target 4 + # 5 + # mkfs + mount ext4 on the ublk device, then run fio verify on a 6 + # file inside that filesystem. Exercises the full stack: 7 + # filesystem -> block layer -> ublk shmem_zc -> loop target backing file. 8 + 9 + . "$(cd "$(dirname "$0")" && pwd)"/test_common.sh 10 + 11 + ERR_CODE=0 12 + 13 + _prep_test "shmem_zc" "loop target hugetlbfs shmem zero-copy fs verify test" 14 + 15 + if ! _have_program fio; then 16 + echo "SKIP: fio not available" 17 + exit "$UBLK_SKIP_CODE" 18 + fi 19 + 20 + if ! grep -q hugetlbfs /proc/filesystems; then 21 + echo "SKIP: hugetlbfs not supported" 22 + exit "$UBLK_SKIP_CODE" 23 + fi 24 + 25 + # Allocate hugepages 26 + OLD_NR_HP=$(cat /proc/sys/vm/nr_hugepages) 27 + echo 10 > /proc/sys/vm/nr_hugepages 28 + NR_HP=$(cat /proc/sys/vm/nr_hugepages) 29 + if [ "$NR_HP" -lt 2 ]; then 30 + echo "SKIP: cannot allocate hugepages" 31 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 32 + exit "$UBLK_SKIP_CODE" 33 + fi 34 + 35 + # Mount hugetlbfs 36 + HTLB_MNT=$(mktemp -d "${UBLK_TEST_DIR}/htlb_mnt_XXXXXX") 37 + if ! mount -t hugetlbfs none "$HTLB_MNT"; then 38 + echo "SKIP: cannot mount hugetlbfs" 39 + rmdir "$HTLB_MNT" 40 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 41 + exit "$UBLK_SKIP_CODE" 42 + fi 43 + 44 + HTLB_FILE="$HTLB_MNT/ublk_buf" 45 + fallocate -l 4M "$HTLB_FILE" 46 + 47 + _create_backfile 0 256M 48 + BACKFILE="${UBLK_BACKFILES[0]}" 49 + 50 + dev_id=$(_add_ublk_dev -t loop --shmem_zc --htlb "$HTLB_FILE" "$BACKFILE") 51 + _check_add_dev $TID $? 52 + 53 + _mkfs_mount_test /dev/ublkb"${dev_id}" \ 54 + _run_fio_verify_io --filename=testfile \ 55 + --size=128M \ 56 + --mem=mmaphuge:"$HTLB_FILE" 57 + ERR_CODE=$? 58 + 59 + # Delete device first so daemon releases the htlb mmap 60 + _ublk_del_dev "${dev_id}" 61 + 62 + rm -f "$HTLB_FILE" 63 + umount "$HTLB_MNT" 64 + rmdir "$HTLB_MNT" 65 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 66 + 67 + _cleanup_test "shmem_zc" 68 + 69 + _show_result $TID $ERR_CODE
+72
tools/testing/selftests/ublk/test_shmemzc_04.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + # Test: shmem_zc with read-only buffer registration on null target 4 + # 5 + # Same as test_shmemzc_01 but with --rdonly_shmem_buf: pages are pinned 6 + # without FOLL_WRITE (UBLK_BUF_F_READ). Write I/O works because 7 + # the server only reads from the shared buffer. 8 + 9 + . "$(cd "$(dirname "$0")" && pwd)"/test_common.sh 10 + 11 + ERR_CODE=0 12 + 13 + _prep_test "shmem_zc" "null target hugetlbfs shmem zero-copy rdonly_buf test" 14 + 15 + if ! _have_program fio; then 16 + echo "SKIP: fio not available" 17 + exit "$UBLK_SKIP_CODE" 18 + fi 19 + 20 + if ! grep -q hugetlbfs /proc/filesystems; then 21 + echo "SKIP: hugetlbfs not supported" 22 + exit "$UBLK_SKIP_CODE" 23 + fi 24 + 25 + # Allocate hugepages 26 + OLD_NR_HP=$(cat /proc/sys/vm/nr_hugepages) 27 + echo 10 > /proc/sys/vm/nr_hugepages 28 + NR_HP=$(cat /proc/sys/vm/nr_hugepages) 29 + if [ "$NR_HP" -lt 2 ]; then 30 + echo "SKIP: cannot allocate hugepages" 31 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 32 + exit "$UBLK_SKIP_CODE" 33 + fi 34 + 35 + # Mount hugetlbfs 36 + HTLB_MNT=$(mktemp -d "${UBLK_TEST_DIR}/htlb_mnt_XXXXXX") 37 + if ! mount -t hugetlbfs none "$HTLB_MNT"; then 38 + echo "SKIP: cannot mount hugetlbfs" 39 + rmdir "$HTLB_MNT" 40 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 41 + exit "$UBLK_SKIP_CODE" 42 + fi 43 + 44 + HTLB_FILE="$HTLB_MNT/ublk_buf" 45 + fallocate -l 4M "$HTLB_FILE" 46 + 47 + dev_id=$(_add_ublk_dev -t null --shmem_zc --htlb "$HTLB_FILE" --rdonly_shmem_buf) 48 + _check_add_dev $TID $? 49 + 50 + fio --name=htlb_zc_rdonly \ 51 + --filename=/dev/ublkb"${dev_id}" \ 52 + --ioengine=io_uring \ 53 + --rw=randwrite \ 54 + --direct=1 \ 55 + --bs=4k \ 56 + --size=4M \ 57 + --iodepth=32 \ 58 + --mem=mmaphuge:"$HTLB_FILE" \ 59 + > /dev/null 2>&1 60 + ERR_CODE=$? 61 + 62 + # Delete device first so daemon releases the htlb mmap 63 + _ublk_del_dev "${dev_id}" 64 + 65 + rm -f "$HTLB_FILE" 66 + umount "$HTLB_MNT" 67 + rmdir "$HTLB_MNT" 68 + echo "$OLD_NR_HP" > /proc/sys/vm/nr_hugepages 69 + 70 + _cleanup_test "shmem_zc" 71 + 72 + _show_result $TID $ERR_CODE