Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

+102 -20

Documentation/admin-guide/device-mapper/verity.rst

··· 102 102 that are not guaranteed to contain zeroes. 103 103 104 104 use_fec_from_device <fec_dev> 105 - Use forward error correction (FEC) to recover from corruption if hash 106 - verification fails. Use encoding data from the specified device. This 107 - may be the same device where data and hash blocks reside, in which case 108 - fec_start must be outside data and hash areas. 105 + Use forward error correction (FEC) parity data from the specified device to 106 + try to automatically recover from corruption and I/O errors. 109 107 110 - If the encoding data covers additional metadata, it must be accessible 111 - on the hash device after the hash blocks. 108 + If this option is given, then <fec_roots> and <fec_blocks> must also be 109 + given. <hash_block_size> must also be equal to <data_block_size>. 112 110 113 - Note: block sizes for data and hash devices must match. Also, if the 114 - verity <dev> is encrypted the <fec_dev> should be too. 111 + <fec_dev> can be the same as <dev>, in which case <fec_start> must be 112 + outside the data area. It can also be the same as <hash_dev>, in which case 113 + <fec_start> must be outside the hash and optional additional metadata areas. 114 + 115 + If the data <dev> is encrypted, the <fec_dev> should be too. 116 + 117 + For more information, see `Forward error correction`_. 115 118 116 119 fec_roots <num> 117 - Number of generator roots. This equals to the number of parity bytes in 118 - the encoding data. For example, in RS(M, N) encoding, the number of roots 119 - is M-N. 120 + The number of parity bytes in each 255-byte Reed-Solomon codeword. The 121 + Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots. 122 + 123 + The supported values are 2 through 24 inclusive. Higher values provide 124 + stronger error correction. However, the minimum value of 2 already provides 125 + strong error correction due to the use of interleaving, so 2 is the 126 + recommended value for most users. fec_roots=2 corresponds to an 127 + RS(255, 253) code, which has a space overhead of about 0.8%. 120 128 121 129 fec_blocks <num> 122 - The number of encoding data blocks on the FEC device. The block size for 123 - the FEC device is <data_block_size>. 130 + The total number of <data_block_size> blocks that are error-checked using 131 + FEC. This must be at least the sum of <num_data_blocks> and the number of 132 + blocks needed by the hash tree. It can include additional metadata blocks, 133 + which are assumed to be accessible on <hash_dev> following the hash blocks. 134 + 135 + Note that this is *not* the number of parity blocks. The number of parity 136 + blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>. 124 137 125 138 fec_start <offset> 126 - This is the offset, in <data_block_size> blocks, from the start of the 127 - FEC device to the beginning of the encoding data. 139 + This is the offset, in <data_block_size> blocks, from the start of <fec_dev> 140 + to the beginning of the parity data. 128 141 129 142 check_at_most_once 130 143 Verify data blocks only the first time they are read from the data device, ··· 193 180 into the page cache. Block hashes are stored linearly, aligned to the nearest 194 181 block size. 195 182 196 - If forward error correction (FEC) support is enabled any recovery of 197 - corrupted data will be verified using the cryptographic hash of the 198 - corresponding data. This is why combining error correction with 199 - integrity checking is essential. 200 - 201 183 Hash Tree 202 184 --------- 203 185 ··· 220 212 / ... \ / . . . \ / \ 221 213 blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767 222 214 215 + Forward error correction 216 + ------------------------ 217 + 218 + dm-verity's optional forward error correction (FEC) support adds strong error 219 + correction capabilities to dm-verity. It allows systems that would be rendered 220 + inoperable by errors to continue operating, albeit with reduced performance. 221 + 222 + FEC uses Reed-Solomon (RS) codes that are interleaved across the entire 223 + device(s), allowing long bursts of corrupt or unreadable blocks to be recovered. 224 + 225 + dm-verity validates any FEC-corrected block against the wanted hash before using 226 + it. Therefore, FEC doesn't affect the security properties of dm-verity. 227 + 228 + The integration of FEC with dm-verity provides significant benefits over a 229 + separate error correction layer: 230 + 231 + - dm-verity invokes FEC only when a block's hash doesn't match the wanted hash 232 + or the block cannot be read at all. As a result, FEC doesn't add overhead to 233 + the common case where no error occurs. 234 + 235 + - dm-verity hashes are also used to identify erasure locations for RS decoding. 236 + This allows correcting twice as many errors. 237 + 238 + FEC uses an RS(255, k) code where k = 255 - fec_roots. fec_roots is usually 2. 239 + This means that each k (usually 253) message bytes have fec_roots (usually 2) 240 + bytes of parity data added to get a 255-byte codeword. (Many external sources 241 + call RS codewords "blocks". Since dm-verity already uses the term "block" to 242 + mean something else, we'll use the clearer term "RS codeword".) 243 + 244 + FEC checks fec_blocks blocks of message data in total, consisting of: 245 + 246 + 1. The data blocks from the data device 247 + 2. The hash blocks from the hash device 248 + 3. Optional additional metadata that follows the hash blocks on the hash device 249 + 250 + dm-verity assumes that the FEC parity data was computed as if the following 251 + procedure were followed: 252 + 253 + 1. Concatenate the message data from the above sources. 254 + 2. Zero-pad to the next multiple of k blocks. Let msg be the resulting byte 255 + array, and msglen its length in bytes. 256 + 3. For 0 <= i < msglen / k (for each RS codeword): 257 + a. Select msg[i + j * msglen / k] for 0 <= j < k. 258 + Consider these to be the 'k' message bytes of an RS codeword. 259 + b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword, 260 + and concatenate them to the FEC parity data. 261 + 262 + Step 3a interleaves the RS codewords across the entire device using an 263 + interleaving degree of data_block_size * ceil(fec_blocks / k). This is the 264 + maximal interleaving, such that the message data consists of a region containing 265 + byte 0 of all the RS codewords, then a region containing byte 1 of all the RS 266 + codewords, and so on up to the region for byte 'k - 1'. Note that the number of 267 + codewords is set to a multiple of data_block_size; thus, the regions are 268 + block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks. 269 + 270 + This interleaving allows long bursts of errors to be corrected. It provides 271 + much stronger error correction than storage devices typically provide, while 272 + keeping the space overhead low. 273 + 274 + The cost is slow decoding: correcting a single block usually requires reading 275 + 254 extra blocks spread evenly across the device(s). However, that is 276 + acceptable because dm-verity uses FEC only when there is actually an error. 277 + 278 + The list below contains additional details about the RS codes used by 279 + dm-verity's FEC. Userspace programs that generate the parity data need to use 280 + these parameters for the parity data to match exactly: 281 + 282 + - Field used is GF(256) 283 + - Bytes are mapped to/from GF(256) elements in the natural way, where bits 0 284 + through 7 (low-order to high-order) map to the coefficients of x^0 through x^7 285 + - Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1 286 + - The codes used are systematic, BCH-view codes 287 + - Primitive element alpha is 'x' 288 + - First consecutive root of code generator polynomial is 'x^0' 223 289 224 290 On-disk format 225 291 ==============

+2

drivers/md/Kconfig

··· 226 226 select BLOCK_HOLDER_DEPRECATED if SYSFS 227 227 select BLK_DEV_DM_BUILTIN 228 228 select BLK_MQ_STACKING 229 + select CRYPTO_LIB_SHA256 if IMA 229 230 depends on DAX || DAX=n 230 231 help 231 232 Device-mapper is a low level volume manager. It works by allowing ··· 300 299 select CRYPTO 301 300 select CRYPTO_CBC 302 301 select CRYPTO_ESSIV 302 + select CRYPTO_LIB_AES 303 303 select CRYPTO_LIB_MD5 # needed by lmk IV mode 304 304 help 305 305 This device-mapper target allows you to create a device that

+2 -2

drivers/md/dm-bufio.c

··· 391 391 */ 392 392 unsigned int num_locks; 393 393 bool no_sleep; 394 - struct buffer_tree trees[]; 394 + struct buffer_tree trees[] __counted_by(num_locks); 395 395 }; 396 396 397 397 static DEFINE_STATIC_KEY_FALSE(no_sleep_enabled); ··· 2511 2511 } 2512 2512 2513 2513 num_locks = dm_num_hash_locks(); 2514 - c = kzalloc(sizeof(*c) + (num_locks * sizeof(struct buffer_tree)), GFP_KERNEL); 2514 + c = kzalloc_flex(*c, cache.trees, num_locks); 2515 2515 if (!c) { 2516 2516 r = -ENOMEM; 2517 2517 goto bad_client;

+17 -16

drivers/md/dm-cache-metadata.c

··· 1023 1023 return; \ 1024 1024 } while (0) 1025 1025 1026 + #define WRITE_LOCK_OR_GOTO(cmd, label) \ 1027 + do { \ 1028 + if (!cmd_write_lock((cmd))) \ 1029 + goto label; \ 1030 + } while (0) 1031 + 1026 1032 #define WRITE_UNLOCK(cmd) \ 1027 1033 up_write(&(cmd)->root_lock) 1028 1034 ··· 1720 1714 return r; 1721 1715 } 1722 1716 1723 - int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result) 1724 - { 1725 - int r; 1726 - 1727 - READ_LOCK(cmd); 1728 - r = blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result); 1729 - READ_UNLOCK(cmd); 1730 - 1731 - return r; 1732 - } 1733 - 1734 1717 void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd) 1735 1718 { 1736 1719 WRITE_LOCK_VOID(cmd); ··· 1786 1791 new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT, 1787 1792 CACHE_MAX_CONCURRENT_LOCKS); 1788 1793 1789 - WRITE_LOCK(cmd); 1790 - if (cmd->fail_io) { 1791 - WRITE_UNLOCK(cmd); 1792 - goto out; 1793 - } 1794 + /* cmd_write_lock() already checks fail_io with cmd->root_lock held */ 1795 + WRITE_LOCK_OR_GOTO(cmd, out); 1794 1796 1795 1797 __destroy_persistent_data_objects(cmd, false); 1796 1798 old_bm = cmd->bm; ··· 1815 1823 dm_block_manager_destroy(new_bm); 1816 1824 1817 1825 return r; 1826 + } 1827 + 1828 + int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result) 1829 + { 1830 + READ_LOCK(cmd); 1831 + *result = cmd->clean_when_opened; 1832 + READ_UNLOCK(cmd); 1833 + 1834 + return 0; 1818 1835 }

+5 -5

drivers/md/dm-cache-metadata.h

··· 135 135 */ 136 136 int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p); 137 137 138 - /* 139 - * Query method. Are all the blocks in the cache clean? 140 - */ 141 - int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result); 142 - 143 138 int dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd, bool *result); 144 139 int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd); 145 140 void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd); 146 141 void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd); 147 142 int dm_cache_metadata_abort(struct dm_cache_metadata *cmd); 143 + 144 + /* 145 + * Query method. Was the metadata cleanly shut down when opened? 146 + */ 147 + int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result); 148 148 149 149 /*----------------------------------------------------------------*/ 150 150

+4

drivers/md/dm-cache-policy-smq.c

··· 1589 1589 { 1590 1590 struct smq_policy *mq = to_smq_policy(p); 1591 1591 struct entry *e = get_entry(&mq->cache_alloc, from_cblock(cblock)); 1592 + unsigned long flags; 1592 1593 1593 1594 if (!e->allocated) 1594 1595 return -ENODATA; 1595 1596 1597 + spin_lock_irqsave(&mq->lock, flags); 1596 1598 // FIXME: what if this block has pending background work? 1597 1599 del_queue(mq, e); 1598 1600 h_remove(&mq->table, e); 1599 1601 free_entry(&mq->cache_alloc, e); 1602 + spin_unlock_irqrestore(&mq->lock, flags); 1603 + 1600 1604 return 0; 1601 1605 } 1602 1606

+67 -25

drivers/md/dm-cache-target.c

··· 1462 1462 struct cache *cache = mg->cache; 1463 1463 1464 1464 bio_list_init(&bios); 1465 - if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios)) 1466 - free_prison_cell(cache, mg->cell); 1465 + if (mg->cell) { 1466 + if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios)) 1467 + free_prison_cell(cache, mg->cell); 1468 + } 1467 1469 1468 - if (!success && mg->overwrite_bio) 1469 - bio_io_error(mg->overwrite_bio); 1470 + if (mg->overwrite_bio) { 1471 + // Set generic error if the bio hasn't been issued yet, 1472 + // e.g., invalidation or metadata commit failed before bio 1473 + // submission. Otherwise preserve the bio's own error status. 1474 + if (!success && !mg->overwrite_bio->bi_status) 1475 + mg->overwrite_bio->bi_status = BLK_STS_IOERR; 1476 + bio_endio(mg->overwrite_bio); 1477 + } 1470 1478 1471 1479 free_migration(mg); 1472 1480 defer_bios(cache, &bios); ··· 1514 1506 return r; 1515 1507 } 1516 1508 1509 + static void invalidate_committed(struct work_struct *ws) 1510 + { 1511 + struct dm_cache_migration *mg = ws_to_mg(ws); 1512 + struct cache *cache = mg->cache; 1513 + struct bio *bio = mg->overwrite_bio; 1514 + struct per_bio_data *pb = get_per_bio_data(bio); 1515 + 1516 + if (mg->k.input) { 1517 + invalidate_complete(mg, false); 1518 + return; 1519 + } 1520 + 1521 + init_continuation(&mg->k, invalidate_completed); 1522 + remap_to_origin_clear_discard(cache, bio, mg->invalidate_oblock); 1523 + dm_hook_bio(&pb->hook_info, bio, overwrite_endio, mg); 1524 + dm_submit_bio_remap(bio, NULL); 1525 + } 1526 + 1517 1527 static void invalidate_remove(struct work_struct *ws) 1518 1528 { 1519 1529 int r; ··· 1544 1518 return; 1545 1519 } 1546 1520 1547 - init_continuation(&mg->k, invalidate_completed); 1521 + init_continuation(&mg->k, invalidate_committed); 1548 1522 continue_after_commit(&cache->committer, &mg->k); 1549 - remap_to_origin_clear_discard(cache, mg->overwrite_bio, mg->invalidate_oblock); 1550 - mg->overwrite_bio = NULL; 1551 1523 schedule_commit(&cache->committer); 1552 1524 } 1553 1525 ··· 1563 1539 READ_WRITE_LOCK_LEVEL, prealloc, &mg->cell); 1564 1540 if (r < 0) { 1565 1541 free_prison_cell(cache, prealloc); 1542 + 1543 + /* Defer the bio for retrying the cell lock */ 1544 + if (mg->overwrite_bio) { 1545 + struct bio *bio = mg->overwrite_bio; 1546 + 1547 + mg->overwrite_bio = NULL; 1548 + defer_bio(cache, bio); 1549 + } 1550 + 1566 1551 invalidate_complete(mg, false); 1567 1552 return r; 1568 1553 } ··· 1734 1701 bio_drop_shared_lock(cache, bio); 1735 1702 atomic_inc(&cache->stats.demotion); 1736 1703 invalidate_start(cache, cblock, block, bio); 1704 + return DM_MAPIO_SUBMITTED; 1737 1705 } else 1738 1706 remap_to_origin_clear_discard(cache, bio, block); 1739 1707 } else { ··· 2501 2467 goto bad; 2502 2468 } 2503 2469 2504 - if (passthrough_mode(cache)) { 2505 - bool all_clean; 2506 - 2507 - r = dm_cache_metadata_all_clean(cache->cmd, &all_clean); 2508 - if (r) { 2509 - *error = "dm_cache_metadata_all_clean() failed"; 2510 - goto bad; 2511 - } 2512 - 2513 - if (!all_clean) { 2514 - *error = "Cannot enter passthrough mode unless all blocks are clean"; 2515 - r = -EINVAL; 2516 - goto bad; 2517 - } 2518 - 2470 + if (passthrough_mode(cache)) 2519 2471 policy_allow_migrations(cache->policy, false); 2520 - } 2521 2472 2522 2473 spin_lock_init(&cache->lock); 2523 2474 bio_list_init(&cache->deferred_bios); ··· 2829 2810 struct cache *cache = context; 2830 2811 2831 2812 if (dirty) { 2813 + if (passthrough_mode(cache)) { 2814 + DMERR("%s: cannot enter passthrough mode unless all blocks are clean", 2815 + cache_device_name(cache)); 2816 + return -EBUSY; 2817 + } 2818 + 2832 2819 set_bit(from_cblock(cblock), cache->dirty_bitset); 2833 2820 atomic_inc(&cache->nr_dirty); 2834 2821 } else ··· 2954 2929 2955 2930 static bool can_resume(struct cache *cache) 2956 2931 { 2932 + bool clean_when_opened; 2933 + int r; 2934 + 2957 2935 /* 2958 2936 * Disallow retrying the resume operation for devices that failed the 2959 2937 * first resume attempt, as the failure leaves the policy object partially ··· 2971 2943 DMERR("%s: unable to resume cache due to missing proper cache table reload", 2972 2944 cache_device_name(cache)); 2973 2945 return false; 2946 + } 2947 + 2948 + if (passthrough_mode(cache)) { 2949 + r = dm_cache_metadata_clean_when_opened(cache->cmd, &clean_when_opened); 2950 + if (r) { 2951 + DMERR("%s: failed to query metadata flags", cache_device_name(cache)); 2952 + return false; 2953 + } 2954 + 2955 + if (!clean_when_opened) { 2956 + DMERR("%s: unable to resume into passthrough mode after unclean shutdown", 2957 + cache_device_name(cache)); 2958 + return false; 2959 + } 2974 2960 } 2975 2961 2976 2962 return true; ··· 3085 3043 load_filtered_mapping, cache); 3086 3044 if (r) { 3087 3045 DMERR("%s: could not load cache mappings", cache_device_name(cache)); 3088 - if (r != -EFBIG) 3046 + if (r != -EFBIG && r != -EBUSY) 3089 3047 metadata_operation_failed(cache, "dm_cache_load_mappings", r); 3090 3048 return r; 3091 3049 } ··· 3552 3510 3553 3511 static struct target_type cache_target = { 3554 3512 .name = "cache", 3555 - .version = {2, 3, 0}, 3513 + .version = {2, 4, 0}, 3556 3514 .module = THIS_MODULE, 3557 3515 .ctr = cache_ctr, 3558 3516 .dtr = cache_dtr,

+49 -91

drivers/md/dm-crypt.c

··· 32 32 #include <linux/ctype.h> 33 33 #include <asm/page.h> 34 34 #include <linux/unaligned.h> 35 + #include <crypto/aes.h> 35 36 #include <crypto/hash.h> 36 37 #include <crypto/md5.h> 37 38 #include <crypto/skcipher.h> ··· 110 109 const char *opts); 111 110 void (*dtr)(struct crypt_config *cc); 112 111 int (*init)(struct crypt_config *cc); 113 - int (*wipe)(struct crypt_config *cc); 112 + void (*wipe)(struct crypt_config *cc); 114 113 int (*generator)(struct crypt_config *cc, u8 *iv, 115 114 struct dm_crypt_request *dmreq); 116 - int (*post)(struct crypt_config *cc, u8 *iv, 117 - struct dm_crypt_request *dmreq); 115 + void (*post)(struct crypt_config *cc, u8 *iv, 116 + struct dm_crypt_request *dmreq); 118 117 }; 119 118 120 119 struct iv_benbi_private { ··· 134 133 135 134 #define ELEPHANT_MAX_KEY_SIZE 32 136 135 struct iv_elephant_private { 137 - struct crypto_skcipher *tfm; 136 + struct aes_enckey *key; 138 137 }; 139 138 140 139 /* ··· 508 507 return 0; 509 508 } 510 509 511 - static int crypt_iv_lmk_wipe(struct crypt_config *cc) 510 + static void crypt_iv_lmk_wipe(struct crypt_config *cc) 512 511 { 513 512 struct iv_lmk_private *lmk = &cc->iv_gen_private.lmk; 514 513 515 514 if (lmk->seed) 516 515 memset(lmk->seed, 0, LMK_SEED_SIZE); 517 - 518 - return 0; 519 516 } 520 517 521 518 static void crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv, ··· 559 560 return 0; 560 561 } 561 562 562 - static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv, 563 - struct dm_crypt_request *dmreq) 563 + static void crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv, 564 + struct dm_crypt_request *dmreq) 564 565 { 565 566 struct scatterlist *sg; 566 567 u8 *dst; 567 568 568 569 if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) 569 - return 0; 570 + return; 570 571 571 572 sg = crypt_get_sg_data(cc, dmreq->sg_out); 572 573 dst = kmap_local_page(sg_page(sg)); ··· 576 577 crypto_xor(dst + sg->offset, iv, cc->iv_size); 577 578 578 579 kunmap_local(dst); 579 - return 0; 580 580 } 581 581 582 582 static void crypt_iv_tcw_dtr(struct crypt_config *cc) ··· 626 628 return 0; 627 629 } 628 630 629 - static int crypt_iv_tcw_wipe(struct crypt_config *cc) 631 + static void crypt_iv_tcw_wipe(struct crypt_config *cc) 630 632 { 631 633 struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw; 632 634 633 635 memset(tcw->iv_seed, 0, cc->iv_size); 634 636 memset(tcw->whitening, 0, TCW_WHITENING_SIZE); 635 - 636 - return 0; 637 637 } 638 638 639 639 static void crypt_iv_tcw_whitening(struct crypt_config *cc, ··· 683 687 return 0; 684 688 } 685 689 686 - static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv, 687 - struct dm_crypt_request *dmreq) 690 + static void crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv, 691 + struct dm_crypt_request *dmreq) 688 692 { 689 693 struct scatterlist *sg; 690 694 u8 *dst; 691 695 692 696 if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) 693 - return 0; 697 + return; 694 698 695 699 /* Apply whitening on ciphertext */ 696 700 sg = crypt_get_sg_data(cc, dmreq->sg_out); 697 701 dst = kmap_local_page(sg_page(sg)); 698 702 crypt_iv_tcw_whitening(cc, dmreq, dst + sg->offset); 699 703 kunmap_local(dst); 700 - 701 - return 0; 702 704 } 703 705 704 706 static int crypt_iv_random_gen(struct crypt_config *cc, u8 *iv, ··· 761 767 { 762 768 struct iv_elephant_private *elephant = &cc->iv_gen_private.elephant; 763 769 764 - crypto_free_skcipher(elephant->tfm); 765 - elephant->tfm = NULL; 770 + kfree_sensitive(elephant->key); 771 + elephant->key = NULL; 766 772 } 767 773 768 774 static int crypt_iv_elephant_ctr(struct crypt_config *cc, struct dm_target *ti, ··· 771 777 struct iv_elephant_private *elephant = &cc->iv_gen_private.elephant; 772 778 int r; 773 779 774 - elephant->tfm = crypto_alloc_skcipher("ecb(aes)", 0, 775 - CRYPTO_ALG_ALLOCATES_MEMORY); 776 - if (IS_ERR(elephant->tfm)) { 777 - r = PTR_ERR(elephant->tfm); 778 - elephant->tfm = NULL; 779 - return r; 780 - } 780 + elephant->key = kmalloc_obj(*elephant->key); 781 + if (!elephant->key) 782 + return -ENOMEM; 781 783 782 784 r = crypt_iv_eboiv_ctr(cc, ti, NULL); 783 785 if (r) ··· 925 935 } 926 936 } 927 937 928 - static int crypt_iv_elephant(struct crypt_config *cc, struct dm_crypt_request *dmreq) 938 + static void crypt_iv_elephant(struct crypt_config *cc, 939 + struct dm_crypt_request *dmreq) 929 940 { 930 941 struct iv_elephant_private *elephant = &cc->iv_gen_private.elephant; 931 - u8 *es, *ks, *data, *data2, *data_offset; 932 - struct skcipher_request *req; 933 - struct scatterlist *sg, *sg2, src, dst; 934 - DECLARE_CRYPTO_WAIT(wait); 935 - int i, r; 942 + u8 *data, *data2, *data_offset; 943 + struct scatterlist *sg, *sg2; 944 + union { 945 + __le64 w[2]; 946 + u8 b[16]; 947 + } es; 948 + u8 ks[32] __aligned(__alignof(long)); /* Elephant sector key */ 949 + int i; 936 950 937 - req = skcipher_request_alloc(elephant->tfm, GFP_NOIO); 938 - es = kzalloc(16, GFP_NOIO); /* Key for AES */ 939 - ks = kzalloc(32, GFP_NOIO); /* Elephant sector key */ 940 - 941 - if (!req || !es || !ks) { 942 - r = -ENOMEM; 943 - goto out; 944 - } 945 - 946 - *(__le64 *)es = cpu_to_le64(dmreq->iv_sector * cc->sector_size); 951 + es.w[0] = cpu_to_le64(dmreq->iv_sector * cc->sector_size); 952 + es.w[1] = 0; 947 953 948 954 /* E(Ks, e(s)) */ 949 - sg_init_one(&src, es, 16); 950 - sg_init_one(&dst, ks, 16); 951 - skcipher_request_set_crypt(req, &src, &dst, 16, NULL); 952 - skcipher_request_set_callback(req, 0, crypto_req_done, &wait); 953 - r = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); 954 - if (r) 955 - goto out; 955 + aes_encrypt(elephant->key, &ks[0], es.b); 956 956 957 957 /* E(Ks, e'(s)) */ 958 - es[15] = 0x80; 959 - sg_init_one(&dst, &ks[16], 16); 960 - r = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); 961 - if (r) 962 - goto out; 958 + es.b[15] = 0x80; 959 + aes_encrypt(elephant->key, &ks[16], es.b); 963 960 964 961 sg = crypt_get_sg_data(cc, dmreq->sg_out); 965 962 data = kmap_local_page(sg_page(sg)); ··· 978 1001 } 979 1002 980 1003 kunmap_local(data); 981 - out: 982 - kfree_sensitive(ks); 983 - kfree_sensitive(es); 984 - skcipher_request_free(req); 985 - return r; 1004 + memzero_explicit(ks, sizeof(ks)); 1005 + memzero_explicit(&es, sizeof(es)); 986 1006 } 987 1007 988 1008 static int crypt_iv_elephant_gen(struct crypt_config *cc, u8 *iv, 989 1009 struct dm_crypt_request *dmreq) 990 1010 { 991 - int r; 992 - 993 - if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) { 994 - r = crypt_iv_elephant(cc, dmreq); 995 - if (r) 996 - return r; 997 - } 1011 + if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) 1012 + crypt_iv_elephant(cc, dmreq); 998 1013 999 1014 return crypt_iv_eboiv_gen(cc, iv, dmreq); 1000 1015 } 1001 1016 1002 - static int crypt_iv_elephant_post(struct crypt_config *cc, u8 *iv, 1003 - struct dm_crypt_request *dmreq) 1017 + static void crypt_iv_elephant_post(struct crypt_config *cc, u8 *iv, 1018 + struct dm_crypt_request *dmreq) 1004 1019 { 1005 1020 if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) 1006 - return crypt_iv_elephant(cc, dmreq); 1007 - 1008 - return 0; 1021 + crypt_iv_elephant(cc, dmreq); 1009 1022 } 1010 1023 1011 1024 static int crypt_iv_elephant_init(struct crypt_config *cc) ··· 1003 1036 struct iv_elephant_private *elephant = &cc->iv_gen_private.elephant; 1004 1037 int key_offset = cc->key_size - cc->key_extra_size; 1005 1038 1006 - return crypto_skcipher_setkey(elephant->tfm, &cc->key[key_offset], cc->key_extra_size); 1039 + return aes_prepareenckey(elephant->key, &cc->key[key_offset], cc->key_extra_size); 1007 1040 } 1008 1041 1009 - static int crypt_iv_elephant_wipe(struct crypt_config *cc) 1042 + static void crypt_iv_elephant_wipe(struct crypt_config *cc) 1010 1043 { 1011 1044 struct iv_elephant_private *elephant = &cc->iv_gen_private.elephant; 1012 - u8 key[ELEPHANT_MAX_KEY_SIZE]; 1013 1045 1014 - memset(key, 0, cc->key_extra_size); 1015 - return crypto_skcipher_setkey(elephant->tfm, key, cc->key_extra_size); 1046 + memzero_explicit(elephant->key, sizeof(*elephant->key)); 1016 1047 } 1017 1048 1018 1049 static const struct crypt_iv_operations crypt_iv_plain_ops = { ··· 1341 1376 } 1342 1377 1343 1378 if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post) 1344 - r = cc->iv_gen_ops->post(cc, org_iv, dmreq); 1379 + cc->iv_gen_ops->post(cc, org_iv, dmreq); 1345 1380 1346 1381 bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size); 1347 1382 bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size); ··· 1418 1453 r = crypto_skcipher_decrypt(req); 1419 1454 1420 1455 if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post) 1421 - r = cc->iv_gen_ops->post(cc, org_iv, dmreq); 1456 + cc->iv_gen_ops->post(cc, org_iv, dmreq); 1422 1457 1423 1458 bio_advance_iter(ctx->bio_in, &ctx->iter_in, cc->sector_size); 1424 1459 bio_advance_iter(ctx->bio_out, &ctx->iter_out, cc->sector_size); ··· 2182 2217 } 2183 2218 2184 2219 if (!error && cc->iv_gen_ops && cc->iv_gen_ops->post) 2185 - error = cc->iv_gen_ops->post(cc, org_iv_of_dmreq(cc, dmreq), dmreq); 2220 + cc->iv_gen_ops->post(cc, org_iv_of_dmreq(cc, dmreq), dmreq); 2186 2221 2187 2222 if (error == -EBADMSG) { 2188 2223 sector_t s = le64_to_cpu(*org_sector_of_dmreq(cc, dmreq)); ··· 2638 2673 get_random_bytes(&cc->key, cc->key_size); 2639 2674 2640 2675 /* Wipe IV private keys */ 2641 - if (cc->iv_gen_ops && cc->iv_gen_ops->wipe) { 2642 - r = cc->iv_gen_ops->wipe(cc); 2643 - if (r) 2644 - return r; 2645 - } 2676 + if (cc->iv_gen_ops && cc->iv_gen_ops->wipe) 2677 + cc->iv_gen_ops->wipe(cc); 2646 2678 2647 2679 kfree_sensitive(cc->key_string); 2648 2680 cc->key_string = NULL; ··· 3679 3717 { 3680 3718 struct crypt_config *cc = ti->private; 3681 3719 3682 - limits->logical_block_size = 3683 - max_t(unsigned int, limits->logical_block_size, cc->sector_size); 3684 - limits->physical_block_size = 3685 - max_t(unsigned int, limits->physical_block_size, cc->sector_size); 3686 - limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size); 3720 + dm_stack_bs_limits(limits, cc->sector_size); 3687 3721 limits->dma_alignment = limits->logical_block_size - 1; 3688 3722 3689 3723 /*

+10 -44

drivers/md/dm-ima.c

··· 12 12 13 13 #include <linux/ima.h> 14 14 #include <linux/sched/mm.h> 15 - #include <crypto/hash.h> 16 - #include <linux/crypto.h> 17 - #include <crypto/hash_info.h> 15 + #include <crypto/sha2.h> 18 16 19 17 #define DM_MSG_PREFIX "ima" 20 18 ··· 176 178 size_t device_data_buf_len, target_metadata_buf_len, target_data_buf_len, l = 0; 177 179 char *target_metadata_buf = NULL, *target_data_buf = NULL, *digest_buf = NULL; 178 180 char *ima_buf = NULL, *device_data_buf = NULL; 179 - int digest_size, last_target_measured = -1, r; 181 + int last_target_measured = -1; 180 182 status_type_t type = STATUSTYPE_IMA; 181 183 size_t cur_total_buf_len = 0; 182 184 unsigned int num_targets, i; 183 - SHASH_DESC_ON_STACK(shash, NULL); 184 - struct crypto_shash *tfm = NULL; 185 - u8 *digest = NULL; 185 + struct sha256_ctx hash_ctx; 186 + u8 digest[SHA256_DIGEST_SIZE]; 186 187 bool noio = false; 187 - /* 188 - * In below hash_alg_prefix_len assignment +1 is for the additional char (':'), 189 - * when prefixing the hash value with the hash algorithm name. e.g. sha256:<hash_value>. 190 - */ 191 - const size_t hash_alg_prefix_len = strlen(DM_IMA_TABLE_HASH_ALG) + 1; 192 188 char table_load_event_name[] = "dm_table_load"; 193 189 194 190 ima_buf = dm_ima_alloc(DM_IMA_MEASUREMENT_BUF_LEN, noio); ··· 202 210 if (dm_ima_alloc_and_copy_device_data(table->md, &device_data_buf, num_targets, noio)) 203 211 goto error; 204 212 205 - tfm = crypto_alloc_shash(DM_IMA_TABLE_HASH_ALG, 0, 0); 206 - if (IS_ERR(tfm)) 207 - goto error; 208 - 209 - shash->tfm = tfm; 210 - digest_size = crypto_shash_digestsize(tfm); 211 - digest = dm_ima_alloc(digest_size, noio); 212 - if (!digest) 213 - goto error; 214 - 215 - r = crypto_shash_init(shash); 216 - if (r) 217 - goto error; 213 + sha256_init(&hash_ctx); 218 214 219 215 memcpy(ima_buf + l, DM_IMA_VERSION_STR, table->md->ima.dm_version_str_len); 220 216 l += table->md->ima.dm_version_str_len; ··· 250 270 */ 251 271 if (unlikely(cur_total_buf_len >= DM_IMA_MEASUREMENT_BUF_LEN)) { 252 272 dm_ima_measure_data(table_load_event_name, ima_buf, l, noio); 253 - r = crypto_shash_update(shash, (const u8 *)ima_buf, l); 254 - if (r < 0) 255 - goto error; 273 + sha256_update(&hash_ctx, (const u8 *)ima_buf, l); 256 274 257 275 memset(ima_buf, 0, DM_IMA_MEASUREMENT_BUF_LEN); 258 276 l = 0; ··· 289 311 if (!last_target_measured) { 290 312 dm_ima_measure_data(table_load_event_name, ima_buf, l, noio); 291 313 292 - r = crypto_shash_update(shash, (const u8 *)ima_buf, l); 293 - if (r < 0) 294 - goto error; 314 + sha256_update(&hash_ctx, (const u8 *)ima_buf, l); 295 315 } 296 316 297 317 /* ··· 297 321 * so that the table data can be verified against the future device state change 298 322 * events, e.g. resume, rename, remove, table-clear etc. 299 323 */ 300 - r = crypto_shash_final(shash, digest); 301 - if (r < 0) 302 - goto error; 324 + sha256_final(&hash_ctx, digest); 303 325 304 - digest_buf = dm_ima_alloc((digest_size*2) + hash_alg_prefix_len + 1, noio); 305 - 326 + digest_buf = kasprintf(GFP_KERNEL, "sha256:%*phN", SHA256_DIGEST_SIZE, 327 + digest); 306 328 if (!digest_buf) 307 329 goto error; 308 - 309 - snprintf(digest_buf, hash_alg_prefix_len + 1, "%s:", DM_IMA_TABLE_HASH_ALG); 310 - 311 - for (i = 0; i < digest_size; i++) 312 - snprintf((digest_buf + hash_alg_prefix_len + (i*2)), 3, "%02x", digest[i]); 313 330 314 331 if (table->md->ima.active_table.hash != table->md->ima.inactive_table.hash) 315 332 kfree(table->md->ima.inactive_table.hash); ··· 323 354 kfree(digest_buf); 324 355 kfree(device_data_buf); 325 356 exit: 326 - kfree(digest); 327 - if (tfm) 328 - crypto_free_shash(tfm); 329 357 kfree(ima_buf); 330 358 kfree(target_metadata_buf); 331 359 kfree(target_data_buf);

-1

drivers/md/dm-ima.h

··· 15 15 #define DM_IMA_TARGET_METADATA_BUF_LEN 128 16 16 #define DM_IMA_TARGET_DATA_BUF_LEN 2048 17 17 #define DM_IMA_DEVICE_CAPACITY_BUF_LEN 128 18 - #define DM_IMA_TABLE_HASH_ALG "sha256" 19 18 20 19 #define __dm_ima_stringify(s) #s 21 20 #define __dm_ima_str(s) __dm_ima_stringify(s)

+3 -1

drivers/md/dm-init.c

··· 303 303 } 304 304 } 305 305 306 - if (waitfor[0]) 306 + if (waitfor[0]) { 307 + wait_for_device_probe(); 307 308 DMINFO("all devices available"); 309 + } 308 310 309 311 list_for_each_entry(dev, &devices, list) { 310 312 if (dm_early_create(&dev->dmi, dev->table,

+3 -7

drivers/md/dm-integrity.c

··· 4046 4046 { 4047 4047 struct dm_integrity_c *ic = ti->private; 4048 4048 4049 - if (ic->sectors_per_block > 1) { 4050 - limits->logical_block_size = ic->sectors_per_block << SECTOR_SHIFT; 4051 - limits->physical_block_size = ic->sectors_per_block << SECTOR_SHIFT; 4052 - limits->io_min = ic->sectors_per_block << SECTOR_SHIFT; 4053 - limits->dma_alignment = limits->logical_block_size - 1; 4054 - limits->discard_granularity = ic->sectors_per_block << SECTOR_SHIFT; 4055 - } 4049 + dm_stack_bs_limits(limits, ic->sectors_per_block << SECTOR_SHIFT); 4050 + limits->dma_alignment = limits->logical_block_size - 1; 4051 + limits->discard_granularity = ic->sectors_per_block << SECTOR_SHIFT; 4056 4052 4057 4053 if (!ic->internal_hash) { 4058 4054 struct blk_integrity *bi = &limits->integrity;

+29 -10

drivers/md/dm-ioctl.c

··· 64 64 static struct rb_root name_rb_tree = RB_ROOT; 65 65 static struct rb_root uuid_rb_tree = RB_ROOT; 66 66 67 - static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool only_deferred); 67 + #define DM_REMOVE_KEEP_OPEN_DEVICES 1 68 + #define DM_REMOVE_MARK_DEFERRED 2 69 + #define DM_REMOVE_ONLY_DEFERRED 4 70 + #define DM_REMOVE_INTERRUPTIBLE 8 71 + static int dm_hash_remove_all(unsigned flags); 68 72 69 73 /* 70 74 * Guards access to both hash tables. ··· 82 78 83 79 static void dm_hash_exit(void) 84 80 { 85 - dm_hash_remove_all(false, false, false); 81 + dm_hash_remove_all(0); 86 82 } 87 83 88 84 /* ··· 337 333 return table; 338 334 } 339 335 340 - static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool only_deferred) 336 + static int dm_hash_remove_all(unsigned flags) 341 337 { 342 338 int dev_skipped; 343 339 struct rb_node *n; ··· 351 347 down_write(&_hash_lock); 352 348 353 349 for (n = rb_first(&name_rb_tree); n; n = rb_next(n)) { 350 + if (flags & DM_REMOVE_INTERRUPTIBLE && fatal_signal_pending(current)) { 351 + up_write(&_hash_lock); 352 + return -EINTR; 353 + } 354 + 354 355 hc = container_of(n, struct hash_cell, name_node); 355 356 md = hc->md; 356 357 dm_get(md); 357 358 358 - if (keep_open_devices && 359 - dm_lock_for_deletion(md, mark_deferred, only_deferred)) { 359 + if (flags & DM_REMOVE_KEEP_OPEN_DEVICES && 360 + dm_lock_for_deletion(md, !!(flags & DM_REMOVE_MARK_DEFERRED), !!(flags & DM_REMOVE_ONLY_DEFERRED))) { 360 361 dm_put(md); 361 362 dev_skipped++; 362 363 continue; ··· 377 368 } 378 369 dm_ima_measure_on_device_remove(md, true); 379 370 dm_put(md); 380 - if (likely(keep_open_devices)) 371 + if (likely(flags & DM_REMOVE_KEEP_OPEN_DEVICES)) 381 372 dm_destroy(md); 382 373 else 383 374 dm_destroy_immediate(md); ··· 393 384 394 385 up_write(&_hash_lock); 395 386 396 - if (dev_skipped) 387 + if (dev_skipped && !(flags & DM_REMOVE_ONLY_DEFERRED)) 397 388 DMWARN("remove_all left %d open device(s)", dev_skipped); 389 + 390 + return 0; 398 391 } 399 392 400 393 /* ··· 524 513 525 514 void dm_deferred_remove(void) 526 515 { 527 - dm_hash_remove_all(true, false, true); 516 + dm_hash_remove_all(DM_REMOVE_KEEP_OPEN_DEVICES | DM_REMOVE_ONLY_DEFERRED); 528 517 } 529 518 530 519 /* ··· 540 529 541 530 static int remove_all(struct file *filp, struct dm_ioctl *param, size_t param_size) 542 531 { 543 - dm_hash_remove_all(true, !!(param->flags & DM_DEFERRED_REMOVE), false); 532 + int r; 533 + int flags = DM_REMOVE_KEEP_OPEN_DEVICES | DM_REMOVE_INTERRUPTIBLE; 534 + if (param->flags & DM_DEFERRED_REMOVE) 535 + flags |= DM_REMOVE_MARK_DEFERRED; 536 + r = dm_hash_remove_all(flags); 544 537 param->data_size = 0; 545 - return 0; 538 + return r; 546 539 } 547 540 548 541 /* ··· 1356 1341 used = param->data_start + (outptr - outbuf); 1357 1342 1358 1343 outptr = align_ptr(outptr); 1344 + if (!outptr || outptr > outbuf + len) { 1345 + param->flags |= DM_BUFFER_FULL_FLAG; 1346 + break; 1347 + } 1359 1348 spec->next = outptr - outbuf; 1360 1349 } 1361 1350

+5 -1

drivers/md/dm-log.c

··· 373 373 374 374 struct log_c *lc; 375 375 uint32_t region_size; 376 - unsigned int region_count; 376 + sector_t region_count; 377 377 size_t bitset_size, buf_size; 378 378 int r; 379 379 char dummy; ··· 401 401 } 402 402 403 403 region_count = dm_sector_div_up(ti->len, region_size); 404 + if (region_count > UINT_MAX) { 405 + DMWARN("region count exceeds limit of %u", UINT_MAX); 406 + return -EINVAL; 407 + } 404 408 405 409 lc = kmalloc_obj(*lc); 406 410 if (!lc) {

+2 -7

drivers/md/dm-mpath.c

··· 102 102 struct bio_list queued_bios; 103 103 104 104 struct timer_list nopath_timer; /* Timeout for queue_if_no_path */ 105 - bool is_suspending; 106 105 }; 107 106 108 107 /* ··· 1748 1749 { 1749 1750 struct multipath *m = ti->private; 1750 1751 1751 - spin_lock_irq(&m->lock); 1752 - m->is_suspending = true; 1753 - spin_unlock_irq(&m->lock); 1754 1752 /* FIXME: bio-based shouldn't need to always disable queue_if_no_path */ 1755 1753 if (m->queue_mode == DM_TYPE_BIO_BASED || !dm_noflush_suspending(m->ti)) 1756 1754 queue_if_no_path(m, false, true, __func__); ··· 1770 1774 struct multipath *m = ti->private; 1771 1775 1772 1776 spin_lock_irq(&m->lock); 1773 - m->is_suspending = false; 1774 1777 if (test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) { 1775 1778 set_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags); 1776 1779 clear_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags); ··· 2093 2098 if (m->current_pg == m->last_probed_pg) 2094 2099 goto skip_probe; 2095 2100 } 2096 - if (!m->current_pg || m->is_suspending || 2101 + if (!m->current_pg || dm_suspended(m->ti) || 2097 2102 test_bit(MPATHF_QUEUE_IO, &m->flags)) 2098 2103 goto skip_probe; 2099 2104 set_bit(MPATHF_DELAY_PG_SWITCH, &m->flags); ··· 2102 2107 2103 2108 list_for_each_entry(pgpath, &pg->pgpaths, list) { 2104 2109 if (pg != READ_ONCE(m->current_pg) || 2105 - READ_ONCE(m->is_suspending)) 2110 + dm_suspended(m->ti)) 2106 2111 goto out; 2107 2112 if (!pgpath->is_active) 2108 2113 continue;

+3 -3

drivers/md/dm-raid1.c

··· 993 993 return NULL; 994 994 } 995 995 996 - *args_used = 2 + param_count; 997 - 998 - if (argc < *args_used) { 996 + if (param_count > argc - 2) { 999 997 ti->error = "Insufficient mirror log arguments"; 1000 998 return NULL; 1001 999 } 1000 + 1001 + *args_used = 2 + param_count; 1002 1002 1003 1003 dl = dm_dirty_log_create(argv[0], ti, mirror_flush, param_count, 1004 1004 argv + 2);

+1 -1

drivers/md/dm-vdo/action-manager.c

··· 107 107 struct action_manager **manager_ptr) 108 108 { 109 109 struct action_manager *manager; 110 - int result = vdo_allocate(1, struct action_manager, __func__, &manager); 110 + int result = vdo_allocate(1, __func__, &manager); 111 111 112 112 if (result != VDO_SUCCESS) 113 113 return result;

+10 -22

drivers/md/dm-vdo/block-map.c

··· 221 221 u64 size = cache->page_count * (u64) VDO_BLOCK_SIZE; 222 222 int result; 223 223 224 - result = vdo_allocate(cache->page_count, struct page_info, "page infos", 225 - &cache->infos); 224 + result = vdo_allocate(cache->page_count, "page infos", &cache->infos); 226 225 if (result != VDO_SUCCESS) 227 226 return result; 228 227 ··· 2363 2364 2364 2365 forest->segments = index + 1; 2365 2366 2366 - result = vdo_allocate(forest->segments, struct boundary, 2367 - "forest boundary array", &forest->boundaries); 2367 + result = vdo_allocate(forest->segments, "forest boundary array", &forest->boundaries); 2368 2368 if (result != VDO_SUCCESS) 2369 2369 return result; 2370 2370 2371 - result = vdo_allocate(forest->segments, struct tree_page *, 2372 - "forest page pointers", &forest->pages); 2371 + result = vdo_allocate(forest->segments, "forest page pointers", &forest->pages); 2373 2372 if (result != VDO_SUCCESS) 2374 2373 return result; 2375 2374 2376 - result = vdo_allocate(new_pages, struct tree_page, 2377 - "new forest pages", &forest->pages[index]); 2375 + result = vdo_allocate(new_pages, "new forest pages", &forest->pages[index]); 2378 2376 if (result != VDO_SUCCESS) 2379 2377 return result; 2380 2378 ··· 2396 2400 struct block_map_tree *tree = &(forest->trees[root]); 2397 2401 height_t height; 2398 2402 2399 - int result = vdo_allocate(forest->segments, 2400 - struct block_map_tree_segment, 2401 - "tree root segments", &tree->segments); 2403 + result = vdo_allocate(forest->segments, "tree root segments", &tree->segments); 2402 2404 if (result != VDO_SUCCESS) 2403 2405 return result; 2404 2406 ··· 2472 2478 return VDO_SUCCESS; 2473 2479 } 2474 2480 2475 - result = vdo_allocate_extended(struct forest, map->root_count, 2476 - struct block_map_tree, __func__, 2477 - &forest); 2481 + result = vdo_allocate_extended(map->root_count, trees, __func__, &forest); 2478 2482 if (result != VDO_SUCCESS) 2479 2483 return result; 2480 2484 ··· 2699 2707 struct cursors *cursors; 2700 2708 int result; 2701 2709 2702 - result = vdo_allocate_extended(struct cursors, map->root_count, 2703 - struct cursor, __func__, &cursors); 2710 + result = vdo_allocate_extended(map->root_count, cursors, __func__, &cursors); 2704 2711 if (result != VDO_SUCCESS) { 2705 2712 vdo_fail_completion(completion, result); 2706 2713 return; ··· 2749 2758 zone->thread_id = vdo->thread_config.logical_threads[zone_number]; 2750 2759 zone->block_map = map; 2751 2760 2752 - result = vdo_allocate_extended(struct dirty_lists, maximum_age, 2753 - dirty_era_t, __func__, 2754 - &zone->dirty_lists); 2761 + result = vdo_allocate_extended(maximum_age, eras, __func__, &zone->dirty_lists); 2755 2762 if (result != VDO_SUCCESS) 2756 2763 return result; 2757 2764 ··· 2889 2900 if (result != VDO_SUCCESS) 2890 2901 return result; 2891 2902 2892 - result = vdo_allocate_extended(struct block_map, 2893 - vdo->thread_config.logical_zone_count, 2894 - struct block_map_zone, __func__, &map); 2903 + result = vdo_allocate_extended(vdo->thread_config.logical_zone_count, 2904 + zones, __func__, &map); 2895 2905 if (result != VDO_SUCCESS) 2896 2906 return result; 2897 2907

+1 -1

drivers/md/dm-vdo/block-map.h

··· 276 276 block_count_t next_entry_count; 277 277 278 278 zone_count_t zone_count; 279 - struct block_map_zone zones[]; 279 + struct block_map_zone zones[] __counted_by(zone_count); 280 280 }; 281 281 282 282 /**

+11 -2

drivers/md/dm-vdo/constants.h

··· 44 44 /* The default size of each slab journal, in blocks */ 45 45 DEFAULT_VDO_SLAB_JOURNAL_SIZE = 224, 46 46 47 + /* The recovery journal starting sequence number set at format time */ 48 + RECOVERY_JOURNAL_STARTING_SEQUENCE_NUMBER = 1, 49 + 47 50 /* 48 51 * The initial size of lbn_operations and pbn_operations, which is based upon the expected 49 52 * maximum number of outstanding VIOs. This value was chosen to make it highly unlikely ··· 60 57 /* The maximum number of physical zones */ 61 58 MAX_VDO_PHYSICAL_ZONES = 16, 62 59 63 - /* The base-2 logarithm of the maximum blocks in one slab */ 64 - MAX_VDO_SLAB_BITS = 23, 60 + /* The default blocks in one slab */ 61 + DEFAULT_VDO_SLAB_BLOCKS = 1U << 19, 62 + 63 + /* The minimum blocks in one slab */ 64 + MIN_VDO_SLAB_BLOCKS = 1U << 13, 65 + 66 + /* The maximum blocks in one slab */ 67 + MAX_VDO_SLAB_BLOCKS = 1U << 23, 65 68 66 69 /* The maximum number of slabs the slab depot supports */ 67 70 MAX_VDO_SLABS = 8192,

+1 -2

drivers/md/dm-vdo/data-vio.c

··· 842 842 struct data_vio_pool *pool; 843 843 data_vio_count_t i; 844 844 845 - result = vdo_allocate_extended(struct data_vio_pool, pool_size, struct data_vio, 846 - __func__, &pool); 845 + result = vdo_allocate_extended(pool_size, data_vios, __func__, &pool); 847 846 if (result != VDO_SUCCESS) 848 847 return result; 849 848

+3 -5

drivers/md/dm-vdo/dedupe.c

··· 296 296 /* The number of zones */ 297 297 zone_count_t zone_count; 298 298 /* The hash zones themselves */ 299 - struct hash_zone zones[]; 299 + struct hash_zone zones[] __counted_by(zone_count); 300 300 }; 301 301 302 302 /* These are in milliseconds. */ ··· 2364 2364 vdo_set_completion_callback(&zone->completion, timeout_index_operations_callback, 2365 2365 zone->thread_id); 2366 2366 INIT_LIST_HEAD(&zone->lock_pool); 2367 - result = vdo_allocate(LOCK_POOL_CAPACITY, struct hash_lock, "hash_lock array", 2368 - &zone->lock_array); 2367 + result = vdo_allocate(LOCK_POOL_CAPACITY, "hash_lock array", &zone->lock_array); 2369 2368 if (result != VDO_SUCCESS) 2370 2369 return result; 2371 2370 ··· 2417 2418 if (zone_count == 0) 2418 2419 return VDO_SUCCESS; 2419 2420 2420 - result = vdo_allocate_extended(struct hash_zones, zone_count, struct hash_zone, 2421 - __func__, &zones); 2421 + result = vdo_allocate_extended(zone_count, zones, __func__, &zones); 2422 2422 if (result != VDO_SUCCESS) 2423 2423 return result; 2424 2424

+138 -13

drivers/md/dm-vdo/dm-vdo-target.c

··· 9 9 #include <linux/delay.h> 10 10 #include <linux/device-mapper.h> 11 11 #include <linux/err.h> 12 + #include <linux/log2.h> 12 13 #include <linux/module.h> 13 14 #include <linux/mutex.h> 14 15 #include <linux/spinlock.h> ··· 61 60 LOAD_PHASE_DRAIN_JOURNAL, 62 61 LOAD_PHASE_WAIT_FOR_READ_ONLY, 63 62 PRE_LOAD_PHASE_START, 63 + PRE_LOAD_PHASE_FORMAT_START, 64 + PRE_LOAD_PHASE_FORMAT_SUPER, 65 + PRE_LOAD_PHASE_FORMAT_GEOMETRY, 66 + PRE_LOAD_PHASE_FORMAT_END, 67 + PRE_LOAD_PHASE_LOAD_SUPER, 64 68 PRE_LOAD_PHASE_LOAD_COMPONENTS, 65 69 PRE_LOAD_PHASE_END, 66 70 PREPARE_GROW_PHYSICAL_PHASE_START, ··· 115 109 "LOAD_PHASE_DRAIN_JOURNAL", 116 110 "LOAD_PHASE_WAIT_FOR_READ_ONLY", 117 111 "PRE_LOAD_PHASE_START", 112 + "PRE_LOAD_PHASE_FORMAT_START", 113 + "PRE_LOAD_PHASE_FORMAT_SUPER", 114 + "PRE_LOAD_PHASE_FORMAT_GEOMETRY", 115 + "PRE_LOAD_PHASE_FORMAT_END", 116 + "PRE_LOAD_PHASE_LOAD_SUPER", 118 117 "PRE_LOAD_PHASE_LOAD_COMPONENTS", 119 118 "PRE_LOAD_PHASE_END", 120 119 "PREPARE_GROW_PHYSICAL_PHASE_START", ··· 284 273 substring_count++; 285 274 } 286 275 287 - result = vdo_allocate(substring_count + 1, char *, "string-splitting array", 288 - &substrings); 276 + result = vdo_allocate(substring_count + 1, "string-splitting array", &substrings); 289 277 if (result != VDO_SUCCESS) 290 278 return result; 291 279 ··· 292 282 if (*s == separator) { 293 283 ptrdiff_t length = s - string; 294 284 295 - result = vdo_allocate(length + 1, char, "split string", 285 + result = vdo_allocate(length + 1, "split string", 296 286 &substrings[current_substring]); 297 287 if (result != VDO_SUCCESS) { 298 288 free_string_array(substrings); ··· 313 303 BUG_ON(current_substring != (substring_count - 1)); 314 304 length = strlen(string); 315 305 316 - result = vdo_allocate(length + 1, char, "split string", 317 - &substrings[current_substring]); 306 + result = vdo_allocate(length + 1, "split string", &substrings[current_substring]); 318 307 if (result != VDO_SUCCESS) { 319 308 free_string_array(substrings); 320 309 return result; ··· 341 332 for (i = 0; (i < array_length) && (substring_array[i] != NULL); i++) 342 333 string_length += strlen(substring_array[i]) + 1; 343 334 344 - result = vdo_allocate(string_length, char, __func__, &output); 335 + result = vdo_allocate(string_length, __func__, &output); 345 336 if (result != VDO_SUCCESS) 346 337 return result; 347 338 ··· 385 376 return VDO_BAD_CONFIGURATION; 386 377 387 378 *bool_ptr = value; 379 + return VDO_SUCCESS; 380 + } 381 + 382 + /** 383 + * parse_memory() - Parse a string into an index memory value. 384 + * @memory_str: The string value to convert to a memory value. 385 + * @memory_ptr: A pointer to return the memory value in. 386 + * 387 + * Return: VDO_SUCCESS or an error 388 + */ 389 + static int __must_check parse_memory(const char *memory_str, 390 + uds_memory_config_size_t *memory_ptr) 391 + { 392 + uds_memory_config_size_t memory; 393 + 394 + if (strcmp(memory_str, "0.25") == 0) { 395 + memory = UDS_MEMORY_CONFIG_256MB; 396 + } else if ((strcmp(memory_str, "0.5") == 0) || (strcmp(memory_str, "0.50") == 0)) { 397 + memory = UDS_MEMORY_CONFIG_512MB; 398 + } else if (strcmp(memory_str, "0.75") == 0) { 399 + memory = UDS_MEMORY_CONFIG_768MB; 400 + } else { 401 + unsigned int value; 402 + int result; 403 + 404 + result = kstrtouint(memory_str, 10, &value); 405 + if (result) { 406 + vdo_log_error("optional parameter error: invalid memory size, must be a positive integer"); 407 + return -EINVAL; 408 + } 409 + 410 + if (value > UDS_MEMORY_CONFIG_MAX) { 411 + vdo_log_error("optional parameter error: invalid memory size, must not be greater than %d", 412 + UDS_MEMORY_CONFIG_MAX); 413 + return -EINVAL; 414 + } 415 + 416 + memory = value; 417 + } 418 + 419 + *memory_ptr = memory; 420 + return VDO_SUCCESS; 421 + } 422 + 423 + /** 424 + * parse_slab_size() - Parse a string option into a slab size value. 425 + * @slab_str: The string value representing slab size. 426 + * @slab_size_ptr: A pointer to return the slab size in. 427 + * 428 + * Return: VDO_SUCCESS or an error 429 + */ 430 + static int __must_check parse_slab_size(const char *slab_str, block_count_t *slab_size_ptr) 431 + { 432 + block_count_t value; 433 + int result; 434 + 435 + result = kstrtoull(slab_str, 10, &value); 436 + if (result) { 437 + vdo_log_error("optional parameter error: invalid slab size, must be a postive integer"); 438 + return -EINVAL; 439 + } 440 + 441 + if (value < MIN_VDO_SLAB_BLOCKS || value > MAX_VDO_SLAB_BLOCKS || (!is_power_of_2(value))) { 442 + vdo_log_error("optional parameter error: invalid slab size, must be a power of two between %u and %u", 443 + MIN_VDO_SLAB_BLOCKS, MAX_VDO_SLAB_BLOCKS); 444 + return -EINVAL; 445 + } 446 + 447 + *slab_size_ptr = value; 388 448 return VDO_SUCCESS; 389 449 } 390 450 ··· 646 568 } 647 569 /* Max discard sectors in blkdev_issue_discard is UINT_MAX >> 9 */ 648 570 if (value > (UINT_MAX / VDO_BLOCK_SIZE)) { 649 - vdo_log_error("optional parameter error: at most %d max discard blocks are allowed", 571 + vdo_log_error("optional parameter error: at most %d max discard blocks are allowed", 650 572 UINT_MAX / VDO_BLOCK_SIZE); 651 573 return -EINVAL; 652 574 } ··· 678 600 if (strcmp(key, "compression") == 0) 679 601 return parse_bool(value, "on", "off", &config->compression); 680 602 681 - /* The remaining arguments must have integral values. */ 603 + if (strcmp(key, "indexSparse") == 0) 604 + return parse_bool(value, "on", "off", &config->index_sparse); 605 + 606 + if (strcmp(key, "indexMemory") == 0) 607 + return parse_memory(value, &config->index_memory); 608 + 609 + if (strcmp(key, "slabSize") == 0) 610 + return parse_slab_size(value, &config->slab_blocks); 611 + 612 + /* The remaining arguments must have non-negative integral values. */ 682 613 result = kstrtouint(value, 10, &count); 683 614 if (result) { 684 615 vdo_log_error("optional config string error: integer value needed, found \"%s\"", ··· 802 715 struct device_config *config = NULL; 803 716 int result; 804 717 718 + if (logical_bytes > (MAXIMUM_VDO_LOGICAL_BLOCKS * VDO_BLOCK_SIZE)) { 719 + handle_parse_error(config, error_ptr, 720 + "Logical size exceeds the maximum"); 721 + return VDO_BAD_CONFIGURATION; 722 + } 723 + 805 724 if ((logical_bytes % VDO_BLOCK_SIZE) != 0) { 806 725 handle_parse_error(config, error_ptr, 807 726 "Logical size must be a multiple of 4096"); ··· 819 726 return VDO_BAD_CONFIGURATION; 820 727 } 821 728 822 - result = vdo_allocate(1, struct device_config, "device_config", &config); 729 + result = vdo_allocate(1, "device_config", &config); 823 730 if (result != VDO_SUCCESS) { 824 731 handle_parse_error(config, error_ptr, 825 732 "Could not allocate config structure"); ··· 851 758 config->max_discard_blocks = 1; 852 759 config->deduplication = true; 853 760 config->compression = false; 761 + config->index_memory = UDS_MEMORY_CONFIG_256MB; 762 + config->index_sparse = false; 763 + config->slab_blocks = DEFAULT_VDO_SLAB_BLOCKS; 854 764 855 765 arg_set.argc = argc; 856 766 arg_set.argv = argv; ··· 879 783 /* Get the physical blocks, if known. */ 880 784 if (config->version >= 1) { 881 785 result = kstrtoull(dm_shift_arg(&arg_set), 10, &config->physical_blocks); 882 - if (result != VDO_SUCCESS) { 786 + if (result) { 883 787 handle_parse_error(config, error_ptr, 884 788 "Invalid physical block count"); 885 789 return VDO_BAD_CONFIGURATION; ··· 900 804 901 805 /* Get the page cache size. */ 902 806 result = kstrtouint(dm_shift_arg(&arg_set), 10, &config->cache_size); 903 - if (result != VDO_SUCCESS) { 807 + if (result) { 904 808 handle_parse_error(config, error_ptr, 905 809 "Invalid block map page cache size"); 906 810 return VDO_BAD_CONFIGURATION; ··· 908 812 909 813 /* Get the block map era length. */ 910 814 result = kstrtouint(dm_shift_arg(&arg_set), 10, &config->block_map_maximum_age); 911 - if (result != VDO_SUCCESS) { 815 + if (result) { 912 816 handle_parse_error(config, error_ptr, "Invalid block map maximum age"); 913 817 return VDO_BAD_CONFIGURATION; 914 818 } ··· 1497 1401 vdo_continue_completion(completion, result); 1498 1402 return; 1499 1403 } 1404 + if (vdo->needs_formatting) 1405 + vdo->admin.phase = PRE_LOAD_PHASE_FORMAT_START; 1406 + else 1407 + vdo->admin.phase = PRE_LOAD_PHASE_LOAD_SUPER; 1500 1408 1409 + vdo_continue_completion(completion, VDO_SUCCESS); 1410 + return; 1411 + 1412 + case PRE_LOAD_PHASE_FORMAT_START: 1413 + vdo_continue_completion(completion, vdo_clear_layout(vdo)); 1414 + return; 1415 + 1416 + case PRE_LOAD_PHASE_FORMAT_SUPER: 1417 + vdo_save_super_block(vdo, completion); 1418 + return; 1419 + 1420 + case PRE_LOAD_PHASE_FORMAT_GEOMETRY: 1421 + vdo_save_geometry_block(vdo, completion); 1422 + return; 1423 + 1424 + case PRE_LOAD_PHASE_FORMAT_END: 1425 + /* cleanup layout before load adds to it */ 1426 + vdo_uninitialize_layout(&vdo->states.layout); 1427 + vdo_continue_completion(completion, VDO_SUCCESS); 1428 + return; 1429 + 1430 + case PRE_LOAD_PHASE_LOAD_SUPER: 1501 1431 vdo_load_super_block(vdo, completion); 1502 1432 return; 1503 1433 ··· 1581 1459 vdo_log_debug("Logical blocks = %llu", logical_blocks); 1582 1460 vdo_log_debug("Physical block size = %llu", (u64) block_size); 1583 1461 vdo_log_debug("Physical blocks = %llu", config->physical_blocks); 1462 + vdo_log_debug("Slab size = %llu", config->slab_blocks); 1584 1463 vdo_log_debug("Block map cache blocks = %u", config->cache_size); 1585 1464 vdo_log_debug("Block map maximum age = %u", config->block_map_maximum_age); 1586 1465 vdo_log_debug("Deduplication = %s", (config->deduplication ? "on" : "off")); 1587 1466 vdo_log_debug("Compression = %s", (config->compression ? "on" : "off")); 1467 + vdo_log_debug("Index memory = %u", config->index_memory); 1468 + vdo_log_debug("Index sparse = %s", (config->index_sparse ? "on" : "off")); 1588 1469 1589 1470 vdo = vdo_find_matching(vdo_uses_device, config); 1590 1471 if (vdo != NULL) { ··· 2983 2858 static struct target_type vdo_target_bio = { 2984 2859 .features = DM_TARGET_SINGLETON, 2985 2860 .name = "vdo", 2986 - .version = { 9, 1, 0 }, 2861 + .version = { 9, 2, 0 }, 2987 2862 .module = THIS_MODULE, 2988 2863 .ctr = vdo_ctr, 2989 2864 .dtr = vdo_dtr,

+211 -10

drivers/md/dm-vdo/encodings.c

··· 12 12 #include "permassert.h" 13 13 14 14 #include "constants.h" 15 + #include "indexer.h" 15 16 #include "status-codes.h" 16 17 #include "types.h" 17 - 18 - /** The maximum logical space is 4 petabytes, which is 1 terablock. */ 19 - static const block_count_t MAXIMUM_VDO_LOGICAL_BLOCKS = 1024ULL * 1024 * 1024 * 1024; 20 - 21 - /** The maximum physical space is 256 terabytes, which is 64 gigablocks. */ 22 - static const block_count_t MAXIMUM_VDO_PHYSICAL_BLOCKS = 1024ULL * 1024 * 1024 * 64; 23 18 24 19 struct geometry_block { 25 20 char magic_number[VDO_GEOMETRY_MAGIC_NUMBER_SIZE]; ··· 285 290 .mem = mem, 286 291 .sparse = sparse, 287 292 }; 293 + } 294 + 295 + /** 296 + * vdo_encode_volume_geometry() - Encode the on-disk representation of a volume geometry into a buffer. 297 + * @buffer: A buffer to store the encoding. 298 + * @geometry: The geometry to encode. 299 + * @version: The geometry block version to encode. 300 + * 301 + * Return: VDO_SUCCESS or an error. 302 + */ 303 + int vdo_encode_volume_geometry(u8 *buffer, const struct volume_geometry *geometry, 304 + u32 version) 305 + { 306 + int result; 307 + enum volume_region_id id; 308 + u32 checksum; 309 + size_t offset = 0; 310 + const struct header *header; 311 + 312 + memcpy(buffer, VDO_GEOMETRY_MAGIC_NUMBER, VDO_GEOMETRY_MAGIC_NUMBER_SIZE); 313 + offset += VDO_GEOMETRY_MAGIC_NUMBER_SIZE; 314 + 315 + header = (version > 4) ? &GEOMETRY_BLOCK_HEADER_5_0 : &GEOMETRY_BLOCK_HEADER_4_0; 316 + vdo_encode_header(buffer, &offset, header); 317 + 318 + /* This is for backwards compatibility */ 319 + encode_u32_le(buffer, &offset, geometry->unused); 320 + encode_u64_le(buffer, &offset, geometry->nonce); 321 + memcpy(buffer + offset, (unsigned char *) &geometry->uuid, sizeof(uuid_t)); 322 + offset += sizeof(uuid_t); 323 + 324 + if (version > 4) 325 + encode_u64_le(buffer, &offset, geometry->bio_offset); 326 + 327 + for (id = 0; id < VDO_VOLUME_REGION_COUNT; id++) { 328 + encode_u32_le(buffer, &offset, geometry->regions[id].id); 329 + encode_u64_le(buffer, &offset, geometry->regions[id].start_block); 330 + } 331 + 332 + encode_u32_le(buffer, &offset, geometry->index_config.mem); 333 + encode_u32_le(buffer, &offset, 0); 334 + 335 + if (geometry->index_config.sparse) 336 + buffer[offset++] = 1; 337 + else 338 + buffer[offset++] = 0; 339 + 340 + result = VDO_ASSERT(header->size == offset + sizeof(u32), 341 + "should have encoded up to the geometry checksum"); 342 + if (result != VDO_SUCCESS) 343 + return result; 344 + 345 + checksum = vdo_crc32(buffer, offset); 346 + encode_u32_le(buffer, &offset, checksum); 347 + 348 + return VDO_SUCCESS; 288 349 } 289 350 290 351 /** ··· 849 798 struct partition *partition; 850 799 int result; 851 800 852 - result = vdo_allocate(1, struct partition, __func__, &partition); 801 + result = vdo_allocate(1, __func__, &partition); 853 802 if (result != VDO_SUCCESS) 854 803 return result; 855 804 ··· 1270 1219 if (result != VDO_SUCCESS) 1271 1220 return result; 1272 1221 1273 - result = VDO_ASSERT(config->slab_size <= (1 << MAX_VDO_SLAB_BITS), 1274 - "slab size must be less than or equal to 2^%d", 1275 - MAX_VDO_SLAB_BITS); 1222 + result = VDO_ASSERT(config->slab_size <= MAX_VDO_SLAB_BLOCKS, 1223 + "slab size must be a power of two less than or equal to %d", 1224 + MAX_VDO_SLAB_BLOCKS); 1276 1225 if (result != VDO_SUCCESS) 1277 1226 return result; 1278 1227 ··· 1536 1485 return result; 1537 1486 1538 1487 return ((checksum != saved_checksum) ? VDO_CHECKSUM_MISMATCH : VDO_SUCCESS); 1488 + } 1489 + 1490 + /** 1491 + * vdo_initialize_component_states() - Initialize the components so they can be written out. 1492 + * @vdo_config: The config used for component state initialization. 1493 + * @geometry: The volume geometry used to calculate the data region offset. 1494 + * @nonce: The nonce to use to identify the vdo. 1495 + * @states: The component states to initialize. 1496 + * 1497 + * Return: VDO_SUCCESS or an error code. 1498 + */ 1499 + int vdo_initialize_component_states(const struct vdo_config *vdo_config, 1500 + const struct volume_geometry *geometry, 1501 + nonce_t nonce, 1502 + struct vdo_component_states *states) 1503 + { 1504 + int result; 1505 + struct slab_config slab_config; 1506 + struct partition *partition; 1507 + 1508 + states->vdo.config = *vdo_config; 1509 + states->vdo.nonce = nonce; 1510 + states->volume_version = VDO_VOLUME_VERSION_67_0; 1511 + 1512 + states->recovery_journal = (struct recovery_journal_state_7_0) { 1513 + .journal_start = RECOVERY_JOURNAL_STARTING_SEQUENCE_NUMBER, 1514 + .logical_blocks_used = 0, 1515 + .block_map_data_blocks = 0, 1516 + }; 1517 + 1518 + /* 1519 + * The layout starts 1 block past the beginning of the data region, as the 1520 + * data region contains the super block but the layout does not. 1521 + */ 1522 + result = vdo_initialize_layout(vdo_config->physical_blocks, 1523 + vdo_get_data_region_start(*geometry) + 1, 1524 + DEFAULT_VDO_BLOCK_MAP_TREE_ROOT_COUNT, 1525 + vdo_config->recovery_journal_size, 1526 + VDO_SLAB_SUMMARY_BLOCKS, 1527 + &states->layout); 1528 + if (result != VDO_SUCCESS) 1529 + return result; 1530 + 1531 + result = vdo_configure_slab(vdo_config->slab_size, 1532 + vdo_config->slab_journal_blocks, 1533 + &slab_config); 1534 + if (result != VDO_SUCCESS) { 1535 + vdo_uninitialize_layout(&states->layout); 1536 + return result; 1537 + } 1538 + 1539 + result = vdo_get_partition(&states->layout, VDO_SLAB_DEPOT_PARTITION, 1540 + &partition); 1541 + if (result != VDO_SUCCESS) { 1542 + vdo_uninitialize_layout(&states->layout); 1543 + return result; 1544 + } 1545 + 1546 + result = vdo_configure_slab_depot(partition, slab_config, 0, 1547 + &states->slab_depot); 1548 + if (result != VDO_SUCCESS) { 1549 + vdo_uninitialize_layout(&states->layout); 1550 + return result; 1551 + } 1552 + 1553 + result = vdo_get_partition(&states->layout, VDO_BLOCK_MAP_PARTITION, 1554 + &partition); 1555 + if (result != VDO_SUCCESS) { 1556 + vdo_uninitialize_layout(&states->layout); 1557 + return result; 1558 + } 1559 + 1560 + states->block_map = (struct block_map_state_2_0) { 1561 + .flat_page_origin = VDO_BLOCK_MAP_FLAT_PAGE_ORIGIN, 1562 + .flat_page_count = 0, 1563 + .root_origin = partition->offset, 1564 + .root_count = DEFAULT_VDO_BLOCK_MAP_TREE_ROOT_COUNT, 1565 + }; 1566 + 1567 + states->vdo.state = VDO_NEW; 1568 + 1569 + return VDO_SUCCESS; 1570 + } 1571 + 1572 + /** 1573 + * vdo_compute_index_blocks() - Compute the number of blocks that the indexer will use. 1574 + * @config: The index config from which the blocks are calculated. 1575 + * @index_blocks_ptr: The number of blocks the index will use. 1576 + * 1577 + * Return: VDO_SUCCESS or an error code. 1578 + */ 1579 + static int vdo_compute_index_blocks(const struct index_config *config, 1580 + block_count_t *index_blocks_ptr) 1581 + { 1582 + int result; 1583 + u64 index_bytes; 1584 + struct uds_parameters uds_parameters = { 1585 + .memory_size = config->mem, 1586 + .sparse = config->sparse, 1587 + }; 1588 + 1589 + result = uds_compute_index_size(&uds_parameters, &index_bytes); 1590 + if (result != UDS_SUCCESS) 1591 + return vdo_log_error_strerror(result, "error computing index size"); 1592 + 1593 + *index_blocks_ptr = index_bytes / VDO_BLOCK_SIZE; 1594 + return VDO_SUCCESS; 1595 + } 1596 + 1597 + /** 1598 + * vdo_initialize_volume_geometry() - Initialize the volume geometry so it can be written out. 1599 + * @nonce: The nonce to use to identify the vdo. 1600 + * @uuid: The uuid to use to identify the vdo. 1601 + * @index_config: The config used for structure initialization. 1602 + * @geometry: The volume geometry to initialize. 1603 + * 1604 + * Return: VDO_SUCCESS or an error code. 1605 + */ 1606 + int vdo_initialize_volume_geometry(nonce_t nonce, uuid_t *uuid, 1607 + const struct index_config *index_config, 1608 + struct volume_geometry *geometry) 1609 + { 1610 + int result; 1611 + block_count_t index_blocks = 0; 1612 + 1613 + result = vdo_compute_index_blocks(index_config, &index_blocks); 1614 + if (result != VDO_SUCCESS) 1615 + return result; 1616 + 1617 + *geometry = (struct volume_geometry) { 1618 + /* This is for backwards compatibility. */ 1619 + .unused = 0, 1620 + .nonce = nonce, 1621 + .bio_offset = 0, 1622 + .regions = { 1623 + [VDO_INDEX_REGION] = { 1624 + .id = VDO_INDEX_REGION, 1625 + .start_block = 1, 1626 + }, 1627 + [VDO_DATA_REGION] = { 1628 + .id = VDO_DATA_REGION, 1629 + .start_block = 1 + index_blocks, 1630 + } 1631 + } 1632 + }; 1633 + 1634 + memcpy(&(geometry->uuid), uuid, sizeof(uuid_t)); 1635 + memcpy(&geometry->index_config, index_config, sizeof(struct index_config)); 1636 + 1637 + return VDO_SUCCESS; 1539 1638 }

+17

drivers/md/dm-vdo/encodings.h

··· 608 608 block_count_t slab_journal_blocks; /* number of slab journal blocks */ 609 609 }; 610 610 611 + /** The maximum logical space is 4 petabytes, which is 1 terablock. */ 612 + #define MAXIMUM_VDO_LOGICAL_BLOCKS ((block_count_t)(1024ULL * 1024 * 1024 * 1024)) 613 + 614 + /** The maximum physical space is 256 terabytes, which is 64 gigablocks. */ 615 + #define MAXIMUM_VDO_PHYSICAL_BLOCKS ((block_count_t)(1024ULL * 1024 * 1024 * 64)) 616 + 611 617 /* This is the structure that captures the vdo fields saved as a super block component. */ 612 618 struct vdo_component { 613 619 enum vdo_state state; ··· 809 803 vdo_get_index_region_start(geometry); 810 804 } 811 805 806 + int vdo_initialize_volume_geometry(nonce_t nonce, uuid_t *uuid, 807 + const struct index_config *index_config, 808 + struct volume_geometry *geometry); 809 + 810 + int vdo_encode_volume_geometry(u8 *buffer, const struct volume_geometry *geometry, 811 + u32 version); 812 812 int __must_check vdo_parse_geometry_block(unsigned char *block, 813 813 struct volume_geometry *geometry); 814 814 ··· 1275 1263 1276 1264 void vdo_encode_super_block(u8 *buffer, struct vdo_component_states *states); 1277 1265 int __must_check vdo_decode_super_block(u8 *buffer); 1266 + 1267 + int vdo_initialize_component_states(const struct vdo_config *vdo_config, 1268 + const struct volume_geometry *geometry, 1269 + nonce_t nonce, 1270 + struct vdo_component_states *states); 1278 1271 1279 1272 /* We start with 0L and postcondition with ~0L to match our historical usage in userspace. */ 1280 1273 static inline u32 vdo_crc32(const void *buf, unsigned long len)

+2 -2

drivers/md/dm-vdo/flush.c

··· 105 105 if ((gfp_mask & GFP_NOWAIT) == GFP_NOWAIT) { 106 106 flush = vdo_allocate_memory_nowait(sizeof(struct vdo_flush), __func__); 107 107 } else { 108 - int result = vdo_allocate(1, struct vdo_flush, __func__, &flush); 108 + int result = vdo_allocate(1, __func__, &flush); 109 109 110 110 if (result != VDO_SUCCESS) 111 111 vdo_log_error_strerror(result, "failed to allocate spare flush"); ··· 134 134 */ 135 135 int vdo_make_flusher(struct vdo *vdo) 136 136 { 137 - int result = vdo_allocate(1, struct flusher, __func__, &vdo->flusher); 137 + int result = vdo_allocate(1, __func__, &vdo->flusher); 138 138 139 139 if (result != VDO_SUCCESS) 140 140 return result;

+1 -1

drivers/md/dm-vdo/funnel-queue.c

··· 14 14 int result; 15 15 struct funnel_queue *queue; 16 16 17 - result = vdo_allocate(1, struct funnel_queue, "funnel queue", &queue); 17 + result = vdo_allocate(1, "funnel queue", &queue); 18 18 if (result != VDO_SUCCESS) 19 19 return result; 20 20

+3 -5

drivers/md/dm-vdo/funnel-workqueue.c

··· 322 322 "queue priority count %u within limit %u", type->max_priority, 323 323 VDO_WORK_Q_MAX_PRIORITY); 324 324 325 - result = vdo_allocate(1, struct simple_work_queue, "simple work queue", &queue); 325 + result = vdo_allocate(1, "simple work queue", &queue); 326 326 if (result != VDO_SUCCESS) 327 327 return result; 328 328 ··· 405 405 return result; 406 406 } 407 407 408 - result = vdo_allocate(1, struct round_robin_work_queue, "round-robin work queue", 409 - &queue); 408 + result = vdo_allocate(1, "round-robin work queue", &queue); 410 409 if (result != VDO_SUCCESS) 411 410 return result; 412 411 413 - result = vdo_allocate(thread_count, struct simple_work_queue *, 414 - "subordinate work queues", &queue->service_queues); 412 + result = vdo_allocate(thread_count, "subordinate work queues", &queue->service_queues); 415 413 if (result != VDO_SUCCESS) { 416 414 vdo_free(queue); 417 415 return result;

+1 -1

drivers/md/dm-vdo/indexer/chapter-index.c

··· 20 20 size_t memory_size; 21 21 struct open_chapter_index *index; 22 22 23 - result = vdo_allocate(1, struct open_chapter_index, "open chapter index", &index); 23 + result = vdo_allocate(1, "open chapter index", &index); 24 24 if (result != VDO_SUCCESS) 25 25 return result; 26 26

+1 -1

drivers/md/dm-vdo/indexer/config.c

··· 325 325 if (result != UDS_SUCCESS) 326 326 return result; 327 327 328 - result = vdo_allocate(1, struct uds_configuration, __func__, &config); 328 + result = vdo_allocate(1, __func__, &config); 329 329 if (result != VDO_SUCCESS) 330 330 return result; 331 331

+5 -8

drivers/md/dm-vdo/indexer/delta-index.c

··· 311 311 { 312 312 int result; 313 313 314 - result = vdo_allocate(size, u8, "delta list", &delta_zone->memory); 314 + result = vdo_allocate(size, "delta list", &delta_zone->memory); 315 315 if (result != VDO_SUCCESS) 316 316 return result; 317 317 318 - result = vdo_allocate(list_count + 2, u64, "delta list temp", 319 - &delta_zone->new_offsets); 318 + result = vdo_allocate(list_count + 2, "delta list temp", &delta_zone->new_offsets); 320 319 if (result != VDO_SUCCESS) 321 320 return result; 322 321 323 322 /* Allocate the delta lists. */ 324 - result = vdo_allocate(list_count + 2, struct delta_list, "delta lists", 325 - &delta_zone->delta_lists); 323 + result = vdo_allocate(list_count + 2, "delta lists", &delta_zone->delta_lists); 326 324 if (result != VDO_SUCCESS) 327 325 return result; 328 326 ··· 350 352 unsigned int z; 351 353 size_t zone_memory; 352 354 353 - result = vdo_allocate(zone_count, struct delta_zone, "Delta Index Zones", 354 - &delta_index->delta_zones); 355 + result = vdo_allocate(zone_count, "Delta Index Zones", &delta_index->delta_zones); 355 356 if (result != VDO_SUCCESS) 356 357 return result; 357 358 ··· 1044 1047 unsigned int z; 1045 1048 u8 *data; 1046 1049 1047 - result = vdo_allocate(DELTA_LIST_MAX_BYTE_COUNT, u8, __func__, &data); 1050 + result = vdo_allocate(DELTA_LIST_MAX_BYTE_COUNT, __func__, &data); 1048 1051 if (result != VDO_SUCCESS) 1049 1052 return result; 1050 1053

+1 -1

drivers/md/dm-vdo/indexer/funnel-requestqueue.c

··· 198 198 int result; 199 199 struct uds_request_queue *queue; 200 200 201 - result = vdo_allocate(1, struct uds_request_queue, __func__, &queue); 201 + result = vdo_allocate(1, __func__, &queue); 202 202 if (result != VDO_SUCCESS) 203 203 return result; 204 204

+1 -1

drivers/md/dm-vdo/indexer/geometry.c

··· 61 61 int result; 62 62 struct index_geometry *geometry; 63 63 64 - result = vdo_allocate(1, struct index_geometry, "geometry", &geometry); 64 + result = vdo_allocate(1, "geometry", &geometry); 65 65 if (result != VDO_SUCCESS) 66 66 return result; 67 67

+39 -15

drivers/md/dm-vdo/indexer/index-layout.c

··· 249 249 return UDS_SUCCESS; 250 250 } 251 251 252 + int uds_compute_index_size(const struct uds_parameters *parameters, u64 *index_size) 253 + { 254 + int result; 255 + struct uds_configuration *index_config; 256 + struct save_layout_sizes sizes; 257 + 258 + if (index_size == NULL) { 259 + vdo_log_error("Missing output size pointer"); 260 + return -EINVAL; 261 + } 262 + 263 + result = uds_make_configuration(parameters, &index_config); 264 + if (result != UDS_SUCCESS) { 265 + vdo_log_error_strerror(result, "cannot compute index size"); 266 + return result; 267 + } 268 + 269 + result = compute_sizes(index_config, &sizes); 270 + uds_free_configuration(index_config); 271 + if (result != UDS_SUCCESS) 272 + return result; 273 + 274 + *index_size = sizes.total_size; 275 + return UDS_SUCCESS; 276 + } 277 + 252 278 /* Create unique data using the current time and a pseudorandom number. */ 253 279 static void create_unique_nonce_data(u8 *buffer) 254 280 { ··· 485 459 type = RH_TYPE_UNSAVED; 486 460 } 487 461 488 - result = vdo_allocate_extended(struct region_table, region_count, 489 - struct layout_region, 462 + result = vdo_allocate_extended(region_count, regions, 490 463 "layout region table for ISL", &table); 491 464 if (result != VDO_SUCCESS) 492 465 return result; ··· 545 520 u8 *buffer; 546 521 size_t offset = 0; 547 522 548 - result = vdo_allocate(table->encoded_size, u8, "index save data", &buffer); 523 + result = vdo_allocate(table->encoded_size, "index save data", &buffer); 549 524 if (result != VDO_SUCCESS) 550 525 return result; 551 526 ··· 667 642 struct region_table *table; 668 643 struct layout_region *lr; 669 644 670 - result = vdo_allocate_extended(struct region_table, region_count, 671 - struct layout_region, "layout region table", 672 - &table); 645 + result = vdo_allocate_extended(region_count, regions, 646 + "layout region table", &table); 673 647 if (result != VDO_SUCCESS) 674 648 return result; 675 649 ··· 714 690 u8 *buffer; 715 691 size_t offset = 0; 716 692 717 - result = vdo_allocate(table->encoded_size, u8, "layout data", &buffer); 693 + result = vdo_allocate(table->encoded_size, "layout data", &buffer); 718 694 if (result != VDO_SUCCESS) 719 695 return result; 720 696 ··· 804 780 if (result != UDS_SUCCESS) 805 781 return result; 806 782 807 - result = vdo_allocate(sizes.save_count, struct index_save_layout, __func__, 808 - &layout->index.saves); 783 + result = vdo_allocate(sizes.save_count, __func__, &layout->index.saves); 809 784 if (result != VDO_SUCCESS) 810 785 return result; 811 786 ··· 1161 1138 header.version); 1162 1139 } 1163 1140 1164 - result = vdo_allocate_extended(struct region_table, header.region_count, 1165 - struct layout_region, 1141 + result = vdo_allocate_extended(header.region_count, regions, 1166 1142 "single file layout region table", &table); 1167 1143 if (result != VDO_SUCCESS) 1168 1144 return result; ··· 1199 1177 u8 *buffer; 1200 1178 size_t offset = 0; 1201 1179 1202 - result = vdo_allocate(saved_size, u8, "super block data", &buffer); 1180 + result = vdo_allocate(saved_size, "super block data", &buffer); 1203 1181 if (result != VDO_SUCCESS) 1204 1182 return result; 1205 1183 ··· 1333 1311 int result; 1334 1312 u64 next_block = first_block; 1335 1313 1336 - result = vdo_allocate(layout->super.max_saves, struct index_save_layout, 1337 - __func__, &layout->index.saves); 1314 + result = vdo_allocate(layout->super.max_saves, __func__, &layout->index.saves); 1338 1315 if (result != VDO_SUCCESS) 1339 1316 return result; 1340 1317 ··· 1466 1445 u64 last_block = next_block + isl->index_save.block_count; 1467 1446 1468 1447 isl->zone_count = table->header.region_count - 3; 1448 + if (isl->zone_count > MAX_ZONES) 1449 + return vdo_log_error_strerror(UDS_CORRUPT_DATA, 1450 + "invalid zone count"); 1469 1451 1470 1452 last_region = &table->regions[table->header.region_count - 1]; 1471 1453 if (last_region->kind == RL_KIND_EMPTY) { ··· 1696 1672 if (result != UDS_SUCCESS) 1697 1673 return result; 1698 1674 1699 - result = vdo_allocate(1, struct index_layout, __func__, &layout); 1675 + result = vdo_allocate(1, __func__, &layout); 1700 1676 if (result != VDO_SUCCESS) 1701 1677 return result; 1702 1678

+4 -4

drivers/md/dm-vdo/indexer/index-page-map.c

··· 38 38 int result; 39 39 struct index_page_map *map; 40 40 41 - result = vdo_allocate(1, struct index_page_map, "page map", &map); 41 + result = vdo_allocate(1, "page map", &map); 42 42 if (result != VDO_SUCCESS) 43 43 return result; 44 44 45 45 map->geometry = geometry; 46 46 map->entries_per_chapter = geometry->index_pages_per_chapter - 1; 47 - result = vdo_allocate(get_entry_count(geometry), u16, "Index Page Map Entries", 47 + result = vdo_allocate(get_entry_count(geometry), "Index Page Map Entries", 48 48 &map->entries); 49 49 if (result != VDO_SUCCESS) { 50 50 uds_free_index_page_map(map); ··· 118 118 u64 saved_size = uds_compute_index_page_map_save_size(map->geometry); 119 119 u32 i; 120 120 121 - result = vdo_allocate(saved_size, u8, "page map data", &buffer); 121 + result = vdo_allocate(saved_size, "page map data", &buffer); 122 122 if (result != VDO_SUCCESS) 123 123 return result; 124 124 ··· 145 145 u64 saved_size = uds_compute_index_page_map_save_size(map->geometry); 146 146 u32 i; 147 147 148 - result = vdo_allocate(saved_size, u8, "page map data", &buffer); 148 + result = vdo_allocate(saved_size, "page map data", &buffer); 149 149 if (result != VDO_SUCCESS) 150 150 return result; 151 151

+1 -1

drivers/md/dm-vdo/indexer/index-session.c

··· 217 217 int result; 218 218 struct uds_index_session *session; 219 219 220 - result = vdo_allocate(1, struct uds_index_session, __func__, &session); 220 + result = vdo_allocate(1, __func__, &session); 221 221 if (result != VDO_SUCCESS) 222 222 return result; 223 223

+5 -9

drivers/md/dm-vdo/indexer/index.c

··· 88 88 int result; 89 89 struct uds_request *request; 90 90 91 - result = vdo_allocate(1, struct uds_request, __func__, &request); 91 + result = vdo_allocate(1, __func__, &request); 92 92 if (result != VDO_SUCCESS) 93 93 return result; 94 94 ··· 764 764 size_t collated_records_size = 765 765 (sizeof(struct uds_volume_record) * index->volume->geometry->records_per_chapter); 766 766 767 - result = vdo_allocate_extended(struct chapter_writer, index->zone_count, 768 - struct open_chapter_zone *, "Chapter Writer", 769 - &writer); 767 + result = vdo_allocate_extended(index->zone_count, chapters, "Chapter Writer", &writer); 770 768 if (result != VDO_SUCCESS) 771 769 return result; 772 770 ··· 1121 1123 int result; 1122 1124 struct index_zone *zone; 1123 1125 1124 - result = vdo_allocate(1, struct index_zone, "index zone", &zone); 1126 + result = vdo_allocate(1, "index zone", &zone); 1125 1127 if (result != VDO_SUCCESS) 1126 1128 return result; 1127 1129 ··· 1158 1160 u64 nonce; 1159 1161 unsigned int z; 1160 1162 1161 - result = vdo_allocate_extended(struct uds_index, config->zone_count, 1162 - struct uds_request_queue *, "index", &index); 1163 + result = vdo_allocate_extended(config->zone_count, zone_queues, "index", &index); 1163 1164 if (result != VDO_SUCCESS) 1164 1165 return result; 1165 1166 ··· 1170 1173 return result; 1171 1174 } 1172 1175 1173 - result = vdo_allocate(index->zone_count, struct index_zone *, "zones", 1174 - &index->zones); 1176 + result = vdo_allocate(index->zone_count, "zones", &index->zones); 1175 1177 if (result != VDO_SUCCESS) { 1176 1178 uds_free_index(index); 1177 1179 return result;

+1 -1

drivers/md/dm-vdo/indexer/index.h

··· 53 53 54 54 index_callback_fn callback; 55 55 struct uds_request_queue *triage_queue; 56 - struct uds_request_queue *zone_queues[]; 56 + struct uds_request_queue *zone_queues[] __counted_by(zone_count); 57 57 }; 58 58 59 59 enum request_stage {

+4

drivers/md/dm-vdo/indexer/indexer.h

··· 282 282 ); 283 283 }; 284 284 285 + /* Compute the number of bytes needed to store an index. */ 286 + int __must_check uds_compute_index_size(const struct uds_parameters *parameters, 287 + u64 *index_size); 288 + 285 289 /* A session is required for most index operations. */ 286 290 int __must_check uds_create_index_session(struct uds_index_session **session); 287 291

+3 -3

drivers/md/dm-vdo/indexer/io-factory.c

··· 64 64 int result; 65 65 struct io_factory *factory; 66 66 67 - result = vdo_allocate(1, struct io_factory, __func__, &factory); 67 + result = vdo_allocate(1, __func__, &factory); 68 68 if (result != VDO_SUCCESS) 69 69 return result; 70 70 ··· 144 144 if (result != UDS_SUCCESS) 145 145 return result; 146 146 147 - result = vdo_allocate(1, struct buffered_reader, "buffered reader", &reader); 147 + result = vdo_allocate(1, "buffered reader", &reader); 148 148 if (result != VDO_SUCCESS) { 149 149 dm_bufio_client_destroy(client); 150 150 return result; ··· 282 282 if (result != UDS_SUCCESS) 283 283 return result; 284 284 285 - result = vdo_allocate(1, struct buffered_writer, "buffered writer", &writer); 285 + result = vdo_allocate(1, "buffered writer", &writer); 286 286 if (result != VDO_SUCCESS) { 287 287 dm_bufio_client_destroy(client); 288 288 return result;

+1 -3

drivers/md/dm-vdo/indexer/open-chapter.c

··· 68 68 size_t capacity = geometry->records_per_chapter / zone_count; 69 69 size_t slot_count = (1 << bits_per(capacity * LOAD_RATIO)); 70 70 71 - result = vdo_allocate_extended(struct open_chapter_zone, slot_count, 72 - struct open_chapter_zone_slot, "open chapter", 73 - &open_chapter); 71 + result = vdo_allocate_extended(slot_count, slots, "open chapter", &open_chapter); 74 72 if (result != VDO_SUCCESS) 75 73 return result; 76 74

+1 -1

drivers/md/dm-vdo/indexer/open-chapter.h

··· 40 40 /* The number of slots in the hash table */ 41 41 unsigned int slot_count; 42 42 /* The hash table slots, referencing virtual record numbers */ 43 - struct open_chapter_zone_slot slots[]; 43 + struct open_chapter_zone_slot slots[] __counted_by(slot_count); 44 44 }; 45 45 46 46 int __must_check uds_make_open_chapter(const struct index_geometry *geometry,

+1 -2

drivers/md/dm-vdo/indexer/radix-sort.c

··· 211 211 unsigned int stack_size = count / INSERTION_SORT_THRESHOLD; 212 212 struct radix_sorter *radix_sorter; 213 213 214 - result = vdo_allocate_extended(struct radix_sorter, stack_size, struct task, 215 - __func__, &radix_sorter); 214 + result = vdo_allocate_extended(stack_size, stack, __func__, &radix_sorter); 216 215 if (result != VDO_SUCCESS) 217 216 return result; 218 217

+4 -6

drivers/md/dm-vdo/indexer/sparse-cache.c

··· 222 222 chapter->virtual_chapter = NO_CHAPTER; 223 223 chapter->index_pages_count = geometry->index_pages_per_chapter; 224 224 225 - result = vdo_allocate(chapter->index_pages_count, struct delta_index_page, 226 - __func__, &chapter->index_pages); 225 + result = vdo_allocate(chapter->index_pages_count, __func__, &chapter->index_pages); 227 226 if (result != VDO_SUCCESS) 228 227 return result; 229 228 230 - return vdo_allocate(chapter->index_pages_count, struct dm_buffer *, 231 - "sparse index volume pages", &chapter->page_buffers); 229 + return vdo_allocate(chapter->index_pages_count, "sparse index volume pages", 230 + &chapter->page_buffers); 232 231 } 233 232 234 233 static int __must_check make_search_list(struct sparse_cache *cache, ··· 293 294 } 294 295 295 296 /* purge_search_list() needs some temporary lists for sorting. */ 296 - result = vdo_allocate(capacity * 2, struct cached_chapter_index *, 297 - "scratch entries", &cache->scratch_entries); 297 + result = vdo_allocate(capacity * 2, "scratch entries", &cache->scratch_entries); 298 298 if (result != VDO_SUCCESS) 299 299 goto out; 300 300

+4 -6

drivers/md/dm-vdo/indexer/volume-index.c

··· 1211 1211 (zone_count * sizeof(struct volume_sub_index_zone))); 1212 1212 1213 1213 /* The following arrays are initialized to all zeros. */ 1214 - result = vdo_allocate(params.list_count, u64, "first chapter to flush", 1214 + result = vdo_allocate(params.list_count, "first chapter to flush", 1215 1215 &sub_index->flush_chapters); 1216 1216 if (result != VDO_SUCCESS) 1217 1217 return result; 1218 1218 1219 - return vdo_allocate(zone_count, struct volume_sub_index_zone, 1220 - "volume index zones", &sub_index->zones); 1219 + return vdo_allocate(zone_count, "volume index zones", &sub_index->zones); 1221 1220 } 1222 1221 1223 1222 int uds_make_volume_index(const struct uds_configuration *config, u64 volume_nonce, ··· 1227 1228 struct volume_index *volume_index; 1228 1229 int result; 1229 1230 1230 - result = vdo_allocate(1, struct volume_index, "volume index", &volume_index); 1231 + result = vdo_allocate(1, "volume index", &volume_index); 1231 1232 if (result != VDO_SUCCESS) 1232 1233 return result; 1233 1234 ··· 1248 1249 1249 1250 volume_index->sparse_sample_rate = config->sparse_sample_rate; 1250 1251 1251 - result = vdo_allocate(config->zone_count, struct volume_index_zone, 1252 - "volume index zones", &volume_index->zones); 1252 + result = vdo_allocate(config->zone_count, "volume index zones", &volume_index->zones); 1253 1253 if (result != VDO_SUCCESS) { 1254 1254 uds_free_volume_index(volume_index); 1255 1255 return result;

+9 -13

drivers/md/dm-vdo/indexer/volume.c

··· 1509 1509 if (result != VDO_SUCCESS) 1510 1510 return result; 1511 1511 1512 - result = vdo_allocate(VOLUME_CACHE_MAX_QUEUED_READS, struct queued_read, 1513 - "volume read queue", &cache->read_queue); 1512 + result = vdo_allocate(VOLUME_CACHE_MAX_QUEUED_READS, "volume read queue", 1513 + &cache->read_queue); 1514 1514 if (result != VDO_SUCCESS) 1515 1515 return result; 1516 1516 1517 - result = vdo_allocate(cache->zone_count, struct search_pending_counter, 1518 - "Volume Cache Zones", &cache->search_pending_counters); 1517 + result = vdo_allocate(cache->zone_count, "Volume Cache Zones", 1518 + &cache->search_pending_counters); 1519 1519 if (result != VDO_SUCCESS) 1520 1520 return result; 1521 1521 1522 - result = vdo_allocate(cache->indexable_pages, u16, "page cache index", 1523 - &cache->index); 1522 + result = vdo_allocate(cache->indexable_pages, "page cache index", &cache->index); 1524 1523 if (result != VDO_SUCCESS) 1525 1524 return result; 1526 1525 1527 - result = vdo_allocate(cache->cache_slots, struct cached_page, "page cache cache", 1528 - &cache->cache); 1526 + result = vdo_allocate(cache->cache_slots, "page cache cache", &cache->cache); 1529 1527 if (result != VDO_SUCCESS) 1530 1528 return result; 1531 1529 ··· 1546 1548 unsigned int reserved_buffers; 1547 1549 int result; 1548 1550 1549 - result = vdo_allocate(1, struct volume, "volume", &volume); 1551 + result = vdo_allocate(1, "volume", &volume); 1550 1552 if (result != VDO_SUCCESS) 1551 1553 return result; 1552 1554 ··· 1583 1585 return result; 1584 1586 } 1585 1587 1586 - result = vdo_allocate(geometry->records_per_page, 1587 - const struct uds_volume_record *, "record pointers", 1588 + result = vdo_allocate(geometry->records_per_page, "record pointers", 1588 1589 &volume->record_pointers); 1589 1590 if (result != VDO_SUCCESS) { 1590 1591 uds_free_volume(volume); ··· 1623 1626 uds_init_cond(&volume->read_threads_read_done_cond); 1624 1627 uds_init_cond(&volume->read_threads_cond); 1625 1628 1626 - result = vdo_allocate(config->read_threads, struct thread *, "reader threads", 1627 - &volume->reader_threads); 1629 + result = vdo_allocate(config->read_threads, "reader threads", &volume->reader_threads); 1628 1630 if (result != VDO_SUCCESS) { 1629 1631 uds_free_volume(volume); 1630 1632 return result;

+2 -3

drivers/md/dm-vdo/int-map.c

··· 164 164 * without have to wrap back around to element zero. 165 165 */ 166 166 map->bucket_count = capacity + (NEIGHBORHOOD - 1); 167 - return vdo_allocate(map->bucket_count, struct bucket, 168 - "struct int_map buckets", &map->buckets); 167 + return vdo_allocate(map->bucket_count, "struct int_map buckets", &map->buckets); 169 168 } 170 169 171 170 /** ··· 181 182 int result; 182 183 size_t capacity; 183 184 184 - result = vdo_allocate(1, struct int_map, "struct int_map", &map); 185 + result = vdo_allocate(1, "struct int_map", &map); 185 186 if (result != VDO_SUCCESS) 186 187 return result; 187 188

+28 -2

drivers/md/dm-vdo/io-submitter.c

··· 365 365 } 366 366 367 367 /** 368 + * vdo_submit_metadata_vio_wait() - Submit I/O for a metadata vio and wait for completion. 369 + * @vio: the vio for which to issue I/O 370 + * @physical: the physical block number to read or write 371 + * @operation: the type of I/O to perform 372 + * 373 + * The function operates similarly to __submit_metadata_vio except that it will 374 + * block until the work is done. It can be used to do i/o before work queues 375 + * and thread completions are set up. 376 + * 377 + * Return: VDO_SUCCESS or an error. 378 + */ 379 + int vdo_submit_metadata_vio_wait(struct vio *vio, 380 + physical_block_number_t physical, 381 + blk_opf_t operation) 382 + { 383 + int result; 384 + 385 + result = vio_reset_bio(vio, vio->data, NULL, operation | REQ_META, physical); 386 + if (result != VDO_SUCCESS) 387 + return result; 388 + 389 + bio_set_dev(vio->bio, vdo_get_backing_device(vio->completion.vdo)); 390 + submit_bio_wait(vio->bio); 391 + return blk_status_to_errno(vio->bio->bi_status); 392 + } 393 + 394 + /** 368 395 * vdo_make_io_submitter() - Create an io_submitter structure. 369 396 * @thread_count: Number of bio-submission threads to set up. 370 397 * @rotation_interval: Interval to use when rotating between bio-submission threads when enqueuing ··· 410 383 struct io_submitter *io_submitter; 411 384 int result; 412 385 413 - result = vdo_allocate_extended(struct io_submitter, thread_count, 414 - struct bio_queue_data, "bio submission data", 386 + result = vdo_allocate_extended(thread_count, bio_queue_data, "bio submission data", 415 387 &io_submitter); 416 388 if (result != VDO_SUCCESS) 417 389 return result;

+4

drivers/md/dm-vdo/io-submitter.h

··· 56 56 REQ_OP_WRITE | REQ_PREFLUSH, NULL, 0); 57 57 } 58 58 59 + int vdo_submit_metadata_vio_wait(struct vio *vio, 60 + physical_block_number_t physical, 61 + blk_opf_t operation); 62 + 59 63 #endif /* VDO_IO_SUBMITTER_H */

+1 -2

drivers/md/dm-vdo/logical-zone.c

··· 94 94 if (zone_count == 0) 95 95 return VDO_SUCCESS; 96 96 97 - result = vdo_allocate_extended(struct logical_zones, zone_count, 98 - struct logical_zone, __func__, &zones); 97 + result = vdo_allocate_extended(zone_count, zones, __func__, &zones); 99 98 if (result != VDO_SUCCESS) 100 99 return result; 101 100

+1 -1

drivers/md/dm-vdo/logical-zone.h

··· 60 60 /* The number of zones */ 61 61 zone_count_t zone_count; 62 62 /* The logical zones themselves */ 63 - struct logical_zone zones[]; 63 + struct logical_zone zones[] __counted_by(zone_count); 64 64 }; 65 65 66 66 int __must_check vdo_make_logical_zones(struct vdo *vdo,

+5 -3

drivers/md/dm-vdo/memory-alloc.c

··· 245 245 } else { 246 246 struct vmalloc_block_info *block; 247 247 248 - if (vdo_allocate(1, struct vmalloc_block_info, __func__, &block) == VDO_SUCCESS) { 248 + if (vdo_allocate(1, __func__, &block) == VDO_SUCCESS) { 249 249 /* 250 250 * It is possible for __vmalloc to fail to allocate memory because there 251 251 * are no pages available. A short sleep may allow the page reclaimer ··· 341 341 void *new_ptr) 342 342 { 343 343 int result; 344 + char *temp_ptr; 344 345 345 346 if (size == 0) { 346 347 vdo_free(ptr); ··· 349 348 return VDO_SUCCESS; 350 349 } 351 350 352 - result = vdo_allocate(size, char, what, new_ptr); 351 + result = vdo_allocate(size, what, &temp_ptr); 353 352 if (result != VDO_SUCCESS) 354 353 return result; 354 + *(void **) new_ptr = temp_ptr; 355 355 356 356 if (ptr != NULL) { 357 357 if (old_size < size) ··· 370 368 int result; 371 369 u8 *dup; 372 370 373 - result = vdo_allocate(strlen(string) + 1, u8, what, &dup); 371 + result = vdo_allocate(strlen(string) + 1, what, &dup); 374 372 if (result != VDO_SUCCESS) 375 373 return result; 376 374

+12 -62

drivers/md/dm-vdo/memory-alloc.h

··· 8 8 9 9 #include <linux/cache.h> 10 10 #include <linux/io.h> /* for PAGE_SIZE */ 11 + #include <linux/overflow.h> 11 12 12 13 #include "permassert.h" 13 14 #include "thread-registry.h" ··· 17 16 int __must_check vdo_allocate_memory(size_t size, size_t align, const char *what, void *ptr); 18 17 19 18 /* 20 - * Allocate storage based on element counts, sizes, and alignment. 21 - * 22 - * This is a generalized form of our allocation use case: It allocates an array of objects, 23 - * optionally preceded by one object of another type (i.e., a struct with trailing variable-length 24 - * array), with the alignment indicated. 25 - * 26 - * Why is this inline? The sizes and alignment will always be constant, when invoked through the 27 - * macros below, and often the count will be a compile-time constant 1 or the number of extra bytes 28 - * will be a compile-time constant 0. So at least some of the arithmetic can usually be optimized 29 - * away, and the run-time selection between allocation functions always can. In many cases, it'll 30 - * boil down to just a function call with a constant size. 31 - * 32 - * @count: The number of objects to allocate 33 - * @size: The size of an object 34 - * @extra: The number of additional bytes to allocate 35 - * @align: The required alignment 36 - * @what: What is being allocated (for error logging) 37 - * @ptr: A pointer to hold the allocated memory 38 - * 39 - * Return: VDO_SUCCESS or an error code 40 - */ 41 - static inline int __vdo_do_allocation(size_t count, size_t size, size_t extra, 42 - size_t align, const char *what, void *ptr) 43 - { 44 - size_t total_size = count * size + extra; 45 - 46 - /* Overflow check: */ 47 - if ((size > 0) && (count > ((SIZE_MAX - extra) / size))) { 48 - /* 49 - * This is kind of a hack: We rely on the fact that SIZE_MAX would cover the entire 50 - * address space (minus one byte) and thus the system can never allocate that much 51 - * and the call will always fail. So we can report an overflow as "out of memory" 52 - * by asking for "merely" SIZE_MAX bytes. 53 - */ 54 - total_size = SIZE_MAX; 55 - } 56 - 57 - return vdo_allocate_memory(total_size, align, what, ptr); 58 - } 59 - 60 - /* 61 19 * Allocate one or more elements of the indicated type, logging an error if the allocation fails. 62 20 * The memory will be zeroed. 63 21 * 64 22 * @COUNT: The number of objects to allocate 65 - * @TYPE: The type of objects to allocate. This type determines the alignment of the allocation. 66 23 * @WHAT: What is being allocated (for error logging) 67 24 * @PTR: A pointer to hold the allocated memory 68 25 * 69 26 * Return: VDO_SUCCESS or an error code 70 27 */ 71 - #define vdo_allocate(COUNT, TYPE, WHAT, PTR) \ 72 - __vdo_do_allocation(COUNT, sizeof(TYPE), 0, __alignof__(TYPE), WHAT, PTR) 28 + #define vdo_allocate(COUNT, WHAT, PTR) \ 29 + vdo_allocate_memory(size_mul((COUNT), sizeof(typeof(**(PTR)))), \ 30 + __alignof__(typeof(**(PTR))), WHAT, PTR) 73 31 74 32 /* 75 - * Allocate one object of an indicated type, followed by one or more elements of a second type, 76 - * logging an error if the allocation fails. The memory will be zeroed. 33 + * Allocate a structure with a flexible array member, with a specified number of elements, logging 34 + * an error if the allocation fails. The memory will be zeroed. 77 35 * 78 - * @TYPE1: The type of the primary object to allocate. This type determines the alignment of the 79 - * allocated memory. 80 36 * @COUNT: The number of objects to allocate 81 - * @TYPE2: The type of array objects to allocate 37 + * @FIELD: The flexible array field at the end of the structure 82 38 * @WHAT: What is being allocated (for error logging) 83 39 * @PTR: A pointer to hold the allocated memory 84 40 * 85 41 * Return: VDO_SUCCESS or an error code 86 42 */ 87 - #define vdo_allocate_extended(TYPE1, COUNT, TYPE2, WHAT, PTR) \ 88 - __extension__({ \ 89 - int _result; \ 90 - TYPE1 **_ptr = (PTR); \ 91 - BUILD_BUG_ON(__alignof__(TYPE1) < __alignof__(TYPE2)); \ 92 - _result = __vdo_do_allocation(COUNT, \ 93 - sizeof(TYPE2), \ 94 - sizeof(TYPE1), \ 95 - __alignof__(TYPE1), \ 96 - WHAT, \ 97 - _ptr); \ 98 - _result; \ 99 - }) 43 + #define vdo_allocate_extended(COUNT, FIELD, WHAT, PTR) \ 44 + vdo_allocate_memory(struct_size(*(PTR), FIELD, (COUNT)), \ 45 + __alignof__(typeof(**(PTR))), \ 46 + WHAT, \ 47 + (PTR)) 100 48 101 49 /* 102 50 * Allocate memory starting on a cache line boundary, logging an error if the allocation fails. The

+1 -1

drivers/md/dm-vdo/message-stats.c

··· 420 420 struct vdo_statistics *stats; 421 421 int result; 422 422 423 - result = vdo_allocate(1, struct vdo_statistics, __func__, &stats); 423 + result = vdo_allocate(1, __func__, &stats); 424 424 if (result != VDO_SUCCESS) { 425 425 vdo_log_error("Cannot allocate memory to write VDO statistics"); 426 426 return result;

+4 -5

drivers/md/dm-vdo/packer.c

··· 120 120 struct packer_bin *bin; 121 121 int result; 122 122 123 - result = vdo_allocate_extended(struct packer_bin, VDO_MAX_COMPRESSION_SLOTS, 124 - struct vio *, __func__, &bin); 123 + result = vdo_allocate_extended(VDO_MAX_COMPRESSION_SLOTS, incoming, __func__, &bin); 125 124 if (result != VDO_SUCCESS) 126 125 return result; 127 126 ··· 145 146 block_count_t i; 146 147 int result; 147 148 148 - result = vdo_allocate(1, struct packer, __func__, &packer); 149 + result = vdo_allocate(1, __func__, &packer); 149 150 if (result != VDO_SUCCESS) 150 151 return result; 151 152 ··· 167 168 * bin must have a canceler for which it is waiting, and any canceler will only have 168 169 * canceled one lock holder at a time. 169 170 */ 170 - result = vdo_allocate_extended(struct packer_bin, MAXIMUM_VDO_USER_VIOS / 2, 171 - struct vio *, __func__, &packer->canceled_bin); 171 + result = vdo_allocate_extended(MAXIMUM_VDO_USER_VIOS / 2, incoming, __func__, 172 + &packer->canceled_bin); 172 173 if (result != VDO_SUCCESS) { 173 174 vdo_free_packer(packer); 174 175 return result;

+3 -5

drivers/md/dm-vdo/physical-zone.c

··· 200 200 /** @idle_list: A list containing all idle PBN lock instances. */ 201 201 struct list_head idle_list; 202 202 /** @locks: The memory for all the locks allocated by this pool. */ 203 - idle_pbn_lock locks[]; 203 + idle_pbn_lock locks[] __counted_by(capacity); 204 204 }; 205 205 206 206 /** ··· 240 240 struct pbn_lock_pool *pool; 241 241 int result; 242 242 243 - result = vdo_allocate_extended(struct pbn_lock_pool, capacity, idle_pbn_lock, 244 - __func__, &pool); 243 + result = vdo_allocate_extended(capacity, locks, __func__, &pool); 245 244 if (result != VDO_SUCCESS) 246 245 return result; 247 246 ··· 367 368 if (zone_count == 0) 368 369 return VDO_SUCCESS; 369 370 370 - result = vdo_allocate_extended(struct physical_zones, zone_count, 371 - struct physical_zone, __func__, &zones); 371 + result = vdo_allocate_extended(zone_count, zones, __func__, &zones); 372 372 if (result != VDO_SUCCESS) 373 373 return result; 374 374

+1 -2

drivers/md/dm-vdo/priority-table.c

··· 60 60 if (max_priority > MAX_PRIORITY) 61 61 return UDS_INVALID_ARGUMENT; 62 62 63 - result = vdo_allocate_extended(struct priority_table, max_priority + 1, 64 - struct bucket, __func__, &table); 63 + result = vdo_allocate_extended(max_priority + 1, buckets, __func__, &table); 65 64 if (result != VDO_SUCCESS) 66 65 return result; 67 66

+9 -14

drivers/md/dm-vdo/recovery-journal.c

··· 593 593 struct thread_config *config = &vdo->thread_config; 594 594 struct lock_counter *counter = &journal->lock_counter; 595 595 596 - result = vdo_allocate(journal->size, u16, __func__, &counter->journal_counters); 596 + result = vdo_allocate(journal->size, __func__, &counter->journal_counters); 597 597 if (result != VDO_SUCCESS) 598 598 return result; 599 599 600 - result = vdo_allocate(journal->size, atomic_t, __func__, 601 - &counter->journal_decrement_counts); 600 + result = vdo_allocate(journal->size, __func__, &counter->journal_decrement_counts); 602 601 if (result != VDO_SUCCESS) 603 602 return result; 604 603 605 - result = vdo_allocate(journal->size * config->logical_zone_count, u16, __func__, 604 + result = vdo_allocate(journal->size * config->logical_zone_count, __func__, 606 605 &counter->logical_counters); 607 606 if (result != VDO_SUCCESS) 608 607 return result; 609 608 610 - result = vdo_allocate(journal->size, atomic_t, __func__, 611 - &counter->logical_zone_counts); 609 + result = vdo_allocate(journal->size, __func__, &counter->logical_zone_counts); 612 610 if (result != VDO_SUCCESS) 613 611 return result; 614 612 615 - result = vdo_allocate(journal->size * config->physical_zone_count, u16, __func__, 613 + result = vdo_allocate(journal->size * config->physical_zone_count, __func__, 616 614 &counter->physical_counters); 617 615 if (result != VDO_SUCCESS) 618 616 return result; 619 617 620 - result = vdo_allocate(journal->size, atomic_t, __func__, 621 - &counter->physical_zone_counts); 618 + result = vdo_allocate(journal->size, __func__, &counter->physical_zone_counts); 622 619 if (result != VDO_SUCCESS) 623 620 return result; 624 621 ··· 669 672 * Allocate a full block for the journal block even though not all of the space is used 670 673 * since the VIO needs to write a full disk block. 671 674 */ 672 - result = vdo_allocate(VDO_BLOCK_SIZE, char, __func__, &data); 675 + result = vdo_allocate(VDO_BLOCK_SIZE, __func__, &data); 673 676 if (result != VDO_SUCCESS) 674 677 return result; 675 678 ··· 708 711 struct recovery_journal *journal; 709 712 int result; 710 713 711 - result = vdo_allocate_extended(struct recovery_journal, 712 - RECOVERY_JOURNAL_RESERVED_BLOCKS, 713 - struct recovery_journal_block, __func__, 714 - &journal); 714 + result = vdo_allocate_extended(RECOVERY_JOURNAL_RESERVED_BLOCKS, blocks, 715 + __func__, &journal); 715 716 if (result != VDO_SUCCESS) 716 717 return result; 717 718

+6 -11

drivers/md/dm-vdo/repair.c

··· 127 127 * The page completions used for playing the journal into the block map, and, during 128 128 * read-only rebuild, for rebuilding the reference counts from the block map. 129 129 */ 130 - struct vdo_page_completion page_completions[]; 130 + struct vdo_page_completion page_completions[] __counted_by(page_count); 131 131 }; 132 132 133 133 /* ··· 1417 1417 * packed_recovery_journal_entry from every valid journal block. 1418 1418 */ 1419 1419 count = ((repair->highest_tail - repair->block_map_head + 1) * entries_per_block); 1420 - result = vdo_allocate(count, struct numbered_block_mapping, __func__, 1421 - &repair->entries); 1420 + result = vdo_allocate(count, __func__, &repair->entries); 1422 1421 if (result != VDO_SUCCESS) 1423 1422 return result; 1424 1423 ··· 1463 1464 * Allocate an array of numbered_block_mapping structs just large enough to transcribe 1464 1465 * every packed_recovery_journal_entry from every valid journal block. 1465 1466 */ 1466 - result = vdo_allocate(repair->entry_count, struct numbered_block_mapping, 1467 - __func__, &repair->entries); 1467 + result = vdo_allocate(repair->entry_count, __func__, &repair->entries); 1468 1468 if (result != VDO_SUCCESS) 1469 1469 return result; 1470 1470 ··· 1713 1715 vdo_log_warning("Device was dirty, rebuilding reference counts"); 1714 1716 } 1715 1717 1716 - result = vdo_allocate_extended(struct repair_completion, page_count, 1717 - struct vdo_page_completion, __func__, 1718 - &repair); 1718 + result = vdo_allocate_extended(page_count, page_completions, __func__, &repair); 1719 1719 if (result != VDO_SUCCESS) { 1720 1720 vdo_fail_completion(parent, result); 1721 1721 return; ··· 1725 1729 prepare_repair_completion(repair, finish_repair, VDO_ZONE_TYPE_ADMIN); 1726 1730 repair->page_count = page_count; 1727 1731 1728 - result = vdo_allocate(remaining * VDO_BLOCK_SIZE, char, __func__, 1729 - &repair->journal_data); 1732 + result = vdo_allocate(remaining * VDO_BLOCK_SIZE, __func__, &repair->journal_data); 1730 1733 if (abort_on_error(result, repair)) 1731 1734 return; 1732 1735 1733 - result = vdo_allocate(vio_count, struct vio, __func__, &repair->vios); 1736 + result = vdo_allocate(vio_count, __func__, &repair->vios); 1734 1737 if (abort_on_error(result, repair)) 1735 1738 return; 1736 1739

+18 -23

drivers/md/dm-vdo/slab-depot.c

··· 2453 2453 if (result != VDO_SUCCESS) 2454 2454 return result; 2455 2455 2456 - result = vdo_allocate(slab->reference_block_count, struct reference_block, 2457 - __func__, &slab->reference_blocks); 2456 + result = vdo_allocate(slab->reference_block_count, __func__, &slab->reference_blocks); 2458 2457 if (result != VDO_SUCCESS) 2459 2458 return result; 2460 2459 ··· 2462 2463 * so we can word-search even at the very end. 2463 2464 */ 2464 2465 bytes = (slab->reference_block_count * COUNTS_PER_BLOCK) + (2 * BYTES_PER_WORD); 2465 - result = vdo_allocate(bytes, vdo_refcount_t, "ref counts array", 2466 - &slab->counters); 2466 + result = vdo_allocate(bytes, "ref counts array", &slab->counters); 2467 2467 if (result != VDO_SUCCESS) { 2468 2468 vdo_free(vdo_forget(slab->reference_blocks)); 2469 2469 return result; ··· 3561 3563 struct slab_status *statuses; 3562 3564 struct slab_iterator iterator = get_slab_iterator(allocator); 3563 3565 3564 - result = vdo_allocate(allocator->slab_count, struct slab_status, __func__, 3565 - &statuses); 3566 + result = vdo_allocate(allocator->slab_count, __func__, &statuses); 3566 3567 if (result != VDO_SUCCESS) 3567 3568 return result; 3568 3569 ··· 3736 3739 const struct slab_config *slab_config = &slab->allocator->depot->slab_config; 3737 3740 int result; 3738 3741 3739 - result = vdo_allocate(slab_config->slab_journal_blocks, struct journal_lock, 3740 - __func__, &journal->locks); 3742 + result = vdo_allocate(slab_config->slab_journal_blocks, __func__, &journal->locks); 3741 3743 if (result != VDO_SUCCESS) 3742 3744 return result; 3743 3745 3744 - result = vdo_allocate(VDO_BLOCK_SIZE, char, "struct packed_slab_journal_block", 3745 - (char **) &journal->block); 3746 + BUILD_BUG_ON(sizeof(*journal->block) != VDO_BLOCK_SIZE); 3747 + result = vdo_allocate(1, "struct packed_slab_journal_block", &journal->block); 3746 3748 if (result != VDO_SUCCESS) 3747 3749 return result; 3748 3750 ··· 3796 3800 struct vdo_slab *slab; 3797 3801 int result; 3798 3802 3799 - result = vdo_allocate(1, struct vdo_slab, __func__, &slab); 3803 + result = vdo_allocate(1, __func__, &slab); 3800 3804 if (result != VDO_SUCCESS) 3801 3805 return result; 3802 3806 ··· 3853 3857 physical_block_number_t slab_origin; 3854 3858 int result; 3855 3859 3856 - result = vdo_allocate(slab_count, struct vdo_slab *, 3857 - "slab pointer array", &depot->new_slabs); 3860 + result = vdo_allocate(slab_count, "slab pointer array", &depot->new_slabs); 3858 3861 if (result != VDO_SUCCESS) 3859 3862 return result; 3860 3863 ··· 4006 4011 char *journal_data; 4007 4012 int result; 4008 4013 4009 - result = vdo_allocate(VDO_BLOCK_SIZE * slab_journal_size, 4010 - char, __func__, &journal_data); 4014 + result = vdo_allocate(VDO_BLOCK_SIZE * slab_journal_size, __func__, &journal_data); 4011 4015 if (result != VDO_SUCCESS) 4012 4016 return result; 4013 4017 ··· 4039 4045 struct slab_summary_block *block = &allocator->summary_blocks[index]; 4040 4046 int result; 4041 4047 4042 - result = vdo_allocate(VDO_BLOCK_SIZE, char, __func__, &block->outgoing_entries); 4048 + result = vdo_allocate(VDO_BLOCK_SIZE, __func__, &block->outgoing_entries); 4043 4049 if (result != VDO_SUCCESS) 4044 4050 return result; 4045 4051 ··· 4108 4114 if (result != VDO_SUCCESS) 4109 4115 return result; 4110 4116 4111 - result = vdo_allocate(VDO_SLAB_SUMMARY_BLOCKS_PER_ZONE, 4112 - struct slab_summary_block, __func__, 4117 + result = vdo_allocate(VDO_SLAB_SUMMARY_BLOCKS_PER_ZONE, __func__, 4113 4118 &allocator->summary_blocks); 4114 4119 if (result != VDO_SUCCESS) 4115 4120 return result; ··· 4167 4174 4168 4175 depot->summary_origin = summary_partition->offset; 4169 4176 depot->hint_shift = vdo_get_slab_summary_hint_shift(depot->slab_size_shift); 4170 - result = vdo_allocate(MAXIMUM_VDO_SLAB_SUMMARY_ENTRIES, 4171 - struct slab_summary_entry, __func__, 4177 + result = vdo_allocate(MAXIMUM_VDO_SLAB_SUMMARY_ENTRIES, __func__, 4172 4178 &depot->summary_entries); 4173 4179 if (result != VDO_SUCCESS) 4174 4180 return result; ··· 4254 4262 } 4255 4263 slab_size_shift = ilog2(slab_size); 4256 4264 4257 - result = vdo_allocate_extended(struct slab_depot, 4258 - vdo->thread_config.physical_zone_count, 4259 - struct block_allocator, __func__, &depot); 4265 + if (state.zone_count > MAX_VDO_PHYSICAL_ZONES) 4266 + return vdo_log_error_strerror(UDS_CORRUPT_DATA, 4267 + "invalid zone count"); 4268 + 4269 + result = vdo_allocate_extended(vdo->thread_config.physical_zone_count, 4270 + allocators, __func__, &depot); 4260 4271 if (result != VDO_SUCCESS) 4261 4272 return result; 4262 4273

+1 -1

drivers/md/dm-vdo/slab-depot.h

··· 509 509 struct slab_summary_entry *summary_entries; 510 510 511 511 /* The block allocators for this depot */ 512 - struct block_allocator allocators[]; 512 + struct block_allocator allocators[] __counted_by(zone_count); 513 513 }; 514 514 515 515 struct reference_updater;

+2

drivers/md/dm-vdo/status-codes.c

··· 80 80 81 81 /* VDO or UDS error */ 82 82 switch (error) { 83 + case VDO_BAD_CONFIGURATION: 84 + return -EINVAL; 83 85 case VDO_NO_SPACE: 84 86 return -ENOSPC; 85 87 case VDO_READ_ONLY:

+1 -1

drivers/md/dm-vdo/thread-utils.c

··· 56 56 struct thread *thread; 57 57 int result; 58 58 59 - result = vdo_allocate(1, struct thread, __func__, &thread); 59 + result = vdo_allocate(1, __func__, &thread); 60 60 if (result != VDO_SUCCESS) { 61 61 vdo_log_warning("Error allocating memory for %s", name); 62 62 return result;

+3

drivers/md/dm-vdo/types.h

··· 227 227 bool compression; 228 228 struct thread_count_config thread_counts; 229 229 block_count_t max_discard_blocks; 230 + block_count_t slab_blocks; 231 + int index_memory; 232 + bool index_sparse; 230 233 }; 231 234 232 235 enum vdo_completion_type {

+234 -101

drivers/md/dm-vdo/vdo.c

··· 34 34 #include <linux/lz4.h> 35 35 #include <linux/mutex.h> 36 36 #include <linux/spinlock.h> 37 + #include <linux/string.h> 37 38 #include <linux/types.h> 39 + #include <linux/uuid.h> 38 40 39 41 #include "logger.h" 40 42 #include "memory-alloc.h" ··· 57 55 #include "slab-depot.h" 58 56 #include "statistics.h" 59 57 #include "status-codes.h" 58 + #include "time-utils.h" 60 59 #include "vio.h" 61 60 62 61 #define PARANOID_THREAD_CONSISTENCY_CHECKS 0 ··· 210 207 config->hash_zone_count = counts.hash_zones; 211 208 } 212 209 213 - result = vdo_allocate(config->logical_zone_count, thread_id_t, 214 - "logical thread array", &config->logical_threads); 210 + result = vdo_allocate(config->logical_zone_count, "logical thread array", 211 + &config->logical_threads); 215 212 if (result != VDO_SUCCESS) { 216 213 uninitialize_thread_config(config); 217 214 return result; 218 215 } 219 216 220 - result = vdo_allocate(config->physical_zone_count, thread_id_t, 221 - "physical thread array", &config->physical_threads); 217 + result = vdo_allocate(config->physical_zone_count, "physical thread array", 218 + &config->physical_threads); 222 219 if (result != VDO_SUCCESS) { 223 220 uninitialize_thread_config(config); 224 221 return result; 225 222 } 226 223 227 - result = vdo_allocate(config->hash_zone_count, thread_id_t, 228 - "hash thread array", &config->hash_zone_threads); 224 + result = vdo_allocate(config->hash_zone_count, "hash thread array", 225 + &config->hash_zone_threads); 229 226 if (result != VDO_SUCCESS) { 230 227 uninitialize_thread_config(config); 231 228 return result; 232 229 } 233 230 234 - result = vdo_allocate(config->bio_thread_count, thread_id_t, 235 - "bio thread array", &config->bio_threads); 231 + result = vdo_allocate(config->bio_thread_count, "bio thread array", &config->bio_threads); 236 232 if (result != VDO_SUCCESS) { 237 233 uninitialize_thread_config(config); 238 234 return result; ··· 258 256 return VDO_SUCCESS; 259 257 } 260 258 261 - /** 262 - * read_geometry_block() - Synchronously read the geometry block from a vdo's underlying block 263 - * device. 264 - * @vdo: The vdo whose geometry is to be read. 265 - * 266 - * Return: VDO_SUCCESS or an error code. 267 - */ 268 - static int __must_check read_geometry_block(struct vdo *vdo) 259 + static int initialize_geometry_block(struct vdo *vdo, 260 + struct vdo_geometry_block *geometry_block) 269 261 { 270 - struct vio *vio; 271 - char *block; 272 262 int result; 273 263 274 - result = vdo_allocate(VDO_BLOCK_SIZE, u8, __func__, &block); 264 + result = vdo_allocate(VDO_BLOCK_SIZE, "encoded geometry block", 265 + (char **) &vdo->geometry_block.buffer); 275 266 if (result != VDO_SUCCESS) 276 267 return result; 277 268 278 - result = create_metadata_vio(vdo, VIO_TYPE_GEOMETRY, VIO_PRIORITY_HIGH, NULL, 279 - block, &vio); 280 - if (result != VDO_SUCCESS) { 281 - vdo_free(block); 269 + return allocate_vio_components(vdo, VIO_TYPE_GEOMETRY, 270 + VIO_PRIORITY_METADATA, NULL, 1, 271 + (char *) geometry_block->buffer, 272 + &vdo->geometry_block.vio); 273 + } 274 + 275 + static int initialize_super_block(struct vdo *vdo, struct vdo_super_block *super_block) 276 + { 277 + int result; 278 + 279 + result = vdo_allocate(VDO_BLOCK_SIZE, "encoded super block", 280 + (char **) &vdo->super_block.buffer); 281 + if (result != VDO_SUCCESS) 282 282 return result; 283 - } 284 283 285 - /* 286 - * This is only safe because, having not already loaded the geometry, the vdo's geometry's 287 - * bio_offset field is 0, so the fact that vio_reset_bio() will subtract that offset from 288 - * the supplied pbn is not a problem. 289 - */ 290 - result = vio_reset_bio(vio, block, NULL, REQ_OP_READ, 291 - VDO_GEOMETRY_BLOCK_LOCATION); 292 - if (result != VDO_SUCCESS) { 293 - free_vio(vdo_forget(vio)); 294 - vdo_free(block); 295 - return result; 296 - } 297 - 298 - bio_set_dev(vio->bio, vdo_get_backing_device(vdo)); 299 - submit_bio_wait(vio->bio); 300 - result = blk_status_to_errno(vio->bio->bi_status); 301 - free_vio(vdo_forget(vio)); 302 - if (result != 0) { 303 - vdo_log_error_strerror(result, "synchronous read failed"); 304 - vdo_free(block); 305 - return -EIO; 306 - } 307 - 308 - result = vdo_parse_geometry_block((u8 *) block, &vdo->geometry); 309 - vdo_free(block); 310 - return result; 284 + return allocate_vio_components(vdo, VIO_TYPE_SUPER_BLOCK, 285 + VIO_PRIORITY_METADATA, NULL, 1, 286 + (char *) super_block->buffer, 287 + &vdo->super_block.vio); 311 288 } 312 289 313 290 static bool get_zone_thread_name(const thread_id_t thread_ids[], zone_count_t count, ··· 434 453 } 435 454 436 455 /** 456 + * vdo_format() - Format a block device to function as a new VDO. 457 + * @vdo: The vdo to format. 458 + * @error_ptr: The reason for any failure during this call. 459 + * 460 + * This function must be called on a device before a VDO can be loaded for the first time. 461 + * Once a device has been formatted, the VDO can be loaded and shut down repeatedly. 462 + * If a new VDO is desired, this function should be called again. 463 + * 464 + * Return: VDO_SUCCESS or an error 465 + **/ 466 + static int __must_check vdo_format(struct vdo *vdo, char **error_ptr) 467 + { 468 + int result; 469 + uuid_t uuid; 470 + nonce_t nonce = current_time_us(); 471 + struct device_config *config = vdo->device_config; 472 + 473 + struct index_config index_config = { 474 + .mem = config->index_memory, 475 + .sparse = config->index_sparse, 476 + }; 477 + 478 + struct vdo_config vdo_config = { 479 + .logical_blocks = config->logical_blocks, 480 + .physical_blocks = config->physical_blocks, 481 + .slab_size = config->slab_blocks, 482 + .slab_journal_blocks = DEFAULT_VDO_SLAB_JOURNAL_SIZE, 483 + .recovery_journal_size = DEFAULT_VDO_RECOVERY_JOURNAL_SIZE, 484 + }; 485 + 486 + uuid_gen(&uuid); 487 + result = vdo_initialize_volume_geometry(nonce, &uuid, &index_config, &vdo->geometry); 488 + if (result != VDO_SUCCESS) { 489 + *error_ptr = "Could not initialize volume geometry during format"; 490 + return result; 491 + } 492 + 493 + result = vdo_initialize_component_states(&vdo_config, &vdo->geometry, nonce, &vdo->states); 494 + if (result == VDO_NO_SPACE) { 495 + block_count_t slab_blocks = config->slab_blocks; 496 + /* 1 is counting geometry block */ 497 + block_count_t fixed_layout_size = 1 + 498 + vdo->geometry.regions[VDO_DATA_REGION].start_block + 499 + DEFAULT_VDO_BLOCK_MAP_TREE_ROOT_COUNT + 500 + DEFAULT_VDO_RECOVERY_JOURNAL_SIZE + VDO_SLAB_SUMMARY_BLOCKS; 501 + block_count_t necessary_size = fixed_layout_size + slab_blocks; 502 + 503 + vdo_log_error("Minimum required size for VDO volume: %llu bytes", 504 + (unsigned long long) necessary_size * VDO_BLOCK_SIZE); 505 + *error_ptr = "Could not allocate enough space for VDO during format"; 506 + return result; 507 + } 508 + if (result != VDO_SUCCESS) { 509 + *error_ptr = "Could not initialize data layout during format"; 510 + return result; 511 + } 512 + 513 + vdo->needs_formatting = true; 514 + 515 + return VDO_SUCCESS; 516 + } 517 + 518 + /** 437 519 * initialize_vdo() - Do the portion of initializing a vdo which will clean up after itself on 438 520 * error. 439 521 * @vdo: The vdo being initialized ··· 519 475 vdo_initialize_completion(&vdo->admin.completion, vdo, VDO_ADMIN_COMPLETION); 520 476 init_completion(&vdo->admin.callback_sync); 521 477 mutex_init(&vdo->stats_mutex); 522 - result = read_geometry_block(vdo); 478 + 479 + result = initialize_geometry_block(vdo, &vdo->geometry_block); 480 + if (result != VDO_SUCCESS) { 481 + *reason = "Could not initialize geometry block"; 482 + return result; 483 + } 484 + 485 + result = initialize_super_block(vdo, &vdo->super_block); 486 + if (result != VDO_SUCCESS) { 487 + *reason = "Could not initialize super block"; 488 + return result; 489 + } 490 + 491 + result = vdo_submit_metadata_vio_wait(&vdo->geometry_block.vio, 492 + VDO_GEOMETRY_BLOCK_LOCATION, REQ_OP_READ); 523 493 if (result != VDO_SUCCESS) { 524 494 *reason = "Could not load geometry block"; 525 495 return result; 496 + } 497 + 498 + if (mem_is_zero(vdo->geometry_block.vio.data, VDO_BLOCK_SIZE)) { 499 + result = vdo_format(vdo, reason); 500 + if (result != VDO_SUCCESS) 501 + return result; 502 + } else { 503 + result = vdo_parse_geometry_block(vdo->geometry_block.buffer, 504 + &vdo->geometry); 505 + if (result != VDO_SUCCESS) { 506 + *reason = "Could not parse geometry block"; 507 + return result; 508 + } 526 509 } 527 510 528 511 result = initialize_thread_config(config->thread_counts, &vdo->thread_config); ··· 564 493 config->thread_counts.hash_zones, vdo->thread_config.thread_count); 565 494 566 495 /* Compression context storage */ 567 - result = vdo_allocate(config->thread_counts.cpu_threads, char *, "LZ4 context", 496 + result = vdo_allocate(config->thread_counts.cpu_threads, "LZ4 context", 568 497 &vdo->compression_context); 569 498 if (result != VDO_SUCCESS) { 570 499 *reason = "cannot allocate LZ4 context"; ··· 572 501 } 573 502 574 503 for (i = 0; i < config->thread_counts.cpu_threads; i++) { 575 - result = vdo_allocate(LZ4_MEM_COMPRESS, char, "LZ4 context", 504 + result = vdo_allocate(LZ4_MEM_COMPRESS, "LZ4 context", 576 505 &vdo->compression_context[i]); 577 506 if (result != VDO_SUCCESS) { 578 507 *reason = "cannot allocate LZ4 context"; ··· 608 537 /* Initialize with a generic failure reason to prevent returning garbage. */ 609 538 *reason = "Unspecified error"; 610 539 611 - result = vdo_allocate(1, struct vdo, __func__, &vdo); 540 + result = vdo_allocate(1, __func__, &vdo); 612 541 if (result != VDO_SUCCESS) { 613 542 *reason = "Cannot allocate VDO"; 614 543 return result; ··· 625 554 626 555 snprintf(vdo->thread_name_prefix, sizeof(vdo->thread_name_prefix), 627 556 "vdo%u", instance); 628 - result = vdo_allocate(vdo->thread_config.thread_count, 629 - struct vdo_thread, __func__, &vdo->threads); 557 + result = vdo_allocate(vdo->thread_config.thread_count, __func__, &vdo->threads); 630 558 if (result != VDO_SUCCESS) { 631 559 *reason = "Cannot allocate thread structures"; 632 560 return result; ··· 718 648 } 719 649 } 720 650 651 + static void uninitialize_geometry_block(struct vdo_geometry_block *geometry_block) 652 + { 653 + free_vio_components(&geometry_block->vio); 654 + vdo_free(geometry_block->buffer); 655 + } 656 + 721 657 static void uninitialize_super_block(struct vdo_super_block *super_block) 722 658 { 723 659 free_vio_components(&super_block->vio); ··· 771 695 vdo_uninitialize_layout(&vdo->next_layout); 772 696 if (vdo->partition_copier) 773 697 dm_kcopyd_client_destroy(vdo_forget(vdo->partition_copier)); 698 + uninitialize_geometry_block(&vdo->geometry_block); 774 699 uninitialize_super_block(&vdo->super_block); 775 700 vdo_free_block_map(vdo_forget(vdo->block_map)); 776 701 vdo_free_hash_zones(vdo_forget(vdo->hash_zones)); ··· 795 718 vdo_free(vdo_forget(vdo->compression_context)); 796 719 } 797 720 vdo_free(vdo); 798 - } 799 - 800 - static int initialize_super_block(struct vdo *vdo, struct vdo_super_block *super_block) 801 - { 802 - int result; 803 - 804 - result = vdo_allocate(VDO_BLOCK_SIZE, char, "encoded super block", 805 - (char **) &vdo->super_block.buffer); 806 - if (result != VDO_SUCCESS) 807 - return result; 808 - 809 - return allocate_vio_components(vdo, VIO_TYPE_SUPER_BLOCK, 810 - VIO_PRIORITY_METADATA, NULL, 1, 811 - (char *) super_block->buffer, 812 - &vdo->super_block.vio); 813 721 } 814 722 815 723 /** ··· 840 778 */ 841 779 void vdo_load_super_block(struct vdo *vdo, struct vdo_completion *parent) 842 780 { 843 - int result; 844 - 845 - result = initialize_super_block(vdo, &vdo->super_block); 846 - if (result != VDO_SUCCESS) { 847 - vdo_continue_completion(parent, result); 848 - return; 849 - } 850 - 851 781 vdo->super_block.vio.completion.parent = parent; 852 782 vdo_submit_metadata_vio(&vdo->super_block.vio, 853 783 vdo_get_data_region_start(vdo->geometry), ··· 953 899 vdo->states.layout = vdo->layout; 954 900 } 955 901 902 + static int __must_check clear_partition(struct vdo *vdo, enum partition_id id) 903 + { 904 + struct partition *partition; 905 + int result; 906 + 907 + result = vdo_get_partition(&vdo->states.layout, id, &partition); 908 + if (result != VDO_SUCCESS) 909 + return result; 910 + 911 + return blkdev_issue_zeroout(vdo_get_backing_device(vdo), 912 + partition->offset * VDO_SECTORS_PER_BLOCK, 913 + partition->count * VDO_SECTORS_PER_BLOCK, 914 + GFP_NOWAIT, 0); 915 + } 916 + 917 + int vdo_clear_layout(struct vdo *vdo) 918 + { 919 + int result; 920 + 921 + /* Zero out the uds index's first block. */ 922 + result = blkdev_issue_zeroout(vdo_get_backing_device(vdo), 923 + VDO_SECTORS_PER_BLOCK, 924 + VDO_SECTORS_PER_BLOCK, 925 + GFP_NOWAIT, 0); 926 + if (result != VDO_SUCCESS) 927 + return result; 928 + 929 + result = clear_partition(vdo, VDO_BLOCK_MAP_PARTITION); 930 + if (result != VDO_SUCCESS) 931 + return result; 932 + 933 + return clear_partition(vdo, VDO_RECOVERY_JOURNAL_PARTITION); 934 + } 935 + 956 936 /** 957 - * continue_super_block_parent() - Continue the parent of a super block save operation. 958 - * @completion: The super block vio. 937 + * continue_parent() - Continue the parent of a save operation. 938 + * @completion: The completion to continue. 959 939 * 960 - * This callback is registered in vdo_save_components(). 961 940 */ 962 - static void continue_super_block_parent(struct vdo_completion *completion) 941 + static void continue_parent(struct vdo_completion *completion) 963 942 { 964 943 vdo_continue_completion(vdo_forget(completion->parent), completion->result); 965 944 } 966 945 946 + static void handle_write_endio(struct bio *bio) 947 + { 948 + struct vio *vio = bio->bi_private; 949 + struct vdo_completion *parent = vio->completion.parent; 950 + 951 + continue_vio_after_io(vio, continue_parent, 952 + parent->callback_thread_id); 953 + } 954 + 967 955 /** 968 - * handle_save_error() - Log a super block save error. 956 + * handle_geometry_block_save_error() - Log a geometry block save error. 957 + * @completion: The super block vio. 958 + * 959 + * This error handler is registered in vdo_save_geometry_block(). 960 + */ 961 + static void handle_geometry_block_save_error(struct vdo_completion *completion) 962 + { 963 + struct vdo_geometry_block *geometry_block = 964 + container_of(as_vio(completion), struct vdo_geometry_block, vio); 965 + 966 + vio_record_metadata_io_error(&geometry_block->vio); 967 + vdo_log_error_strerror(completion->result, "geometry block save failed"); 968 + completion->callback(completion); 969 + } 970 + 971 + /** 972 + * vdo_save_geometry_block() - Encode the vdo and save the geometry block asynchronously. 973 + * @vdo: The vdo whose state is being saved. 974 + * @parent: The completion to notify when the save is complete. 975 + */ 976 + void vdo_save_geometry_block(struct vdo *vdo, struct vdo_completion *parent) 977 + { 978 + struct vdo_geometry_block *geometry_block = &vdo->geometry_block; 979 + 980 + vdo_encode_volume_geometry(geometry_block->buffer, &vdo->geometry, 981 + VDO_DEFAULT_GEOMETRY_BLOCK_VERSION); 982 + geometry_block->vio.completion.parent = parent; 983 + geometry_block->vio.completion.callback_thread_id = parent->callback_thread_id; 984 + vdo_submit_metadata_vio(&geometry_block->vio, 985 + VDO_GEOMETRY_BLOCK_LOCATION, 986 + handle_write_endio, handle_geometry_block_save_error, 987 + REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA); 988 + } 989 + 990 + /** 991 + * handle_super_block_save_error() - Log a super block save error. 969 992 * @completion: The super block vio. 970 993 * 971 994 * This error handler is registered in vdo_save_components(). 972 995 */ 973 - static void handle_save_error(struct vdo_completion *completion) 996 + static void handle_super_block_save_error(struct vdo_completion *completion) 974 997 { 975 998 struct vdo_super_block *super_block = 976 999 container_of(as_vio(completion), struct vdo_super_block, vio); ··· 1066 935 completion->callback(completion); 1067 936 } 1068 937 1069 - static void super_block_write_endio(struct bio *bio) 938 + /** 939 + * vdo_save_super_block() - Save the component states to the super block asynchronously. 940 + * @vdo: The vdo whose state is being saved. 941 + * @parent: The completion to notify when the save is complete. 942 + */ 943 + void vdo_save_super_block(struct vdo *vdo, struct vdo_completion *parent) 1070 944 { 1071 - struct vio *vio = bio->bi_private; 1072 - struct vdo_completion *parent = vio->completion.parent; 945 + struct vdo_super_block *super_block = &vdo->super_block; 1073 946 1074 - continue_vio_after_io(vio, continue_super_block_parent, 1075 - parent->callback_thread_id); 947 + vdo_encode_super_block(super_block->buffer, &vdo->states); 948 + super_block->vio.completion.parent = parent; 949 + super_block->vio.completion.callback_thread_id = parent->callback_thread_id; 950 + vdo_submit_metadata_vio(&super_block->vio, 951 + vdo_get_data_region_start(vdo->geometry), 952 + handle_write_endio, handle_super_block_save_error, 953 + REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA); 1076 954 } 1077 955 1078 956 /** 1079 - * vdo_save_components() - Encode the vdo and save the super block asynchronously. 957 + * vdo_save_components() - Copy the current state of the VDO to the states struct and save 958 + * it to the super block asynchronously. 1080 959 * @vdo: The vdo whose state is being saved. 1081 960 * @parent: The completion to notify when the save is complete. 1082 961 */ ··· 1105 964 } 1106 965 1107 966 record_vdo(vdo); 1108 - 1109 - vdo_encode_super_block(super_block->buffer, &vdo->states); 1110 - super_block->vio.completion.parent = parent; 1111 - super_block->vio.completion.callback_thread_id = parent->callback_thread_id; 1112 - vdo_submit_metadata_vio(&super_block->vio, 1113 - vdo_get_data_region_start(vdo->geometry), 1114 - super_block_write_endio, handle_save_error, 1115 - REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA); 967 + vdo_save_super_block(vdo, parent); 1116 968 } 1117 969 1118 970 /** ··· 1131 997 if (result != VDO_SUCCESS) 1132 998 return result; 1133 999 1134 - result = vdo_allocate(1, struct read_only_listener, __func__, 1135 - &read_only_listener); 1000 + result = vdo_allocate(1, __func__, &read_only_listener); 1136 1001 if (result != VDO_SUCCESS) 1137 1002 return result; 1138 1003

+19

drivers/md/dm-vdo/vdo.h

··· 144 144 145 145 struct thread_count_config; 146 146 147 + struct vdo_geometry_block { 148 + /* The vio for reading and writing the geometry block to disk */ 149 + struct vio vio; 150 + /* A buffer to hold the geometry block */ 151 + u8 *buffer; 152 + }; 153 + 147 154 struct vdo_super_block { 148 155 /* The vio for reading and writing the super block to disk */ 149 156 struct vio vio; ··· 192 185 struct device_config *device_config; 193 186 /* The thread mapping */ 194 187 struct thread_config thread_config; 188 + 189 + /* The geometry block */ 190 + struct vdo_geometry_block geometry_block; 195 191 196 192 /* The super block */ 197 193 struct vdo_super_block super_block; ··· 246 236 const struct admin_state_code *suspend_type; 247 237 bool allocations_allowed; 248 238 bool dump_on_shutdown; 239 + bool needs_formatting; 249 240 atomic_t processing_message; 250 241 251 242 /* ··· 315 304 316 305 void vdo_destroy(struct vdo *vdo); 317 306 307 + int __must_check vdo_format_components(struct vdo *vdo); 308 + 309 + void vdo_format_super_block(struct vdo *vdo, struct vdo_completion *parent); 310 + 318 311 void vdo_load_super_block(struct vdo *vdo, struct vdo_completion *parent); 319 312 320 313 struct block_device * __must_check vdo_get_backing_device(const struct vdo *vdo); ··· 340 325 enum vdo_state __must_check vdo_get_state(const struct vdo *vdo); 341 326 342 327 void vdo_set_state(struct vdo *vdo, enum vdo_state state); 328 + 329 + int vdo_clear_layout(struct vdo *vdo); 330 + void vdo_save_geometry_block(struct vdo *vdo, struct vdo_completion *parent); 331 + void vdo_save_super_block(struct vdo *vdo, struct vdo_completion *parent); 343 332 344 333 void vdo_save_components(struct vdo *vdo, struct vdo_completion *parent); 345 334

+5 -7

drivers/md/dm-vdo/vio.c

··· 52 52 struct bio *bio = NULL; 53 53 int result; 54 54 55 - result = vdo_allocate_extended(struct bio, size + 1, struct bio_vec, 56 - "bio", &bio); 55 + result = vdo_allocate_memory(sizeof(struct bio) + sizeof(struct bio_vec) * (size + 1), 56 + __alignof__(struct bio), "bio", &bio); 57 57 if (result != VDO_SUCCESS) 58 58 return result; 59 59 ··· 129 129 * Metadata vios should use direct allocation and not use the buffer pool, which is 130 130 * reserved for submissions from the linux block layer. 131 131 */ 132 - result = vdo_allocate(1, struct vio, __func__, &vio); 132 + result = vdo_allocate(1, __func__, &vio); 133 133 if (result != VDO_SUCCESS) { 134 134 vdo_log_error("metadata vio allocation failure %d", result); 135 135 return result; ··· 327 327 int result; 328 328 size_t per_vio_size = VDO_BLOCK_SIZE * block_count; 329 329 330 - result = vdo_allocate_extended(struct vio_pool, pool_size, struct pooled_vio, 331 - __func__, &pool); 330 + result = vdo_allocate_extended(pool_size, vios, __func__, &pool); 332 331 if (result != VDO_SUCCESS) 333 332 return result; 334 333 ··· 335 336 INIT_LIST_HEAD(&pool->available); 336 337 INIT_LIST_HEAD(&pool->busy); 337 338 338 - result = vdo_allocate(pool_size * per_vio_size, char, 339 - "VIO pool buffer", &pool->buffer); 339 + result = vdo_allocate(pool_size * per_vio_size, "VIO pool buffer", &pool->buffer); 340 340 if (result != VDO_SUCCESS) { 341 341 free_vio_pool(pool); 342 342 return result;

+174 -210

drivers/md/dm-verity-fec.c

··· 11 11 #define DM_MSG_PREFIX "verity-fec" 12 12 13 13 /* 14 - * When correcting a data block, the FEC code performs optimally when it can 15 - * collect all the associated RS blocks at the same time. As each byte is part 16 - * of a different RS block, there are '1 << data_dev_block_bits' RS blocks. 17 - * There are '1 << DM_VERITY_FEC_BUF_RS_BITS' RS blocks per buffer, so that 18 - * gives '1 << (data_dev_block_bits - DM_VERITY_FEC_BUF_RS_BITS)' buffers. 14 + * When correcting a block, the FEC implementation performs optimally when it 15 + * can collect all the associated RS codewords at the same time. As each byte 16 + * is part of a different codeword, there are '1 << data_dev_block_bits' 17 + * codewords. Each buffer has space for the message bytes for 18 + * '1 << DM_VERITY_FEC_BUF_RS_BITS' codewords, so that gives 19 + * '1 << (data_dev_block_bits - DM_VERITY_FEC_BUF_RS_BITS)' buffers. 19 20 */ 20 21 static inline unsigned int fec_max_nbufs(struct dm_verity *v) 21 22 { 22 23 return 1 << (v->data_dev_block_bits - DM_VERITY_FEC_BUF_RS_BITS); 23 24 } 24 25 25 - /* 26 - * Return an interleaved offset for a byte in RS block. 27 - */ 28 - static inline u64 fec_interleave(struct dm_verity *v, u64 offset) 29 - { 30 - u32 mod; 31 - 32 - mod = do_div(offset, v->fec->rsn); 33 - return offset + mod * (v->fec->rounds << v->data_dev_block_bits); 34 - } 35 - 36 - /* 37 - * Read error-correcting codes for the requested RS block. Returns a pointer 38 - * to the data block. Caller is responsible for releasing buf. 39 - */ 40 - static u8 *fec_read_parity(struct dm_verity *v, u64 rsb, int index, 41 - unsigned int *offset, unsigned int par_buf_offset, 42 - struct dm_buffer **buf, unsigned short ioprio) 43 - { 44 - u64 position, block, rem; 45 - u8 *res; 46 - 47 - /* We have already part of parity bytes read, skip to the next block */ 48 - if (par_buf_offset) 49 - index++; 50 - 51 - position = (index + rsb) * v->fec->roots; 52 - block = div64_u64_rem(position, v->fec->io_size, &rem); 53 - *offset = par_buf_offset ? 0 : (unsigned int)rem; 54 - 55 - res = dm_bufio_read_with_ioprio(v->fec->bufio, block, buf, ioprio); 56 - if (IS_ERR(res)) { 57 - DMERR("%s: FEC %llu: parity read failed (block %llu): %ld", 58 - v->data_dev->name, (unsigned long long)rsb, 59 - (unsigned long long)block, PTR_ERR(res)); 60 - *buf = NULL; 61 - } 62 - 63 - return res; 64 - } 65 - 66 26 /* Loop over each allocated buffer. */ 67 27 #define fec_for_each_buffer(io, __i) \ 68 28 for (__i = 0; __i < (io)->nbufs; __i++) 69 29 70 - /* Loop over each RS block in each allocated buffer. */ 71 - #define fec_for_each_buffer_rs_block(io, __i, __j) \ 30 + /* Loop over each RS message in each allocated buffer. */ 31 + /* To stop early, use 'goto', not 'break' (since this uses nested loops). */ 32 + #define fec_for_each_buffer_rs_message(io, __i, __j) \ 72 33 fec_for_each_buffer(io, __i) \ 73 34 for (__j = 0; __j < 1 << DM_VERITY_FEC_BUF_RS_BITS; __j++) 74 35 75 36 /* 76 - * Return a pointer to the current RS block when called inside 77 - * fec_for_each_buffer_rs_block. 37 + * Return a pointer to the current RS message when called inside 38 + * fec_for_each_buffer_rs_message. 78 39 */ 79 - static inline u8 *fec_buffer_rs_block(struct dm_verity *v, 80 - struct dm_verity_fec_io *fio, 81 - unsigned int i, unsigned int j) 40 + static inline u8 *fec_buffer_rs_message(struct dm_verity *v, 41 + struct dm_verity_fec_io *fio, 42 + unsigned int i, unsigned int j) 82 43 { 83 - return &fio->bufs[i][j * v->fec->rsn]; 44 + return &fio->bufs[i][j * v->fec->rs_k]; 84 45 } 85 46 86 47 /* 87 - * Return an index to the current RS block when called inside 88 - * fec_for_each_buffer_rs_block. 89 - */ 90 - static inline unsigned int fec_buffer_rs_index(unsigned int i, unsigned int j) 91 - { 92 - return (i << DM_VERITY_FEC_BUF_RS_BITS) + j; 93 - } 94 - 95 - /* 96 - * Decode all RS blocks from buffers and copy corrected bytes into fio->output 97 - * starting from block_offset. 48 + * Decode all RS codewords whose message bytes were loaded into fio->bufs. Copy 49 + * the corrected bytes into fio->output starting from out_pos. 98 50 */ 99 51 static int fec_decode_bufs(struct dm_verity *v, struct dm_verity_io *io, 100 - struct dm_verity_fec_io *fio, u64 rsb, int byte_index, 101 - unsigned int block_offset, int neras) 52 + struct dm_verity_fec_io *fio, u64 target_block, 53 + unsigned int target_region, u64 index_in_region, 54 + unsigned int out_pos, int neras) 102 55 { 103 - int r, corrected = 0, res; 56 + int r = 0, corrected = 0, res; 104 57 struct dm_buffer *buf; 105 - unsigned int n, i, j, offset, par_buf_offset = 0; 106 - uint16_t par_buf[DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN]; 107 - u8 *par, *block; 58 + unsigned int n, i, j, parity_pos, to_copy; 59 + uint16_t par_buf[DM_VERITY_FEC_MAX_ROOTS]; 60 + u8 *par, *msg_buf; 61 + u64 parity_block; 108 62 struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size); 109 63 110 - par = fec_read_parity(v, rsb, block_offset, &offset, 111 - par_buf_offset, &buf, bio->bi_ioprio); 112 - if (IS_ERR(par)) 64 + /* 65 + * Compute the index of the first parity block that will be needed and 66 + * the starting position in that block. Then read that block. 67 + * 68 + * block_size is always a power of 2, but roots might not be. Note that 69 + * when it's not, a codeword's parity bytes can span a block boundary. 70 + */ 71 + parity_block = ((index_in_region << v->data_dev_block_bits) + out_pos) * 72 + v->fec->roots; 73 + parity_pos = parity_block & (v->fec->block_size - 1); 74 + parity_block >>= v->data_dev_block_bits; 75 + par = dm_bufio_read_with_ioprio(v->fec->bufio, parity_block, &buf, 76 + bio->bi_ioprio); 77 + if (IS_ERR(par)) { 78 + DMERR("%s: FEC %llu: parity read failed (block %llu): %ld", 79 + v->data_dev->name, target_block, parity_block, 80 + PTR_ERR(par)); 113 81 return PTR_ERR(par); 82 + } 114 83 115 84 /* 116 - * Decode the RS blocks we have in bufs. Each RS block results in 117 - * one corrected target byte and consumes fec->roots parity bytes. 85 + * Decode the RS codewords whose message bytes are in bufs. Each RS 86 + * codeword results in one corrected target byte and consumes fec->roots 87 + * parity bytes. 118 88 */ 119 - fec_for_each_buffer_rs_block(fio, n, i) { 120 - block = fec_buffer_rs_block(v, fio, n, i); 121 - for (j = 0; j < v->fec->roots - par_buf_offset; j++) 122 - par_buf[par_buf_offset + j] = par[offset + j]; 123 - /* Decode an RS block using Reed-Solomon */ 124 - res = decode_rs8(fio->rs, block, par_buf, v->fec->rsn, 89 + fec_for_each_buffer_rs_message(fio, n, i) { 90 + msg_buf = fec_buffer_rs_message(v, fio, n, i); 91 + 92 + /* 93 + * Copy the next 'roots' parity bytes to 'par_buf', reading 94 + * another parity block if needed. 95 + */ 96 + to_copy = min(v->fec->block_size - parity_pos, v->fec->roots); 97 + for (j = 0; j < to_copy; j++) 98 + par_buf[j] = par[parity_pos++]; 99 + if (to_copy < v->fec->roots) { 100 + parity_block++; 101 + parity_pos = 0; 102 + 103 + dm_bufio_release(buf); 104 + par = dm_bufio_read_with_ioprio(v->fec->bufio, 105 + parity_block, &buf, 106 + bio->bi_ioprio); 107 + if (IS_ERR(par)) { 108 + DMERR("%s: FEC %llu: parity read failed (block %llu): %ld", 109 + v->data_dev->name, target_block, 110 + parity_block, PTR_ERR(par)); 111 + return PTR_ERR(par); 112 + } 113 + for (; j < v->fec->roots; j++) 114 + par_buf[j] = par[parity_pos++]; 115 + } 116 + 117 + /* Decode an RS codeword using the Reed-Solomon library. */ 118 + res = decode_rs8(fio->rs, msg_buf, par_buf, v->fec->rs_k, 125 119 NULL, neras, fio->erasures, 0, NULL); 126 120 if (res < 0) { 127 121 r = res; 128 - goto error; 129 - } 130 - 131 - corrected += res; 132 - fio->output[block_offset] = block[byte_index]; 133 - 134 - block_offset++; 135 - if (block_offset >= 1 << v->data_dev_block_bits) 136 122 goto done; 137 - 138 - /* Read the next block when we run out of parity bytes */ 139 - offset += (v->fec->roots - par_buf_offset); 140 - /* Check if parity bytes are split between blocks */ 141 - if (offset < v->fec->io_size && (offset + v->fec->roots) > v->fec->io_size) { 142 - par_buf_offset = v->fec->io_size - offset; 143 - for (j = 0; j < par_buf_offset; j++) 144 - par_buf[j] = par[offset + j]; 145 - offset += par_buf_offset; 146 - } else 147 - par_buf_offset = 0; 148 - 149 - if (offset >= v->fec->io_size) { 150 - dm_bufio_release(buf); 151 - 152 - par = fec_read_parity(v, rsb, block_offset, &offset, 153 - par_buf_offset, &buf, bio->bi_ioprio); 154 - if (IS_ERR(par)) 155 - return PTR_ERR(par); 156 123 } 124 + corrected += res; 125 + fio->output[out_pos++] = msg_buf[target_region]; 126 + 127 + if (out_pos >= v->fec->block_size) 128 + goto done; 157 129 } 158 130 done: 159 - r = corrected; 160 - error: 161 131 dm_bufio_release(buf); 162 132 163 133 if (r < 0 && neras) 164 134 DMERR_LIMIT("%s: FEC %llu: failed to correct: %d", 165 - v->data_dev->name, (unsigned long long)rsb, r); 166 - else if (r > 0) { 135 + v->data_dev->name, target_block, r); 136 + else if (r == 0) 167 137 DMWARN_LIMIT("%s: FEC %llu: corrected %d errors", 168 - v->data_dev->name, (unsigned long long)rsb, r); 169 - atomic64_inc(&v->fec->corrected); 170 - } 138 + v->data_dev->name, target_block, corrected); 171 139 172 140 return r; 173 141 } ··· 146 178 static int fec_is_erasure(struct dm_verity *v, struct dm_verity_io *io, 147 179 const u8 *want_digest, const u8 *data) 148 180 { 149 - if (unlikely(verity_hash(v, io, data, 1 << v->data_dev_block_bits, 181 + if (unlikely(verity_hash(v, io, data, v->fec->block_size, 150 182 io->tmp_digest))) 151 183 return 0; 152 184 ··· 154 186 } 155 187 156 188 /* 157 - * Read data blocks that are part of the RS block and deinterleave as much as 158 - * fits into buffers. Check for erasure locations if @neras is non-NULL. 189 + * Read the message block at index @index_in_region within each of the 190 + * @v->fec->rs_k regions and deinterleave their contents into @io->fec_io->bufs. 191 + * 192 + * @target_block gives the index of specific block within this sequence that is 193 + * being corrected, relative to the start of all the FEC message blocks. 194 + * 195 + * @out_pos gives the current output position, i.e. the position in (each) block 196 + * from which to start the deinterleaving. Deinterleaving continues until 197 + * either end-of-block is reached or there's no more buffer space. 198 + * 199 + * If @neras is non-NULL, then also use verity hashes and the presence/absence 200 + * of I/O errors to determine which of the message blocks in the sequence are 201 + * likely to be incorrect. Write the number of such blocks to *@neras and the 202 + * indices of the corresponding RS message bytes in [0, k - 1] to 203 + * @io->fec_io->erasures, up to a limit of @v->fec->roots + 1 such blocks. 159 204 */ 160 205 static int fec_read_bufs(struct dm_verity *v, struct dm_verity_io *io, 161 - u64 rsb, u64 target, unsigned int block_offset, 162 - int *neras) 206 + u64 target_block, u64 index_in_region, 207 + unsigned int out_pos, int *neras) 163 208 { 164 209 bool is_zero; 165 - int i, j, target_index = -1; 210 + int i, j; 166 211 struct dm_buffer *buf; 167 212 struct dm_bufio_client *bufio; 168 213 struct dm_verity_fec_io *fio = io->fec_io; 169 - u64 block, ileaved; 170 - u8 *bbuf, *rs_block; 214 + u64 block; 215 + u8 *bbuf; 171 216 u8 want_digest[HASH_MAX_DIGESTSIZE]; 172 - unsigned int n, k; 217 + unsigned int n, src_pos; 173 218 struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size); 174 219 175 220 if (neras) ··· 191 210 if (WARN_ON(v->digest_size > sizeof(want_digest))) 192 211 return -EINVAL; 193 212 194 - /* 195 - * read each of the rsn data blocks that are part of the RS block, and 196 - * interleave contents to available bufs 197 - */ 198 - for (i = 0; i < v->fec->rsn; i++) { 199 - ileaved = fec_interleave(v, rsb * v->fec->rsn + i); 200 - 213 + for (i = 0; i < v->fec->rs_k; i++) { 201 214 /* 202 - * target is the data block we want to correct, target_index is 203 - * the index of this block within the rsn RS blocks 215 + * Read the block from region i. It contains the i'th message 216 + * byte of the target block's RS codewords. 204 217 */ 205 - if (ileaved == target) 206 - target_index = i; 207 - 208 - block = ileaved >> v->data_dev_block_bits; 218 + block = i * v->fec->region_blocks + index_in_region; 209 219 bufio = v->fec->data_bufio; 210 220 211 221 if (block >= v->data_blocks) { ··· 216 244 bbuf = dm_bufio_read_with_ioprio(bufio, block, &buf, bio->bi_ioprio); 217 245 if (IS_ERR(bbuf)) { 218 246 DMWARN_LIMIT("%s: FEC %llu: read failed (%llu): %ld", 219 - v->data_dev->name, 220 - (unsigned long long)rsb, 221 - (unsigned long long)block, PTR_ERR(bbuf)); 247 + v->data_dev->name, target_block, block, 248 + PTR_ERR(bbuf)); 222 249 223 250 /* assume the block is corrupted */ 224 251 if (neras && *neras <= v->fec->roots) ··· 244 273 } 245 274 246 275 /* 247 - * deinterleave and copy the bytes that fit into bufs, 248 - * starting from block_offset 276 + * Deinterleave the bytes of the block, starting from 'out_pos', 277 + * into the i'th byte of the RS message buffers. Stop when 278 + * end-of-block is reached or there are no more buffers. 249 279 */ 250 - fec_for_each_buffer_rs_block(fio, n, j) { 251 - k = fec_buffer_rs_index(n, j) + block_offset; 252 - 253 - if (k >= 1 << v->data_dev_block_bits) 280 + src_pos = out_pos; 281 + fec_for_each_buffer_rs_message(fio, n, j) { 282 + if (src_pos >= v->fec->block_size) 254 283 goto done; 255 - 256 - rs_block = fec_buffer_rs_block(v, fio, n, j); 257 - rs_block[i] = bbuf[k]; 284 + fec_buffer_rs_message(v, fio, n, j)[i] = bbuf[src_pos++]; 258 285 } 259 286 done: 260 287 dm_bufio_release(buf); 261 288 } 262 - 263 - return target_index; 289 + return 0; 264 290 } 265 291 266 292 /* ··· 304 336 unsigned int n; 305 337 306 338 fec_for_each_buffer(fio, n) 307 - memset(fio->bufs[n], 0, v->fec->rsn << DM_VERITY_FEC_BUF_RS_BITS); 339 + memset(fio->bufs[n], 0, v->fec->rs_k << DM_VERITY_FEC_BUF_RS_BITS); 308 340 309 341 memset(fio->erasures, 0, sizeof(fio->erasures)); 310 342 } 311 343 312 344 /* 313 - * Decode all RS blocks in a single data block and return the target block 314 - * (indicated by @offset) in fio->output. If @use_erasures is non-zero, uses 315 - * hashes to locate erasures. 345 + * Try to correct the message (data or hash) block at index @target_block. 346 + * 347 + * If @use_erasures is true, use verity hashes to locate erasures. This makes 348 + * the error correction slower but up to twice as capable. 349 + * 350 + * On success, return 0 and write the corrected block to @fio->output. 0 is 351 + * returned only if the digest of the corrected block matches @want_digest; this 352 + * is critical to ensure that FEC can't cause dm-verity to return bad data. 316 353 */ 317 - static int fec_decode_rsb(struct dm_verity *v, struct dm_verity_io *io, 318 - struct dm_verity_fec_io *fio, u64 rsb, u64 offset, 319 - const u8 *want_digest, bool use_erasures) 354 + static int fec_decode(struct dm_verity *v, struct dm_verity_io *io, 355 + struct dm_verity_fec_io *fio, u64 target_block, 356 + const u8 *want_digest, bool use_erasures) 320 357 { 321 358 int r, neras = 0; 322 - unsigned int pos; 359 + unsigned int target_region, out_pos; 360 + u64 index_in_region; 323 361 324 - for (pos = 0; pos < 1 << v->data_dev_block_bits; ) { 362 + /* 363 + * Compute 'target_region', the index of the region the target block is 364 + * in; and 'index_in_region', the index of the target block within its 365 + * region. The latter value is also the index within its region of each 366 + * message block that shares its RS codewords with the target block. 367 + */ 368 + target_region = div64_u64_rem(target_block, v->fec->region_blocks, 369 + &index_in_region); 370 + if (WARN_ON_ONCE(target_region >= v->fec->rs_k)) 371 + /* target_block is out-of-bounds. Should never happen. */ 372 + return -EIO; 373 + 374 + for (out_pos = 0; out_pos < v->fec->block_size;) { 325 375 fec_init_bufs(v, fio); 326 376 327 - r = fec_read_bufs(v, io, rsb, offset, pos, 377 + r = fec_read_bufs(v, io, target_block, index_in_region, out_pos, 328 378 use_erasures ? &neras : NULL); 329 379 if (unlikely(r < 0)) 330 380 return r; 331 381 332 - r = fec_decode_bufs(v, io, fio, rsb, r, pos, neras); 382 + r = fec_decode_bufs(v, io, fio, target_block, target_region, 383 + index_in_region, out_pos, neras); 333 384 if (r < 0) 334 385 return r; 335 386 336 - pos += fio->nbufs << DM_VERITY_FEC_BUF_RS_BITS; 387 + out_pos += fio->nbufs << DM_VERITY_FEC_BUF_RS_BITS; 337 388 } 338 389 339 390 /* Always re-validate the corrected block against the expected hash */ 340 - r = verity_hash(v, io, fio->output, 1 << v->data_dev_block_bits, 341 - io->tmp_digest); 391 + r = verity_hash(v, io, fio->output, v->fec->block_size, io->tmp_digest); 342 392 if (unlikely(r < 0)) 343 393 return r; 344 394 345 395 if (memcmp(io->tmp_digest, want_digest, v->digest_size)) { 346 396 DMERR_LIMIT("%s: FEC %llu: failed to correct (%d erasures)", 347 - v->data_dev->name, (unsigned long long)rsb, neras); 397 + v->data_dev->name, target_block, neras); 348 398 return -EILSEQ; 349 399 } 350 400 ··· 376 390 { 377 391 int r; 378 392 struct dm_verity_fec_io *fio; 379 - u64 offset, res, rsb; 380 393 381 394 if (!verity_fec_is_enabled(v)) 382 395 return -EOPNOTSUPP; ··· 393 408 block = block - v->hash_start + v->data_blocks; 394 409 395 410 /* 396 - * For RS(M, N), the continuous FEC data is divided into blocks of N 397 - * bytes. Since block size may not be divisible by N, the last block 398 - * is zero padded when decoding. 399 - * 400 - * Each byte of the block is covered by a different RS(M, N) code, 401 - * and each code is interleaved over N blocks to make it less likely 402 - * that bursty corruption will leave us in unrecoverable state. 403 - */ 404 - 405 - offset = block << v->data_dev_block_bits; 406 - res = div64_u64(offset, v->fec->rounds << v->data_dev_block_bits); 407 - 408 - /* 409 - * The base RS block we can feed to the interleaver to find out all 410 - * blocks required for decoding. 411 - */ 412 - rsb = offset - res * (v->fec->rounds << v->data_dev_block_bits); 413 - 414 - /* 415 411 * Locating erasures is slow, so attempt to recover the block without 416 412 * them first. Do a second attempt with erasures if the corruption is 417 413 * bad enough. 418 414 */ 419 - r = fec_decode_rsb(v, io, fio, rsb, offset, want_digest, false); 415 + r = fec_decode(v, io, fio, block, want_digest, false); 420 416 if (r < 0) { 421 - r = fec_decode_rsb(v, io, fio, rsb, offset, want_digest, true); 417 + r = fec_decode(v, io, fio, block, want_digest, true); 422 418 if (r < 0) 423 419 goto done; 424 420 } 425 421 426 - memcpy(dest, fio->output, 1 << v->data_dev_block_bits); 422 + memcpy(dest, fio->output, v->fec->block_size); 423 + atomic64_inc(&v->fec->corrected); 427 424 428 425 done: 429 426 fio->level--; ··· 552 585 553 586 } else if (!strcasecmp(arg_name, DM_VERITY_OPT_FEC_ROOTS)) { 554 587 if (sscanf(arg_value, "%hhu%c", &num_c, &dummy) != 1 || !num_c || 555 - num_c < (DM_VERITY_FEC_RSM - DM_VERITY_FEC_MAX_RSN) || 556 - num_c > (DM_VERITY_FEC_RSM - DM_VERITY_FEC_MIN_RSN)) { 588 + num_c < DM_VERITY_FEC_MIN_ROOTS || 589 + num_c > DM_VERITY_FEC_MAX_ROOTS) { 557 590 ti->error = "Invalid " DM_VERITY_OPT_FEC_ROOTS; 558 591 return -EINVAL; 559 592 } ··· 592 625 { 593 626 struct dm_verity_fec *f = v->fec; 594 627 struct dm_target *ti = v->ti; 595 - u64 hash_blocks, fec_blocks; 628 + u64 hash_blocks; 596 629 int ret; 597 630 598 631 if (!verity_fec_is_enabled(v)) { ··· 615 648 * hash device after the hash blocks. 616 649 */ 617 650 618 - hash_blocks = v->hash_blocks - v->hash_start; 651 + hash_blocks = v->hash_end - v->hash_start; 619 652 620 653 /* 621 654 * Require matching block sizes for data and hash devices for ··· 625 658 ti->error = "Block sizes must match to use FEC"; 626 659 return -EINVAL; 627 660 } 661 + f->block_size = 1 << v->data_dev_block_bits; 628 662 629 663 if (!f->roots) { 630 664 ti->error = "Missing " DM_VERITY_OPT_FEC_ROOTS; 631 665 return -EINVAL; 632 666 } 633 - f->rsn = DM_VERITY_FEC_RSM - f->roots; 667 + f->rs_k = DM_VERITY_FEC_RS_N - f->roots; 634 668 635 669 if (!f->blocks) { 636 670 ti->error = "Missing " DM_VERITY_OPT_FEC_BLOCKS; 637 671 return -EINVAL; 638 672 } 639 673 640 - f->rounds = f->blocks; 641 - if (sector_div(f->rounds, f->rsn)) 642 - f->rounds++; 674 + f->region_blocks = f->blocks; 675 + if (sector_div(f->region_blocks, f->rs_k)) 676 + f->region_blocks++; 643 677 644 678 /* 645 679 * Due to optional metadata, f->blocks can be larger than 646 680 * data_blocks and hash_blocks combined. 647 681 */ 648 - if (f->blocks < v->data_blocks + hash_blocks || !f->rounds) { 682 + if (f->blocks < v->data_blocks + hash_blocks || !f->region_blocks) { 649 683 ti->error = "Invalid " DM_VERITY_OPT_FEC_BLOCKS; 650 684 return -EINVAL; 651 685 } ··· 656 688 * it to be large enough. 657 689 */ 658 690 f->hash_blocks = f->blocks - v->data_blocks; 659 - if (dm_bufio_get_device_size(v->bufio) < f->hash_blocks) { 691 + if (dm_bufio_get_device_size(v->bufio) < 692 + v->hash_start + f->hash_blocks) { 660 693 ti->error = "Hash device is too small for " 661 694 DM_VERITY_OPT_FEC_BLOCKS; 662 695 return -E2BIG; 663 696 } 664 697 665 - f->io_size = 1 << v->data_dev_block_bits; 666 - 667 - f->bufio = dm_bufio_client_create(f->dev->bdev, 668 - f->io_size, 698 + f->bufio = dm_bufio_client_create(f->dev->bdev, f->block_size, 669 699 1, 0, NULL, NULL, 0); 670 700 if (IS_ERR(f->bufio)) { 671 701 ti->error = "Cannot initialize FEC bufio client"; ··· 672 706 673 707 dm_bufio_set_sector_offset(f->bufio, f->start << (v->data_dev_block_bits - SECTOR_SHIFT)); 674 708 675 - fec_blocks = div64_u64(f->rounds * f->roots, v->fec->roots << SECTOR_SHIFT); 676 - if (dm_bufio_get_device_size(f->bufio) < fec_blocks) { 709 + if (dm_bufio_get_device_size(f->bufio) < f->region_blocks * f->roots) { 677 710 ti->error = "FEC device is too small"; 678 711 return -E2BIG; 679 712 } 680 713 681 - f->data_bufio = dm_bufio_client_create(v->data_dev->bdev, 682 - 1 << v->data_dev_block_bits, 714 + f->data_bufio = dm_bufio_client_create(v->data_dev->bdev, f->block_size, 683 715 1, 0, NULL, NULL, 0); 684 716 if (IS_ERR(f->data_bufio)) { 685 717 ti->error = "Cannot initialize FEC data bufio client"; ··· 707 743 } 708 744 709 745 f->cache = kmem_cache_create("dm_verity_fec_buffers", 710 - f->rsn << DM_VERITY_FEC_BUF_RS_BITS, 746 + f->rs_k << DM_VERITY_FEC_BUF_RS_BITS, 711 747 0, 0, NULL); 712 748 if (!f->cache) { 713 749 ti->error = "Cannot create FEC buffer cache"; ··· 724 760 725 761 /* Preallocate an output buffer for each thread */ 726 762 ret = mempool_init_kmalloc_pool(&f->output_pool, num_online_cpus(), 727 - 1 << v->data_dev_block_bits); 763 + f->block_size); 728 764 if (ret) { 729 765 ti->error = "Cannot allocate FEC output pool"; 730 766 return ret;

+14 -14

drivers/md/dm-verity-fec.h

··· 11 11 #include "dm-verity.h" 12 12 #include <linux/rslib.h> 13 13 14 - /* Reed-Solomon(M, N) parameters */ 15 - #define DM_VERITY_FEC_RSM 255 16 - #define DM_VERITY_FEC_MAX_RSN 253 17 - #define DM_VERITY_FEC_MIN_RSN 231 /* ~10% space overhead */ 14 + /* Reed-Solomon(n, k) parameters */ 15 + #define DM_VERITY_FEC_RS_N 255 16 + #define DM_VERITY_FEC_MIN_ROOTS 2 /* RS(255, 253): ~0.8% space overhead */ 17 + #define DM_VERITY_FEC_MAX_ROOTS 24 /* RS(255, 231): ~10% space overhead */ 18 18 19 19 /* buffers for deinterleaving and decoding */ 20 - #define DM_VERITY_FEC_BUF_RS_BITS 4 /* 1 << RS blocks per buffer */ 20 + #define DM_VERITY_FEC_BUF_RS_BITS 4 /* log2(RS messages per buffer) */ 21 21 22 22 #define DM_VERITY_OPT_FEC_DEV "use_fec_from_device" 23 23 #define DM_VERITY_OPT_FEC_BLOCKS "fec_blocks" ··· 29 29 struct dm_dev *dev; /* parity data device */ 30 30 struct dm_bufio_client *data_bufio; /* for data dev access */ 31 31 struct dm_bufio_client *bufio; /* for parity data access */ 32 - size_t io_size; /* IO size for roots */ 32 + size_t block_size; /* size of data, hash, and parity blocks in bytes */ 33 33 sector_t start; /* parity data start in blocks */ 34 34 sector_t blocks; /* number of blocks covered */ 35 - sector_t rounds; /* number of interleaving rounds */ 35 + sector_t region_blocks; /* blocks per region: ceil(blocks / rs_k) */ 36 36 sector_t hash_blocks; /* blocks covered after v->hash_start */ 37 - unsigned char roots; /* number of parity bytes, M-N of RS(M, N) */ 38 - unsigned char rsn; /* N of RS(M, N) */ 37 + unsigned char roots; /* parity bytes per RS codeword, n-k of RS(n, k) */ 38 + unsigned char rs_k; /* message bytes per RS codeword, k of RS(n, k) */ 39 39 mempool_t fio_pool; /* mempool for dm_verity_fec_io */ 40 40 mempool_t rs_pool; /* mempool for fio->rs */ 41 41 mempool_t prealloc_pool; /* mempool for preallocated buffers */ ··· 47 47 /* per-bio data */ 48 48 struct dm_verity_fec_io { 49 49 struct rs_control *rs; /* Reed-Solomon state */ 50 - int erasures[DM_VERITY_FEC_MAX_RSN]; /* erasures for decode_rs8 */ 50 + int erasures[DM_VERITY_FEC_MAX_ROOTS + 1]; /* erasures for decode_rs8 */ 51 51 u8 *output; /* buffer for corrected output */ 52 52 unsigned int level; /* recursion level */ 53 53 unsigned int nbufs; /* number of buffers allocated */ 54 54 /* 55 - * Buffers for deinterleaving RS blocks. Each buffer has space for 56 - * the data bytes of (1 << DM_VERITY_FEC_BUF_RS_BITS) RS blocks. The 57 - * array length is fec_max_nbufs(v), and we try to allocate that many 58 - * buffers. However, in low-memory situations we may be unable to 55 + * Buffers for deinterleaving RS codewords. Each buffer has space for 56 + * the message bytes of (1 << DM_VERITY_FEC_BUF_RS_BITS) RS codewords. 57 + * The array length is fec_max_nbufs(v), and we try to allocate that 58 + * many buffers. However, in low-memory situations we may be unable to 59 59 * allocate all buffers. 'nbufs' holds the number actually allocated. 60 60 */ 61 61 u8 *bufs[];

+5 -11

drivers/md/dm-verity-target.c

··· 733 733 734 734 hash_block_start &= ~(sector_t)(cluster - 1); 735 735 hash_block_end |= cluster - 1; 736 - if (unlikely(hash_block_end >= v->hash_blocks)) 737 - hash_block_end = v->hash_blocks - 1; 736 + if (unlikely(hash_block_end >= v->hash_end)) 737 + hash_block_end = v->hash_end - 1; 738 738 } 739 739 no_prefetch_cluster: 740 740 dm_bufio_prefetch_with_ioprio(v->bufio, hash_block_start, ··· 1011 1011 { 1012 1012 struct dm_verity *v = ti->private; 1013 1013 1014 - if (limits->logical_block_size < 1 << v->data_dev_block_bits) 1015 - limits->logical_block_size = 1 << v->data_dev_block_bits; 1016 - 1017 - if (limits->physical_block_size < 1 << v->data_dev_block_bits) 1018 - limits->physical_block_size = 1 << v->data_dev_block_bits; 1019 - 1020 - limits->io_min = limits->logical_block_size; 1014 + dm_stack_bs_limits(limits, 1 << v->data_dev_block_bits); 1021 1015 1022 1016 /* 1023 1017 * Similar to what dm-crypt does, opt dm-verity out of support for ··· 1601 1607 } 1602 1608 hash_position += s; 1603 1609 } 1604 - v->hash_blocks = hash_position; 1610 + v->hash_end = hash_position; 1605 1611 1606 1612 r = mempool_init_page_pool(&v->recheck_pool, 1, 0); 1607 1613 if (unlikely(r)) { ··· 1628 1634 goto bad; 1629 1635 } 1630 1636 1631 - if (dm_bufio_get_device_size(v->bufio) < v->hash_blocks) { 1637 + if (dm_bufio_get_device_size(v->bufio) < v->hash_end) { 1632 1638 ti->error = "Hash device is too small"; 1633 1639 r = -E2BIG; 1634 1640 goto bad;

+2 -2

drivers/md/dm-verity.h

··· 53 53 unsigned int sig_size; /* root digest signature size */ 54 54 #endif /* CONFIG_SECURITY */ 55 55 unsigned int salt_size; 56 - sector_t hash_start; /* hash start in blocks */ 56 + sector_t hash_start; /* index of first hash block on hash_dev */ 57 + sector_t hash_end; /* 1 + index of last hash block on hash dev */ 57 58 sector_t data_blocks; /* the number of data blocks */ 58 - sector_t hash_blocks; /* the number of hash blocks */ 59 59 unsigned char data_dev_block_bits; /* log2(data blocksize) */ 60 60 unsigned char hash_dev_block_bits; /* log2(hash blocksize) */ 61 61 unsigned char hash_per_block_bits; /* log2(hashes in hash block) */

+1 -9

drivers/md/dm-writecache.c

··· 1640 1640 { 1641 1641 struct dm_writecache *wc = ti->private; 1642 1642 1643 - if (limits->logical_block_size < wc->block_size) 1644 - limits->logical_block_size = wc->block_size; 1645 - 1646 - if (limits->physical_block_size < wc->block_size) 1647 - limits->physical_block_size = wc->block_size; 1648 - 1649 - if (limits->io_min < wc->block_size) 1650 - limits->io_min = wc->block_size; 1643 + dm_stack_bs_limits(limits, wc->block_size); 1651 1644 } 1652 - 1653 1645 1654 1646 static void writecache_writeback_endio(struct bio *bio) 1655 1647 {

+7

include/linux/device-mapper.h

··· 755 755 return (n << SECTOR_SHIFT); 756 756 } 757 757 758 + static inline void dm_stack_bs_limits(struct queue_limits *limits, unsigned int bs) 759 + { 760 + limits->logical_block_size = max(limits->logical_block_size, bs); 761 + limits->physical_block_size = max(limits->physical_block_size, bs); 762 + limits->io_min = max(limits->io_min, bs); 763 + } 764 + 758 765 #endif /* _LINUX_DEVICE_MAPPER_H */

Configure Feed

Configure Feed