Merge tag 'xfs-merge-6.18' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

+14 -55

Documentation/admin-guide/xfs.rst

··· 34 34 to the file. Specifying a fixed ``allocsize`` value turns off 35 35 the dynamic behaviour. 36 36 37 - attr2 or noattr2 38 - The options enable/disable an "opportunistic" improvement to 39 - be made in the way inline extended attributes are stored 40 - on-disk. When the new form is used for the first time when 41 - ``attr2`` is selected (either when setting or removing extended 42 - attributes) the on-disk superblock feature bit field will be 43 - updated to reflect this format being in use. 44 - 45 - The default behaviour is determined by the on-disk feature 46 - bit indicating that ``attr2`` behaviour is active. If either 47 - mount option is set, then that becomes the new default used 48 - by the filesystem. 49 - 50 - CRC enabled filesystems always use the ``attr2`` format, and so 51 - will reject the ``noattr2`` mount option if it is set. 52 - 53 37 discard or nodiscard (default) 54 38 Enable/disable the issuing of commands to let the block 55 39 device reclaim space freed by the filesystem. This is ··· 58 74 Make the data allocator use the filestreams allocation mode 59 75 across the entire filesystem rather than just on directories 60 76 configured to use it. 61 - 62 - ikeep or noikeep (default) 63 - When ``ikeep`` is specified, XFS does not delete empty inode 64 - clusters and keeps them around on disk. When ``noikeep`` is 65 - specified, empty inode clusters are returned to the free 66 - space pool. 67 77 68 78 inode32 or inode64 (default) 69 79 When ``inode32`` is specified, it indicates that XFS limits ··· 231 253 232 254 The deprecation will take place in two parts. Support for mounting V4 233 255 filesystems can now be disabled at kernel build time via Kconfig option. 234 - The option will default to yes until September 2025, at which time it 235 - will be changed to default to no. In September 2030, support will be 236 - removed from the codebase entirely. 256 + These options were changed to default to no in September 2025. In 257 + September 2030, support will be removed from the codebase entirely. 237 258 238 259 Note: Distributors may choose to withdraw V4 format support earlier than 239 260 the dates listed above. ··· 245 268 ============================ ================ 246 269 Mounting with V4 filesystem September 2030 247 270 Mounting ascii-ci filesystem September 2030 248 - ikeep/noikeep September 2025 249 - attr2/noattr2 September 2025 250 271 ============================ ================ 251 272 252 273 ··· 260 285 osyncisdsync/osyncisosync v4.0 261 286 barrier v4.19 262 287 nobarrier v4.19 288 + ikeep/noikeep v6.18 289 + attr2/noattr2 v6.18 263 290 =========================== ======= 264 291 265 292 sysctls ··· 289 312 removes unused preallocation from clean inodes and releases 290 313 the unused space back to the free pool. 291 314 292 - fs.xfs.speculative_cow_prealloc_lifetime 293 - This is an alias for speculative_prealloc_lifetime. 294 - 295 315 fs.xfs.error_level (Min: 0 Default: 3 Max: 11) 296 316 A volume knob for error reporting when internal errors occur. 297 317 This will generate detailed messages & backtraces for filesystem ··· 314 340 XFS_PTAG_VERIFIER_ERROR 0x00000100 315 341 316 342 This option is intended for debugging only. 317 - 318 - fs.xfs.irix_symlink_mode (Min: 0 Default: 0 Max: 1) 319 - Controls whether symlinks are created with mode 0777 (default) 320 - or whether their mode is affected by the umask (irix mode). 321 - 322 - fs.xfs.irix_sgid_inherit (Min: 0 Default: 0 Max: 1) 323 - Controls files created in SGID directories. 324 - If the group ID of the new file does not match the effective group 325 - ID or one of the supplementary group IDs of the parent dir, the 326 - ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl 327 - is set. 328 343 329 344 fs.xfs.inherit_sync (Min: 0 Default: 1 Max: 1) 330 345 Setting this to "1" will cause the "sync" flag set ··· 350 387 Deprecated Sysctls 351 388 ================== 352 389 353 - =========================================== ================ 354 - Name Removal Schedule 355 - =========================================== ================ 356 - fs.xfs.irix_sgid_inherit September 2025 357 - fs.xfs.irix_symlink_mode September 2025 358 - fs.xfs.speculative_cow_prealloc_lifetime September 2025 359 - =========================================== ================ 360 - 390 + None currently. 361 391 362 392 Removed Sysctls 363 393 =============== 364 394 365 - ============================= ======= 366 - Name Removed 367 - ============================= ======= 368 - fs.xfs.xfsbufd_centisec v4.0 369 - fs.xfs.age_buffer_centisecs v4.0 370 - ============================= ======= 395 + ========================================== ======= 396 + Name Removed 397 + ========================================== ======= 398 + fs.xfs.xfsbufd_centisec v4.0 399 + fs.xfs.age_buffer_centisecs v4.0 400 + fs.xfs.irix_symlink_mode v6.18 401 + fs.xfs.irix_sgid_inherit v6.18 402 + fs.xfs.speculative_cow_prealloc_lifetime v6.18 403 + ========================================== ======= 371 404 372 405 Error handling 373 406 ==============

+6 -16

fs/xfs/Kconfig

··· 25 25 config XFS_SUPPORT_V4 26 26 bool "Support deprecated V4 (crc=0) format" 27 27 depends on XFS_FS 28 - default y 28 + default n 29 29 help 30 30 The V4 filesystem format lacks certain features that are supported 31 31 by the V5 format, such as metadata checksumming, strengthened ··· 40 40 filesystem is a V4 filesystem. If no such string is found, please 41 41 upgrade xfsprogs to the latest version and try again. 42 42 43 - This option will become default N in September 2025. Support for the 43 + This option became default N in September 2025. Support for the 44 44 V4 format will be removed entirely in September 2030. Distributors 45 45 can say N here to withdraw support earlier. 46 46 ··· 50 50 config XFS_SUPPORT_ASCII_CI 51 51 bool "Support deprecated case-insensitive ascii (ascii-ci=1) format" 52 52 depends on XFS_FS 53 - default y 53 + default n 54 54 help 55 55 The ASCII case insensitivity filesystem feature only works correctly 56 56 on systems that have been coerced into using ISO 8859-1, and it does ··· 67 67 filesystem is a case-insensitive filesystem. If no such string is 68 68 found, please upgrade xfsprogs to the latest version and try again. 69 69 70 - This option will become default N in September 2025. Support for the 70 + This option became default N in September 2025. Support for the 71 71 feature will be removed entirely in September 2030. Distributors 72 72 can say N here to withdraw support earlier. 73 73 ··· 137 137 138 138 config XFS_ONLINE_SCRUB 139 139 bool "XFS online metadata check support" 140 - default n 140 + default y 141 141 depends on XFS_FS 142 142 depends on TMPFS && SHMEM 143 143 select XFS_LIVE_HOOKS ··· 150 150 advantage here is to look for problems proactively so that 151 151 they can be dealt with in a controlled manner. 152 152 153 - This feature is considered EXPERIMENTAL. Use with caution! 154 - 155 153 See the xfs_scrub man page in section 8 for additional information. 156 - 157 - If unsure, say N. 158 154 159 155 config XFS_ONLINE_SCRUB_STATS 160 156 bool "XFS online metadata check usage data collection" ··· 167 171 168 172 Usage data are collected in /sys/kernel/debug/xfs/scrub. 169 173 170 - If unsure, say N. 171 - 172 174 config XFS_ONLINE_REPAIR 173 175 bool "XFS online metadata repair support" 174 - default n 176 + default y 175 177 depends on XFS_FS && XFS_ONLINE_SCRUB 176 178 select XFS_BTREE_IN_MEM 177 179 help ··· 180 186 formatted with secondary metadata, such as reverse mappings and inode 181 187 parent pointers. 182 188 183 - This feature is considered EXPERIMENTAL. Use with caution! 184 - 185 189 See the xfs_scrub man page in section 8 for additional information. 186 - 187 - If unsure, say N. 188 190 189 191 config XFS_WARN 190 192 bool "XFS Verbose Warnings"

+3 -4

fs/xfs/libxfs/xfs_ag_resv.c

··· 92 92 trace_xfs_ag_resv_critical(pag, type, avail); 93 93 94 94 /* Critically low if less than 10% or max btree height remains. */ 95 - return XFS_TEST_ERROR(avail < orig / 10 || 96 - avail < mp->m_agbtree_maxlevels, 97 - mp, XFS_ERRTAG_AG_RESV_CRITICAL); 95 + return avail < orig / 10 || avail < mp->m_agbtree_maxlevels || 96 + XFS_TEST_ERROR(mp, XFS_ERRTAG_AG_RESV_CRITICAL); 98 97 } 99 98 100 99 /* ··· 202 203 return -EINVAL; 203 204 } 204 205 205 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL)) 206 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_AG_RESV_FAIL)) 206 207 error = -ENOSPC; 207 208 else 208 209 error = xfs_dec_fdblocks(mp, hidden_space, true);

+2 -3

fs/xfs/libxfs/xfs_alloc.c

··· 3321 3321 xfs_verifier_error(bp, -EFSBADCRC, __this_address); 3322 3322 else { 3323 3323 fa = xfs_agf_verify(bp); 3324 - if (XFS_TEST_ERROR(fa, mp, XFS_ERRTAG_ALLOC_READ_AGF)) 3324 + if (fa || XFS_TEST_ERROR(mp, XFS_ERRTAG_ALLOC_READ_AGF)) 3325 3325 xfs_verifier_error(bp, -EFSCORRUPTED, fa); 3326 3326 } 3327 3327 } ··· 4019 4019 ASSERT(len != 0); 4020 4020 ASSERT(type != XFS_AG_RESV_AGFL); 4021 4021 4022 - if (XFS_TEST_ERROR(false, mp, 4023 - XFS_ERRTAG_FREE_EXTENT)) 4022 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_FREE_EXTENT)) 4024 4023 return -EIO; 4025 4024 4026 4025 error = xfs_free_extent_fix_freelist(tp, pag, &agbp);

+6 -19

fs/xfs/libxfs/xfs_attr_leaf.c

··· 667 667 668 668 /* 669 669 * For attr2 we can try to move the forkoff if there is space in the 670 - * literal area, but for the old format we are done if there is no 671 - * space in the fixed attribute fork. 670 + * literal area 672 671 */ 673 - if (!xfs_has_attr2(mp)) 674 - return 0; 675 - 676 672 dsize = dp->i_df.if_bytes; 677 673 678 674 switch (dp->i_df.if_format) { ··· 719 723 } 720 724 721 725 /* 722 - * Switch on the ATTR2 superblock bit (implies also FEATURES2) unless: 723 - * - noattr2 mount option is set, 724 - * - on-disk version bit says it is already set, or 725 - * - the attr2 mount option is not set to enable automatic upgrade from attr1. 726 + * Switch on the ATTR2 superblock bit (implies also FEATURES2) unless 727 + * on-disk version bit says it is already set 726 728 */ 727 729 STATIC void 728 730 xfs_sbversion_add_attr2( 729 731 struct xfs_mount *mp, 730 732 struct xfs_trans *tp) 731 733 { 732 - if (xfs_has_noattr2(mp)) 733 - return; 734 734 if (mp->m_sb.sb_features2 & XFS_SB_VERSION2_ATTR2BIT) 735 - return; 736 - if (!xfs_has_attr2(mp)) 737 735 return; 738 736 739 737 spin_lock(&mp->m_sb_lock); ··· 879 889 /* 880 890 * Fix up the start offset of the attribute fork 881 891 */ 882 - if (totsize == sizeof(struct xfs_attr_sf_hdr) && xfs_has_attr2(mp) && 892 + if (totsize == sizeof(struct xfs_attr_sf_hdr) && 883 893 (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) && 884 894 !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE)) && 885 895 !xfs_has_parent(mp)) { ··· 890 900 ASSERT(dp->i_forkoff); 891 901 ASSERT(totsize > sizeof(struct xfs_attr_sf_hdr) || 892 902 (args->op_flags & XFS_DA_OP_ADDNAME) || 893 - !xfs_has_attr2(mp) || 894 903 dp->i_df.if_format == XFS_DINODE_FMT_BTREE || 895 904 xfs_has_parent(mp)); 896 905 xfs_trans_log_inode(args->trans, dp, ··· 1029 1040 bytes += xfs_attr_sf_entsize_byname(name_loc->namelen, 1030 1041 be16_to_cpu(name_loc->valuelen)); 1031 1042 } 1032 - if (xfs_has_attr2(dp->i_mount) && 1033 - (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) && 1043 + if ((dp->i_df.if_format != XFS_DINODE_FMT_BTREE) && 1034 1044 (bytes == sizeof(struct xfs_attr_sf_hdr))) 1035 1045 return -1; 1036 1046 return xfs_attr_shortform_bytesfit(dp, bytes); ··· 1149 1161 * this case. 1150 1162 */ 1151 1163 if (!(args->op_flags & XFS_DA_OP_REPLACE)) { 1152 - ASSERT(xfs_has_attr2(dp->i_mount)); 1153 1164 ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE); 1154 1165 xfs_attr_fork_remove(dp, args->trans); 1155 1166 } ··· 1212 1225 1213 1226 trace_xfs_attr_leaf_to_node(args); 1214 1227 1215 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_ATTR_LEAF_TO_NODE)) { 1228 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_ATTR_LEAF_TO_NODE)) { 1216 1229 error = -EIO; 1217 1230 goto out; 1218 1231 }

+11 -20

fs/xfs/libxfs/xfs_bmap.c

··· 997 997 static int 998 998 xfs_bmap_set_attrforkoff( 999 999 struct xfs_inode *ip, 1000 - int size, 1001 - int *version) 1000 + int size) 1002 1001 { 1003 1002 int default_size = xfs_default_attroffset(ip) >> 3; 1004 1003 ··· 1011 1012 ip->i_forkoff = xfs_attr_shortform_bytesfit(ip, size); 1012 1013 if (!ip->i_forkoff) 1013 1014 ip->i_forkoff = default_size; 1014 - else if (xfs_has_attr2(ip->i_mount) && version) 1015 - *version = 2; 1016 1015 break; 1017 1016 default: 1018 1017 ASSERT(0); ··· 1032 1035 int rsvd) /* xact may use reserved blks */ 1033 1036 { 1034 1037 struct xfs_mount *mp = tp->t_mountp; 1035 - int version = 1; /* superblock attr version */ 1036 1038 int logflags; /* logging flags */ 1037 1039 int error; /* error return value */ 1038 1040 ··· 1041 1045 ASSERT(!xfs_inode_has_attr_fork(ip)); 1042 1046 1043 1047 xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); 1044 - error = xfs_bmap_set_attrforkoff(ip, size, &version); 1048 + error = xfs_bmap_set_attrforkoff(ip, size); 1045 1049 if (error) 1046 1050 return error; 1047 1051 ··· 1065 1069 xfs_trans_log_inode(tp, ip, logflags); 1066 1070 if (error) 1067 1071 return error; 1068 - if (!xfs_has_attr(mp) || 1069 - (!xfs_has_attr2(mp) && version == 2)) { 1072 + if (!xfs_has_attr(mp)) { 1070 1073 bool log_sb = false; 1071 1074 1072 1075 spin_lock(&mp->m_sb_lock); 1073 1076 if (!xfs_has_attr(mp)) { 1074 1077 xfs_add_attr(mp); 1075 - log_sb = true; 1076 - } 1077 - if (!xfs_has_attr2(mp) && version == 2) { 1078 1078 xfs_add_attr2(mp); 1079 1079 log_sb = true; 1080 1080 } ··· 3654 3662 /* Trim the allocation back to the maximum an AG can fit. */ 3655 3663 args.maxlen = min(ap->length, mp->m_ag_max_usable); 3656 3664 3657 - if (unlikely(XFS_TEST_ERROR(false, mp, 3658 - XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) 3665 + if (unlikely(XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT))) 3659 3666 error = xfs_bmap_exact_minlen_extent_alloc(ap, &args); 3660 3667 else if ((ap->datatype & XFS_ALLOC_USERDATA) && 3661 3668 xfs_inode_is_filestream(ap->ip)) ··· 3840 3849 } 3841 3850 3842 3851 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || 3843 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 3852 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 3844 3853 xfs_bmap_mark_sick(ip, whichfork); 3845 3854 return -EFSCORRUPTED; 3846 3855 } ··· 4191 4200 (XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO)); 4192 4201 4193 4202 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || 4194 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 4203 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 4195 4204 xfs_bmap_mark_sick(ip, whichfork); 4196 4205 return -EFSCORRUPTED; 4197 4206 } ··· 4536 4545 (XFS_BMAPI_ATTRFORK | XFS_BMAPI_PREALLOC)); 4537 4546 4538 4547 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || 4539 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 4548 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 4540 4549 xfs_bmap_mark_sick(ip, whichfork); 4541 4550 return -EFSCORRUPTED; 4542 4551 } ··· 5670 5679 int logflags = 0; 5671 5680 5672 5681 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || 5673 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 5682 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 5674 5683 xfs_bmap_mark_sick(ip, whichfork); 5675 5684 return -EFSCORRUPTED; 5676 5685 } ··· 5786 5795 int logflags = 0; 5787 5796 5788 5797 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || 5789 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 5798 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 5790 5799 xfs_bmap_mark_sick(ip, whichfork); 5791 5800 return -EFSCORRUPTED; 5792 5801 } ··· 5891 5900 int i = 0; 5892 5901 5893 5902 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(ifp)) || 5894 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 5903 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 5895 5904 xfs_bmap_mark_sick(ip, whichfork); 5896 5905 return -EFSCORRUPTED; 5897 5906 } ··· 6056 6065 6057 6066 trace_xfs_bmap_deferred(bi); 6058 6067 6059 - if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_BMAP_FINISH_ONE)) 6068 + if (XFS_TEST_ERROR(tp->t_mountp, XFS_ERRTAG_BMAP_FINISH_ONE)) 6060 6069 return -EIO; 6061 6070 6062 6071 switch (bi->bi_type) {

+1 -1

fs/xfs/libxfs/xfs_btree.c

··· 306 306 307 307 fa = __xfs_btree_check_block(cur, block, level, bp); 308 308 if (XFS_IS_CORRUPT(mp, fa != NULL) || 309 - XFS_TEST_ERROR(false, mp, xfs_btree_block_errtag(cur))) { 309 + XFS_TEST_ERROR(mp, xfs_btree_block_errtag(cur))) { 310 310 if (bp) 311 311 trace_xfs_btree_corrupt(bp, _RET_IP_); 312 312 xfs_btree_mark_sick(cur);

+1 -1

fs/xfs/libxfs/xfs_da_btree.c

··· 565 565 566 566 trace_xfs_da_split(state->args); 567 567 568 - if (XFS_TEST_ERROR(false, state->mp, XFS_ERRTAG_DA_LEAF_SPLIT)) 568 + if (XFS_TEST_ERROR(state->mp, XFS_ERRTAG_DA_LEAF_SPLIT)) 569 569 return -EIO; 570 570 571 571 /*

+1 -1

fs/xfs/libxfs/xfs_dir2.c

··· 223 223 bool ino_ok = xfs_verify_dir_ino(mp, ino); 224 224 225 225 if (XFS_IS_CORRUPT(mp, !ino_ok) || 226 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_DIR_INO_VALIDATE)) { 226 + XFS_TEST_ERROR(mp, XFS_ERRTAG_DIR_INO_VALIDATE)) { 227 227 xfs_warn(mp, "Invalid inode number 0x%Lx", 228 228 (unsigned long long) ino); 229 229 return -EFSCORRUPTED;

+67 -47

fs/xfs/libxfs/xfs_errortag.h

··· 4 4 * Copyright (C) 2017 Oracle. 5 5 * All Rights Reserved. 6 6 */ 7 - #ifndef __XFS_ERRORTAG_H_ 7 + #if !defined(__XFS_ERRORTAG_H_) || defined(XFS_ERRTAG) 8 8 #define __XFS_ERRORTAG_H_ 9 9 10 10 /* 11 - * error injection tags - the labels can be anything you want 12 - * but each tag should have its own unique number 11 + * There are two ways to use this header file. The first way is to #include it 12 + * bare, which will define all the XFS_ERRTAG_* error injection knobs for use 13 + * with the XFS_TEST_ERROR macro. The second way is to enclose the #include 14 + * with a #define for an XFS_ERRTAG macro, in which case the header will define 15 + " an XFS_ERRTAGS macro that expands to invoke that XFS_ERRTAG macro for each 16 + * defined error injection knob. 13 17 */ 14 18 19 + /* 20 + * These are the actual error injection tags. The numbers should be consecutive 21 + * because arrays are sized based on the maximum. 22 + */ 15 23 #define XFS_ERRTAG_NOERROR 0 16 24 #define XFS_ERRTAG_IFLUSH_1 1 17 25 #define XFS_ERRTAG_IFLUSH_2 2 ··· 79 71 * Random factors for above tags, 1 means always, 2 means 1/2 time, etc. 80 72 */ 81 73 #define XFS_RANDOM_DEFAULT 100 82 - #define XFS_RANDOM_IFLUSH_1 XFS_RANDOM_DEFAULT 83 - #define XFS_RANDOM_IFLUSH_2 XFS_RANDOM_DEFAULT 84 - #define XFS_RANDOM_IFLUSH_3 XFS_RANDOM_DEFAULT 85 - #define XFS_RANDOM_IFLUSH_4 XFS_RANDOM_DEFAULT 86 - #define XFS_RANDOM_IFLUSH_5 XFS_RANDOM_DEFAULT 87 - #define XFS_RANDOM_IFLUSH_6 XFS_RANDOM_DEFAULT 88 - #define XFS_RANDOM_DA_READ_BUF XFS_RANDOM_DEFAULT 89 - #define XFS_RANDOM_BTREE_CHECK_LBLOCK (XFS_RANDOM_DEFAULT/4) 90 - #define XFS_RANDOM_BTREE_CHECK_SBLOCK XFS_RANDOM_DEFAULT 91 - #define XFS_RANDOM_ALLOC_READ_AGF XFS_RANDOM_DEFAULT 92 - #define XFS_RANDOM_IALLOC_READ_AGI XFS_RANDOM_DEFAULT 93 - #define XFS_RANDOM_ITOBP_INOTOBP XFS_RANDOM_DEFAULT 94 - #define XFS_RANDOM_IUNLINK XFS_RANDOM_DEFAULT 95 - #define XFS_RANDOM_IUNLINK_REMOVE XFS_RANDOM_DEFAULT 96 - #define XFS_RANDOM_DIR_INO_VALIDATE XFS_RANDOM_DEFAULT 97 - #define XFS_RANDOM_BULKSTAT_READ_CHUNK XFS_RANDOM_DEFAULT 98 - #define XFS_RANDOM_IODONE_IOERR (XFS_RANDOM_DEFAULT/10) 99 - #define XFS_RANDOM_STRATREAD_IOERR (XFS_RANDOM_DEFAULT/10) 100 - #define XFS_RANDOM_STRATCMPL_IOERR (XFS_RANDOM_DEFAULT/10) 101 - #define XFS_RANDOM_DIOWRITE_IOERR (XFS_RANDOM_DEFAULT/10) 102 - #define XFS_RANDOM_BMAPIFORMAT XFS_RANDOM_DEFAULT 103 - #define XFS_RANDOM_FREE_EXTENT 1 104 - #define XFS_RANDOM_RMAP_FINISH_ONE 1 105 - #define XFS_RANDOM_REFCOUNT_CONTINUE_UPDATE 1 106 - #define XFS_RANDOM_REFCOUNT_FINISH_ONE 1 107 - #define XFS_RANDOM_BMAP_FINISH_ONE 1 108 - #define XFS_RANDOM_AG_RESV_CRITICAL 4 109 - #define XFS_RANDOM_LOG_BAD_CRC 1 110 - #define XFS_RANDOM_LOG_ITEM_PIN 1 111 - #define XFS_RANDOM_BUF_LRU_REF 2 112 - #define XFS_RANDOM_FORCE_SCRUB_REPAIR 1 113 - #define XFS_RANDOM_FORCE_SUMMARY_RECALC 1 114 - #define XFS_RANDOM_IUNLINK_FALLBACK (XFS_RANDOM_DEFAULT/10) 115 - #define XFS_RANDOM_BUF_IOERROR XFS_RANDOM_DEFAULT 116 - #define XFS_RANDOM_REDUCE_MAX_IEXTENTS 1 117 - #define XFS_RANDOM_BMAP_ALLOC_MINLEN_EXTENT 1 118 - #define XFS_RANDOM_AG_RESV_FAIL 1 119 - #define XFS_RANDOM_LARP 1 120 - #define XFS_RANDOM_DA_LEAF_SPLIT 1 121 - #define XFS_RANDOM_ATTR_LEAF_TO_NODE 1 122 - #define XFS_RANDOM_WB_DELAY_MS 3000 123 - #define XFS_RANDOM_WRITE_DELAY_MS 3000 124 - #define XFS_RANDOM_EXCHMAPS_FINISH_ONE 1 125 - #define XFS_RANDOM_METAFILE_RESV_CRITICAL 4 74 + 75 + /* 76 + * Table of errror injection knobs. The parameters to the XFS_ERRTAG macro are: 77 + * 1. The XFS_ERRTAG_ flag but without the prefix; 78 + * 2. The name of the sysfs knob; and 79 + * 3. The default value for the knob. 80 + */ 81 + #ifdef XFS_ERRTAG 82 + # undef XFS_ERRTAGS 83 + # define XFS_ERRTAGS \ 84 + XFS_ERRTAG(NOERROR, noerror, XFS_RANDOM_DEFAULT) \ 85 + XFS_ERRTAG(IFLUSH_1, iflush1, XFS_RANDOM_DEFAULT) \ 86 + XFS_ERRTAG(IFLUSH_2, iflush2, XFS_RANDOM_DEFAULT) \ 87 + XFS_ERRTAG(IFLUSH_3, iflush3, XFS_RANDOM_DEFAULT) \ 88 + XFS_ERRTAG(IFLUSH_4, iflush4, XFS_RANDOM_DEFAULT) \ 89 + XFS_ERRTAG(IFLUSH_5, iflush5, XFS_RANDOM_DEFAULT) \ 90 + XFS_ERRTAG(IFLUSH_6, iflush6, XFS_RANDOM_DEFAULT) \ 91 + XFS_ERRTAG(DA_READ_BUF, dareadbuf, XFS_RANDOM_DEFAULT) \ 92 + XFS_ERRTAG(BTREE_CHECK_LBLOCK, btree_chk_lblk, XFS_RANDOM_DEFAULT/4) \ 93 + XFS_ERRTAG(BTREE_CHECK_SBLOCK, btree_chk_sblk, XFS_RANDOM_DEFAULT) \ 94 + XFS_ERRTAG(ALLOC_READ_AGF, readagf, XFS_RANDOM_DEFAULT) \ 95 + XFS_ERRTAG(IALLOC_READ_AGI, readagi, XFS_RANDOM_DEFAULT) \ 96 + XFS_ERRTAG(ITOBP_INOTOBP, itobp, XFS_RANDOM_DEFAULT) \ 97 + XFS_ERRTAG(IUNLINK, iunlink, XFS_RANDOM_DEFAULT) \ 98 + XFS_ERRTAG(IUNLINK_REMOVE, iunlinkrm, XFS_RANDOM_DEFAULT) \ 99 + XFS_ERRTAG(DIR_INO_VALIDATE, dirinovalid, XFS_RANDOM_DEFAULT) \ 100 + XFS_ERRTAG(BULKSTAT_READ_CHUNK, bulkstat, XFS_RANDOM_DEFAULT) \ 101 + XFS_ERRTAG(IODONE_IOERR, logiodone, XFS_RANDOM_DEFAULT/10) \ 102 + XFS_ERRTAG(STRATREAD_IOERR, stratread, XFS_RANDOM_DEFAULT/10) \ 103 + XFS_ERRTAG(STRATCMPL_IOERR, stratcmpl, XFS_RANDOM_DEFAULT/10) \ 104 + XFS_ERRTAG(DIOWRITE_IOERR, diowrite, XFS_RANDOM_DEFAULT/10) \ 105 + XFS_ERRTAG(BMAPIFORMAT, bmapifmt, XFS_RANDOM_DEFAULT) \ 106 + XFS_ERRTAG(FREE_EXTENT, free_extent, 1) \ 107 + XFS_ERRTAG(RMAP_FINISH_ONE, rmap_finish_one, 1) \ 108 + XFS_ERRTAG(REFCOUNT_CONTINUE_UPDATE, refcount_continue_update, 1) \ 109 + XFS_ERRTAG(REFCOUNT_FINISH_ONE, refcount_finish_one, 1) \ 110 + XFS_ERRTAG(BMAP_FINISH_ONE, bmap_finish_one, 1) \ 111 + XFS_ERRTAG(AG_RESV_CRITICAL, ag_resv_critical, 4) \ 112 + XFS_ERRTAG(LOG_BAD_CRC, log_bad_crc, 1) \ 113 + XFS_ERRTAG(LOG_ITEM_PIN, log_item_pin, 1) \ 114 + XFS_ERRTAG(BUF_LRU_REF, buf_lru_ref, 2) \ 115 + XFS_ERRTAG(FORCE_SCRUB_REPAIR, force_repair, 1) \ 116 + XFS_ERRTAG(FORCE_SUMMARY_RECALC, bad_summary, 1) \ 117 + XFS_ERRTAG(IUNLINK_FALLBACK, iunlink_fallback, XFS_RANDOM_DEFAULT/10) \ 118 + XFS_ERRTAG(BUF_IOERROR, buf_ioerror, XFS_RANDOM_DEFAULT) \ 119 + XFS_ERRTAG(REDUCE_MAX_IEXTENTS, reduce_max_iextents, 1) \ 120 + XFS_ERRTAG(BMAP_ALLOC_MINLEN_EXTENT, bmap_alloc_minlen_extent, 1) \ 121 + XFS_ERRTAG(AG_RESV_FAIL, ag_resv_fail, 1) \ 122 + XFS_ERRTAG(LARP, larp, 1) \ 123 + XFS_ERRTAG(DA_LEAF_SPLIT, da_leaf_split, 1) \ 124 + XFS_ERRTAG(ATTR_LEAF_TO_NODE, attr_leaf_to_node, 1) \ 125 + XFS_ERRTAG(WB_DELAY_MS, wb_delay_ms, 3000) \ 126 + XFS_ERRTAG(WRITE_DELAY_MS, write_delay_ms, 3000) \ 127 + XFS_ERRTAG(EXCHMAPS_FINISH_ONE, exchmaps_finish_one, 1) \ 128 + XFS_ERRTAG(METAFILE_RESV_CRITICAL, metafile_resv_crit, 4) 129 + #endif /* XFS_ERRTAG */ 126 130 127 131 #endif /* __XFS_ERRORTAG_H_ */

+2 -2

fs/xfs/libxfs/xfs_exchmaps.c

··· 616 616 return error; 617 617 } 618 618 619 - if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_EXCHMAPS_FINISH_ONE)) 619 + if (XFS_TEST_ERROR(tp->t_mountp, XFS_ERRTAG_EXCHMAPS_FINISH_ONE)) 620 620 return -EIO; 621 621 622 622 /* If we still have work to do, ask for a new transaction. */ ··· 882 882 &new_nextents)) 883 883 return -EFBIG; 884 884 885 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS) && 885 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS) && 886 886 new_nextents > 10) 887 887 return -EFBIG; 888 888

+3 -3

fs/xfs/libxfs/xfs_ialloc.c

··· 2140 2140 * remove the chunk if the block size is large enough for multiple inode 2141 2141 * chunks (that might not be free). 2142 2142 */ 2143 - if (!xfs_has_ikeep(mp) && rec.ir_free == XFS_INOBT_ALL_FREE && 2143 + if (rec.ir_free == XFS_INOBT_ALL_FREE && 2144 2144 mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK) { 2145 2145 xic->deleted = true; 2146 2146 xic->first_ino = xfs_agino_to_ino(pag, rec.ir_startino); ··· 2286 2286 * enough for multiple chunks. Leave the finobt record to remain in sync 2287 2287 * with the inobt. 2288 2288 */ 2289 - if (!xfs_has_ikeep(mp) && rec.ir_free == XFS_INOBT_ALL_FREE && 2289 + if (rec.ir_free == XFS_INOBT_ALL_FREE && 2290 2290 mp->m_sb.sb_inopblock <= XFS_INODES_PER_CHUNK) { 2291 2291 error = xfs_btree_delete(cur, &i); 2292 2292 if (error) ··· 2706 2706 xfs_verifier_error(bp, -EFSBADCRC, __this_address); 2707 2707 else { 2708 2708 fa = xfs_agi_verify(bp); 2709 - if (XFS_TEST_ERROR(fa, mp, XFS_ERRTAG_IALLOC_READ_AGI)) 2709 + if (fa || XFS_TEST_ERROR(mp, XFS_ERRTAG_IALLOC_READ_AGI)) 2710 2710 xfs_verifier_error(bp, -EFSCORRUPTED, fa); 2711 2711 } 2712 2712 }

+2 -2

fs/xfs/libxfs/xfs_inode_buf.c

··· 61 61 di_ok = xfs_verify_magic16(bp, dip->di_magic) && 62 62 xfs_dinode_good_version(mp, dip->di_version) && 63 63 xfs_verify_agino_or_null(bp->b_pag, unlinked_ino); 64 - if (unlikely(XFS_TEST_ERROR(!di_ok, mp, 65 - XFS_ERRTAG_ITOBP_INOTOBP))) { 64 + if (unlikely(!di_ok || 65 + XFS_TEST_ERROR(mp, XFS_ERRTAG_ITOBP_INOTOBP))) { 66 66 if (readahead) { 67 67 bp->b_flags &= ~XBF_DONE; 68 68 xfs_buf_ioerror(bp, -EIO);

+1 -2

fs/xfs/libxfs/xfs_inode_fork.c

··· 756 756 if (nr_exts < ifp->if_nextents) 757 757 return -EFBIG; 758 758 759 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS) && 760 - nr_exts > 10) 759 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_REDUCE_MAX_IEXTENTS) && nr_exts > 10) 761 760 return -EFBIG; 762 761 763 762 if (nr_exts > xfs_iext_max_nextents(has_large, whichfork)) {

-11

fs/xfs/libxfs/xfs_inode_util.c

··· 299 299 } else { 300 300 inode_init_owner(args->idmap, inode, dir, args->mode); 301 301 } 302 - 303 - /* 304 - * If the group ID of the new file does not match the effective 305 - * group ID or one of the supplementary group IDs, the S_ISGID 306 - * bit is cleared (and only if the irix_sgid_inherit 307 - * compatibility variable is set). 308 - */ 309 - if (irix_sgid_inherit && (inode->i_mode & S_ISGID) && 310 - !vfsgid_in_group_p(i_gid_into_vfsgid(args->idmap, inode))) 311 - inode->i_mode &= ~S_ISGID; 312 - 313 302 ip->i_projid = xfs_get_initial_prid(pip); 314 303 } 315 304

+70 -80

fs/xfs/libxfs/xfs_log_format.h

··· 86 86 uint32_t pad2; /* may as well make it 64 bits */ 87 87 }; 88 88 89 - /* Region types for iovec's i_type */ 90 - #define XLOG_REG_TYPE_BFORMAT 1 91 - #define XLOG_REG_TYPE_BCHUNK 2 92 - #define XLOG_REG_TYPE_EFI_FORMAT 3 93 - #define XLOG_REG_TYPE_EFD_FORMAT 4 94 - #define XLOG_REG_TYPE_IFORMAT 5 95 - #define XLOG_REG_TYPE_ICORE 6 96 - #define XLOG_REG_TYPE_IEXT 7 97 - #define XLOG_REG_TYPE_IBROOT 8 98 - #define XLOG_REG_TYPE_ILOCAL 9 99 - #define XLOG_REG_TYPE_IATTR_EXT 10 100 - #define XLOG_REG_TYPE_IATTR_BROOT 11 101 - #define XLOG_REG_TYPE_IATTR_LOCAL 12 102 - #define XLOG_REG_TYPE_QFORMAT 13 103 - #define XLOG_REG_TYPE_DQUOT 14 104 - #define XLOG_REG_TYPE_QUOTAOFF 15 105 - #define XLOG_REG_TYPE_LRHEADER 16 106 - #define XLOG_REG_TYPE_UNMOUNT 17 107 - #define XLOG_REG_TYPE_COMMIT 18 108 - #define XLOG_REG_TYPE_TRANSHDR 19 109 - #define XLOG_REG_TYPE_ICREATE 20 110 - #define XLOG_REG_TYPE_RUI_FORMAT 21 111 - #define XLOG_REG_TYPE_RUD_FORMAT 22 112 - #define XLOG_REG_TYPE_CUI_FORMAT 23 113 - #define XLOG_REG_TYPE_CUD_FORMAT 24 114 - #define XLOG_REG_TYPE_BUI_FORMAT 25 115 - #define XLOG_REG_TYPE_BUD_FORMAT 26 116 - #define XLOG_REG_TYPE_ATTRI_FORMAT 27 117 - #define XLOG_REG_TYPE_ATTRD_FORMAT 28 118 - #define XLOG_REG_TYPE_ATTR_NAME 29 119 - #define XLOG_REG_TYPE_ATTR_VALUE 30 120 - #define XLOG_REG_TYPE_XMI_FORMAT 31 121 - #define XLOG_REG_TYPE_XMD_FORMAT 32 122 - #define XLOG_REG_TYPE_ATTR_NEWNAME 33 123 - #define XLOG_REG_TYPE_ATTR_NEWVALUE 34 124 - #define XLOG_REG_TYPE_MAX 34 125 - 126 89 /* 127 90 * Flags to log operation header 128 91 * ··· 104 141 #define XLOG_END_TRANS 0x10 /* End a continued transaction */ 105 142 #define XLOG_UNMOUNT_TRANS 0x20 /* Unmount a filesystem transaction */ 106 143 107 - 108 - typedef struct xlog_op_header { 144 + struct xlog_op_header { 109 145 __be32 oh_tid; /* transaction id of operation : 4 b */ 110 146 __be32 oh_len; /* bytes in data region : 4 b */ 111 147 __u8 oh_clientid; /* who sent me this : 1 b */ 112 148 __u8 oh_flags; /* : 1 b */ 113 149 __u16 oh_res2; /* 32 bit align : 2 b */ 114 - } xlog_op_header_t; 150 + }; 115 151 116 152 /* valid values for h_fmt */ 117 153 #define XLOG_FMT_UNKNOWN 0 ··· 136 174 __be32 h_prev_block; /* block number to previous LR : 4 */ 137 175 __be32 h_num_logops; /* number of log operations in this LR : 4 */ 138 176 __be32 h_cycle_data[XLOG_HEADER_CYCLE_SIZE / BBSIZE]; 139 - /* new fields */ 177 + 178 + /* fields added by the Linux port: */ 140 179 __be32 h_fmt; /* format of log record : 4 */ 141 180 uuid_t h_fs_uuid; /* uuid of FS : 16 */ 181 + 182 + /* fields added for log v2: */ 142 183 __be32 h_size; /* iclog size : 4 */ 184 + 185 + /* 186 + * When h_size added for log v2 support, it caused structure to have 187 + * a different size on i386 vs all other architectures because the 188 + * sum of the size ofthe member is not aligned by that of the largest 189 + * __be64-sized member, and i386 has really odd struct alignment rules. 190 + * 191 + * Due to the way the log headers are placed out on-disk that alone is 192 + * not a problem becaue the xlog_rec_header always sits alone in a 193 + * BBSIZEs area, and the rest of that area is padded with zeroes. 194 + * But xlog_cksum used to calculate the checksum based on the structure 195 + * size, and thus gives different checksums for i386 vs the rest. 196 + * We now do two checksum validation passes for both sizes to allow 197 + * moving v5 file systems with unclean logs between i386 and other 198 + * (little-endian) architectures. 199 + */ 200 + __u32 h_pad0; 143 201 } xlog_rec_header_t; 202 + 203 + #ifdef __i386__ 204 + #define XLOG_REC_SIZE offsetofend(struct xlog_rec_header, h_size) 205 + #define XLOG_REC_SIZE_OTHER sizeof(struct xlog_rec_header) 206 + #else 207 + #define XLOG_REC_SIZE sizeof(struct xlog_rec_header) 208 + #define XLOG_REC_SIZE_OTHER offsetofend(struct xlog_rec_header, h_size) 209 + #endif /* __i386__ */ 144 210 145 211 typedef struct xlog_rec_ext_header { 146 212 __be32 xh_cycle; /* write cycle of log : 4 */ ··· 185 195 } xlog_in_core_2_t; 186 196 187 197 /* not an on-disk structure, but needed by log recovery in userspace */ 188 - typedef struct xfs_log_iovec { 198 + struct xfs_log_iovec { 189 199 void *i_addr; /* beginning address of region */ 190 200 int i_len; /* length in bytes of region */ 191 201 uint i_type; /* type of region */ 192 - } xfs_log_iovec_t; 193 - 202 + }; 194 203 195 204 /* 196 205 * Transaction Header definitions. ··· 202 213 * Do not change the below structure without redoing the code in 203 214 * xlog_recover_add_to_trans() and xlog_recover_add_to_cont_trans(). 204 215 */ 205 - typedef struct xfs_trans_header { 216 + struct xfs_trans_header { 206 217 uint th_magic; /* magic number */ 207 218 uint th_type; /* transaction type */ 208 219 int32_t th_tid; /* transaction id (unused) */ 209 220 uint th_num_items; /* num items logged by trans */ 210 - } xfs_trans_header_t; 221 + }; 211 222 212 223 #define XFS_TRANS_HEADER_MAGIC 0x5452414e /* TRAN */ 213 224 ··· 531 542 #define __XFS_BLF_DATAMAP_SIZE ((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) / NBWORD) 532 543 #define XFS_BLF_DATAMAP_SIZE (__XFS_BLF_DATAMAP_SIZE + 1) 533 544 534 - typedef struct xfs_buf_log_format { 545 + struct xfs_buf_log_format { 535 546 unsigned short blf_type; /* buf log item type indicator */ 536 547 unsigned short blf_size; /* size of this item */ 537 548 unsigned short blf_flags; /* misc state */ ··· 539 550 int64_t blf_blkno; /* starting blkno of this buf */ 540 551 unsigned int blf_map_size; /* used size of data bitmap in words */ 541 552 unsigned int blf_data_map[XFS_BLF_DATAMAP_SIZE]; /* dirty bitmap */ 542 - } xfs_buf_log_format_t; 553 + }; 543 554 544 555 /* 545 556 * All buffers now need to tell recovery where the magic number ··· 595 606 /* 596 607 * EFI/EFD log format definitions 597 608 */ 598 - typedef struct xfs_extent { 609 + struct xfs_extent { 599 610 xfs_fsblock_t ext_start; 600 611 xfs_extlen_t ext_len; 601 - } xfs_extent_t; 612 + }; 602 613 603 614 /* 604 - * Since an xfs_extent_t has types (start:64, len: 32) 605 - * there are different alignments on 32 bit and 64 bit kernels. 606 - * So we provide the different variants for use by a 607 - * conversion routine. 615 + * Since the structures in struct xfs_extent add up to 96 bytes, it has 616 + * different alignments on i386 vs all other architectures, because i386 617 + * does not pad structures to their natural alignment. 618 + * 619 + * Provide the different variants for use by a conversion routine. 608 620 */ 609 - typedef struct xfs_extent_32 { 621 + struct xfs_extent_32 { 610 622 uint64_t ext_start; 611 623 uint32_t ext_len; 612 - } __attribute__((packed)) xfs_extent_32_t; 624 + } __attribute__((packed)); 613 625 614 - typedef struct xfs_extent_64 { 626 + struct xfs_extent_64 { 615 627 uint64_t ext_start; 616 628 uint32_t ext_len; 617 629 uint32_t ext_pad; 618 - } xfs_extent_64_t; 630 + }; 619 631 620 632 /* 621 633 * This is the structure used to lay out an efi log item in the 622 634 * log. The efi_extents field is a variable size array whose 623 635 * size is given by efi_nextents. 624 636 */ 625 - typedef struct xfs_efi_log_format { 637 + struct xfs_efi_log_format { 626 638 uint16_t efi_type; /* efi log item type */ 627 639 uint16_t efi_size; /* size of this item */ 628 640 uint32_t efi_nextents; /* # extents to free */ 629 641 uint64_t efi_id; /* efi identifier */ 630 - xfs_extent_t efi_extents[]; /* array of extents to free */ 631 - } xfs_efi_log_format_t; 642 + struct xfs_extent efi_extents[]; /* array of extents to free */ 643 + }; 632 644 633 645 static inline size_t 634 646 xfs_efi_log_format_sizeof( ··· 639 649 nr * sizeof(struct xfs_extent); 640 650 } 641 651 642 - typedef struct xfs_efi_log_format_32 { 652 + struct xfs_efi_log_format_32 { 643 653 uint16_t efi_type; /* efi log item type */ 644 654 uint16_t efi_size; /* size of this item */ 645 655 uint32_t efi_nextents; /* # extents to free */ 646 656 uint64_t efi_id; /* efi identifier */ 647 - xfs_extent_32_t efi_extents[]; /* array of extents to free */ 648 - } __attribute__((packed)) xfs_efi_log_format_32_t; 657 + struct xfs_extent_32 efi_extents[]; /* array of extents to free */ 658 + } __attribute__((packed)); 649 659 650 660 static inline size_t 651 661 xfs_efi_log_format32_sizeof( ··· 655 665 nr * sizeof(struct xfs_extent_32); 656 666 } 657 667 658 - typedef struct xfs_efi_log_format_64 { 668 + struct xfs_efi_log_format_64 { 659 669 uint16_t efi_type; /* efi log item type */ 660 670 uint16_t efi_size; /* size of this item */ 661 671 uint32_t efi_nextents; /* # extents to free */ 662 672 uint64_t efi_id; /* efi identifier */ 663 - xfs_extent_64_t efi_extents[]; /* array of extents to free */ 664 - } xfs_efi_log_format_64_t; 673 + struct xfs_extent_64 efi_extents[]; /* array of extents to free */ 674 + }; 665 675 666 676 static inline size_t 667 677 xfs_efi_log_format64_sizeof( ··· 676 686 * log. The efd_extents array is a variable size array whose 677 687 * size is given by efd_nextents; 678 688 */ 679 - typedef struct xfs_efd_log_format { 689 + struct xfs_efd_log_format { 680 690 uint16_t efd_type; /* efd log item type */ 681 691 uint16_t efd_size; /* size of this item */ 682 692 uint32_t efd_nextents; /* # of extents freed */ 683 693 uint64_t efd_efi_id; /* id of corresponding efi */ 684 - xfs_extent_t efd_extents[]; /* array of extents freed */ 685 - } xfs_efd_log_format_t; 694 + struct xfs_extent efd_extents[]; /* array of extents freed */ 695 + }; 686 696 687 697 static inline size_t 688 698 xfs_efd_log_format_sizeof( ··· 692 702 nr * sizeof(struct xfs_extent); 693 703 } 694 704 695 - typedef struct xfs_efd_log_format_32 { 705 + struct xfs_efd_log_format_32 { 696 706 uint16_t efd_type; /* efd log item type */ 697 707 uint16_t efd_size; /* size of this item */ 698 708 uint32_t efd_nextents; /* # of extents freed */ 699 709 uint64_t efd_efi_id; /* id of corresponding efi */ 700 - xfs_extent_32_t efd_extents[]; /* array of extents freed */ 701 - } __attribute__((packed)) xfs_efd_log_format_32_t; 710 + struct xfs_extent_32 efd_extents[]; /* array of extents freed */ 711 + } __attribute__((packed)); 702 712 703 713 static inline size_t 704 714 xfs_efd_log_format32_sizeof( ··· 708 718 nr * sizeof(struct xfs_extent_32); 709 719 } 710 720 711 - typedef struct xfs_efd_log_format_64 { 721 + struct xfs_efd_log_format_64 { 712 722 uint16_t efd_type; /* efd log item type */ 713 723 uint16_t efd_size; /* size of this item */ 714 724 uint32_t efd_nextents; /* # of extents freed */ 715 725 uint64_t efd_efi_id; /* id of corresponding efi */ 716 - xfs_extent_64_t efd_extents[]; /* array of extents freed */ 717 - } xfs_efd_log_format_64_t; 726 + struct xfs_extent_64 efd_extents[]; /* array of extents freed */ 727 + }; 718 728 719 729 static inline size_t 720 730 xfs_efd_log_format64_sizeof( ··· 947 957 * The first two fields must be the type and size fitting into 948 958 * 32 bits : log_recovery code assumes that. 949 959 */ 950 - typedef struct xfs_dq_logformat { 960 + struct xfs_dq_logformat { 951 961 uint16_t qlf_type; /* dquot log item type */ 952 962 uint16_t qlf_size; /* size of this item */ 953 963 xfs_dqid_t qlf_id; /* usr/grp/proj id : 32 bits */ 954 964 int64_t qlf_blkno; /* blkno of dquot buffer */ 955 965 int32_t qlf_len; /* len of dquot buffer */ 956 966 uint32_t qlf_boffset; /* off of dquot in buffer */ 957 - } xfs_dq_logformat_t; 967 + }; 958 968 959 969 /* 960 970 * log format struct for QUOTAOFF records. ··· 964 974 * to the first and ensures that the first logitem is taken out of the AIL 965 975 * only when the last one is securely committed. 966 976 */ 967 - typedef struct xfs_qoff_logformat { 977 + struct xfs_qoff_logformat { 968 978 unsigned short qf_type; /* quotaoff log item type */ 969 979 unsigned short qf_size; /* size of this item */ 970 980 unsigned int qf_flags; /* USR and/or GRP */ 971 981 char qf_pad[12]; /* padding for future */ 972 - } xfs_qoff_logformat_t; 982 + }; 973 983 974 984 /* 975 985 * Disk quotas status in m_qflags, and also sb_qflags. 16 bits.

+1 -1

fs/xfs/libxfs/xfs_log_recover.h

··· 111 111 struct xlog_recover { 112 112 struct hlist_node r_list; 113 113 xlog_tid_t r_log_tid; /* log's transaction id */ 114 - xfs_trans_header_t r_theader; /* trans header for partial */ 114 + struct xfs_trans_header r_theader; /* trans header for partial */ 115 115 int r_state; /* not needed */ 116 116 xfs_lsn_t r_lsn; /* xact lsn */ 117 117 struct list_head r_itemq; /* q for items */

+1 -1

fs/xfs/libxfs/xfs_metafile.c

··· 121 121 div_u64(mp->m_metafile_resv_target, 10))) 122 122 return true; 123 123 124 - return XFS_TEST_ERROR(false, mp, XFS_ERRTAG_METAFILE_RESV_CRITICAL); 124 + return XFS_TEST_ERROR(mp, XFS_ERRTAG_METAFILE_RESV_CRITICAL); 125 125 } 126 126 127 127 /* Allocate a block from the metadata file's reservation. */

+2

fs/xfs/libxfs/xfs_ondisk.h

··· 174 174 XFS_CHECK_STRUCT_SIZE(struct xfs_rud_log_format, 16); 175 175 XFS_CHECK_STRUCT_SIZE(struct xfs_map_extent, 32); 176 176 XFS_CHECK_STRUCT_SIZE(struct xfs_phys_extent, 16); 177 + XFS_CHECK_STRUCT_SIZE(struct xlog_rec_header, 328); 178 + XFS_CHECK_STRUCT_SIZE(struct xlog_rec_ext_header, 260); 177 179 178 180 XFS_CHECK_OFFSET(struct xfs_bui_log_format, bui_extents, 16); 179 181 XFS_CHECK_OFFSET(struct xfs_cui_log_format, cui_extents, 16);

+3 -4

fs/xfs/libxfs/xfs_refcount.c

··· 1113 1113 * refcount continue update "error" has been injected. 1114 1114 */ 1115 1115 if (cur->bc_refc.nr_ops > 2 && 1116 - XFS_TEST_ERROR(false, cur->bc_mp, 1117 - XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE)) 1116 + XFS_TEST_ERROR(cur->bc_mp, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE)) 1118 1117 return false; 1119 1118 1120 1119 if (cur->bc_refc.nr_ops == 0) ··· 1397 1398 1398 1399 trace_xfs_refcount_deferred(mp, ri); 1399 1400 1400 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) 1401 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) 1401 1402 return -EIO; 1402 1403 1403 1404 /* ··· 1510 1511 1511 1512 trace_xfs_refcount_deferred(mp, ri); 1512 1513 1513 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) 1514 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) 1514 1515 return -EIO; 1515 1516 1516 1517 /*

+1 -1

fs/xfs/libxfs/xfs_rmap.c

··· 2690 2690 2691 2691 trace_xfs_rmap_deferred(mp, ri); 2692 2692 2693 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RMAP_FINISH_ONE)) 2693 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_RMAP_FINISH_ONE)) 2694 2694 return -EIO; 2695 2695 2696 2696 /*

+1 -1

fs/xfs/libxfs/xfs_rtbitmap.c

··· 1067 1067 ASSERT(rbmip->i_itemp != NULL); 1068 1068 xfs_assert_ilocked(rbmip, XFS_ILOCK_EXCL); 1069 1069 1070 - if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FREE_EXTENT)) 1070 + if (XFS_TEST_ERROR(mp, XFS_ERRTAG_FREE_EXTENT)) 1071 1071 return -EIO; 1072 1072 1073 1073 error = xfs_rtcheck_alloc_range(&args, start, len);

+3 -6

fs/xfs/libxfs/xfs_sb.c

··· 142 142 if (sbp->sb_versionnum & XFS_SB_VERSION_MOREBITSBIT) { 143 143 if (sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT) 144 144 features |= XFS_FEAT_LAZYSBCOUNT; 145 - if (sbp->sb_features2 & XFS_SB_VERSION2_ATTR2BIT) 146 - features |= XFS_FEAT_ATTR2; 147 145 if (sbp->sb_features2 & XFS_SB_VERSION2_PROJID32BIT) 148 146 features |= XFS_FEAT_PROJID32; 149 147 if (sbp->sb_features2 & XFS_SB_VERSION2_FTYPE) ··· 153 155 154 156 /* Always on V5 features */ 155 157 features |= XFS_FEAT_ALIGN | XFS_FEAT_LOGV2 | XFS_FEAT_EXTFLG | 156 - XFS_FEAT_LAZYSBCOUNT | XFS_FEAT_ATTR2 | XFS_FEAT_PROJID32 | 158 + XFS_FEAT_LAZYSBCOUNT | XFS_FEAT_PROJID32 | 157 159 XFS_FEAT_V3INODES | XFS_FEAT_CRC | XFS_FEAT_PQUOTINO; 158 160 159 161 /* Optional V5 features */ ··· 1522 1524 geo->version = XFS_FSOP_GEOM_VERSION; 1523 1525 geo->flags = XFS_FSOP_GEOM_FLAGS_NLINK | 1524 1526 XFS_FSOP_GEOM_FLAGS_DIRV2 | 1525 - XFS_FSOP_GEOM_FLAGS_EXTFLG; 1527 + XFS_FSOP_GEOM_FLAGS_EXTFLG | 1528 + XFS_FSOP_GEOM_FLAGS_ATTR2; 1526 1529 if (xfs_has_attr(mp)) 1527 1530 geo->flags |= XFS_FSOP_GEOM_FLAGS_ATTR; 1528 1531 if (xfs_has_quota(mp)) ··· 1536 1537 geo->flags |= XFS_FSOP_GEOM_FLAGS_DIRV2CI; 1537 1538 if (xfs_has_lazysbcount(mp)) 1538 1539 geo->flags |= XFS_FSOP_GEOM_FLAGS_LAZYSB; 1539 - if (xfs_has_attr2(mp)) 1540 - geo->flags |= XFS_FSOP_GEOM_FLAGS_ATTR2; 1541 1540 if (xfs_has_projid32(mp)) 1542 1541 geo->flags |= XFS_FSOP_GEOM_FLAGS_PROJID32; 1543 1542 if (xfs_has_crc(mp))

+7

fs/xfs/libxfs/xfs_zones.h

··· 29 29 #define XFS_OPEN_GC_ZONES 1U 30 30 #define XFS_MIN_OPEN_ZONES (XFS_OPEN_GC_ZONES + 1U) 31 31 32 + /* 33 + * For zoned devices that do not have a limit on the number of open zones, and 34 + * for regular devices using the zoned allocator, use the most common SMR disks 35 + * limit (128) as the default limit on the number of open zones. 36 + */ 37 + #define XFS_DEFAULT_MAX_OPEN_ZONES 128 38 + 32 39 bool xfs_zone_validate(struct blk_zone *zone, struct xfs_rtgroup *rtg, 33 40 xfs_rgblock_t *write_pointer); 34 41

+2 -2

fs/xfs/scrub/cow_repair.c

··· 300 300 * on the debugging knob, replace everything in the CoW fork. 301 301 */ 302 302 if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) || 303 - XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) { 303 + XFS_TEST_ERROR(sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) { 304 304 error = xrep_cow_mark_file_range(xc, xc->irec.br_startblock, 305 305 xc->irec.br_blockcount); 306 306 if (error) ··· 385 385 * CoW fork and then scan for staging extents in the refcountbt. 386 386 */ 387 387 if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) || 388 - XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) { 388 + XFS_TEST_ERROR(sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) { 389 389 error = xrep_cow_mark_file_range(xc, xc->irec.br_startblock, 390 390 xc->irec.br_blockcount); 391 391 if (error)

+6 -6

fs/xfs/scrub/metapath.c

··· 79 79 80 80 if (mpath->dp_ilock_flags) 81 81 xfs_iunlock(mpath->dp, mpath->dp_ilock_flags); 82 - kfree(mpath->path); 82 + kfree_const(mpath->path); 83 83 } 84 84 85 85 /* Set up a metadir path scan. @path must be dynamically allocated. */ ··· 98 98 99 99 error = xchk_install_live_inode(sc, ip); 100 100 if (error) { 101 - kfree(path); 101 + kfree_const(path); 102 102 return error; 103 103 } 104 104 105 105 mpath = kzalloc(sizeof(struct xchk_metapath), XCHK_GFP_FLAGS); 106 106 if (!mpath) { 107 - kfree(path); 107 + kfree_const(path); 108 108 return -ENOMEM; 109 109 } 110 110 ··· 132 132 return -ENOENT; 133 133 134 134 return xchk_setup_metapath_scan(sc, sc->mp->m_metadirip, 135 - kasprintf(GFP_KERNEL, "rtgroups"), sc->mp->m_rtdirip); 135 + kstrdup_const("rtgroups", GFP_KERNEL), sc->mp->m_rtdirip); 136 136 } 137 137 138 138 /* Scan a rtgroup inode under the /rtgroups directory. */ ··· 179 179 return -ENOENT; 180 180 181 181 return xchk_setup_metapath_scan(sc, sc->mp->m_metadirip, 182 - kstrdup("quota", GFP_KERNEL), qi->qi_dirip); 182 + kstrdup_const("quota", GFP_KERNEL), qi->qi_dirip); 183 183 } 184 184 185 185 /* Scan a quota inode under the /quota directory. */ ··· 212 212 return -ENOENT; 213 213 214 214 return xchk_setup_metapath_scan(sc, qi->qi_dirip, 215 - kstrdup(xfs_dqinode_path(type), GFP_KERNEL), ip); 215 + kstrdup_const(xfs_dqinode_path(type), GFP_KERNEL), ip); 216 216 } 217 217 #else 218 218 # define xchk_setup_metapath_quotadir(...) (-ENOENT)

+9

fs/xfs/scrub/newbt.c

··· 28 28 #include "scrub/newbt.h" 29 29 30 30 /* 31 + * This is the maximum number of deferred extent freeing item extents (EFIs) 32 + * that we'll attach to a transaction without rolling the transaction to avoid 33 + * overrunning a tr_itruncate reservation. The newbt code should reserve 34 + * exactly the correct number of blocks to rebuild the btree, so there should 35 + * not be any excess blocks to free when committing a new btree. 36 + */ 37 + #define XREP_MAX_ITRUNCATE_EFIS (128) 38 + 39 + /* 31 40 * Estimate proper slack values for a btree that's being reloaded. 32 41 * 33 42 * Under most circumstances, we'll take whatever default loading value the

+499 -123

fs/xfs/scrub/reap.c

··· 36 36 #include "xfs_metafile.h" 37 37 #include "xfs_rtgroup.h" 38 38 #include "xfs_rtrmap_btree.h" 39 + #include "xfs_extfree_item.h" 40 + #include "xfs_rmap_item.h" 41 + #include "xfs_refcount_item.h" 42 + #include "xfs_buf_item.h" 43 + #include "xfs_bmap_item.h" 44 + #include "xfs_bmap_btree.h" 39 45 #include "scrub/scrub.h" 40 46 #include "scrub/common.h" 41 47 #include "scrub/trace.h" ··· 97 91 struct xreap_state { 98 92 struct xfs_scrub *sc; 99 93 100 - /* Reverse mapping owner and metadata reservation type. */ 101 - const struct xfs_owner_info *oinfo; 102 - enum xfs_ag_resv_type resv; 103 - 104 - /* If true, roll the transaction before reaping the next extent. */ 105 - bool force_roll; 106 - 107 - /* Number of deferred reaps attached to the current transaction. */ 108 - unsigned int deferred; 94 + union { 95 + struct { 96 + /* 97 + * For AG blocks, this is reverse mapping owner and 98 + * metadata reservation type. 99 + */ 100 + const struct xfs_owner_info *oinfo; 101 + enum xfs_ag_resv_type resv; 102 + }; 103 + struct { 104 + /* For file blocks, this is the inode and fork. */ 105 + struct xfs_inode *ip; 106 + int whichfork; 107 + }; 108 + }; 109 109 110 110 /* Number of invalidated buffers logged to the current transaction. */ 111 - unsigned int invalidated; 111 + unsigned int nr_binval; 112 112 113 - /* Number of deferred reaps queued during the whole reap sequence. */ 114 - unsigned long long total_deferred; 113 + /* Maximum number of buffers we can invalidate in a single tx. */ 114 + unsigned int max_binval; 115 + 116 + /* Number of deferred reaps attached to the current transaction. */ 117 + unsigned int nr_deferred; 118 + 119 + /* Maximum number of intents we can reap in a single transaction. */ 120 + unsigned int max_deferred; 115 121 }; 116 122 117 123 /* Put a block back on the AGFL. */ ··· 166 148 } 167 149 168 150 /* Are there any uncommitted reap operations? */ 169 - static inline bool xreap_dirty(const struct xreap_state *rs) 151 + static inline bool xreap_is_dirty(const struct xreap_state *rs) 170 152 { 171 - if (rs->force_roll) 172 - return true; 173 - if (rs->deferred) 174 - return true; 175 - if (rs->invalidated) 176 - return true; 177 - if (rs->total_deferred) 178 - return true; 179 - return false; 153 + return rs->nr_binval > 0 || rs->nr_deferred > 0; 180 154 } 181 - 182 - #define XREAP_MAX_BINVAL (2048) 183 155 184 156 /* 185 - * Decide if we want to roll the transaction after reaping an extent. We don't 186 - * want to overrun the transaction reservation, so we prohibit more than 187 - * 128 EFIs per transaction. For the same reason, we limit the number 188 - * of buffer invalidations to 2048. 157 + * Decide if we need to roll the transaction to clear out the the log 158 + * reservation that we allocated to buffer invalidations. 189 159 */ 190 - static inline bool xreap_want_roll(const struct xreap_state *rs) 160 + static inline bool xreap_want_binval_roll(const struct xreap_state *rs) 191 161 { 192 - if (rs->force_roll) 193 - return true; 194 - if (rs->deferred > XREP_MAX_ITRUNCATE_EFIS) 195 - return true; 196 - if (rs->invalidated > XREAP_MAX_BINVAL) 197 - return true; 198 - return false; 162 + return rs->nr_binval >= rs->max_binval; 199 163 } 200 164 201 - static inline void xreap_reset(struct xreap_state *rs) 165 + /* Reset the buffer invalidation count after rolling. */ 166 + static inline void xreap_binval_reset(struct xreap_state *rs) 202 167 { 203 - rs->total_deferred += rs->deferred; 204 - rs->deferred = 0; 205 - rs->invalidated = 0; 206 - rs->force_roll = false; 168 + rs->nr_binval = 0; 207 169 } 208 170 209 - #define XREAP_MAX_DEFER_CHAIN (2048) 171 + /* 172 + * Bump the number of invalidated buffers, and return true if we can continue, 173 + * or false if we need to roll the transaction. 174 + */ 175 + static inline bool xreap_inc_binval(struct xreap_state *rs) 176 + { 177 + rs->nr_binval++; 178 + return rs->nr_binval < rs->max_binval; 179 + } 210 180 211 181 /* 212 182 * Decide if we want to finish the deferred ops that are attached to the scrub 213 183 * transaction. We don't want to queue huge chains of deferred ops because 214 184 * that can consume a lot of log space and kernel memory. Hence we trigger a 215 - * xfs_defer_finish if there are more than 2048 deferred reap operations or the 216 - * caller did some real work. 185 + * xfs_defer_finish if there are too many deferred reap operations or we've run 186 + * out of space for invalidations. 217 187 */ 218 - static inline bool 219 - xreap_want_defer_finish(const struct xreap_state *rs) 188 + static inline bool xreap_want_defer_finish(const struct xreap_state *rs) 220 189 { 221 - if (rs->force_roll) 222 - return true; 223 - if (rs->total_deferred > XREAP_MAX_DEFER_CHAIN) 224 - return true; 225 - return false; 190 + return rs->nr_deferred >= rs->max_deferred; 226 191 } 227 192 193 + /* 194 + * Reset the defer chain length and buffer invalidation count after finishing 195 + * items. 196 + */ 228 197 static inline void xreap_defer_finish_reset(struct xreap_state *rs) 229 198 { 230 - rs->total_deferred = 0; 231 - rs->deferred = 0; 232 - rs->invalidated = 0; 233 - rs->force_roll = false; 199 + rs->nr_deferred = 0; 200 + rs->nr_binval = 0; 201 + } 202 + 203 + /* 204 + * Bump the number of deferred extent reaps. 205 + */ 206 + static inline void xreap_inc_defer(struct xreap_state *rs) 207 + { 208 + rs->nr_deferred++; 209 + } 210 + 211 + /* Force the caller to finish a deferred item chain. */ 212 + static inline void xreap_force_defer_finish(struct xreap_state *rs) 213 + { 214 + rs->nr_deferred = rs->max_deferred; 215 + } 216 + 217 + /* Maximum number of fsblocks that we might find in a buffer to invalidate. */ 218 + static inline unsigned int 219 + xrep_binval_max_fsblocks( 220 + struct xfs_mount *mp) 221 + { 222 + /* Remote xattr values are the largest buffers that we support. */ 223 + return xfs_attr3_max_rmt_blocks(mp); 234 224 } 235 225 236 226 /* ··· 250 224 struct xfs_mount *mp, 251 225 xfs_extlen_t fsblocks) 252 226 { 253 - int max_fsbs; 254 - 255 - /* Remote xattr values are the largest buffers that we support. */ 256 - max_fsbs = xfs_attr3_max_rmt_blocks(mp); 257 - 258 - return XFS_FSB_TO_BB(mp, min_t(xfs_extlen_t, fsblocks, max_fsbs)); 227 + return XFS_FSB_TO_BB(mp, min_t(xfs_extlen_t, fsblocks, 228 + xrep_binval_max_fsblocks(mp))); 259 229 } 260 230 261 231 /* ··· 319 297 while ((bp = xrep_bufscan_advance(mp, &scan)) != NULL) { 320 298 xfs_trans_bjoin(sc->tp, bp); 321 299 xfs_trans_binval(sc->tp, bp); 322 - rs->invalidated++; 323 300 324 301 /* 325 302 * Stop invalidating if we've hit the limit; we should 326 303 * still have enough reservation left to free however 327 304 * far we've gotten. 328 305 */ 329 - if (rs->invalidated > XREAP_MAX_BINVAL) { 306 + if (!xreap_inc_binval(rs)) { 330 307 *aglenp -= agbno_next - bno; 331 308 goto out; 332 309 } ··· 437 416 trace_xreap_dispose_unmap_extent(pag_group(sc->sa.pag), agbno, 438 417 *aglenp); 439 418 440 - rs->force_roll = true; 441 - 442 419 if (rs->oinfo == &XFS_RMAP_OINFO_COW) { 443 420 /* 444 - * If we're unmapping CoW staging extents, remove the 421 + * t0: Unmapping CoW staging extents, remove the 445 422 * records from the refcountbt, which will remove the 446 423 * rmap record as well. 447 424 */ 448 425 xfs_refcount_free_cow_extent(sc->tp, false, fsbno, 449 426 *aglenp); 427 + xreap_inc_defer(rs); 450 428 return 0; 451 429 } 452 430 453 - return xfs_rmap_free(sc->tp, sc->sa.agf_bp, sc->sa.pag, agbno, 454 - *aglenp, rs->oinfo); 431 + /* t1: unmap crosslinked metadata blocks */ 432 + xfs_rmap_free_extent(sc->tp, false, fsbno, *aglenp, 433 + rs->oinfo->oi_owner); 434 + xreap_inc_defer(rs); 435 + return 0; 455 436 } 456 437 457 438 trace_xreap_dispose_free_extent(pag_group(sc->sa.pag), agbno, *aglenp); ··· 466 443 */ 467 444 xreap_agextent_binval(rs, agbno, aglenp); 468 445 if (*aglenp == 0) { 469 - ASSERT(xreap_want_roll(rs)); 446 + ASSERT(xreap_want_binval_roll(rs)); 470 447 return 0; 471 448 } 472 449 473 450 /* 474 - * If we're getting rid of CoW staging extents, use deferred work items 451 + * t2: To get rid of CoW staging extents, use deferred work items 475 452 * to remove the refcountbt records (which removes the rmap records) 476 453 * and free the extent. We're not worried about the system going down 477 454 * here because log recovery walks the refcount btree to clean out the ··· 486 463 if (error) 487 464 return error; 488 465 489 - rs->force_roll = true; 466 + xreap_inc_defer(rs); 490 467 return 0; 491 468 } 492 469 493 - /* Put blocks back on the AGFL one at a time. */ 470 + /* t3: Put blocks back on the AGFL one at a time. */ 494 471 if (rs->resv == XFS_AG_RESV_AGFL) { 495 472 ASSERT(*aglenp == 1); 496 473 error = xreap_put_freelist(sc, agbno); 497 474 if (error) 498 475 return error; 499 476 500 - rs->force_roll = true; 477 + xreap_force_defer_finish(rs); 501 478 return 0; 502 479 } 503 480 504 481 /* 505 - * Use deferred frees to get rid of the old btree blocks to try to 482 + * t4: Use deferred frees to get rid of the old btree blocks to try to 506 483 * minimize the window in which we could crash and lose the old blocks. 507 484 * Add a defer ops barrier every other extent to avoid stressing the 508 485 * system with large EFIs. ··· 512 489 if (error) 513 490 return error; 514 491 515 - rs->deferred++; 516 - if (rs->deferred % 2 == 0) 492 + xreap_inc_defer(rs); 493 + if (rs->nr_deferred % 2 == 0) 517 494 xfs_defer_add_barrier(sc->tp); 518 495 return 0; 496 + } 497 + 498 + /* Configure the deferral and invalidation limits */ 499 + static inline void 500 + xreap_configure_limits( 501 + struct xreap_state *rs, 502 + unsigned int fixed_overhead, 503 + unsigned int variable_overhead, 504 + unsigned int per_intent, 505 + unsigned int per_binval) 506 + { 507 + struct xfs_scrub *sc = rs->sc; 508 + unsigned int res = sc->tp->t_log_res - fixed_overhead; 509 + 510 + /* Don't underflow the reservation */ 511 + if (sc->tp->t_log_res < (fixed_overhead + variable_overhead)) { 512 + ASSERT(sc->tp->t_log_res >= 513 + (fixed_overhead + variable_overhead)); 514 + xfs_force_shutdown(sc->mp, SHUTDOWN_CORRUPT_INCORE); 515 + return; 516 + } 517 + 518 + rs->max_deferred = per_intent ? res / variable_overhead : 0; 519 + res -= rs->max_deferred * per_intent; 520 + rs->max_binval = per_binval ? res / per_binval : 0; 521 + } 522 + 523 + /* 524 + * Compute the maximum number of intent items that reaping can attach to the 525 + * scrub transaction given the worst case log overhead of the intent items 526 + * needed to reap a single per-AG space extent. This is not for freeing CoW 527 + * staging extents. 528 + */ 529 + STATIC void 530 + xreap_configure_agextent_limits( 531 + struct xreap_state *rs) 532 + { 533 + struct xfs_scrub *sc = rs->sc; 534 + struct xfs_mount *mp = sc->mp; 535 + 536 + /* 537 + * In the worst case, relogging an intent item causes both an intent 538 + * item and a done item to be attached to a transaction for each extent 539 + * that we'd like to process. 540 + */ 541 + const unsigned int efi = xfs_efi_log_space(1) + 542 + xfs_efd_log_space(1); 543 + const unsigned int rui = xfs_rui_log_space(1) + 544 + xfs_rud_log_space(); 545 + 546 + /* 547 + * Various things can happen when reaping non-CoW metadata blocks: 548 + * 549 + * t1: Unmapping crosslinked metadata blocks: deferred removal of rmap 550 + * record. 551 + * 552 + * t3: Freeing to AGFL: roll and finish deferred items for every block. 553 + * Limits here do not matter. 554 + * 555 + * t4: Freeing metadata blocks: deferred freeing of the space, which 556 + * also removes the rmap record. 557 + * 558 + * For simplicity, we'll use the worst-case intents size to determine 559 + * the maximum number of deferred extents before we have to finish the 560 + * whole chain. If we're trying to reap a btree larger than this size, 561 + * a crash midway through reaping can result in leaked blocks. 562 + */ 563 + const unsigned int t1 = rui; 564 + const unsigned int t4 = rui + efi; 565 + const unsigned int per_intent = max(t1, t4); 566 + 567 + /* 568 + * For each transaction in a reap chain, we must be able to take one 569 + * step in the defer item chain, which should only consist of EFI or 570 + * RUI items. 571 + */ 572 + const unsigned int f1 = xfs_calc_finish_efi_reservation(mp, 1); 573 + const unsigned int f2 = xfs_calc_finish_rui_reservation(mp, 1); 574 + const unsigned int step_size = max(f1, f2); 575 + 576 + /* Largest buffer size (in fsblocks) that can be invalidated. */ 577 + const unsigned int max_binval = xrep_binval_max_fsblocks(mp); 578 + 579 + /* Maximum overhead of invalidating one buffer. */ 580 + const unsigned int per_binval = 581 + xfs_buf_inval_log_space(1, XFS_B_TO_FSBT(mp, max_binval)); 582 + 583 + /* 584 + * For each transaction in a reap chain, we can delete some number of 585 + * extents and invalidate some number of blocks. We assume that btree 586 + * blocks aren't usually contiguous; and that scrub likely pulled all 587 + * the buffers into memory. From these assumptions, set the maximum 588 + * number of deferrals we can queue before flushing the defer chain, 589 + * and the number of invalidations we can queue before rolling to a 590 + * clean transaction (and possibly relogging some of the deferrals) to 591 + * the same quantity. 592 + */ 593 + const unsigned int variable_overhead = per_intent + per_binval; 594 + 595 + xreap_configure_limits(rs, step_size, variable_overhead, per_intent, 596 + per_binval); 597 + 598 + trace_xreap_agextent_limits(sc->tp, per_binval, rs->max_binval, 599 + step_size, per_intent, rs->max_deferred); 600 + } 601 + 602 + /* 603 + * Compute the maximum number of intent items that reaping can attach to the 604 + * scrub transaction given the worst case log overhead of the intent items 605 + * needed to reap a single CoW staging extent. This is not for freeing 606 + * metadata blocks. 607 + */ 608 + STATIC void 609 + xreap_configure_agcow_limits( 610 + struct xreap_state *rs) 611 + { 612 + struct xfs_scrub *sc = rs->sc; 613 + struct xfs_mount *mp = sc->mp; 614 + 615 + /* 616 + * In the worst case, relogging an intent item causes both an intent 617 + * item and a done item to be attached to a transaction for each extent 618 + * that we'd like to process. 619 + */ 620 + const unsigned int efi = xfs_efi_log_space(1) + 621 + xfs_efd_log_space(1); 622 + const unsigned int rui = xfs_rui_log_space(1) + 623 + xfs_rud_log_space(); 624 + const unsigned int cui = xfs_cui_log_space(1) + 625 + xfs_cud_log_space(); 626 + 627 + /* 628 + * Various things can happen when reaping non-CoW metadata blocks: 629 + * 630 + * t0: Unmapping crosslinked CoW blocks: deferred removal of refcount 631 + * record, which defers removal of rmap record 632 + * 633 + * t2: Freeing CoW blocks: deferred removal of refcount record, which 634 + * defers removal of rmap record; and deferred removal of the space 635 + * 636 + * For simplicity, we'll use the worst-case intents size to determine 637 + * the maximum number of deferred extents before we have to finish the 638 + * whole chain. If we're trying to reap a btree larger than this size, 639 + * a crash midway through reaping can result in leaked blocks. 640 + */ 641 + const unsigned int t0 = cui + rui; 642 + const unsigned int t2 = cui + rui + efi; 643 + const unsigned int per_intent = max(t0, t2); 644 + 645 + /* 646 + * For each transaction in a reap chain, we must be able to take one 647 + * step in the defer item chain, which should only consist of CUI, EFI, 648 + * or RUI items. 649 + */ 650 + const unsigned int f1 = xfs_calc_finish_efi_reservation(mp, 1); 651 + const unsigned int f2 = xfs_calc_finish_rui_reservation(mp, 1); 652 + const unsigned int f3 = xfs_calc_finish_cui_reservation(mp, 1); 653 + const unsigned int step_size = max3(f1, f2, f3); 654 + 655 + /* Largest buffer size (in fsblocks) that can be invalidated. */ 656 + const unsigned int max_binval = xrep_binval_max_fsblocks(mp); 657 + 658 + /* Overhead of invalidating one buffer */ 659 + const unsigned int per_binval = 660 + xfs_buf_inval_log_space(1, XFS_B_TO_FSBT(mp, max_binval)); 661 + 662 + /* 663 + * For each transaction in a reap chain, we can delete some number of 664 + * extents and invalidate some number of blocks. We assume that CoW 665 + * staging extents are usually more than 1 fsblock, and that there 666 + * shouldn't be any buffers for those blocks. From the assumptions, 667 + * set the number of deferrals to use as much of the reservation as 668 + * it can, but leave space to invalidate 1/8th that number of buffers. 669 + */ 670 + const unsigned int variable_overhead = per_intent + 671 + (per_binval / 8); 672 + 673 + xreap_configure_limits(rs, step_size, variable_overhead, per_intent, 674 + per_binval); 675 + 676 + trace_xreap_agcow_limits(sc->tp, per_binval, rs->max_binval, step_size, 677 + per_intent, rs->max_deferred); 519 678 } 520 679 521 680 /* ··· 736 531 if (error) 737 532 return error; 738 533 xreap_defer_finish_reset(rs); 739 - } else if (xreap_want_roll(rs)) { 534 + } else if (xreap_want_binval_roll(rs)) { 740 535 error = xrep_roll_ag_trans(sc); 741 536 if (error) 742 537 return error; 743 - xreap_reset(rs); 538 + xreap_binval_reset(rs); 744 539 } 745 540 746 541 agbno += aglen; ··· 767 562 ASSERT(xfs_has_rmapbt(sc->mp)); 768 563 ASSERT(sc->ip == NULL); 769 564 565 + xreap_configure_agextent_limits(&rs); 770 566 error = xagb_bitmap_walk(bitmap, xreap_agmeta_extent, &rs); 771 567 if (error) 772 568 return error; 773 569 774 - if (xreap_dirty(&rs)) 570 + if (xreap_is_dirty(&rs)) 775 571 return xrep_defer_finish(sc); 776 572 777 573 return 0; ··· 834 628 if (error) 835 629 goto out_agf; 836 630 xreap_defer_finish_reset(rs); 837 - } else if (xreap_want_roll(rs)) { 631 + } else if (xreap_want_binval_roll(rs)) { 838 632 /* 839 633 * Hold the AGF buffer across the transaction roll so 840 634 * that we don't have to reattach it to the scrub ··· 845 639 xfs_trans_bjoin(sc->tp, sc->sa.agf_bp); 846 640 if (error) 847 641 goto out_agf; 848 - xreap_reset(rs); 642 + xreap_binval_reset(rs); 849 643 } 850 644 851 645 agbno += aglen; ··· 880 674 ASSERT(xfs_has_rmapbt(sc->mp)); 881 675 ASSERT(sc->ip != NULL); 882 676 677 + if (oinfo == &XFS_RMAP_OINFO_COW) 678 + xreap_configure_agcow_limits(&rs); 679 + else 680 + xreap_configure_agextent_limits(&rs); 883 681 error = xfsb_bitmap_walk(bitmap, xreap_fsmeta_extent, &rs); 884 682 if (error) 885 683 return error; 886 684 887 - if (xreap_dirty(&rs)) 685 + if (xreap_is_dirty(&rs)) 888 686 return xrep_defer_finish(sc); 889 687 890 688 return 0; ··· 980 770 rtbno = xfs_rgbno_to_rtb(sc->sr.rtg, rgbno); 981 771 982 772 /* 983 - * If there are other rmappings, this block is cross linked and must 773 + * t1: There are other rmappings; this block is cross linked and must 984 774 * not be freed. Remove the forward and reverse mapping and move on. 985 775 */ 986 776 if (crosslinked) { ··· 988 778 *rglenp); 989 779 990 780 xfs_refcount_free_cow_extent(sc->tp, true, rtbno, *rglenp); 991 - rs->deferred++; 781 + xreap_inc_defer(rs); 992 782 return 0; 993 783 } 994 784 995 785 trace_xreap_dispose_free_extent(rtg_group(sc->sr.rtg), rgbno, *rglenp); 996 786 997 787 /* 998 - * The CoW staging extent is not crosslinked. Use deferred work items 788 + * t2: The CoW staging extent is not crosslinked. Use deferred work 999 789 * to remove the refcountbt records (which removes the rmap records) 1000 790 * and free the extent. We're not worried about the system going down 1001 791 * here because log recovery walks the refcount btree to clean out the ··· 1009 799 if (error) 1010 800 return error; 1011 801 1012 - rs->deferred++; 802 + xreap_inc_defer(rs); 1013 803 return 0; 804 + } 805 + 806 + /* 807 + * Compute the maximum number of intent items that reaping can attach to the 808 + * scrub transaction given the worst case log overhead of the intent items 809 + * needed to reap a single CoW staging extent. This is not for freeing 810 + * metadata blocks. 811 + */ 812 + STATIC void 813 + xreap_configure_rgcow_limits( 814 + struct xreap_state *rs) 815 + { 816 + struct xfs_scrub *sc = rs->sc; 817 + struct xfs_mount *mp = sc->mp; 818 + 819 + /* 820 + * In the worst case, relogging an intent item causes both an intent 821 + * item and a done item to be attached to a transaction for each extent 822 + * that we'd like to process. 823 + */ 824 + const unsigned int efi = xfs_efi_log_space(1) + 825 + xfs_efd_log_space(1); 826 + const unsigned int rui = xfs_rui_log_space(1) + 827 + xfs_rud_log_space(); 828 + const unsigned int cui = xfs_cui_log_space(1) + 829 + xfs_cud_log_space(); 830 + 831 + /* 832 + * Various things can happen when reaping non-CoW metadata blocks: 833 + * 834 + * t1: Unmapping crosslinked CoW blocks: deferred removal of refcount 835 + * record, which defers removal of rmap record 836 + * 837 + * t2: Freeing CoW blocks: deferred removal of refcount record, which 838 + * defers removal of rmap record; and deferred removal of the space 839 + * 840 + * For simplicity, we'll use the worst-case intents size to determine 841 + * the maximum number of deferred extents before we have to finish the 842 + * whole chain. If we're trying to reap a btree larger than this size, 843 + * a crash midway through reaping can result in leaked blocks. 844 + */ 845 + const unsigned int t1 = cui + rui; 846 + const unsigned int t2 = cui + rui + efi; 847 + const unsigned int per_intent = max(t1, t2); 848 + 849 + /* 850 + * For each transaction in a reap chain, we must be able to take one 851 + * step in the defer item chain, which should only consist of CUI, EFI, 852 + * or RUI items. 853 + */ 854 + const unsigned int f1 = xfs_calc_finish_rt_efi_reservation(mp, 1); 855 + const unsigned int f2 = xfs_calc_finish_rt_rui_reservation(mp, 1); 856 + const unsigned int f3 = xfs_calc_finish_rt_cui_reservation(mp, 1); 857 + const unsigned int step_size = max3(f1, f2, f3); 858 + 859 + /* 860 + * The only buffer for the rt device is the rtgroup super, so we don't 861 + * need to save space for buffer invalidations. 862 + */ 863 + xreap_configure_limits(rs, step_size, per_intent, per_intent, 0); 864 + 865 + trace_xreap_rgcow_limits(sc->tp, 0, 0, step_size, per_intent, 866 + rs->max_deferred); 1014 867 } 1015 868 1016 869 #define XREAP_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP | \ ··· 1128 855 if (error) 1129 856 goto out_unlock; 1130 857 xreap_defer_finish_reset(rs); 1131 - } else if (xreap_want_roll(rs)) { 858 + } else if (xreap_want_binval_roll(rs)) { 1132 859 error = xfs_trans_roll_inode(&sc->tp, sc->ip); 1133 860 if (error) 1134 861 goto out_unlock; 1135 - xreap_reset(rs); 862 + xreap_binval_reset(rs); 1136 863 } 1137 864 1138 865 rgbno += rglen; ··· 1164 891 1165 892 ASSERT(xfs_has_rmapbt(sc->mp)); 1166 893 ASSERT(sc->ip != NULL); 894 + ASSERT(oinfo == &XFS_RMAP_OINFO_COW); 1167 895 896 + xreap_configure_rgcow_limits(&rs); 1168 897 error = xrtb_bitmap_walk(bitmap, xreap_rtmeta_extent, &rs); 1169 898 if (error) 1170 899 return error; 1171 900 1172 - if (xreap_dirty(&rs)) 901 + if (xreap_is_dirty(&rs)) 1173 902 return xrep_defer_finish(sc); 1174 903 1175 904 return 0; ··· 1204 929 ASSERT(sc->ip != NULL); 1205 930 ASSERT(xfs_is_metadir_inode(sc->ip)); 1206 931 932 + xreap_configure_agextent_limits(&rs); 1207 933 xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); 1208 - 1209 934 error = xfsb_bitmap_walk(bitmap, xreap_fsmeta_extent, &rs); 1210 935 if (error) 1211 936 return error; 1212 937 1213 - if (xreap_dirty(&rs)) { 938 + if (xreap_is_dirty(&rs)) { 1214 939 error = xrep_defer_finish(sc); 1215 940 if (error) 1216 941 return error; ··· 1230 955 */ 1231 956 STATIC int 1232 957 xreap_bmapi_select( 1233 - struct xfs_scrub *sc, 1234 - struct xfs_inode *ip, 1235 - int whichfork, 958 + struct xreap_state *rs, 1236 959 struct xfs_bmbt_irec *imap, 1237 960 bool *crosslinked) 1238 961 { 1239 962 struct xfs_owner_info oinfo; 963 + struct xfs_scrub *sc = rs->sc; 1240 964 struct xfs_btree_cur *cur; 1241 965 xfs_filblks_t len = 1; 1242 966 xfs_agblock_t bno; ··· 1249 975 cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, sc->sa.agf_bp, 1250 976 sc->sa.pag); 1251 977 1252 - xfs_rmap_ino_owner(&oinfo, ip->i_ino, whichfork, imap->br_startoff); 978 + xfs_rmap_ino_owner(&oinfo, rs->ip->i_ino, rs->whichfork, 979 + imap->br_startoff); 1253 980 error = xfs_rmap_has_other_keys(cur, agbno, 1, &oinfo, crosslinked); 1254 981 if (error) 1255 982 goto out_cur; ··· 1313 1038 */ 1314 1039 STATIC int 1315 1040 xreap_bmapi_binval( 1316 - struct xfs_scrub *sc, 1317 - struct xfs_inode *ip, 1318 - int whichfork, 1041 + struct xreap_state *rs, 1319 1042 struct xfs_bmbt_irec *imap) 1320 1043 { 1044 + struct xfs_scrub *sc = rs->sc; 1321 1045 struct xfs_mount *mp = sc->mp; 1322 1046 struct xfs_perag *pag = sc->sa.pag; 1323 - int bmap_flags = xfs_bmapi_aflag(whichfork); 1047 + int bmap_flags = xfs_bmapi_aflag(rs->whichfork); 1324 1048 xfs_fileoff_t off; 1325 1049 xfs_fileoff_t max_off; 1326 1050 xfs_extlen_t scan_blocks; 1327 1051 xfs_agblock_t bno; 1328 1052 xfs_agblock_t agbno; 1329 1053 xfs_agblock_t agbno_next; 1330 - unsigned int invalidated = 0; 1331 1054 int error; 1332 1055 1333 1056 /* ··· 1352 1079 struct xfs_bmbt_irec hmap; 1353 1080 int nhmaps = 1; 1354 1081 1355 - error = xfs_bmapi_read(ip, off, max_off - off, &hmap, 1082 + error = xfs_bmapi_read(rs->ip, off, max_off - off, &hmap, 1356 1083 &nhmaps, bmap_flags); 1357 1084 if (error) 1358 1085 return error; ··· 1393 1120 xfs_buf_stale(bp); 1394 1121 xfs_buf_relse(bp); 1395 1122 } 1396 - invalidated++; 1397 1123 1398 1124 /* 1399 1125 * Stop invalidating if we've hit the limit; we should 1400 1126 * still have enough reservation left to free however 1401 - * much of the mapping we've seen so far. 1127 + * far we've gotten. 1402 1128 */ 1403 - if (invalidated > XREAP_MAX_BINVAL) { 1129 + if (!xreap_inc_binval(rs)) { 1404 1130 imap->br_blockcount = agbno_next - bno; 1405 1131 goto out; 1406 1132 } ··· 1421 1149 */ 1422 1150 STATIC int 1423 1151 xrep_reap_bmapi_iter( 1424 - struct xfs_scrub *sc, 1425 - struct xfs_inode *ip, 1426 - int whichfork, 1152 + struct xreap_state *rs, 1427 1153 struct xfs_bmbt_irec *imap, 1428 1154 bool crosslinked) 1429 1155 { 1156 + struct xfs_scrub *sc = rs->sc; 1430 1157 int error; 1431 1158 1432 1159 if (crosslinked) { ··· 1442 1171 imap->br_blockcount); 1443 1172 1444 1173 /* 1445 - * Schedule removal of the mapping from the fork. We use 1174 + * t0: Schedule removal of the mapping from the fork. We use 1446 1175 * deferred log intents in this function to control the exact 1447 1176 * sequence of metadata updates. 1448 1177 */ 1449 - xfs_bmap_unmap_extent(sc->tp, ip, whichfork, imap); 1450 - xfs_trans_mod_dquot_byino(sc->tp, ip, XFS_TRANS_DQ_BCOUNT, 1178 + xfs_bmap_unmap_extent(sc->tp, rs->ip, rs->whichfork, imap); 1179 + xfs_trans_mod_dquot_byino(sc->tp, rs->ip, XFS_TRANS_DQ_BCOUNT, 1451 1180 -(int64_t)imap->br_blockcount); 1452 - xfs_rmap_unmap_extent(sc->tp, ip, whichfork, imap); 1181 + xfs_rmap_unmap_extent(sc->tp, rs->ip, rs->whichfork, imap); 1453 1182 return 0; 1454 1183 } 1455 1184 ··· 1470 1199 * transaction is full of logged buffer invalidations, so we need to 1471 1200 * return early so that we can roll and retry. 1472 1201 */ 1473 - error = xreap_bmapi_binval(sc, ip, whichfork, imap); 1202 + error = xreap_bmapi_binval(rs, imap); 1474 1203 if (error || imap->br_blockcount == 0) 1475 1204 return error; 1476 1205 1477 1206 /* 1478 - * Schedule removal of the mapping from the fork. We use deferred log 1479 - * intents in this function to control the exact sequence of metadata 1207 + * t1: Schedule removal of the mapping from the fork. We use deferred 1208 + * work in this function to control the exact sequence of metadata 1480 1209 * updates. 1481 1210 */ 1482 - xfs_bmap_unmap_extent(sc->tp, ip, whichfork, imap); 1483 - xfs_trans_mod_dquot_byino(sc->tp, ip, XFS_TRANS_DQ_BCOUNT, 1211 + xfs_bmap_unmap_extent(sc->tp, rs->ip, rs->whichfork, imap); 1212 + xfs_trans_mod_dquot_byino(sc->tp, rs->ip, XFS_TRANS_DQ_BCOUNT, 1484 1213 -(int64_t)imap->br_blockcount); 1485 1214 return xfs_free_extent_later(sc->tp, imap->br_startblock, 1486 1215 imap->br_blockcount, NULL, XFS_AG_RESV_NONE, 1487 1216 XFS_FREE_EXTENT_SKIP_DISCARD); 1217 + } 1218 + 1219 + /* Compute the maximum mapcount of a file buffer. */ 1220 + static unsigned int 1221 + xreap_bmapi_binval_mapcount( 1222 + struct xfs_scrub *sc) 1223 + { 1224 + /* directory blocks can span multiple fsblocks and be discontiguous */ 1225 + if (sc->sm->sm_type == XFS_SCRUB_TYPE_DIR) 1226 + return sc->mp->m_dir_geo->fsbcount; 1227 + 1228 + /* all other file xattr/symlink blocks must be contiguous */ 1229 + return 1; 1230 + } 1231 + 1232 + /* Compute the maximum block size of a file buffer. */ 1233 + static unsigned int 1234 + xreap_bmapi_binval_blocksize( 1235 + struct xfs_scrub *sc) 1236 + { 1237 + switch (sc->sm->sm_type) { 1238 + case XFS_SCRUB_TYPE_DIR: 1239 + return sc->mp->m_dir_geo->blksize; 1240 + case XFS_SCRUB_TYPE_XATTR: 1241 + case XFS_SCRUB_TYPE_PARENT: 1242 + /* 1243 + * The xattr structure itself consists of single fsblocks, but 1244 + * there could be remote xattr blocks to invalidate. 1245 + */ 1246 + return XFS_XATTR_SIZE_MAX; 1247 + } 1248 + 1249 + /* everything else is a single block */ 1250 + return sc->mp->m_sb.sb_blocksize; 1251 + } 1252 + 1253 + /* 1254 + * Compute the maximum number of buffer invalidations that we can do while 1255 + * reaping a single extent from a file fork. 1256 + */ 1257 + STATIC void 1258 + xreap_configure_bmapi_limits( 1259 + struct xreap_state *rs) 1260 + { 1261 + struct xfs_scrub *sc = rs->sc; 1262 + struct xfs_mount *mp = sc->mp; 1263 + 1264 + /* overhead of invalidating a buffer */ 1265 + const unsigned int per_binval = 1266 + xfs_buf_inval_log_space(xreap_bmapi_binval_mapcount(sc), 1267 + xreap_bmapi_binval_blocksize(sc)); 1268 + 1269 + /* 1270 + * In the worst case, relogging an intent item causes both an intent 1271 + * item and a done item to be attached to a transaction for each extent 1272 + * that we'd like to process. 1273 + */ 1274 + const unsigned int efi = xfs_efi_log_space(1) + 1275 + xfs_efd_log_space(1); 1276 + const unsigned int rui = xfs_rui_log_space(1) + 1277 + xfs_rud_log_space(); 1278 + const unsigned int bui = xfs_bui_log_space(1) + 1279 + xfs_bud_log_space(); 1280 + 1281 + /* 1282 + * t1: Unmapping crosslinked file data blocks: one bmap deletion, 1283 + * possibly an EFI for underfilled bmbt blocks, and an rmap deletion. 1284 + * 1285 + * t2: Freeing freeing file data blocks: one bmap deletion, possibly an 1286 + * EFI for underfilled bmbt blocks, and another EFI for the space 1287 + * itself. 1288 + */ 1289 + const unsigned int t1 = (bui + efi) + rui; 1290 + const unsigned int t2 = (bui + efi) + efi; 1291 + const unsigned int per_intent = max(t1, t2); 1292 + 1293 + /* 1294 + * For each transaction in a reap chain, we must be able to take one 1295 + * step in the defer item chain, which should only consist of CUI, EFI, 1296 + * or RUI items. 1297 + */ 1298 + const unsigned int f1 = xfs_calc_finish_efi_reservation(mp, 1); 1299 + const unsigned int f2 = xfs_calc_finish_rui_reservation(mp, 1); 1300 + const unsigned int f3 = xfs_calc_finish_bui_reservation(mp, 1); 1301 + const unsigned int step_size = max3(f1, f2, f3); 1302 + 1303 + /* 1304 + * Each call to xreap_ifork_extent starts with a clean transaction and 1305 + * operates on a single mapping by creating a chain of log intent items 1306 + * for that mapping. We need to leave enough reservation in the 1307 + * transaction to log btree buffer and inode updates for each step in 1308 + * the chain, and to relog the log intents. 1309 + */ 1310 + const unsigned int per_extent_res = per_intent + step_size; 1311 + 1312 + xreap_configure_limits(rs, per_extent_res, per_binval, 0, per_binval); 1313 + 1314 + trace_xreap_bmapi_limits(sc->tp, per_binval, rs->max_binval, 1315 + step_size, per_intent, 1); 1488 1316 } 1489 1317 1490 1318 /* ··· 1592 1222 */ 1593 1223 STATIC int 1594 1224 xreap_ifork_extent( 1595 - struct xfs_scrub *sc, 1596 - struct xfs_inode *ip, 1597 - int whichfork, 1225 + struct xreap_state *rs, 1598 1226 struct xfs_bmbt_irec *imap) 1599 1227 { 1228 + struct xfs_scrub *sc = rs->sc; 1600 1229 xfs_agnumber_t agno; 1601 1230 bool crosslinked; 1602 1231 int error; 1603 1232 1604 1233 ASSERT(sc->sa.pag == NULL); 1605 1234 1606 - trace_xreap_ifork_extent(sc, ip, whichfork, imap); 1235 + trace_xreap_ifork_extent(sc, rs->ip, rs->whichfork, imap); 1607 1236 1608 1237 agno = XFS_FSB_TO_AGNO(sc->mp, imap->br_startblock); 1609 1238 sc->sa.pag = xfs_perag_get(sc->mp, agno); ··· 1617 1248 * Decide the fate of the blocks at the beginning of the mapping, then 1618 1249 * update the mapping to use it with the unmap calls. 1619 1250 */ 1620 - error = xreap_bmapi_select(sc, ip, whichfork, imap, &crosslinked); 1251 + error = xreap_bmapi_select(rs, imap, &crosslinked); 1621 1252 if (error) 1622 1253 goto out_agf; 1623 1254 1624 - error = xrep_reap_bmapi_iter(sc, ip, whichfork, imap, crosslinked); 1255 + error = xrep_reap_bmapi_iter(rs, imap, crosslinked); 1625 1256 if (error) 1626 1257 goto out_agf; 1627 1258 ··· 1645 1276 struct xfs_inode *ip, 1646 1277 int whichfork) 1647 1278 { 1279 + struct xreap_state rs = { 1280 + .sc = sc, 1281 + .ip = ip, 1282 + .whichfork = whichfork, 1283 + }; 1648 1284 xfs_fileoff_t off = 0; 1649 1285 int bmap_flags = xfs_bmapi_aflag(whichfork); 1650 1286 int error; ··· 1658 1284 ASSERT(ip == sc->ip || ip == sc->tempip); 1659 1285 ASSERT(whichfork == XFS_ATTR_FORK || !XFS_IS_REALTIME_INODE(ip)); 1660 1286 1287 + xreap_configure_bmapi_limits(&rs); 1661 1288 while (off < XFS_MAX_FILEOFF) { 1662 1289 struct xfs_bmbt_irec imap; 1663 1290 int nimaps = 1; ··· 1678 1303 * can in a single transaction. 1679 1304 */ 1680 1305 if (xfs_bmap_is_real_extent(&imap)) { 1681 - error = xreap_ifork_extent(sc, ip, whichfork, &imap); 1306 + error = xreap_ifork_extent(&rs, &imap); 1682 1307 if (error) 1683 1308 return error; 1684 1309 1685 1310 error = xfs_defer_finish(&sc->tp); 1686 1311 if (error) 1687 1312 return error; 1313 + xreap_defer_finish_reset(&rs); 1688 1314 } 1689 1315 1690 1316 off = imap.br_startoff + imap.br_blockcount;

+1 -1

fs/xfs/scrub/repair.c

··· 1110 1110 return true; 1111 1111 1112 1112 /* Let debug users force us into the repair routines. */ 1113 - if (XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) 1113 + if (XFS_TEST_ERROR(sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) 1114 1114 return true; 1115 1115 1116 1116 /* Metadata is corrupt or failed cross-referencing. */

-8

fs/xfs/scrub/repair.h

··· 18 18 19 19 #ifdef CONFIG_XFS_ONLINE_REPAIR 20 20 21 - /* 22 - * This is the maximum number of deferred extent freeing item extents (EFIs) 23 - * that we'll attach to a transaction without rolling the transaction to avoid 24 - * overrunning a tr_itruncate reservation. 25 - */ 26 - #define XREP_MAX_ITRUNCATE_EFIS (128) 27 - 28 - 29 21 /* Repair helpers */ 30 22 31 23 int xrep_attempt(struct xfs_scrub *sc, struct xchk_stats_run *run);

+1 -1

fs/xfs/scrub/symlink_repair.c

··· 185 185 return 0; 186 186 187 187 nr = min(XFS_SYMLINK_MAXLEN, xfs_inode_data_fork_size(ip)); 188 - strncpy(target_buf, ifp->if_data, nr); 188 + memcpy(target_buf, ifp->if_data, nr); 189 189 return nr; 190 190 } 191 191

+1

fs/xfs/scrub/trace.c

··· 22 22 #include "xfs_parent.h" 23 23 #include "xfs_metafile.h" 24 24 #include "xfs_rtgroup.h" 25 + #include "xfs_trans.h" 25 26 #include "scrub/scrub.h" 26 27 #include "scrub/xfile.h" 27 28 #include "scrub/xfarray.h"

+45

fs/xfs/scrub/trace.h

··· 2000 2000 DEFINE_REPAIR_EXTENT_EVENT(xreap_bmapi_binval); 2001 2001 DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert); 2002 2002 2003 + DECLARE_EVENT_CLASS(xrep_reap_limits_class, 2004 + TP_PROTO(const struct xfs_trans *tp, unsigned int per_binval, 2005 + unsigned int max_binval, unsigned int step_size, 2006 + unsigned int per_intent, 2007 + unsigned int max_deferred), 2008 + TP_ARGS(tp, per_binval, max_binval, step_size, per_intent, max_deferred), 2009 + TP_STRUCT__entry( 2010 + __field(dev_t, dev) 2011 + __field(unsigned int, log_res) 2012 + __field(unsigned int, per_binval) 2013 + __field(unsigned int, max_binval) 2014 + __field(unsigned int, step_size) 2015 + __field(unsigned int, per_intent) 2016 + __field(unsigned int, max_deferred) 2017 + ), 2018 + TP_fast_assign( 2019 + __entry->dev = tp->t_mountp->m_super->s_dev; 2020 + __entry->log_res = tp->t_log_res; 2021 + __entry->per_binval = per_binval; 2022 + __entry->max_binval = max_binval; 2023 + __entry->step_size = step_size; 2024 + __entry->per_intent = per_intent; 2025 + __entry->max_deferred = max_deferred; 2026 + ), 2027 + TP_printk("dev %d:%d logres %u per_binval %u max_binval %u step_size %u per_intent %u max_deferred %u", 2028 + MAJOR(__entry->dev), MINOR(__entry->dev), 2029 + __entry->log_res, 2030 + __entry->per_binval, 2031 + __entry->max_binval, 2032 + __entry->step_size, 2033 + __entry->per_intent, 2034 + __entry->max_deferred) 2035 + ); 2036 + #define DEFINE_REPAIR_REAP_LIMITS_EVENT(name) \ 2037 + DEFINE_EVENT(xrep_reap_limits_class, name, \ 2038 + TP_PROTO(const struct xfs_trans *tp, unsigned int per_binval, \ 2039 + unsigned int max_binval, unsigned int step_size, \ 2040 + unsigned int per_intent, \ 2041 + unsigned int max_deferred), \ 2042 + TP_ARGS(tp, per_binval, max_binval, step_size, per_intent, max_deferred)) 2043 + DEFINE_REPAIR_REAP_LIMITS_EVENT(xreap_agextent_limits); 2044 + DEFINE_REPAIR_REAP_LIMITS_EVENT(xreap_agcow_limits); 2045 + DEFINE_REPAIR_REAP_LIMITS_EVENT(xreap_rgcow_limits); 2046 + DEFINE_REPAIR_REAP_LIMITS_EVENT(xreap_bmapi_limits); 2047 + 2003 2048 DECLARE_EVENT_CLASS(xrep_reap_find_class, 2004 2049 TP_PROTO(const struct xfs_group *xg, xfs_agblock_t agbno, 2005 2050 xfs_extlen_t len, bool crosslinked),

+1 -1

fs/xfs/xfs_attr_item.c

··· 491 491 /* Reset trans after EAGAIN cycle since the transaction is new */ 492 492 args->trans = tp; 493 493 494 - if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_LARP)) { 494 + if (XFS_TEST_ERROR(args->dp->i_mount, XFS_ERRTAG_LARP)) { 495 495 error = -EIO; 496 496 goto out; 497 497 }

+25 -21

fs/xfs/xfs_buf.c

··· 387 387 struct xfs_buftarg *btp, 388 388 struct xfs_buf_map *map) 389 389 { 390 - xfs_daddr_t eofs; 391 - 392 390 /* Check for IOs smaller than the sector size / not sector aligned */ 393 391 ASSERT(!(BBTOB(map->bm_len) < btp->bt_meta_sectorsize)); 394 392 ASSERT(!(BBTOB(map->bm_bn) & (xfs_off_t)btp->bt_meta_sectormask)); ··· 395 397 * Corrupted block numbers can get through to here, unfortunately, so we 396 398 * have to check that the buffer falls within the filesystem bounds. 397 399 */ 398 - eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks); 399 - if (map->bm_bn < 0 || map->bm_bn >= eofs) { 400 + if (map->bm_bn < 0 || map->bm_bn >= btp->bt_nr_sectors) { 400 401 xfs_alert(btp->bt_mount, 401 402 "%s: daddr 0x%llx out of range, EOFS 0x%llx", 402 - __func__, map->bm_bn, eofs); 403 + __func__, map->bm_bn, btp->bt_nr_sectors); 403 404 WARN_ON(1); 404 405 return -EFSCORRUPTED; 405 406 } ··· 1296 1299 if (bio->bi_status) 1297 1300 xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status)); 1298 1301 else if ((bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) && 1299 - XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR)) 1302 + XFS_TEST_ERROR(bp->b_mount, XFS_ERRTAG_BUF_IOERROR)) 1300 1303 xfs_buf_ioerror(bp, -EIO); 1301 1304 1302 1305 if (bp->b_flags & XBF_ASYNC) { ··· 1717 1720 int 1718 1721 xfs_configure_buftarg( 1719 1722 struct xfs_buftarg *btp, 1720 - unsigned int sectorsize) 1723 + unsigned int sectorsize, 1724 + xfs_rfsblock_t nr_blocks) 1721 1725 { 1722 - int error; 1726 + struct xfs_mount *mp = btp->bt_mount; 1723 1727 1724 - ASSERT(btp->bt_bdev != NULL); 1728 + if (btp->bt_bdev) { 1729 + int error; 1725 1730 1726 - /* Set up metadata sector size info */ 1727 - btp->bt_meta_sectorsize = sectorsize; 1728 - btp->bt_meta_sectormask = sectorsize - 1; 1731 + error = bdev_validate_blocksize(btp->bt_bdev, sectorsize); 1732 + if (error) { 1733 + xfs_warn(mp, 1734 + "Cannot use blocksize %u on device %pg, err %d", 1735 + sectorsize, btp->bt_bdev, error); 1736 + return -EINVAL; 1737 + } 1729 1738 1730 - error = bdev_validate_blocksize(btp->bt_bdev, sectorsize); 1731 - if (error) { 1732 - xfs_warn(btp->bt_mount, 1733 - "Cannot use blocksize %u on device %pg, err %d", 1734 - sectorsize, btp->bt_bdev, error); 1735 - return -EINVAL; 1739 + if (bdev_can_atomic_write(btp->bt_bdev)) 1740 + xfs_configure_buftarg_atomic_writes(btp); 1736 1741 } 1737 1742 1738 - if (bdev_can_atomic_write(btp->bt_bdev)) 1739 - xfs_configure_buftarg_atomic_writes(btp); 1743 + btp->bt_meta_sectorsize = sectorsize; 1744 + btp->bt_meta_sectormask = sectorsize - 1; 1745 + /* m_blkbb_log is not set up yet */ 1746 + btp->bt_nr_sectors = nr_blocks << (mp->m_sb.sb_blocklog - BBSHIFT); 1740 1747 return 0; 1741 1748 } 1742 1749 ··· 1750 1749 size_t logical_sectorsize, 1751 1750 const char *descr) 1752 1751 { 1752 + /* The maximum size of the buftarg is only known once the sb is read. */ 1753 + btp->bt_nr_sectors = (xfs_daddr_t)-1; 1754 + 1753 1755 /* Set up device logical sector size mask */ 1754 1756 btp->bt_logical_sectorsize = logical_sectorsize; 1755 1757 btp->bt_logical_sectormask = logical_sectorsize - 1; ··· 2088 2084 * This allows userspace to disrupt buffer caching for debug/testing 2089 2085 * purposes. 2090 2086 */ 2091 - if (XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_LRU_REF)) 2087 + if (XFS_TEST_ERROR(bp->b_mount, XFS_ERRTAG_BUF_LRU_REF)) 2092 2088 lru_ref = 0; 2093 2089 2094 2090 atomic_set(&bp->b_lru_ref, lru_ref);

+3 -1

fs/xfs/xfs_buf.h

··· 103 103 size_t bt_meta_sectormask; 104 104 size_t bt_logical_sectorsize; 105 105 size_t bt_logical_sectormask; 106 + xfs_daddr_t bt_nr_sectors; 106 107 107 108 /* LRU control structures */ 108 109 struct shrinker *bt_shrinker; ··· 373 372 extern void xfs_free_buftarg(struct xfs_buftarg *); 374 373 extern void xfs_buftarg_wait(struct xfs_buftarg *); 375 374 extern void xfs_buftarg_drain(struct xfs_buftarg *); 376 - int xfs_configure_buftarg(struct xfs_buftarg *btp, unsigned int sectorsize); 375 + int xfs_configure_buftarg(struct xfs_buftarg *btp, unsigned int sectorsize, 376 + xfs_fsblock_t nr_blocks); 377 377 378 378 #define xfs_readonly_buftarg(buftarg) bdev_read_only((buftarg)->bt_bdev) 379 379

+10

fs/xfs/xfs_buf_item_recover.c

··· 736 736 */ 737 737 xfs_sb_from_disk(&mp->m_sb, dsb); 738 738 739 + /* 740 + * Grow can change the device size. Mirror that into the buftarg. 741 + */ 742 + mp->m_ddev_targp->bt_nr_sectors = 743 + XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks); 744 + if (mp->m_rtdev_targp && mp->m_rtdev_targp != mp->m_ddev_targp) { 745 + mp->m_rtdev_targp->bt_nr_sectors = 746 + XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); 747 + } 748 + 739 749 if (mp->m_sb.sb_agcount < orig_agcount) { 740 750 xfs_alert(mp, "Shrinking AG count in log recovery not supported"); 741 751 return -EFSCORRUPTED;

+30 -186

fs/xfs/xfs_error.c

··· 10 10 #include "xfs_log_format.h" 11 11 #include "xfs_trans_resv.h" 12 12 #include "xfs_mount.h" 13 - #include "xfs_errortag.h" 14 13 #include "xfs_error.h" 15 14 #include "xfs_sysfs.h" 16 15 #include "xfs_inode.h" 17 16 18 17 #ifdef DEBUG 19 18 20 - static unsigned int xfs_errortag_random_default[] = { 21 - XFS_RANDOM_DEFAULT, 22 - XFS_RANDOM_IFLUSH_1, 23 - XFS_RANDOM_IFLUSH_2, 24 - XFS_RANDOM_IFLUSH_3, 25 - XFS_RANDOM_IFLUSH_4, 26 - XFS_RANDOM_IFLUSH_5, 27 - XFS_RANDOM_IFLUSH_6, 28 - XFS_RANDOM_DA_READ_BUF, 29 - XFS_RANDOM_BTREE_CHECK_LBLOCK, 30 - XFS_RANDOM_BTREE_CHECK_SBLOCK, 31 - XFS_RANDOM_ALLOC_READ_AGF, 32 - XFS_RANDOM_IALLOC_READ_AGI, 33 - XFS_RANDOM_ITOBP_INOTOBP, 34 - XFS_RANDOM_IUNLINK, 35 - XFS_RANDOM_IUNLINK_REMOVE, 36 - XFS_RANDOM_DIR_INO_VALIDATE, 37 - XFS_RANDOM_BULKSTAT_READ_CHUNK, 38 - XFS_RANDOM_IODONE_IOERR, 39 - XFS_RANDOM_STRATREAD_IOERR, 40 - XFS_RANDOM_STRATCMPL_IOERR, 41 - XFS_RANDOM_DIOWRITE_IOERR, 42 - XFS_RANDOM_BMAPIFORMAT, 43 - XFS_RANDOM_FREE_EXTENT, 44 - XFS_RANDOM_RMAP_FINISH_ONE, 45 - XFS_RANDOM_REFCOUNT_CONTINUE_UPDATE, 46 - XFS_RANDOM_REFCOUNT_FINISH_ONE, 47 - XFS_RANDOM_BMAP_FINISH_ONE, 48 - XFS_RANDOM_AG_RESV_CRITICAL, 49 - 0, /* XFS_RANDOM_DROP_WRITES has been removed */ 50 - XFS_RANDOM_LOG_BAD_CRC, 51 - XFS_RANDOM_LOG_ITEM_PIN, 52 - XFS_RANDOM_BUF_LRU_REF, 53 - XFS_RANDOM_FORCE_SCRUB_REPAIR, 54 - XFS_RANDOM_FORCE_SUMMARY_RECALC, 55 - XFS_RANDOM_IUNLINK_FALLBACK, 56 - XFS_RANDOM_BUF_IOERROR, 57 - XFS_RANDOM_REDUCE_MAX_IEXTENTS, 58 - XFS_RANDOM_BMAP_ALLOC_MINLEN_EXTENT, 59 - XFS_RANDOM_AG_RESV_FAIL, 60 - XFS_RANDOM_LARP, 61 - XFS_RANDOM_DA_LEAF_SPLIT, 62 - XFS_RANDOM_ATTR_LEAF_TO_NODE, 63 - XFS_RANDOM_WB_DELAY_MS, 64 - XFS_RANDOM_WRITE_DELAY_MS, 65 - XFS_RANDOM_EXCHMAPS_FINISH_ONE, 66 - XFS_RANDOM_METAFILE_RESV_CRITICAL, 67 - }; 19 + #define XFS_ERRTAG(_tag, _name, _default) \ 20 + [XFS_ERRTAG_##_tag] = (_default), 21 + #include "xfs_errortag.h" 22 + static const unsigned int xfs_errortag_random_default[] = { XFS_ERRTAGS }; 23 + #undef XFS_ERRTAG 68 24 69 25 struct xfs_errortag_attr { 70 26 struct attribute attr; ··· 49 93 size_t count) 50 94 { 51 95 struct xfs_mount *mp = to_mp(kobject); 52 - struct xfs_errortag_attr *xfs_attr = to_attr(attr); 96 + unsigned int error_tag = to_attr(attr)->tag; 53 97 int ret; 54 - unsigned int val; 55 98 56 99 if (strcmp(buf, "default") == 0) { 57 - val = xfs_errortag_random_default[xfs_attr->tag]; 100 + mp->m_errortag[error_tag] = 101 + xfs_errortag_random_default[error_tag]; 58 102 } else { 59 - ret = kstrtouint(buf, 0, &val); 103 + ret = kstrtouint(buf, 0, &mp->m_errortag[error_tag]); 60 104 if (ret) 61 105 return ret; 62 106 } 63 107 64 - ret = xfs_errortag_set(mp, xfs_attr->tag, val); 65 - if (ret) 66 - return ret; 67 108 return count; 68 109 } 69 110 ··· 71 118 char *buf) 72 119 { 73 120 struct xfs_mount *mp = to_mp(kobject); 74 - struct xfs_errortag_attr *xfs_attr = to_attr(attr); 121 + unsigned int error_tag = to_attr(attr)->tag; 75 122 76 - return snprintf(buf, PAGE_SIZE, "%u\n", 77 - xfs_errortag_get(mp, xfs_attr->tag)); 123 + return snprintf(buf, PAGE_SIZE, "%u\n", mp->m_errortag[error_tag]); 78 124 } 79 125 80 126 static const struct sysfs_ops xfs_errortag_sysfs_ops = { ··· 81 129 .store = xfs_errortag_attr_store, 82 130 }; 83 131 84 - #define XFS_ERRORTAG_ATTR_RW(_name, _tag) \ 132 + #define XFS_ERRTAG(_tag, _name, _default) \ 85 133 static struct xfs_errortag_attr xfs_errortag_attr_##_name = { \ 86 134 .attr = {.name = __stringify(_name), \ 87 135 .mode = VERIFY_OCTAL_PERMISSIONS(S_IWUSR | S_IRUGO) }, \ 88 - .tag = (_tag), \ 89 - } 136 + .tag = XFS_ERRTAG_##_tag, \ 137 + }; 138 + #include "xfs_errortag.h" 139 + XFS_ERRTAGS 140 + #undef XFS_ERRTAG 90 141 91 - #define XFS_ERRORTAG_ATTR_LIST(_name) &xfs_errortag_attr_##_name.attr 92 - 93 - XFS_ERRORTAG_ATTR_RW(noerror, XFS_ERRTAG_NOERROR); 94 - XFS_ERRORTAG_ATTR_RW(iflush1, XFS_ERRTAG_IFLUSH_1); 95 - XFS_ERRORTAG_ATTR_RW(iflush2, XFS_ERRTAG_IFLUSH_2); 96 - XFS_ERRORTAG_ATTR_RW(iflush3, XFS_ERRTAG_IFLUSH_3); 97 - XFS_ERRORTAG_ATTR_RW(iflush4, XFS_ERRTAG_IFLUSH_4); 98 - XFS_ERRORTAG_ATTR_RW(iflush5, XFS_ERRTAG_IFLUSH_5); 99 - XFS_ERRORTAG_ATTR_RW(iflush6, XFS_ERRTAG_IFLUSH_6); 100 - XFS_ERRORTAG_ATTR_RW(dareadbuf, XFS_ERRTAG_DA_READ_BUF); 101 - XFS_ERRORTAG_ATTR_RW(btree_chk_lblk, XFS_ERRTAG_BTREE_CHECK_LBLOCK); 102 - XFS_ERRORTAG_ATTR_RW(btree_chk_sblk, XFS_ERRTAG_BTREE_CHECK_SBLOCK); 103 - XFS_ERRORTAG_ATTR_RW(readagf, XFS_ERRTAG_ALLOC_READ_AGF); 104 - XFS_ERRORTAG_ATTR_RW(readagi, XFS_ERRTAG_IALLOC_READ_AGI); 105 - XFS_ERRORTAG_ATTR_RW(itobp, XFS_ERRTAG_ITOBP_INOTOBP); 106 - XFS_ERRORTAG_ATTR_RW(iunlink, XFS_ERRTAG_IUNLINK); 107 - XFS_ERRORTAG_ATTR_RW(iunlinkrm, XFS_ERRTAG_IUNLINK_REMOVE); 108 - XFS_ERRORTAG_ATTR_RW(dirinovalid, XFS_ERRTAG_DIR_INO_VALIDATE); 109 - XFS_ERRORTAG_ATTR_RW(bulkstat, XFS_ERRTAG_BULKSTAT_READ_CHUNK); 110 - XFS_ERRORTAG_ATTR_RW(logiodone, XFS_ERRTAG_IODONE_IOERR); 111 - XFS_ERRORTAG_ATTR_RW(stratread, XFS_ERRTAG_STRATREAD_IOERR); 112 - XFS_ERRORTAG_ATTR_RW(stratcmpl, XFS_ERRTAG_STRATCMPL_IOERR); 113 - XFS_ERRORTAG_ATTR_RW(diowrite, XFS_ERRTAG_DIOWRITE_IOERR); 114 - XFS_ERRORTAG_ATTR_RW(bmapifmt, XFS_ERRTAG_BMAPIFORMAT); 115 - XFS_ERRORTAG_ATTR_RW(free_extent, XFS_ERRTAG_FREE_EXTENT); 116 - XFS_ERRORTAG_ATTR_RW(rmap_finish_one, XFS_ERRTAG_RMAP_FINISH_ONE); 117 - XFS_ERRORTAG_ATTR_RW(refcount_continue_update, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE); 118 - XFS_ERRORTAG_ATTR_RW(refcount_finish_one, XFS_ERRTAG_REFCOUNT_FINISH_ONE); 119 - XFS_ERRORTAG_ATTR_RW(bmap_finish_one, XFS_ERRTAG_BMAP_FINISH_ONE); 120 - XFS_ERRORTAG_ATTR_RW(ag_resv_critical, XFS_ERRTAG_AG_RESV_CRITICAL); 121 - XFS_ERRORTAG_ATTR_RW(log_bad_crc, XFS_ERRTAG_LOG_BAD_CRC); 122 - XFS_ERRORTAG_ATTR_RW(log_item_pin, XFS_ERRTAG_LOG_ITEM_PIN); 123 - XFS_ERRORTAG_ATTR_RW(buf_lru_ref, XFS_ERRTAG_BUF_LRU_REF); 124 - XFS_ERRORTAG_ATTR_RW(force_repair, XFS_ERRTAG_FORCE_SCRUB_REPAIR); 125 - XFS_ERRORTAG_ATTR_RW(bad_summary, XFS_ERRTAG_FORCE_SUMMARY_RECALC); 126 - XFS_ERRORTAG_ATTR_RW(iunlink_fallback, XFS_ERRTAG_IUNLINK_FALLBACK); 127 - XFS_ERRORTAG_ATTR_RW(buf_ioerror, XFS_ERRTAG_BUF_IOERROR); 128 - XFS_ERRORTAG_ATTR_RW(reduce_max_iextents, XFS_ERRTAG_REDUCE_MAX_IEXTENTS); 129 - XFS_ERRORTAG_ATTR_RW(bmap_alloc_minlen_extent, XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT); 130 - XFS_ERRORTAG_ATTR_RW(ag_resv_fail, XFS_ERRTAG_AG_RESV_FAIL); 131 - XFS_ERRORTAG_ATTR_RW(larp, XFS_ERRTAG_LARP); 132 - XFS_ERRORTAG_ATTR_RW(da_leaf_split, XFS_ERRTAG_DA_LEAF_SPLIT); 133 - XFS_ERRORTAG_ATTR_RW(attr_leaf_to_node, XFS_ERRTAG_ATTR_LEAF_TO_NODE); 134 - XFS_ERRORTAG_ATTR_RW(wb_delay_ms, XFS_ERRTAG_WB_DELAY_MS); 135 - XFS_ERRORTAG_ATTR_RW(write_delay_ms, XFS_ERRTAG_WRITE_DELAY_MS); 136 - XFS_ERRORTAG_ATTR_RW(exchmaps_finish_one, XFS_ERRTAG_EXCHMAPS_FINISH_ONE); 137 - XFS_ERRORTAG_ATTR_RW(metafile_resv_crit, XFS_ERRTAG_METAFILE_RESV_CRITICAL); 138 - 142 + #define XFS_ERRTAG(_tag, _name, _default) \ 143 + &xfs_errortag_attr_##_name.attr, 144 + #include "xfs_errortag.h" 139 145 static struct attribute *xfs_errortag_attrs[] = { 140 - XFS_ERRORTAG_ATTR_LIST(noerror), 141 - XFS_ERRORTAG_ATTR_LIST(iflush1), 142 - XFS_ERRORTAG_ATTR_LIST(iflush2), 143 - XFS_ERRORTAG_ATTR_LIST(iflush3), 144 - XFS_ERRORTAG_ATTR_LIST(iflush4), 145 - XFS_ERRORTAG_ATTR_LIST(iflush5), 146 - XFS_ERRORTAG_ATTR_LIST(iflush6), 147 - XFS_ERRORTAG_ATTR_LIST(dareadbuf), 148 - XFS_ERRORTAG_ATTR_LIST(btree_chk_lblk), 149 - XFS_ERRORTAG_ATTR_LIST(btree_chk_sblk), 150 - XFS_ERRORTAG_ATTR_LIST(readagf), 151 - XFS_ERRORTAG_ATTR_LIST(readagi), 152 - XFS_ERRORTAG_ATTR_LIST(itobp), 153 - XFS_ERRORTAG_ATTR_LIST(iunlink), 154 - XFS_ERRORTAG_ATTR_LIST(iunlinkrm), 155 - XFS_ERRORTAG_ATTR_LIST(dirinovalid), 156 - XFS_ERRORTAG_ATTR_LIST(bulkstat), 157 - XFS_ERRORTAG_ATTR_LIST(logiodone), 158 - XFS_ERRORTAG_ATTR_LIST(stratread), 159 - XFS_ERRORTAG_ATTR_LIST(stratcmpl), 160 - XFS_ERRORTAG_ATTR_LIST(diowrite), 161 - XFS_ERRORTAG_ATTR_LIST(bmapifmt), 162 - XFS_ERRORTAG_ATTR_LIST(free_extent), 163 - XFS_ERRORTAG_ATTR_LIST(rmap_finish_one), 164 - XFS_ERRORTAG_ATTR_LIST(refcount_continue_update), 165 - XFS_ERRORTAG_ATTR_LIST(refcount_finish_one), 166 - XFS_ERRORTAG_ATTR_LIST(bmap_finish_one), 167 - XFS_ERRORTAG_ATTR_LIST(ag_resv_critical), 168 - XFS_ERRORTAG_ATTR_LIST(log_bad_crc), 169 - XFS_ERRORTAG_ATTR_LIST(log_item_pin), 170 - XFS_ERRORTAG_ATTR_LIST(buf_lru_ref), 171 - XFS_ERRORTAG_ATTR_LIST(force_repair), 172 - XFS_ERRORTAG_ATTR_LIST(bad_summary), 173 - XFS_ERRORTAG_ATTR_LIST(iunlink_fallback), 174 - XFS_ERRORTAG_ATTR_LIST(buf_ioerror), 175 - XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents), 176 - XFS_ERRORTAG_ATTR_LIST(bmap_alloc_minlen_extent), 177 - XFS_ERRORTAG_ATTR_LIST(ag_resv_fail), 178 - XFS_ERRORTAG_ATTR_LIST(larp), 179 - XFS_ERRORTAG_ATTR_LIST(da_leaf_split), 180 - XFS_ERRORTAG_ATTR_LIST(attr_leaf_to_node), 181 - XFS_ERRORTAG_ATTR_LIST(wb_delay_ms), 182 - XFS_ERRORTAG_ATTR_LIST(write_delay_ms), 183 - XFS_ERRORTAG_ATTR_LIST(exchmaps_finish_one), 184 - XFS_ERRORTAG_ATTR_LIST(metafile_resv_crit), 185 - NULL, 146 + XFS_ERRTAGS 147 + NULL 186 148 }; 187 149 ATTRIBUTE_GROUPS(xfs_errortag); 150 + #undef XFS_ERRTAG 151 + 152 + /* -1 because XFS_ERRTAG_DROP_WRITES got removed, + 1 for NULL termination */ 153 + static_assert(ARRAY_SIZE(xfs_errortag_attrs) == XFS_ERRTAG_MAX); 188 154 189 155 static const struct kobj_type xfs_errortag_ktype = { 190 156 .release = xfs_sysfs_release, ··· 165 295 bool 166 296 xfs_errortag_test( 167 297 struct xfs_mount *mp, 168 - const char *expression, 169 298 const char *file, 170 299 int line, 171 300 unsigned int error_tag) ··· 190 321 return false; 191 322 192 323 xfs_warn_ratelimited(mp, 193 - "Injecting error (%s) at file %s, line %d, on filesystem \"%s\"", 194 - expression, file, line, mp->m_super->s_id); 324 + "Injecting error at file %s, line %d, on filesystem \"%s\"", 325 + file, line, mp->m_super->s_id); 195 326 return true; 196 - } 197 - 198 - int 199 - xfs_errortag_get( 200 - struct xfs_mount *mp, 201 - unsigned int error_tag) 202 - { 203 - if (!xfs_errortag_valid(error_tag)) 204 - return -EINVAL; 205 - 206 - return mp->m_errortag[error_tag]; 207 - } 208 - 209 - int 210 - xfs_errortag_set( 211 - struct xfs_mount *mp, 212 - unsigned int error_tag, 213 - unsigned int tag_value) 214 - { 215 - if (!xfs_errortag_valid(error_tag)) 216 - return -EINVAL; 217 - 218 - mp->m_errortag[error_tag] = tag_value; 219 - return 0; 220 327 } 221 328 222 329 int ··· 204 359 205 360 if (!xfs_errortag_valid(error_tag)) 206 361 return -EINVAL; 207 - 208 - return xfs_errortag_set(mp, error_tag, 209 - xfs_errortag_random_default[error_tag]); 362 + mp->m_errortag[error_tag] = xfs_errortag_random_default[error_tag]; 363 + return 0; 210 364 } 211 365 212 366 int

+19 -28

fs/xfs/xfs_error.h

··· 8 8 9 9 struct xfs_mount; 10 10 11 - extern void xfs_error_report(const char *tag, int level, struct xfs_mount *mp, 12 - const char *filename, int linenum, 13 - xfs_failaddr_t failaddr); 14 - extern void xfs_corruption_error(const char *tag, int level, 15 - struct xfs_mount *mp, const void *buf, size_t bufsize, 16 - const char *filename, int linenum, 17 - xfs_failaddr_t failaddr); 11 + void xfs_error_report(const char *tag, int level, struct xfs_mount *mp, 12 + const char *filename, int linenum, xfs_failaddr_t failaddr); 13 + void xfs_corruption_error(const char *tag, int level, struct xfs_mount *mp, 14 + const void *buf, size_t bufsize, const char *filename, 15 + int linenum, xfs_failaddr_t failaddr); 18 16 void xfs_buf_corruption_error(struct xfs_buf *bp, xfs_failaddr_t fa); 19 - extern void xfs_buf_verifier_error(struct xfs_buf *bp, int error, 20 - const char *name, const void *buf, size_t bufsz, 21 - xfs_failaddr_t failaddr); 22 - extern void xfs_verifier_error(struct xfs_buf *bp, int error, 23 - xfs_failaddr_t failaddr); 24 - extern void xfs_inode_verifier_error(struct xfs_inode *ip, int error, 25 - const char *name, const void *buf, size_t bufsz, 26 - xfs_failaddr_t failaddr); 17 + void xfs_buf_verifier_error(struct xfs_buf *bp, int error, const char *name, 18 + const void *buf, size_t bufsz, xfs_failaddr_t failaddr); 19 + void xfs_verifier_error(struct xfs_buf *bp, int error, xfs_failaddr_t failaddr); 20 + void xfs_inode_verifier_error(struct xfs_inode *ip, int error, const char *name, 21 + const void *buf, size_t bufsz, xfs_failaddr_t failaddr); 27 22 28 23 #define XFS_ERROR_REPORT(e, lvl, mp) \ 29 24 xfs_error_report(e, lvl, mp, __FILE__, __LINE__, __return_address) ··· 34 39 #define XFS_CORRUPTION_DUMP_LEN (128) 35 40 36 41 #ifdef DEBUG 37 - extern int xfs_errortag_init(struct xfs_mount *mp); 38 - extern void xfs_errortag_del(struct xfs_mount *mp); 39 - extern bool xfs_errortag_test(struct xfs_mount *mp, const char *expression, 40 - const char *file, int line, unsigned int error_tag); 41 - #define XFS_TEST_ERROR(expr, mp, tag) \ 42 - ((expr) || xfs_errortag_test((mp), #expr, __FILE__, __LINE__, (tag))) 42 + int xfs_errortag_init(struct xfs_mount *mp); 43 + void xfs_errortag_del(struct xfs_mount *mp); 44 + bool xfs_errortag_test(struct xfs_mount *mp, const char *file, int line, 45 + unsigned int error_tag); 46 + #define XFS_TEST_ERROR(mp, tag) \ 47 + xfs_errortag_test((mp), __FILE__, __LINE__, (tag)) 43 48 bool xfs_errortag_enabled(struct xfs_mount *mp, unsigned int tag); 44 49 #define XFS_ERRORTAG_DELAY(mp, tag) \ 45 50 do { \ ··· 53 58 mdelay((mp)->m_errortag[(tag)]); \ 54 59 } while (0) 55 60 56 - extern int xfs_errortag_get(struct xfs_mount *mp, unsigned int error_tag); 57 - extern int xfs_errortag_set(struct xfs_mount *mp, unsigned int error_tag, 58 - unsigned int tag_value); 59 - extern int xfs_errortag_add(struct xfs_mount *mp, unsigned int error_tag); 60 - extern int xfs_errortag_clearall(struct xfs_mount *mp); 61 + int xfs_errortag_add(struct xfs_mount *mp, unsigned int error_tag); 62 + int xfs_errortag_clearall(struct xfs_mount *mp); 61 63 #else 62 64 #define xfs_errortag_init(mp) (0) 63 65 #define xfs_errortag_del(mp) 64 - #define XFS_TEST_ERROR(expr, mp, tag) (expr) 66 + #define XFS_TEST_ERROR(mp, tag) (false) 65 67 #define XFS_ERRORTAG_DELAY(mp, tag) ((void)0) 66 - #define xfs_errortag_set(mp, tag, val) (ENOSYS) 67 68 #define xfs_errortag_add(mp, tag) (ENOSYS) 68 69 #define xfs_errortag_clearall(mp) (ENOSYS) 69 70 #endif /* DEBUG */

+2 -2

fs/xfs/xfs_extfree_item.c

··· 202 202 sizeof(struct xfs_extent)); 203 203 return 0; 204 204 } else if (buf->iov_len == len32) { 205 - xfs_efi_log_format_32_t *src_efi_fmt_32 = buf->iov_base; 205 + struct xfs_efi_log_format_32 *src_efi_fmt_32 = buf->iov_base; 206 206 207 207 dst_efi_fmt->efi_type = src_efi_fmt_32->efi_type; 208 208 dst_efi_fmt->efi_size = src_efi_fmt_32->efi_size; ··· 216 216 } 217 217 return 0; 218 218 } else if (buf->iov_len == len64) { 219 - xfs_efi_log_format_64_t *src_efi_fmt_64 = buf->iov_base; 219 + struct xfs_efi_log_format_64 *src_efi_fmt_64 = buf->iov_base; 220 220 221 221 dst_efi_fmt->efi_type = src_efi_fmt_64->efi_type; 222 222 dst_efi_fmt->efi_size = src_efi_fmt_64->efi_size;

+2 -2

fs/xfs/xfs_extfree_item.h

··· 49 49 struct xfs_log_item efi_item; 50 50 atomic_t efi_refcount; 51 51 atomic_t efi_next_extent; 52 - xfs_efi_log_format_t efi_format; 52 + struct xfs_efi_log_format efi_format; 53 53 }; 54 54 55 55 static inline size_t ··· 69 69 struct xfs_log_item efd_item; 70 70 struct xfs_efi_log_item *efd_efip; 71 71 uint efd_next_extent; 72 - xfs_efd_log_format_t efd_format; 72 + struct xfs_efd_log_format efd_format; 73 73 }; 74 74 75 75 static inline size_t

+34 -41

fs/xfs/xfs_file.c

··· 75 75 return xfs_log_force_inode(ip); 76 76 } 77 77 78 - static xfs_csn_t 79 - xfs_fsync_seq( 80 - struct xfs_inode *ip, 81 - bool datasync) 82 - { 83 - if (!xfs_ipincount(ip)) 84 - return 0; 85 - if (datasync && !(ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP)) 86 - return 0; 87 - return ip->i_itemp->ili_commit_seq; 88 - } 89 - 90 78 /* 91 - * All metadata updates are logged, which means that we just have to flush the 92 - * log up to the latest LSN that touched the inode. 79 + * All metadata updates are logged, which means that we just have to push the 80 + * journal to the required sequence number than holds the updates. We track 81 + * datasync commits separately to full sync commits, and hence only need to 82 + * select the correct sequence number for the log force here. 93 83 * 94 - * If we have concurrent fsync/fdatasync() calls, we need them to all block on 95 - * the log force before we clear the ili_fsync_fields field. This ensures that 96 - * we don't get a racing sync operation that does not wait for the metadata to 97 - * hit the journal before returning. If we race with clearing ili_fsync_fields, 98 - * then all that will happen is the log force will do nothing as the lsn will 99 - * already be on disk. We can't race with setting ili_fsync_fields because that 100 - * is done under XFS_ILOCK_EXCL, and that can't happen because we hold the lock 101 - * shared until after the ili_fsync_fields is cleared. 84 + * We don't have to serialise against concurrent modifications, as we do not 85 + * have to wait for modifications that have not yet completed. We define a 86 + * transaction commit as completing when the commit sequence number is updated, 87 + * hence if the sequence number has not updated, the sync operation has been 88 + * run before the commit completed and we don't have to wait for it. 89 + * 90 + * If we have concurrent fsync/fdatasync() calls, the sequence numbers remain 91 + * set on the log item until - at least - the journal flush completes. In 92 + * reality, they are only cleared when the inode is fully unpinned (i.e. 93 + * persistent in the journal and not dirty in the CIL), and so we rely on 94 + * xfs_log_force_seq() either skipping sequences that have been persisted or 95 + * waiting on sequences that are still in flight to correctly order concurrent 96 + * sync operations. 102 97 */ 103 - static int 98 + static int 104 99 xfs_fsync_flush_log( 105 100 struct xfs_inode *ip, 106 101 bool datasync, 107 102 int *log_flushed) 108 103 { 109 - int error = 0; 110 - xfs_csn_t seq; 104 + struct xfs_inode_log_item *iip = ip->i_itemp; 105 + xfs_csn_t seq = 0; 111 106 112 - xfs_ilock(ip, XFS_ILOCK_SHARED); 113 - seq = xfs_fsync_seq(ip, datasync); 114 - if (seq) { 115 - error = xfs_log_force_seq(ip->i_mount, seq, XFS_LOG_SYNC, 107 + spin_lock(&iip->ili_lock); 108 + if (datasync) 109 + seq = iip->ili_datasync_seq; 110 + else 111 + seq = iip->ili_commit_seq; 112 + spin_unlock(&iip->ili_lock); 113 + 114 + if (!seq) 115 + return 0; 116 + 117 + return xfs_log_force_seq(ip->i_mount, seq, XFS_LOG_SYNC, 116 118 log_flushed); 117 - 118 - spin_lock(&ip->i_itemp->ili_lock); 119 - ip->i_itemp->ili_fsync_fields = 0; 120 - spin_unlock(&ip->i_itemp->ili_lock); 121 - } 122 - xfs_iunlock(ip, XFS_ILOCK_SHARED); 123 - return error; 124 119 } 125 120 126 121 STATIC int ··· 153 158 error = blkdev_issue_flush(mp->m_ddev_targp->bt_bdev); 154 159 155 160 /* 156 - * Any inode that has dirty modifications in the log is pinned. The 157 - * racy check here for a pinned inode will not catch modifications 158 - * that happen concurrently to the fsync call, but fsync semantics 159 - * only require to sync previously completed I/O. 161 + * If the inode has a inode log item attached, it may need the journal 162 + * flushed to persist any changes the log item might be tracking. 160 163 */ 161 - if (xfs_ipincount(ip)) { 164 + if (ip->i_itemp) { 162 165 err2 = xfs_fsync_flush_log(ip, datasync, &log_flushed); 163 166 if (err2 && !error) 164 167 error = err2;

-2

fs/xfs/xfs_globals.c

··· 14 14 */ 15 15 xfs_param_t xfs_params = { 16 16 /* MIN DFLT MAX */ 17 - .sgid_inherit = { 0, 0, 1 }, 18 - .symlink_mode = { 0, 0, 1 }, 19 17 .panic_mask = { 0, 0, XFS_PTAG_MASK}, 20 18 .error_level = { 0, 3, 11 }, 21 19 .syncd_timer = { 1*100, 30*100, 7200*100},

+2 -4

fs/xfs/xfs_icache.c

··· 646 646 goto out_destroy; 647 647 648 648 /* 649 - * For version 5 superblocks, if we are initialising a new inode and we 650 - * are not utilising the XFS_FEAT_IKEEP inode cluster mode, we can 649 + * For version 5 superblocks, if we are initialising a new inode, we 651 650 * simply build the new inode core with a random generation number. 652 651 * 653 652 * For version 4 (and older) superblocks, log recovery is dependent on ··· 654 655 * value and hence we must also read the inode off disk even when 655 656 * initializing new inodes. 656 657 */ 657 - if (xfs_has_v3inodes(mp) && 658 - (flags & XFS_IGET_CREATE) && !xfs_has_ikeep(mp)) { 658 + if (xfs_has_v3inodes(mp) && (flags & XFS_IGET_CREATE)) { 659 659 VFS_I(ip)->i_generation = get_random_u32(); 660 660 } else { 661 661 struct xfs_buf *bp;

+64 -53

fs/xfs/xfs_inode.c

··· 877 877 return error; 878 878 } 879 879 880 + static inline int 881 + xfs_projid_differ( 882 + struct xfs_inode *tdp, 883 + struct xfs_inode *sip) 884 + { 885 + /* 886 + * If we are using project inheritance, we only allow hard link/renames 887 + * creation in our tree when the project IDs are the same; else 888 + * the tree quota mechanism could be circumvented. 889 + */ 890 + if (unlikely((tdp->i_diflags & XFS_DIFLAG_PROJINHERIT) && 891 + tdp->i_projid != sip->i_projid)) { 892 + /* 893 + * Project quota setup skips special files which can 894 + * leave inodes in a PROJINHERIT directory without a 895 + * project ID set. We need to allow links to be made 896 + * to these "project-less" inodes because userspace 897 + * expects them to succeed after project ID setup, 898 + * but everything else should be rejected. 899 + */ 900 + if (!special_file(VFS_I(sip)->i_mode) || 901 + sip->i_projid != 0) { 902 + return -EXDEV; 903 + } 904 + } 905 + 906 + return 0; 907 + } 908 + 880 909 int 881 910 xfs_link( 882 911 struct xfs_inode *tdp, ··· 959 930 goto error_return; 960 931 } 961 932 962 - /* 963 - * If we are using project inheritance, we only allow hard link 964 - * creation in our tree when the project IDs are the same; else 965 - * the tree quota mechanism could be circumvented. 966 - */ 967 - if (unlikely((tdp->i_diflags & XFS_DIFLAG_PROJINHERIT) && 968 - tdp->i_projid != sip->i_projid)) { 969 - /* 970 - * Project quota setup skips special files which can 971 - * leave inodes in a PROJINHERIT directory without a 972 - * project ID set. We need to allow links to be made 973 - * to these "project-less" inodes because userspace 974 - * expects them to succeed after project ID setup, 975 - * but everything else should be rejected. 976 - */ 977 - if (!special_file(VFS_I(sip)->i_mode) || 978 - sip->i_projid != 0) { 979 - error = -EXDEV; 980 - goto error_return; 981 - } 982 - } 933 + error = xfs_projid_differ(tdp, sip); 934 + if (error) 935 + goto error_return; 983 936 984 937 error = xfs_dir_add_child(tp, resblks, &du); 985 938 if (error) ··· 1667 1656 spin_lock(&iip->ili_lock); 1668 1657 iip->ili_last_fields = iip->ili_fields; 1669 1658 iip->ili_fields = 0; 1670 - iip->ili_fsync_fields = 0; 1671 1659 spin_unlock(&iip->ili_lock); 1672 1660 ASSERT(iip->ili_last_fields); 1673 1661 ··· 1831 1821 xfs_iunpin( 1832 1822 struct xfs_inode *ip) 1833 1823 { 1834 - xfs_assert_ilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED); 1824 + struct xfs_inode_log_item *iip = ip->i_itemp; 1825 + xfs_csn_t seq = 0; 1835 1826 1836 1827 trace_xfs_inode_unpin_nowait(ip, _RET_IP_); 1828 + xfs_assert_ilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED); 1829 + 1830 + spin_lock(&iip->ili_lock); 1831 + seq = iip->ili_commit_seq; 1832 + spin_unlock(&iip->ili_lock); 1833 + if (!seq) 1834 + return; 1837 1835 1838 1836 /* Give the log a push to start the unpinning I/O */ 1839 - xfs_log_force_seq(ip->i_mount, ip->i_itemp->ili_commit_seq, 0, NULL); 1837 + xfs_log_force_seq(ip->i_mount, seq, 0, NULL); 1840 1838 1841 1839 } 1842 1840 ··· 2245 2227 if (du_wip.ip) 2246 2228 xfs_trans_ijoin(tp, du_wip.ip, 0); 2247 2229 2248 - /* 2249 - * If we are using project inheritance, we only allow renames 2250 - * into our tree when the project IDs are the same; else the 2251 - * tree quota mechanism would be circumvented. 2252 - */ 2253 - if (unlikely((target_dp->i_diflags & XFS_DIFLAG_PROJINHERIT) && 2254 - target_dp->i_projid != src_ip->i_projid)) { 2255 - error = -EXDEV; 2230 + error = xfs_projid_differ(target_dp, src_ip); 2231 + if (error) 2256 2232 goto out_trans_cancel; 2257 - } 2258 2233 2259 2234 /* RENAME_EXCHANGE is unique from here on. */ 2260 2235 if (flags & RENAME_EXCHANGE) { ··· 2388 2377 * error handling as the caller will shutdown and fail the buffer. 2389 2378 */ 2390 2379 error = -EFSCORRUPTED; 2391 - if (XFS_TEST_ERROR(dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC), 2392 - mp, XFS_ERRTAG_IFLUSH_1)) { 2380 + if (dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC) || 2381 + XFS_TEST_ERROR(mp, XFS_ERRTAG_IFLUSH_1)) { 2393 2382 xfs_alert_tag(mp, XFS_PTAG_IFLUSH, 2394 2383 "%s: Bad inode %llu magic number 0x%x, ptr "PTR_FMT, 2395 2384 __func__, ip->i_ino, be16_to_cpu(dip->di_magic), dip); ··· 2405 2394 goto flush_out; 2406 2395 } 2407 2396 } else if (S_ISREG(VFS_I(ip)->i_mode)) { 2408 - if (XFS_TEST_ERROR( 2409 - ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && 2410 - ip->i_df.if_format != XFS_DINODE_FMT_BTREE, 2411 - mp, XFS_ERRTAG_IFLUSH_3)) { 2397 + if ((ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && 2398 + ip->i_df.if_format != XFS_DINODE_FMT_BTREE) || 2399 + XFS_TEST_ERROR(mp, XFS_ERRTAG_IFLUSH_3)) { 2412 2400 xfs_alert_tag(mp, XFS_PTAG_IFLUSH, 2413 2401 "%s: Bad regular inode %llu, ptr "PTR_FMT, 2414 2402 __func__, ip->i_ino, ip); 2415 2403 goto flush_out; 2416 2404 } 2417 2405 } else if (S_ISDIR(VFS_I(ip)->i_mode)) { 2418 - if (XFS_TEST_ERROR( 2419 - ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && 2420 - ip->i_df.if_format != XFS_DINODE_FMT_BTREE && 2421 - ip->i_df.if_format != XFS_DINODE_FMT_LOCAL, 2422 - mp, XFS_ERRTAG_IFLUSH_4)) { 2406 + if ((ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && 2407 + ip->i_df.if_format != XFS_DINODE_FMT_BTREE && 2408 + ip->i_df.if_format != XFS_DINODE_FMT_LOCAL) || 2409 + XFS_TEST_ERROR(mp, XFS_ERRTAG_IFLUSH_4)) { 2423 2410 xfs_alert_tag(mp, XFS_PTAG_IFLUSH, 2424 2411 "%s: Bad directory inode %llu, ptr "PTR_FMT, 2425 2412 __func__, ip->i_ino, ip); 2426 2413 goto flush_out; 2427 2414 } 2428 2415 } 2429 - if (XFS_TEST_ERROR(ip->i_df.if_nextents + xfs_ifork_nextents(&ip->i_af) > 2430 - ip->i_nblocks, mp, XFS_ERRTAG_IFLUSH_5)) { 2416 + if (ip->i_df.if_nextents + xfs_ifork_nextents(&ip->i_af) > 2417 + ip->i_nblocks || XFS_TEST_ERROR(mp, XFS_ERRTAG_IFLUSH_5)) { 2431 2418 xfs_alert_tag(mp, XFS_PTAG_IFLUSH, 2432 2419 "%s: detected corrupt incore inode %llu, " 2433 2420 "total extents = %llu nblocks = %lld, ptr "PTR_FMT, ··· 2434 2425 ip->i_nblocks, ip); 2435 2426 goto flush_out; 2436 2427 } 2437 - if (XFS_TEST_ERROR(ip->i_forkoff > mp->m_sb.sb_inodesize, 2438 - mp, XFS_ERRTAG_IFLUSH_6)) { 2428 + if (ip->i_forkoff > mp->m_sb.sb_inodesize || 2429 + XFS_TEST_ERROR(mp, XFS_ERRTAG_IFLUSH_6)) { 2439 2430 xfs_alert_tag(mp, XFS_PTAG_IFLUSH, 2440 2431 "%s: bad inode %llu, forkoff 0x%x, ptr "PTR_FMT, 2441 2432 __func__, ip->i_ino, ip->i_forkoff, ip); ··· 2511 2502 spin_lock(&iip->ili_lock); 2512 2503 iip->ili_last_fields = iip->ili_fields; 2513 2504 iip->ili_fields = 0; 2514 - iip->ili_fsync_fields = 0; 2515 2505 set_bit(XFS_LI_FLUSHING, &iip->ili_item.li_flags); 2516 2506 spin_unlock(&iip->ili_lock); 2517 2507 ··· 2669 2661 xfs_log_force_inode( 2670 2662 struct xfs_inode *ip) 2671 2663 { 2664 + struct xfs_inode_log_item *iip = ip->i_itemp; 2672 2665 xfs_csn_t seq = 0; 2673 2666 2674 - xfs_ilock(ip, XFS_ILOCK_SHARED); 2675 - if (xfs_ipincount(ip)) 2676 - seq = ip->i_itemp->ili_commit_seq; 2677 - xfs_iunlock(ip, XFS_ILOCK_SHARED); 2667 + if (!iip) 2668 + return 0; 2669 + 2670 + spin_lock(&iip->ili_lock); 2671 + seq = iip->ili_commit_seq; 2672 + spin_unlock(&iip->ili_lock); 2678 2673 2679 2674 if (!seq) 2680 2675 return 0;

+79 -46

fs/xfs/xfs_inode_item.c

··· 131 131 } 132 132 133 133 /* 134 - * Inode verifiers do not check that the extent size hint is an integer 135 - * multiple of the rt extent size on a directory with both rtinherit 136 - * and extszinherit flags set. If we're logging a directory that is 137 - * misconfigured in this way, clear the hint. 134 + * Inode verifiers do not check that the extent size hints are an 135 + * integer multiple of the rt extent size on a directory with 136 + * rtinherit flags set. If we're logging a directory that is 137 + * misconfigured in this way, clear the bad hints. 138 138 */ 139 - if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && 140 - (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && 141 - xfs_extlen_to_rtxmod(ip->i_mount, ip->i_extsize) > 0) { 142 - ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | 143 - XFS_DIFLAG_EXTSZINHERIT); 144 - ip->i_extsize = 0; 145 - flags |= XFS_ILOG_CORE; 139 + if (ip->i_diflags & XFS_DIFLAG_RTINHERIT) { 140 + if ((ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && 141 + xfs_extlen_to_rtxmod(ip->i_mount, ip->i_extsize) > 0) { 142 + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | 143 + XFS_DIFLAG_EXTSZINHERIT); 144 + ip->i_extsize = 0; 145 + flags |= XFS_ILOG_CORE; 146 + } 147 + if ((ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && 148 + xfs_extlen_to_rtxmod(ip->i_mount, ip->i_cowextsize) > 0) { 149 + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; 150 + ip->i_cowextsize = 0; 151 + flags |= XFS_ILOG_CORE; 152 + } 146 153 } 147 154 148 - /* 149 - * Record the specific change for fdatasync optimisation. This allows 150 - * fdatasync to skip log forces for inodes that are only timestamp 151 - * dirty. Once we've processed the XFS_ILOG_IVERSION flag, convert it 152 - * to XFS_ILOG_CORE so that the actual on-disk dirty tracking 153 - * (ili_fields) correctly tracks that the version has changed. 154 - */ 155 155 spin_lock(&iip->ili_lock); 156 - iip->ili_fsync_fields |= (flags & ~XFS_ILOG_IVERSION); 157 - if (flags & XFS_ILOG_IVERSION) 158 - flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE); 159 - 160 - /* 161 - * Inode verifiers do not check that the CoW extent size hint is an 162 - * integer multiple of the rt extent size on a directory with both 163 - * rtinherit and cowextsize flags set. If we're logging a directory 164 - * that is misconfigured in this way, clear the hint. 165 - */ 166 - if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && 167 - (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && 168 - xfs_extlen_to_rtxmod(ip->i_mount, ip->i_cowextsize) > 0) { 169 - ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; 170 - ip->i_cowextsize = 0; 171 - flags |= XFS_ILOG_CORE; 172 - } 173 - 174 156 if (!iip->ili_item.li_buf) { 175 157 struct xfs_buf *bp; 176 158 int error; ··· 187 205 } 188 206 189 207 /* 208 + * Store the dirty flags back into the inode item as this state is used 209 + * later on in xfs_inode_item_committing() to determine whether the 210 + * transaction is relevant to fsync state or not. 211 + */ 212 + iip->ili_dirty_flags = flags; 213 + 214 + /* 215 + * Convert the flags on-disk fields that have been modified in the 216 + * transaction so that ili_fields tracks the changes correctly. 217 + */ 218 + if (flags & XFS_ILOG_IVERSION) 219 + flags = ((flags & ~XFS_ILOG_IVERSION) | XFS_ILOG_CORE); 220 + 221 + /* 190 222 * Always OR in the bits from the ili_last_fields field. This is to 191 223 * coordinate with the xfs_iflush() and xfs_buf_inode_iodone() routines 192 224 * in the eventual clearing of the ili_fields bits. See the big comment ··· 210 214 spin_unlock(&iip->ili_lock); 211 215 212 216 xfs_inode_item_precommit_check(ip); 213 - 214 - /* 215 - * We are done with the log item transaction dirty state, so clear it so 216 - * that it doesn't pollute future transactions. 217 - */ 218 - iip->ili_dirty_flags = 0; 219 217 return 0; 220 218 } 221 219 ··· 719 729 struct xfs_log_item *lip, 720 730 int remove) 721 731 { 722 - struct xfs_inode *ip = INODE_ITEM(lip)->ili_inode; 732 + struct xfs_inode_log_item *iip = INODE_ITEM(lip); 733 + struct xfs_inode *ip = iip->ili_inode; 723 734 724 735 trace_xfs_inode_unpin(ip, _RET_IP_); 725 736 ASSERT(lip->li_buf || xfs_iflags_test(ip, XFS_ISTALE)); 726 737 ASSERT(atomic_read(&ip->i_pincount) > 0); 727 - if (atomic_dec_and_test(&ip->i_pincount)) 738 + 739 + /* 740 + * If this is the last unpin, then the inode no longer needs a journal 741 + * flush to persist it. Hence we can clear the commit sequence numbers 742 + * as a fsync/fdatasync operation on the inode at this point is a no-op. 743 + */ 744 + if (atomic_dec_and_lock(&ip->i_pincount, &iip->ili_lock)) { 745 + iip->ili_commit_seq = 0; 746 + iip->ili_datasync_seq = 0; 747 + spin_unlock(&iip->ili_lock); 728 748 wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT); 749 + } 729 750 } 730 751 731 752 STATIC uint ··· 859 858 return lsn; 860 859 } 861 860 861 + /* 862 + * The modification is now complete, so before we unlock the inode we need to 863 + * update the commit sequence numbers for data integrity journal flushes. We 864 + * always record the commit sequence number (ili_commit_seq) so that anything 865 + * that needs a full journal sync will capture all of this modification. 866 + * 867 + * We then 868 + * check if the changes will impact a datasync (O_DSYNC) journal flush. If the 869 + * changes will require a datasync flush, then we also record the sequence in 870 + * ili_datasync_seq. 871 + * 872 + * These commit sequence numbers will get cleared atomically with the inode being 873 + * unpinned (i.e. pin count goes to zero), and so it will only be set when the 874 + * inode is dirty in the journal. This removes the need for checking if the 875 + * inode is pinned to determine if a journal flush is necessary, and hence 876 + * removes the need for holding the ILOCK_SHARED in xfs_file_fsync() to 877 + * serialise pin counts against commit sequence number updates. 878 + * 879 + */ 862 880 STATIC void 863 881 xfs_inode_item_committing( 864 882 struct xfs_log_item *lip, 865 883 xfs_csn_t seq) 866 884 { 867 - INODE_ITEM(lip)->ili_commit_seq = seq; 885 + struct xfs_inode_log_item *iip = INODE_ITEM(lip); 886 + 887 + spin_lock(&iip->ili_lock); 888 + iip->ili_commit_seq = seq; 889 + if (iip->ili_dirty_flags & ~(XFS_ILOG_IVERSION | XFS_ILOG_TIMESTAMP)) 890 + iip->ili_datasync_seq = seq; 891 + spin_unlock(&iip->ili_lock); 892 + 893 + /* 894 + * Clear the per-transaction dirty flags now that we have finished 895 + * recording the transaction's inode modifications in the CIL and are 896 + * about to release and (maybe) unlock the inode. 897 + */ 898 + iip->ili_dirty_flags = 0; 899 + 868 900 return xfs_inode_item_release(lip); 869 901 } 870 902 ··· 1089 1055 { 1090 1056 iip->ili_last_fields = 0; 1091 1057 iip->ili_fields = 0; 1092 - iip->ili_fsync_fields = 0; 1093 1058 iip->ili_flush_lsn = 0; 1094 1059 iip->ili_item.li_buf = NULL; 1095 1060 list_del_init(&iip->ili_item.li_bio_list);

+9 -1

fs/xfs/xfs_inode_item.h

··· 32 32 spinlock_t ili_lock; /* flush state lock */ 33 33 unsigned int ili_last_fields; /* fields when flushed */ 34 34 unsigned int ili_fields; /* fields to be logged */ 35 - unsigned int ili_fsync_fields; /* logged since last fsync */ 36 35 xfs_lsn_t ili_flush_lsn; /* lsn at last flush */ 36 + 37 + /* 38 + * We record the sequence number for every inode modification, as 39 + * well as those that only require fdatasync operations for data 40 + * integrity. This allows optimisation of the O_DSYNC/fdatasync path 41 + * without needing to track what modifications the journal is currently 42 + * carrying for the inode. These are protected by the above ili_lock. 43 + */ 37 44 xfs_csn_t ili_commit_seq; /* last transaction commit */ 45 + xfs_csn_t ili_datasync_seq; /* for datasync optimisation */ 38 46 }; 39 47 40 48 static inline int xfs_inode_clean(struct xfs_inode *ip)

+9 -15

fs/xfs/xfs_ioctl.c

··· 512 512 { 513 513 struct xfs_inode *ip = XFS_I(d_inode(dentry)); 514 514 515 - if (d_is_special(dentry)) 516 - return -ENOTTY; 517 - 518 515 xfs_ilock(ip, XFS_ILOCK_SHARED); 519 516 xfs_fill_fsxattr(ip, XFS_DATA_FORK, fa); 520 517 xfs_iunlock(ip, XFS_ILOCK_SHARED); ··· 732 735 int error; 733 736 734 737 trace_xfs_ioctl_setattr(ip); 735 - 736 - if (d_is_special(dentry)) 737 - return -ENOTTY; 738 738 739 739 if (!fa->fsx_valid) { 740 740 if (fa->flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | ··· 1203 1209 current->comm); 1204 1210 return -ENOTTY; 1205 1211 case XFS_IOC_DIOINFO: { 1206 - struct xfs_buftarg *target = xfs_inode_buftarg(ip); 1212 + struct kstat st; 1207 1213 struct dioattr da; 1208 1214 1209 - da.d_mem = target->bt_logical_sectorsize; 1215 + error = vfs_getattr(&filp->f_path, &st, STATX_DIOALIGN, 0); 1216 + if (error) 1217 + return error; 1210 1218 1211 1219 /* 1212 - * See xfs_report_dioalign() for an explanation about why this 1213 - * reports a value larger than the sector size for COW inodes. 1220 + * Some userspace directly feeds the return value to 1221 + * posix_memalign, which fails for values that are smaller than 1222 + * the pointer size. Round up the value to not break userspace. 1214 1223 */ 1215 - if (xfs_is_cow_inode(ip)) 1216 - da.d_miniosz = xfs_inode_alloc_unitsize(ip); 1217 - else 1218 - da.d_miniosz = target->bt_logical_sectorsize; 1224 + da.d_mem = roundup(st.dio_mem_align, sizeof(void *)); 1225 + da.d_miniosz = st.dio_offset_align; 1219 1226 da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1); 1220 - 1221 1227 if (copy_to_user(arg, &da, sizeof(da))) 1222 1228 return -EFAULT; 1223 1229 return 0;

+14 -5

fs/xfs/xfs_iomap.c

··· 149 149 iomap->bdev = target->bt_bdev; 150 150 iomap->flags = iomap_flags; 151 151 152 - if (xfs_ipincount(ip) && 153 - (ip->i_itemp->ili_fsync_fields & ~XFS_ILOG_TIMESTAMP)) 154 - iomap->flags |= IOMAP_F_DIRTY; 152 + /* 153 + * If the inode is dirty for datasync purposes, let iomap know so it 154 + * doesn't elide the IO completion journal flushes on O_DSYNC IO. 155 + */ 156 + if (ip->i_itemp) { 157 + struct xfs_inode_log_item *iip = ip->i_itemp; 158 + 159 + spin_lock(&iip->ili_lock); 160 + if (iip->ili_datasync_seq) 161 + iomap->flags |= IOMAP_F_DIRTY; 162 + spin_unlock(&iip->ili_lock); 163 + } 155 164 156 165 iomap->validity_cookie = sequence_cookie; 157 166 return 0; ··· 1563 1554 return error; 1564 1555 1565 1556 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(&ip->i_df)) || 1566 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 1557 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 1567 1558 xfs_bmap_mark_sick(ip, XFS_DATA_FORK); 1568 1559 error = -EFSCORRUPTED; 1569 1560 goto out_unlock; ··· 1737 1728 return error; 1738 1729 1739 1730 if (XFS_IS_CORRUPT(mp, !xfs_ifork_has_extents(&ip->i_df)) || 1740 - XFS_TEST_ERROR(false, mp, XFS_ERRTAG_BMAPIFORMAT)) { 1731 + XFS_TEST_ERROR(mp, XFS_ERRTAG_BMAPIFORMAT)) { 1741 1732 xfs_bmap_mark_sick(ip, XFS_DATA_FORK); 1742 1733 error = -EFSCORRUPTED; 1743 1734 goto out_unlock;

+7 -7

fs/xfs/xfs_iops.c

··· 431 431 struct dentry *dentry, 432 432 const char *symname) 433 433 { 434 - struct inode *inode; 435 - struct xfs_inode *cip = NULL; 436 - struct xfs_name name; 437 - int error; 438 - umode_t mode; 434 + struct inode *inode; 435 + struct xfs_inode *cip = NULL; 436 + struct xfs_name name; 437 + int error; 438 + umode_t mode = S_IFLNK | S_IRWXUGO; 439 439 440 - mode = S_IFLNK | 441 - (irix_symlink_mode ? 0777 & ~current_umask() : S_IRWXUGO); 442 440 error = xfs_dentry_mode_to_name(&name, dentry, mode); 443 441 if (unlikely(error)) 444 442 goto out; ··· 1333 1335 .setattr = xfs_vn_setattr, 1334 1336 .listxattr = xfs_vn_listxattr, 1335 1337 .update_time = xfs_vn_update_time, 1338 + .fileattr_get = xfs_fileattr_get, 1339 + .fileattr_set = xfs_fileattr_set, 1336 1340 }; 1337 1341 1338 1342 /* Figure out if this file actually supports DAX. */

-2

fs/xfs/xfs_linux.h

··· 89 89 #undef XFS_NATIVE_HOST 90 90 #endif 91 91 92 - #define irix_sgid_inherit xfs_params.sgid_inherit.val 93 - #define irix_symlink_mode xfs_params.symlink_mode.val 94 92 #define xfs_panic_mask xfs_params.panic_mask.val 95 93 #define xfs_error_level xfs_params.error_level.val 96 94 #define xfs_syncd_centisecs xfs_params.syncd_timer.val

+18 -17

fs/xfs/xfs_log.c

··· 969 969 * counters will be recalculated. Refer to xlog_check_unmount_rec for 970 970 * more details. 971 971 */ 972 - if (XFS_TEST_ERROR(xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS), mp, 973 - XFS_ERRTAG_FORCE_SUMMARY_RECALC)) { 972 + if (xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS) || 973 + XFS_TEST_ERROR(mp, XFS_ERRTAG_FORCE_SUMMARY_RECALC)) { 974 974 xfs_alert(mp, "%s: will fix summary counters at next mount", 975 975 __func__); 976 976 return; ··· 1240 1240 /* 1241 1241 * Race to shutdown the filesystem if we see an error. 1242 1242 */ 1243 - if (XFS_TEST_ERROR(error, log->l_mp, XFS_ERRTAG_IODONE_IOERR)) { 1243 + if (error || XFS_TEST_ERROR(log->l_mp, XFS_ERRTAG_IODONE_IOERR)) { 1244 1244 xfs_alert(log->l_mp, "log I/O error %d", error); 1245 1245 xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); 1246 1246 } ··· 1567 1567 struct xlog *log, 1568 1568 struct xlog_rec_header *rhead, 1569 1569 char *dp, 1570 - int size) 1570 + unsigned int hdrsize, 1571 + unsigned int size) 1571 1572 { 1572 1573 uint32_t crc; 1573 1574 1574 1575 /* first generate the crc for the record header ... */ 1575 - crc = xfs_start_cksum_update((char *)rhead, 1576 - sizeof(struct xlog_rec_header), 1576 + crc = xfs_start_cksum_update((char *)rhead, hdrsize, 1577 1577 offsetof(struct xlog_rec_header, h_crc)); 1578 1578 1579 1579 /* ... then for additional cycle data for v2 logs ... */ ··· 1817 1817 1818 1818 /* calculcate the checksum */ 1819 1819 iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header, 1820 - iclog->ic_datap, size); 1820 + iclog->ic_datap, XLOG_REC_SIZE, size); 1821 1821 /* 1822 1822 * Intentionally corrupt the log record CRC based on the error injection 1823 1823 * frequency, if defined. This facilitates testing log recovery in the ··· 1826 1826 * detects the bad CRC and attempts to recover. 1827 1827 */ 1828 1828 #ifdef DEBUG 1829 - if (XFS_TEST_ERROR(false, log->l_mp, XFS_ERRTAG_LOG_BAD_CRC)) { 1829 + if (XFS_TEST_ERROR(log->l_mp, XFS_ERRTAG_LOG_BAD_CRC)) { 1830 1830 iclog->ic_header.h_crc &= cpu_to_le32(0xAAAAAAAA); 1831 1831 iclog->ic_fail_crc = true; 1832 1832 xfs_warn(log->l_mp, ··· 2655 2655 * until you know exactly how many bytes get copied. Therefore, wait 2656 2656 * until later to update ic_offset. 2657 2657 * 2658 - * xlog_write() algorithm assumes that at least 2 xlog_op_header_t's 2658 + * xlog_write() algorithm assumes that at least 2 xlog_op_header's 2659 2659 * can fit into remaining data section. 2660 2660 */ 2661 - if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) { 2661 + if (iclog->ic_size - iclog->ic_offset < 2662 + 2 * sizeof(struct xlog_op_header)) { 2662 2663 int error = 0; 2663 2664 2664 2665 xlog_state_switch_iclogs(log, iclog, iclog->ic_size); ··· 3153 3152 */ 3154 3153 3155 3154 /* for trans header */ 3156 - unit_bytes += sizeof(xlog_op_header_t); 3157 - unit_bytes += sizeof(xfs_trans_header_t); 3155 + unit_bytes += sizeof(struct xlog_op_header); 3156 + unit_bytes += sizeof(struct xfs_trans_header); 3158 3157 3159 3158 /* for start-rec */ 3160 - unit_bytes += sizeof(xlog_op_header_t); 3159 + unit_bytes += sizeof(struct xlog_op_header); 3161 3160 3162 3161 /* 3163 3162 * for LR headers - the space for data in an iclog is the size minus ··· 3180 3179 num_headers = howmany(unit_bytes, iclog_space); 3181 3180 3182 3181 /* for split-recs - ophdrs added when data split over LRs */ 3183 - unit_bytes += sizeof(xlog_op_header_t) * num_headers; 3182 + unit_bytes += sizeof(struct xlog_op_header) * num_headers; 3184 3183 3185 3184 /* add extra header reservations if we overrun */ 3186 3185 while (!num_headers || 3187 3186 howmany(unit_bytes, iclog_space) > num_headers) { 3188 - unit_bytes += sizeof(xlog_op_header_t); 3187 + unit_bytes += sizeof(struct xlog_op_header); 3189 3188 num_headers++; 3190 3189 } 3191 3190 unit_bytes += log->l_iclog_hsize * num_headers; ··· 3322 3321 struct xlog_in_core *iclog, 3323 3322 int count) 3324 3323 { 3325 - xlog_op_header_t *ophead; 3324 + struct xlog_op_header *ophead; 3326 3325 xlog_in_core_t *icptr; 3327 3326 xlog_in_core_2_t *xhdr; 3328 3327 void *base_ptr, *ptr, *p; ··· 3400 3399 op_len = be32_to_cpu(iclog->ic_header.h_cycle_data[idx]); 3401 3400 } 3402 3401 } 3403 - ptr += sizeof(xlog_op_header_t) + op_len; 3402 + ptr += sizeof(struct xlog_op_header) + op_len; 3404 3403 } 3405 3404 } 3406 3405 #endif

+37

fs/xfs/xfs_log.h

··· 20 20 int lv_alloc_size; /* size of allocated lv */ 21 21 }; 22 22 23 + /* Region types for iovec's i_type */ 24 + #define XLOG_REG_TYPE_BFORMAT 1 25 + #define XLOG_REG_TYPE_BCHUNK 2 26 + #define XLOG_REG_TYPE_EFI_FORMAT 3 27 + #define XLOG_REG_TYPE_EFD_FORMAT 4 28 + #define XLOG_REG_TYPE_IFORMAT 5 29 + #define XLOG_REG_TYPE_ICORE 6 30 + #define XLOG_REG_TYPE_IEXT 7 31 + #define XLOG_REG_TYPE_IBROOT 8 32 + #define XLOG_REG_TYPE_ILOCAL 9 33 + #define XLOG_REG_TYPE_IATTR_EXT 10 34 + #define XLOG_REG_TYPE_IATTR_BROOT 11 35 + #define XLOG_REG_TYPE_IATTR_LOCAL 12 36 + #define XLOG_REG_TYPE_QFORMAT 13 37 + #define XLOG_REG_TYPE_DQUOT 14 38 + #define XLOG_REG_TYPE_QUOTAOFF 15 39 + #define XLOG_REG_TYPE_LRHEADER 16 40 + #define XLOG_REG_TYPE_UNMOUNT 17 41 + #define XLOG_REG_TYPE_COMMIT 18 42 + #define XLOG_REG_TYPE_TRANSHDR 19 43 + #define XLOG_REG_TYPE_ICREATE 20 44 + #define XLOG_REG_TYPE_RUI_FORMAT 21 45 + #define XLOG_REG_TYPE_RUD_FORMAT 22 46 + #define XLOG_REG_TYPE_CUI_FORMAT 23 47 + #define XLOG_REG_TYPE_CUD_FORMAT 24 48 + #define XLOG_REG_TYPE_BUI_FORMAT 25 49 + #define XLOG_REG_TYPE_BUD_FORMAT 26 50 + #define XLOG_REG_TYPE_ATTRI_FORMAT 27 51 + #define XLOG_REG_TYPE_ATTRD_FORMAT 28 52 + #define XLOG_REG_TYPE_ATTR_NAME 29 53 + #define XLOG_REG_TYPE_ATTR_VALUE 30 54 + #define XLOG_REG_TYPE_XMI_FORMAT 31 55 + #define XLOG_REG_TYPE_XMD_FORMAT 32 56 + #define XLOG_REG_TYPE_ATTR_NEWNAME 33 57 + #define XLOG_REG_TYPE_ATTR_NEWVALUE 34 58 + #define XLOG_REG_TYPE_MAX 34 59 + 23 60 #define XFS_LOG_VEC_ORDERED (-1) 24 61 25 62 /*

+2 -2

fs/xfs/xfs_log_priv.h

··· 499 499 extern void 500 500 xlog_recover_cancel(struct xlog *); 501 501 502 - extern __le32 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, 503 - char *dp, int size); 502 + __le32 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, 503 + char *dp, unsigned int hdrsize, unsigned int size); 504 504 505 505 extern struct kmem_cache *xfs_log_ticket_cache; 506 506 struct xlog_ticket *xlog_ticket_alloc(struct xlog *log, int unit_bytes,

+24 -10

fs/xfs/xfs_log_recover.c

··· 2894 2894 int pass, 2895 2895 struct list_head *buffer_list) 2896 2896 { 2897 - __le32 old_crc = rhead->h_crc; 2898 - __le32 crc; 2897 + __le32 expected_crc = rhead->h_crc, crc, other_crc; 2899 2898 2900 - crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len)); 2899 + crc = xlog_cksum(log, rhead, dp, XLOG_REC_SIZE, 2900 + be32_to_cpu(rhead->h_len)); 2901 + 2902 + /* 2903 + * Look at the end of the struct xlog_rec_header definition in 2904 + * xfs_log_format.h for the glory details. 2905 + */ 2906 + if (expected_crc && crc != expected_crc) { 2907 + other_crc = xlog_cksum(log, rhead, dp, XLOG_REC_SIZE_OTHER, 2908 + be32_to_cpu(rhead->h_len)); 2909 + if (other_crc == expected_crc) { 2910 + xfs_notice_once(log->l_mp, 2911 + "Fixing up incorrect CRC due to padding."); 2912 + crc = other_crc; 2913 + } 2914 + } 2901 2915 2902 2916 /* 2903 2917 * Nothing else to do if this is a CRC verification pass. Just return 2904 2918 * if this a record with a non-zero crc. Unfortunately, mkfs always 2905 - * sets old_crc to 0 so we must consider this valid even on v5 supers. 2906 - * Otherwise, return EFSBADCRC on failure so the callers up the stack 2907 - * know precisely what failed. 2919 + * sets expected_crc to 0 so we must consider this valid even on v5 2920 + * supers. Otherwise, return EFSBADCRC on failure so the callers up the 2921 + * stack know precisely what failed. 2908 2922 */ 2909 2923 if (pass == XLOG_RECOVER_CRCPASS) { 2910 - if (old_crc && crc != old_crc) 2924 + if (expected_crc && crc != expected_crc) 2911 2925 return -EFSBADCRC; 2912 2926 return 0; 2913 2927 } ··· 2932 2918 * zero CRC check prevents warnings from being emitted when upgrading 2933 2919 * the kernel from one that does not add CRCs by default. 2934 2920 */ 2935 - if (crc != old_crc) { 2936 - if (old_crc || xfs_has_crc(log->l_mp)) { 2921 + if (crc != expected_crc) { 2922 + if (expected_crc || xfs_has_crc(log->l_mp)) { 2937 2923 xfs_alert(log->l_mp, 2938 2924 "log record CRC mismatch: found 0x%x, expected 0x%x.", 2939 - le32_to_cpu(old_crc), 2925 + le32_to_cpu(expected_crc), 2940 2926 le32_to_cpu(crc)); 2941 2927 xfs_hex_dump(dp, 32); 2942 2928 }

-13

fs/xfs/xfs_mount.c

··· 1057 1057 xfs_inodegc_start(mp); 1058 1058 xfs_blockgc_start(mp); 1059 1059 1060 - /* 1061 - * Now that we've recovered any pending superblock feature bit 1062 - * additions, we can finish setting up the attr2 behaviour for the 1063 - * mount. The noattr2 option overrides the superblock flag, so only 1064 - * check the superblock feature flag if the mount option is not set. 1065 - */ 1066 - if (xfs_has_noattr2(mp)) { 1067 - mp->m_features &= ~XFS_FEAT_ATTR2; 1068 - } else if (!xfs_has_attr2(mp) && 1069 - (mp->m_sb.sb_features2 & XFS_SB_VERSION2_ATTR2BIT)) { 1070 - mp->m_features |= XFS_FEAT_ATTR2; 1071 - } 1072 - 1073 1060 if (xfs_has_metadir(mp)) { 1074 1061 error = xfs_mount_setup_metadir(mp); 1075 1062 if (error)

+6 -6

fs/xfs/xfs_mount.h

··· 363 363 #define XFS_FEAT_EXTFLG (1ULL << 7) /* unwritten extents */ 364 364 #define XFS_FEAT_ASCIICI (1ULL << 8) /* ASCII only case-insens. */ 365 365 #define XFS_FEAT_LAZYSBCOUNT (1ULL << 9) /* Superblk counters */ 366 - #define XFS_FEAT_ATTR2 (1ULL << 10) /* dynamic attr fork */ 367 366 #define XFS_FEAT_PARENT (1ULL << 11) /* parent pointers */ 368 367 #define XFS_FEAT_PROJID32 (1ULL << 12) /* 32 bit project id */ 369 368 #define XFS_FEAT_CRC (1ULL << 13) /* metadata CRCs */ ··· 385 386 386 387 /* Mount features */ 387 388 #define XFS_FEAT_NOLIFETIME (1ULL << 47) /* disable lifetime hints */ 388 - #define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */ 389 389 #define XFS_FEAT_NOALIGN (1ULL << 49) /* ignore alignment */ 390 390 #define XFS_FEAT_ALLOCSIZE (1ULL << 50) /* user specified allocation size */ 391 391 #define XFS_FEAT_LARGE_IOSIZE (1ULL << 51) /* report large preferred ··· 394 396 #define XFS_FEAT_DISCARD (1ULL << 54) /* discard unused blocks */ 395 397 #define XFS_FEAT_GRPID (1ULL << 55) /* group-ID assigned from directory */ 396 398 #define XFS_FEAT_SMALL_INUMS (1ULL << 56) /* user wants 32bit inodes */ 397 - #define XFS_FEAT_IKEEP (1ULL << 57) /* keep empty inode clusters*/ 398 399 #define XFS_FEAT_SWALLOC (1ULL << 58) /* stripe width allocation */ 399 400 #define XFS_FEAT_FILESTREAMS (1ULL << 59) /* use filestreams allocator */ 400 401 #define XFS_FEAT_DAX_ALWAYS (1ULL << 60) /* DAX always enabled */ ··· 501 504 __XFS_HAS_V4_FEAT(logv2, LOGV2) 502 505 __XFS_HAS_V4_FEAT(extflg, EXTFLG) 503 506 __XFS_HAS_V4_FEAT(lazysbcount, LAZYSBCOUNT) 504 - __XFS_ADD_V4_FEAT(attr2, ATTR2) 505 507 __XFS_ADD_V4_FEAT(projid32, PROJID32) 506 508 __XFS_HAS_V4_FEAT(v3inodes, V3INODES) 507 509 __XFS_HAS_V4_FEAT(crc, CRC) 508 510 __XFS_HAS_V4_FEAT(pquotino, PQUOTINO) 511 + 512 + static inline void xfs_add_attr2(struct xfs_mount *mp) 513 + { 514 + if (IS_ENABLED(CONFIG_XFS_SUPPORT_V4)) 515 + xfs_sb_version_addattr2(&mp->m_sb); 516 + } 509 517 510 518 /* 511 519 * Mount features ··· 519 517 * bit inodes and read-only state, are kept as operational state rather than 520 518 * features. 521 519 */ 522 - __XFS_HAS_FEAT(noattr2, NOATTR2) 523 520 __XFS_HAS_FEAT(noalign, NOALIGN) 524 521 __XFS_HAS_FEAT(allocsize, ALLOCSIZE) 525 522 __XFS_HAS_FEAT(large_iosize, LARGE_IOSIZE) ··· 527 526 __XFS_HAS_FEAT(discard, DISCARD) 528 527 __XFS_HAS_FEAT(grpid, GRPID) 529 528 __XFS_HAS_FEAT(small_inums, SMALL_INUMS) 530 - __XFS_HAS_FEAT(ikeep, IKEEP) 531 529 __XFS_HAS_FEAT(swalloc, SWALLOC) 532 530 __XFS_HAS_FEAT(filestreams, FILESTREAMS) 533 531 __XFS_HAS_FEAT(dax_always, DAX_ALWAYS)

+1 -1

fs/xfs/xfs_notify_failure.c

··· 165 165 uint64_t *bblen) 166 166 { 167 167 u64 dev_start = btp->bt_dax_part_off; 168 - u64 dev_len = bdev_nr_bytes(btp->bt_bdev); 168 + u64 dev_len = BBTOB(btp->bt_nr_sectors); 169 169 u64 dev_end = dev_start + dev_len - 1; 170 170 171 171 /* Notify failure on the whole device. */

+6 -61

fs/xfs/xfs_super.c

··· 105 105 Opt_logbufs, Opt_logbsize, Opt_logdev, Opt_rtdev, 106 106 Opt_wsync, Opt_noalign, Opt_swalloc, Opt_sunit, Opt_swidth, Opt_nouuid, 107 107 Opt_grpid, Opt_nogrpid, Opt_bsdgroups, Opt_sysvgroups, 108 - Opt_allocsize, Opt_norecovery, Opt_inode64, Opt_inode32, Opt_ikeep, 109 - Opt_noikeep, Opt_largeio, Opt_nolargeio, Opt_attr2, Opt_noattr2, 108 + Opt_allocsize, Opt_norecovery, Opt_inode64, Opt_inode32, 109 + Opt_largeio, Opt_nolargeio, 110 110 Opt_filestreams, Opt_quota, Opt_noquota, Opt_usrquota, Opt_grpquota, 111 111 Opt_prjquota, Opt_uquota, Opt_gquota, Opt_pquota, 112 112 Opt_uqnoenforce, Opt_gqnoenforce, Opt_pqnoenforce, Opt_qnoenforce, ··· 133 133 fsparam_flag("norecovery", Opt_norecovery), 134 134 fsparam_flag("inode64", Opt_inode64), 135 135 fsparam_flag("inode32", Opt_inode32), 136 - fsparam_flag("ikeep", Opt_ikeep), 137 - fsparam_flag("noikeep", Opt_noikeep), 138 136 fsparam_flag("largeio", Opt_largeio), 139 137 fsparam_flag("nolargeio", Opt_nolargeio), 140 - fsparam_flag("attr2", Opt_attr2), 141 - fsparam_flag("noattr2", Opt_noattr2), 142 138 fsparam_flag("filestreams", Opt_filestreams), 143 139 fsparam_flag("quota", Opt_quota), 144 140 fsparam_flag("noquota", Opt_noquota), ··· 171 175 { 172 176 static struct proc_xfs_info xfs_info_set[] = { 173 177 /* the few simple ones we can get from the mount struct */ 174 - { XFS_FEAT_IKEEP, ",ikeep" }, 175 178 { XFS_FEAT_WSYNC, ",wsync" }, 176 179 { XFS_FEAT_NOALIGN, ",noalign" }, 177 180 { XFS_FEAT_SWALLOC, ",swalloc" }, 178 181 { XFS_FEAT_NOUUID, ",nouuid" }, 179 182 { XFS_FEAT_NORECOVERY, ",norecovery" }, 180 - { XFS_FEAT_ATTR2, ",attr2" }, 181 183 { XFS_FEAT_FILESTREAMS, ",filestreams" }, 182 184 { XFS_FEAT_GRPID, ",grpid" }, 183 185 { XFS_FEAT_DISCARD, ",discard" }, ··· 535 541 { 536 542 int error; 537 543 538 - error = xfs_configure_buftarg(mp->m_ddev_targp, mp->m_sb.sb_sectsize); 544 + error = xfs_configure_buftarg(mp->m_ddev_targp, mp->m_sb.sb_sectsize, 545 + mp->m_sb.sb_dblocks); 539 546 if (error) 540 547 return error; 541 548 ··· 546 551 if (xfs_has_sector(mp)) 547 552 log_sector_size = mp->m_sb.sb_logsectsize; 548 553 error = xfs_configure_buftarg(mp->m_logdev_targp, 549 - log_sector_size); 554 + log_sector_size, mp->m_sb.sb_logblocks); 550 555 if (error) 551 556 return error; 552 557 } ··· 560 565 mp->m_rtdev_targp = mp->m_ddev_targp; 561 566 } else if (mp->m_rtname) { 562 567 error = xfs_configure_buftarg(mp->m_rtdev_targp, 563 - mp->m_sb.sb_sectsize); 568 + mp->m_sb.sb_sectsize, mp->m_sb.sb_rblocks); 564 569 if (error) 565 570 return error; 566 571 } ··· 1084 1089 } 1085 1090 1086 1091 /* 1087 - * V5 filesystems always use attr2 format for attributes. 1088 - */ 1089 - if (xfs_has_crc(mp) && xfs_has_noattr2(mp)) { 1090 - xfs_warn(mp, "Cannot mount a V5 filesystem as noattr2. " 1091 - "attr2 is always enabled for V5 filesystems."); 1092 - return -EINVAL; 1093 - } 1094 - 1095 - /* 1096 1092 * prohibit r/w mounts of read-only filesystems 1097 1093 */ 1098 1094 if ((mp->m_sb.sb_flags & XFS_SBF_READONLY) && !xfs_is_readonly(mp)) { ··· 1529 1543 return 0; 1530 1544 #endif 1531 1545 /* Following mount options will be removed in September 2025 */ 1532 - case Opt_ikeep: 1533 - xfs_fs_warn_deprecated(fc, param, XFS_FEAT_IKEEP, true); 1534 - parsing_mp->m_features |= XFS_FEAT_IKEEP; 1535 - return 0; 1536 - case Opt_noikeep: 1537 - xfs_fs_warn_deprecated(fc, param, XFS_FEAT_IKEEP, false); 1538 - parsing_mp->m_features &= ~XFS_FEAT_IKEEP; 1539 - return 0; 1540 - case Opt_attr2: 1541 - xfs_fs_warn_deprecated(fc, param, XFS_FEAT_ATTR2, true); 1542 - parsing_mp->m_features |= XFS_FEAT_ATTR2; 1543 - return 0; 1544 - case Opt_noattr2: 1545 - xfs_fs_warn_deprecated(fc, param, XFS_FEAT_NOATTR2, true); 1546 - parsing_mp->m_features |= XFS_FEAT_NOATTR2; 1547 - return 0; 1548 1546 case Opt_max_open_zones: 1549 1547 parsing_mp->m_max_open_zones = result.uint_32; 1550 1548 return 0; ··· 1563 1593 xfs_warn(mp, "no-recovery mounts must be read-only."); 1564 1594 return -EINVAL; 1565 1595 } 1566 - 1567 - /* 1568 - * We have not read the superblock at this point, so only the attr2 1569 - * mount option can set the attr2 feature by this stage. 1570 - */ 1571 - if (xfs_has_attr2(mp) && xfs_has_noattr2(mp)) { 1572 - xfs_warn(mp, "attr2 and noattr2 cannot both be specified."); 1573 - return -EINVAL; 1574 - } 1575 - 1576 1596 1577 1597 if (xfs_has_noalign(mp) && (mp->m_dalign || mp->m_swidth)) { 1578 1598 xfs_warn(mp, ··· 2137 2177 error = xfs_fs_validate_params(new_mp); 2138 2178 if (error) 2139 2179 return error; 2140 - 2141 - /* attr2 -> noattr2 */ 2142 - if (xfs_has_noattr2(new_mp)) { 2143 - if (xfs_has_crc(mp)) { 2144 - xfs_warn(mp, 2145 - "attr2 is always enabled for a V5 filesystem - can't be changed."); 2146 - return -EINVAL; 2147 - } 2148 - mp->m_features &= ~XFS_FEAT_ATTR2; 2149 - mp->m_features |= XFS_FEAT_NOATTR2; 2150 - } else if (xfs_has_attr2(new_mp)) { 2151 - /* noattr2 -> attr2 */ 2152 - mp->m_features &= ~XFS_FEAT_NOATTR2; 2153 - mp->m_features |= XFS_FEAT_ATTR2; 2154 - } 2155 2180 2156 2181 /* Validate new max_atomic_write option before making other changes */ 2157 2182 if (mp->m_awu_max_bytes != new_mp->m_awu_max_bytes) {

+1 -28

fs/xfs/xfs_sysctl.c

··· 50 50 } 51 51 #endif /* CONFIG_PROC_FS */ 52 52 53 - STATIC int 53 + static inline int 54 54 xfs_deprecated_dointvec_minmax( 55 55 const struct ctl_table *ctl, 56 56 int write, ··· 67 67 } 68 68 69 69 static const struct ctl_table xfs_table[] = { 70 - { 71 - .procname = "irix_sgid_inherit", 72 - .data = &xfs_params.sgid_inherit.val, 73 - .maxlen = sizeof(int), 74 - .mode = 0644, 75 - .proc_handler = xfs_deprecated_dointvec_minmax, 76 - .extra1 = &xfs_params.sgid_inherit.min, 77 - .extra2 = &xfs_params.sgid_inherit.max 78 - }, 79 - { 80 - .procname = "irix_symlink_mode", 81 - .data = &xfs_params.symlink_mode.val, 82 - .maxlen = sizeof(int), 83 - .mode = 0644, 84 - .proc_handler = xfs_deprecated_dointvec_minmax, 85 - .extra1 = &xfs_params.symlink_mode.min, 86 - .extra2 = &xfs_params.symlink_mode.max 87 - }, 88 70 { 89 71 .procname = "panic_mask", 90 72 .data = &xfs_params.panic_mask.val, ··· 164 182 .maxlen = sizeof(int), 165 183 .mode = 0644, 166 184 .proc_handler = proc_dointvec_minmax, 167 - .extra1 = &xfs_params.blockgc_timer.min, 168 - .extra2 = &xfs_params.blockgc_timer.max, 169 - }, 170 - { 171 - .procname = "speculative_cow_prealloc_lifetime", 172 - .data = &xfs_params.blockgc_timer.val, 173 - .maxlen = sizeof(int), 174 - .mode = 0644, 175 - .proc_handler = xfs_deprecated_dointvec_minmax, 176 185 .extra1 = &xfs_params.blockgc_timer.min, 177 186 .extra2 = &xfs_params.blockgc_timer.max, 178 187 },

-3

fs/xfs/xfs_sysctl.h

··· 19 19 } xfs_sysctl_val_t; 20 20 21 21 typedef struct xfs_param { 22 - xfs_sysctl_val_t sgid_inherit; /* Inherit S_ISGID if process' GID is 23 - * not a member of parent dir GID. */ 24 - xfs_sysctl_val_t symlink_mode; /* Link creat mode affected by umask */ 25 22 xfs_sysctl_val_t panic_mask; /* bitmask to cause panic on errors. */ 26 23 xfs_sysctl_val_t error_level; /* Degree of reporting for problems */ 27 24 xfs_sysctl_val_t syncd_timer; /* Interval between xfssyncd wakeups */

+12 -11

fs/xfs/xfs_trans.c

··· 452 452 */ 453 453 STATIC void 454 454 xfs_trans_apply_sb_deltas( 455 - xfs_trans_t *tp) 455 + struct xfs_trans *tp) 456 456 { 457 - struct xfs_dsb *sbp; 458 - struct xfs_buf *bp; 459 - int whole = 0; 460 - 461 - bp = xfs_trans_getsb(tp); 462 - sbp = bp->b_addr; 457 + struct xfs_mount *mp = tp->t_mountp; 458 + struct xfs_buf *bp = xfs_trans_getsb(tp); 459 + struct xfs_dsb *sbp = bp->b_addr; 460 + int whole = 0; 463 461 464 462 /* 465 463 * Only update the superblock counters if we are logging them 466 464 */ 467 - if (!xfs_has_lazysbcount((tp->t_mountp))) { 465 + if (!xfs_has_lazysbcount(mp)) { 468 466 if (tp->t_icount_delta) 469 467 be64_add_cpu(&sbp->sb_icount, tp->t_icount_delta); 470 468 if (tp->t_ifree_delta) ··· 489 491 * write the correct value ondisk. 490 492 */ 491 493 if ((tp->t_frextents_delta || tp->t_res_frextents_delta) && 492 - !xfs_has_rtgroups(tp->t_mountp)) { 493 - struct xfs_mount *mp = tp->t_mountp; 494 + !xfs_has_rtgroups(mp)) { 494 495 int64_t rtxdelta; 495 496 496 497 rtxdelta = tp->t_frextents_delta + tp->t_res_frextents_delta; ··· 502 505 503 506 if (tp->t_dblocks_delta) { 504 507 be64_add_cpu(&sbp->sb_dblocks, tp->t_dblocks_delta); 508 + mp->m_ddev_targp->bt_nr_sectors += 509 + XFS_FSB_TO_BB(mp, tp->t_dblocks_delta); 505 510 whole = 1; 506 511 } 507 512 if (tp->t_agcount_delta) { ··· 523 524 * recompute the ondisk rtgroup block log. The incore values 524 525 * will be recomputed in xfs_trans_unreserve_and_mod_sb. 525 526 */ 526 - if (xfs_has_rtgroups(tp->t_mountp)) { 527 + if (xfs_has_rtgroups(mp)) { 527 528 sbp->sb_rgblklog = xfs_compute_rgblklog( 528 529 be32_to_cpu(sbp->sb_rgextents), 529 530 be32_to_cpu(sbp->sb_rextsize)); ··· 536 537 } 537 538 if (tp->t_rblocks_delta) { 538 539 be64_add_cpu(&sbp->sb_rblocks, tp->t_rblocks_delta); 540 + mp->m_rtdev_targp->bt_nr_sectors += 541 + XFS_FSB_TO_BB(mp, tp->t_rblocks_delta); 539 542 whole = 1; 540 543 } 541 544 if (tp->t_rextents_delta) {

+1 -1

fs/xfs/xfs_trans_ail.c

··· 374 374 * If log item pinning is enabled, skip the push and track the item as 375 375 * pinned. This can help induce head-behind-tail conditions. 376 376 */ 377 - if (XFS_TEST_ERROR(false, ailp->ail_log->l_mp, XFS_ERRTAG_LOG_ITEM_PIN)) 377 + if (XFS_TEST_ERROR(ailp->ail_log->l_mp, XFS_ERRTAG_LOG_ITEM_PIN)) 378 378 return XFS_ITEM_PINNED; 379 379 380 380 /*

+59 -63

fs/xfs/xfs_zone_alloc.c

··· 493 493 return oz; 494 494 } 495 495 496 - /* 497 - * For data with short or medium lifetime, try to colocated it into an 498 - * already open zone with a matching temperature. 499 - */ 500 - static bool 501 - xfs_colocate_eagerly( 502 - enum rw_hint file_hint) 503 - { 504 - switch (file_hint) { 505 - case WRITE_LIFE_MEDIUM: 506 - case WRITE_LIFE_SHORT: 507 - case WRITE_LIFE_NONE: 508 - return true; 509 - default: 510 - return false; 511 - } 512 - } 496 + enum xfs_zone_alloc_score { 497 + /* Any open zone will do it, we're desperate */ 498 + XFS_ZONE_ALLOC_ANY = 0, 513 499 514 - static bool 515 - xfs_good_hint_match( 516 - struct xfs_open_zone *oz, 517 - enum rw_hint file_hint) 518 - { 519 - switch (oz->oz_write_hint) { 520 - case WRITE_LIFE_LONG: 521 - case WRITE_LIFE_EXTREME: 522 - /* colocate long and extreme */ 523 - if (file_hint == WRITE_LIFE_LONG || 524 - file_hint == WRITE_LIFE_EXTREME) 525 - return true; 526 - break; 527 - case WRITE_LIFE_MEDIUM: 528 - /* colocate medium with medium */ 529 - if (file_hint == WRITE_LIFE_MEDIUM) 530 - return true; 531 - break; 532 - case WRITE_LIFE_SHORT: 533 - case WRITE_LIFE_NONE: 534 - case WRITE_LIFE_NOT_SET: 535 - /* colocate short and none */ 536 - if (file_hint <= WRITE_LIFE_SHORT) 537 - return true; 538 - break; 539 - } 540 - return false; 541 - } 500 + /* It better fit somehow */ 501 + XFS_ZONE_ALLOC_OK = 1, 502 + 503 + /* Only reuse a zone if it fits really well. */ 504 + XFS_ZONE_ALLOC_GOOD = 2, 505 + }; 506 + 507 + /* 508 + * Life time hint co-location matrix. Fields not set default to 0 509 + * aka XFS_ZONE_ALLOC_ANY. 510 + */ 511 + static const unsigned int 512 + xfs_zoned_hint_score[WRITE_LIFE_HINT_NR][WRITE_LIFE_HINT_NR] = { 513 + [WRITE_LIFE_NOT_SET] = { 514 + [WRITE_LIFE_NOT_SET] = XFS_ZONE_ALLOC_OK, 515 + }, 516 + [WRITE_LIFE_NONE] = { 517 + [WRITE_LIFE_NONE] = XFS_ZONE_ALLOC_OK, 518 + }, 519 + [WRITE_LIFE_SHORT] = { 520 + [WRITE_LIFE_SHORT] = XFS_ZONE_ALLOC_GOOD, 521 + }, 522 + [WRITE_LIFE_MEDIUM] = { 523 + [WRITE_LIFE_MEDIUM] = XFS_ZONE_ALLOC_GOOD, 524 + }, 525 + [WRITE_LIFE_LONG] = { 526 + [WRITE_LIFE_LONG] = XFS_ZONE_ALLOC_OK, 527 + [WRITE_LIFE_EXTREME] = XFS_ZONE_ALLOC_OK, 528 + }, 529 + [WRITE_LIFE_EXTREME] = { 530 + [WRITE_LIFE_LONG] = XFS_ZONE_ALLOC_OK, 531 + [WRITE_LIFE_EXTREME] = XFS_ZONE_ALLOC_OK, 532 + }, 533 + }; 542 534 543 535 static bool 544 536 xfs_try_use_zone( 545 537 struct xfs_zone_info *zi, 546 538 enum rw_hint file_hint, 547 539 struct xfs_open_zone *oz, 548 - bool lowspace) 540 + unsigned int goodness) 549 541 { 550 542 if (oz->oz_allocated == rtg_blocks(oz->oz_rtg)) 551 543 return false; 552 - if (!lowspace && !xfs_good_hint_match(oz, file_hint)) 544 + 545 + if (xfs_zoned_hint_score[oz->oz_write_hint][file_hint] < goodness) 553 546 return false; 547 + 554 548 if (!atomic_inc_not_zero(&oz->oz_ref)) 555 549 return false; 556 550 ··· 575 581 xfs_select_open_zone_lru( 576 582 struct xfs_zone_info *zi, 577 583 enum rw_hint file_hint, 578 - bool lowspace) 584 + unsigned int goodness) 579 585 { 580 586 struct xfs_open_zone *oz; 581 587 582 588 lockdep_assert_held(&zi->zi_open_zones_lock); 583 589 584 590 list_for_each_entry(oz, &zi->zi_open_zones, oz_entry) 585 - if (xfs_try_use_zone(zi, file_hint, oz, lowspace)) 591 + if (xfs_try_use_zone(zi, file_hint, oz, goodness)) 586 592 return oz; 587 593 588 594 cond_resched_lock(&zi->zi_open_zones_lock); ··· 645 651 * data. 646 652 */ 647 653 spin_lock(&zi->zi_open_zones_lock); 648 - if (xfs_colocate_eagerly(write_hint)) 649 - oz = xfs_select_open_zone_lru(zi, write_hint, false); 650 - else if (pack_tight) 654 + oz = xfs_select_open_zone_lru(zi, write_hint, XFS_ZONE_ALLOC_GOOD); 655 + if (oz) 656 + goto out_unlock; 657 + 658 + if (pack_tight) 651 659 oz = xfs_select_open_zone_mru(zi, write_hint); 652 660 if (oz) 653 661 goto out_unlock; ··· 663 667 goto out_unlock; 664 668 665 669 /* 666 - * Try to colocate cold data with other cold data if we failed to open a 667 - * new zone for it. 670 + * Try to find an zone that is an ok match to colocate data with. 668 671 */ 669 - if (write_hint != WRITE_LIFE_NOT_SET && 670 - !xfs_colocate_eagerly(write_hint)) 671 - oz = xfs_select_open_zone_lru(zi, write_hint, false); 672 - if (!oz) 673 - oz = xfs_select_open_zone_lru(zi, WRITE_LIFE_NOT_SET, false); 674 - if (!oz) 675 - oz = xfs_select_open_zone_lru(zi, WRITE_LIFE_NOT_SET, true); 672 + oz = xfs_select_open_zone_lru(zi, write_hint, XFS_ZONE_ALLOC_OK); 673 + if (oz) 674 + goto out_unlock; 675 + 676 + /* 677 + * Pick the least recently used zone, regardless of hint match 678 + */ 679 + oz = xfs_select_open_zone_lru(zi, write_hint, XFS_ZONE_ALLOC_ANY); 676 680 out_unlock: 677 681 spin_unlock(&zi->zi_open_zones_lock); 678 682 return oz; ··· 1131 1135 if (bdev_open_zones) 1132 1136 mp->m_max_open_zones = bdev_open_zones; 1133 1137 else 1134 - mp->m_max_open_zones = xfs_max_open_zones(mp); 1138 + mp->m_max_open_zones = XFS_DEFAULT_MAX_OPEN_ZONES; 1135 1139 } 1136 1140 1137 1141 if (mp->m_max_open_zones < XFS_MIN_OPEN_ZONES) { ··· 1244 1248 if (!mp->m_zone_info) 1245 1249 return -ENOMEM; 1246 1250 1247 - xfs_info(mp, "%u zones of %u blocks size (%u max open)", 1251 + xfs_info(mp, "%u zones of %u blocks (%u max open zones)", 1248 1252 mp->m_sb.sb_rgcount, mp->m_groups[XG_TYPE_RTG].blocks, 1249 1253 mp->m_max_open_zones); 1250 1254 trace_xfs_zones_mount(mp);

+1

include/linux/rw_hint.h

··· 14 14 WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM, 15 15 WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG, 16 16 WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME, 17 + WRITE_LIFE_HINT_NR, 17 18 } __packed; 18 19 19 20 /* Sparse ignores __packed annotations on enums, hence the #ifndef below. */

Configure Feed

Configure Feed