Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

xfs: fix various problems in xfs_atomic_write_cow_iomap_begin

I think there are several things wrong with this function:

A) xfs_bmapi_write can return a much larger unwritten mapping than what
the caller asked for. We convert part of that range to written, but
return the entire written mapping to iomap even though that's
inaccurate.

B) The arguments to xfs_reflink_convert_cow_locked are wrong -- an
unwritten mapping could be *smaller* than the write range (or even
the hole range). In this case, we convert too much file range to
written state because we then return a smaller mapping to iomap.

C) It doesn't handle delalloc mappings. This I covered in the patch
that I already sent to the list.

D) Reassigning count_fsb to handle the hole means that if the second
cmap lookup attempt succeeds (due to racing with someone else) we
trim the mapping more than is strictly necessary. The changing
meaning of count_fsb makes this harder to notice.

E) The tracepoint is kinda wrong because @length is mutated. That makes
it harder to chase the data flows through this function because you
can't just grep on the pos/bytecount strings.

F) We don't actually check that the br_state = XFS_EXT_NORM assignment
is accurate, i.e that the cow fork actually contains a written
mapping for the range we're interested in

G) Somewhat inadequate documentation of why we need to xfs_trim_extent
so aggressively in this function.

H) Not sure why xfs_iomap_end_fsb is used here, the vfs already clamped
the write range to s_maxbytes.

Fix these issues, and then the atomic writes regressions in generic/760,
generic/617, generic/091, generic/263, and generic/521 all go away for
me.

Cc: stable@vger.kernel.org # v6.16
Fixes: bd1d2c21d5d249 ("xfs: add xfs_atomic_write_cow_iomap_begin()")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>

authored by

Darrick J. Wong and committed by
Carlos Maiolino
8d7bba1e 8d54eacd

+50 -11
+50 -11
fs/xfs/xfs_iomap.c
··· 1091 1091 }; 1092 1092 #endif /* CONFIG_XFS_RT */ 1093 1093 1094 + #ifdef DEBUG 1095 + static void 1096 + xfs_check_atomic_cow_conversion( 1097 + struct xfs_inode *ip, 1098 + xfs_fileoff_t offset_fsb, 1099 + xfs_filblks_t count_fsb, 1100 + const struct xfs_bmbt_irec *cmap) 1101 + { 1102 + struct xfs_iext_cursor icur; 1103 + struct xfs_bmbt_irec cmap2 = { }; 1104 + 1105 + if (xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &cmap2)) 1106 + xfs_trim_extent(&cmap2, offset_fsb, count_fsb); 1107 + 1108 + ASSERT(cmap2.br_startoff == cmap->br_startoff); 1109 + ASSERT(cmap2.br_blockcount == cmap->br_blockcount); 1110 + ASSERT(cmap2.br_startblock == cmap->br_startblock); 1111 + ASSERT(cmap2.br_state == cmap->br_state); 1112 + } 1113 + #else 1114 + # define xfs_check_atomic_cow_conversion(...) ((void)0) 1115 + #endif 1116 + 1094 1117 static int 1095 1118 xfs_atomic_write_cow_iomap_begin( 1096 1119 struct inode *inode, ··· 1125 1102 { 1126 1103 struct xfs_inode *ip = XFS_I(inode); 1127 1104 struct xfs_mount *mp = ip->i_mount; 1128 - const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); 1129 - xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); 1130 - xfs_filblks_t count_fsb = end_fsb - offset_fsb; 1105 + const xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); 1106 + const xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length); 1107 + const xfs_filblks_t count_fsb = end_fsb - offset_fsb; 1108 + xfs_filblks_t hole_count_fsb; 1131 1109 int nmaps = 1; 1132 1110 xfs_filblks_t resaligned; 1133 1111 struct xfs_bmbt_irec cmap; ··· 1167 1143 if (cmap.br_startoff <= offset_fsb) { 1168 1144 if (isnullstartblock(cmap.br_startblock)) 1169 1145 goto convert_delay; 1146 + 1147 + /* 1148 + * cmap could extend outside the write range due to previous 1149 + * speculative preallocations. We must trim cmap to the write 1150 + * range because the cow fork treats written mappings to mean 1151 + * "write in progress". 1152 + */ 1170 1153 xfs_trim_extent(&cmap, offset_fsb, count_fsb); 1171 1154 goto found; 1172 1155 } 1173 1156 1174 - end_fsb = cmap.br_startoff; 1175 - count_fsb = end_fsb - offset_fsb; 1157 + hole_count_fsb = cmap.br_startoff - offset_fsb; 1176 1158 1177 - resaligned = xfs_aligned_fsb_count(offset_fsb, count_fsb, 1159 + resaligned = xfs_aligned_fsb_count(offset_fsb, hole_count_fsb, 1178 1160 xfs_get_cowextsz_hint(ip)); 1179 1161 xfs_iunlock(ip, XFS_ILOCK_EXCL); 1180 1162 ··· 1216 1186 * atomic writes to that same range will be aligned (and don't require 1217 1187 * this COW-based method). 1218 1188 */ 1219 - error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, 1189 + error = xfs_bmapi_write(tp, ip, offset_fsb, hole_count_fsb, 1220 1190 XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC | 1221 1191 XFS_BMAPI_EXTSZALIGN, 0, &cmap, &nmaps); 1222 1192 if (error) { ··· 1229 1199 if (error) 1230 1200 goto out_unlock; 1231 1201 1202 + /* 1203 + * cmap could map more blocks than the range we passed into bmapi_write 1204 + * because of EXTSZALIGN or adjacent pre-existing unwritten mappings 1205 + * that were merged. Trim cmap to the original write range so that we 1206 + * don't convert more than we were asked to do for this write. 1207 + */ 1208 + xfs_trim_extent(&cmap, offset_fsb, count_fsb); 1209 + 1232 1210 found: 1233 1211 if (cmap.br_state != XFS_EXT_NORM) { 1234 - error = xfs_reflink_convert_cow_locked(ip, offset_fsb, 1235 - count_fsb); 1212 + error = xfs_reflink_convert_cow_locked(ip, cmap.br_startoff, 1213 + cmap.br_blockcount); 1236 1214 if (error) 1237 1215 goto out_unlock; 1238 1216 cmap.br_state = XFS_EXT_NORM; 1217 + xfs_check_atomic_cow_conversion(ip, offset_fsb, count_fsb, 1218 + &cmap); 1239 1219 } 1240 1220 1241 - length = XFS_FSB_TO_B(mp, cmap.br_startoff + cmap.br_blockcount); 1242 - trace_xfs_iomap_found(ip, offset, length - offset, XFS_COW_FORK, &cmap); 1221 + trace_xfs_iomap_found(ip, offset, length, XFS_COW_FORK, &cmap); 1243 1222 seq = xfs_iomap_inode_sequence(ip, IOMAP_F_SHARED); 1244 1223 xfs_iunlock(ip, XFS_ILOCK_EXCL); 1245 1224 return xfs_bmbt_to_iomap(ip, iomap, &cmap, flags, IOMAP_F_SHARED, seq);