Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"This mainly fixes 1 lcluster-sized pclusters for the big pcluster
feature, which can be forcely generated by mkfs as a specific on-disk
case for per-(sub)file compression strategies but missed to handle in
runtime properly.

Also, documentation updates are included to fix the broken
illustration due to the ReST conversion by accident and complete the
big pcluster introduction.

Summary:

- update documentation to fix the broken illustration due to ReST
conversion by accident at that time and complete the big pcluster
introduction

- fix 1 lcluster-sized pclusters for the big pcluster feature"

* tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix 1 lcluster-sized pcluster for big pcluster
erofs: update documentation about data compression
erofs: fix broken illustration in documentation

+118 -72
+99 -70
Documentation/filesystems/erofs.rst
··· 50 50 51 51 - Support POSIX.1e ACLs by using xattrs; 52 52 53 - - Support transparent file compression as an option: 54 - LZ4 algorithm with 4 KB fixed-sized output compression for high performance. 53 + - Support transparent data compression as an option: 54 + LZ4 algorithm with the fixed-sized output compression for high performance. 55 55 56 56 The following git tree provides the file system user-space tools under 57 57 development (ex, formatting tool mkfs.erofs): ··· 113 113 114 114 :: 115 115 116 - |-> aligned with 8B 117 - |-> followed closely 118 - + meta_blkaddr blocks |-> another slot 119 - _____________________________________________________________________ 120 - | ... | inode | xattrs | extents | data inline | ... | inode ... 121 - |________|_______|(optional)|(optional)|__(optional)_|_____|__________ 122 - |-> aligned with the inode slot size 123 - . . 124 - . . 125 - . . 126 - . . 127 - . . 128 - . . 129 - .____________________________________________________|-> aligned with 4B 130 - | xattr_ibody_header | shared xattrs | inline xattrs | 131 - |____________________|_______________|_______________| 132 - |-> 12 bytes <-|->x * 4 bytes<-| . 133 - . . . 134 - . . . 135 - . . . 136 - ._______________________________.______________________. 137 - | id | id | id | id | ... | id | ent | ... | ent| ... | 138 - |____|____|____|____|______|____|_____|_____|____|_____| 139 - |-> aligned with 4B 140 - |-> aligned with 4B 116 + |-> aligned with 8B 117 + |-> followed closely 118 + + meta_blkaddr blocks |-> another slot 119 + _____________________________________________________________________ 120 + | ... | inode | xattrs | extents | data inline | ... | inode ... 121 + |________|_______|(optional)|(optional)|__(optional)_|_____|__________ 122 + |-> aligned with the inode slot size 123 + . . 124 + . . 125 + . . 126 + . . 127 + . . 128 + . . 129 + .____________________________________________________|-> aligned with 4B 130 + | xattr_ibody_header | shared xattrs | inline xattrs | 131 + |____________________|_______________|_______________| 132 + |-> 12 bytes <-|->x * 4 bytes<-| . 133 + . . . 134 + . . . 135 + . . . 136 + ._______________________________.______________________. 137 + | id | id | id | id | ... | id | ent | ... | ent| ... | 138 + |____|____|____|____|______|____|_____|_____|____|_____| 139 + |-> aligned with 4B 140 + |-> aligned with 4B 141 141 142 142 Inode could be 32 or 64 bytes, which can be distinguished from a common 143 143 field which all inode versions have -- i_format:: ··· 175 175 Each share xattr can also be directly found by the following formula: 176 176 xattr offset = xattr_blkaddr * block_size + 4 * xattr_id 177 177 178 - :: 178 + :: 179 179 180 - |-> aligned by 4 bytes 181 - + xattr_blkaddr blocks |-> aligned with 4 bytes 182 - _________________________________________________________________________ 183 - | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... 184 - |________|_____________|_____________|_____|______________|_______________ 180 + |-> aligned by 4 bytes 181 + + xattr_blkaddr blocks |-> aligned with 4 bytes 182 + _________________________________________________________________________ 183 + | ... | xattr_entry | xattr data | ... | xattr_entry | xattr data ... 184 + |________|_____________|_____________|_____|______________|_______________ 185 185 186 186 Directories 187 187 ----------- ··· 193 193 194 194 :: 195 195 196 - ___________________________ 197 - / | 198 - / ______________|________________ 199 - / / | nameoff1 | nameoffN-1 200 - ____________.______________._______________v________________v__________ 201 - | dirent | dirent | ... | dirent | filename | filename | ... | filename | 202 - |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| 203 - \ ^ 204 - \ | * could have 205 - \ | trailing '\0' 206 - \________________________| nameoff0 207 - 208 - Directory block 196 + ___________________________ 197 + / | 198 + / ______________|________________ 199 + / / | nameoff1 | nameoffN-1 200 + ____________.______________._______________v________________v__________ 201 + | dirent | dirent | ... | dirent | filename | filename | ... | filename | 202 + |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| 203 + \ ^ 204 + \ | * could have 205 + \ | trailing '\0' 206 + \________________________| nameoff0 207 + Directory block 209 208 210 209 Note that apart from the offset of the first filename, nameoff0 also indicates 211 210 the total number of directory entries in this block since it is no need to 212 211 introduce another on-disk field at all. 213 212 214 - Compression 215 - ----------- 216 - Currently, EROFS supports 4KB fixed-sized output transparent file compression, 217 - as illustrated below:: 213 + Data compression 214 + ---------------- 215 + EROFS implements LZ4 fixed-sized output compression which generates fixed-sized 216 + compressed data blocks from variable-sized input in contrast to other existing 217 + fixed-sized input solutions. Relatively higher compression ratios can be gotten 218 + by using fixed-sized output compression since nowadays popular data compression 219 + algorithms are mostly LZ77-based and such fixed-sized output approach can be 220 + benefited from the historical dictionary (aka. sliding window). 218 221 219 - |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- 220 - clusterofs clusterofs clusterofs 221 - | | | logical data 222 - _________v_______________________________v_____________________v_______________ 223 - ... | . | | . | | . | ... 224 - ____|____.________|_____________|________.____|_____________|__.__________|____ 225 - |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| 226 - size size size size size 227 - . . . . 228 - . . . . 229 - . . . . 230 - _______._____________._____________._____________._____________________ 231 - ... | | | | ... physical data 232 - _______|_____________|_____________|_____________|_____________________ 233 - |-> cluster <-|-> cluster <-|-> cluster <-| 234 - size size size 222 + In details, original (uncompressed) data is turned into several variable-sized 223 + extents and in the meanwhile, compressed into physical clusters (pclusters). 224 + In order to record each variable-sized extent, logical clusters (lclusters) are 225 + introduced as the basic unit of compress indexes to indicate whether a new 226 + extent is generated within the range (HEAD) or not (NONHEAD). Lclusters are now 227 + fixed in block size, as illustrated below:: 235 228 236 - Currently each on-disk physical cluster can contain 4KB (un)compressed data 237 - at most. For each logical cluster, there is a corresponding on-disk index to 238 - describe its cluster type, physical cluster address, etc. 229 + |<- variable-sized extent ->|<- VLE ->| 230 + clusterofs clusterofs clusterofs 231 + | | | 232 + _________v_________________________________v_______________________v________ 233 + ... | . | | . | | . ... 234 + ____|____._________|______________|________.___ _|______________|__.________ 235 + |-> lcluster <-|-> lcluster <-|-> lcluster <-|-> lcluster <-| 236 + (HEAD) (NONHEAD) (HEAD) (NONHEAD) . 237 + . CBLKCNT . . 238 + . . . 239 + . . . 240 + _______._____________________________.______________._________________ 241 + ... | | | | ... 242 + _______|______________|______________|______________|_________________ 243 + |-> big pcluster <-|-> pcluster <-| 239 244 240 - See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. 245 + A physical cluster can be seen as a container of physical compressed blocks 246 + which contains compressed data. Previously, only lcluster-sized (4KB) pclusters 247 + were supported. After big pcluster feature is introduced (available since 248 + Linux v5.13), pcluster can be a multiple of lcluster size. 249 + 250 + For each HEAD lcluster, clusterofs is recorded to indicate where a new extent 251 + starts and blkaddr is used to seek the compressed data. For each NONHEAD 252 + lcluster, delta0 and delta1 are available instead of blkaddr to indicate the 253 + distance to its HEAD lcluster and the next HEAD lcluster. A PLAIN lcluster is 254 + also a HEAD lcluster except that its data is uncompressed. See the comments 255 + around "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. 256 + 257 + If big pcluster is enabled, pcluster size in lclusters needs to be recorded as 258 + well. Let the delta0 of the first NONHEAD lcluster store the compressed block 259 + count with a special flag as a new called CBLKCNT NONHEAD lcluster. It's easy 260 + to understand its delta0 is constantly 1, as illustrated below:: 261 + 262 + __________________________________________________________ 263 + | HEAD | NONHEAD | NONHEAD | ... | NONHEAD | HEAD | HEAD | 264 + |__:___|_(CBLKCNT)_|_________|_____|_________|__:___|____:_| 265 + |<----- a big pcluster (with CBLKCNT) ------>|<-- -->| 266 + a lcluster-sized pcluster (without CBLKCNT) ^ 267 + 268 + If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT, 269 + but it's easy to know the size of such pcluster is 1 lcluster as well.
+19 -2
fs/erofs/zmap.c
··· 450 450 lcn = m->lcn + 1; 451 451 if (m->compressedlcs) 452 452 goto out; 453 - if (lcn == initial_lcn) 454 - goto err_bonus_cblkcnt; 455 453 456 454 err = z_erofs_load_cluster_from_disk(m, lcn); 457 455 if (err) 458 456 return err; 459 457 458 + /* 459 + * If the 1st NONHEAD lcluster has already been handled initially w/o 460 + * valid compressedlcs, which means at least it mustn't be CBLKCNT, or 461 + * an internal implemenatation error is detected. 462 + * 463 + * The following code can also handle it properly anyway, but let's 464 + * BUG_ON in the debugging mode only for developers to notice that. 465 + */ 466 + DBG_BUGON(lcn == initial_lcn && 467 + m->type == Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD); 468 + 460 469 switch (m->type) { 470 + case Z_EROFS_VLE_CLUSTER_TYPE_PLAIN: 471 + case Z_EROFS_VLE_CLUSTER_TYPE_HEAD: 472 + /* 473 + * if the 1st NONHEAD lcluster is actually PLAIN or HEAD type 474 + * rather than CBLKCNT, it's a 1 lcluster-sized pcluster. 475 + */ 476 + m->compressedlcs = 1; 477 + break; 461 478 case Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD: 462 479 if (m->delta[0] != 1) 463 480 goto err_bonus_cblkcnt;