Merge tag 'erofs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

+12 -8

Documentation/ABI/testing/sysfs-fs-erofs

··· 3 3 Contact: "Huang Jianan" <huangjianan@oppo.com> 4 4 Description: Shows all enabled kernel features. 5 5 Supported features: 6 - zero_padding, compr_cfgs, big_pcluster, chunked_file, 7 - device_table, compr_head2, sb_chksum, ztailpacking, 8 - dedupe, fragments, 48bit, metabox. 6 + compr_cfgs, big_pcluster, chunked_file, device_table, 7 + compr_head2, sb_chksum, ztailpacking, dedupe, fragments, 8 + 48bit, metabox. 9 9 10 10 What: /sys/fs/erofs/<disk>/sync_decompress 11 11 Date: November 2021 12 12 Contact: "Huang Jianan" <huangjianan@oppo.com> 13 - Description: Control strategy of sync decompression: 13 + Description: Control strategy of synchronous decompression. Synchronous 14 + decompression tries to decompress in the reader thread for 15 + synchronous reads and small asynchronous reads (<= 12 KiB): 14 16 15 - - 0 (default, auto): enable for readpage, and enable for 16 - readahead on atomic contexts only. 17 - - 1 (force on): enable for readpage and readahead. 18 - - 2 (force off): disable for all situations. 17 + - 0 (auto, default): apply to synchronous reads only, but will 18 + switch to 1 (force on) if any decompression 19 + request is detected in atomic contexts; 20 + - 1 (force on): apply to synchronous reads and small 21 + asynchronous reads; 22 + - 2 (force off): disable synchronous decompression completely. 19 23 20 24 What: /sys/fs/erofs/<disk>/drop_caches 21 25 Date: November 2024

+13 -5

Documentation/filesystems/erofs.rst

··· 63 63 - Support POSIX.1e ACLs by using extended attributes; 64 64 65 65 - Support transparent data compression as an option: 66 - LZ4, MicroLZMA and DEFLATE algorithms can be used on a per-file basis; In 67 - addition, inplace decompression is also supported to avoid bounce compressed 68 - buffers and unnecessary page cache thrashing. 66 + LZ4, MicroLZMA, DEFLATE and Zstandard algorithms can be used on a per-file 67 + basis; In addition, inplace decompression is also supported to avoid bounce 68 + compressed buffers and unnecessary page cache thrashing. 69 69 70 70 - Support chunk-based data deduplication and rolling-hash compressed data 71 71 deduplication; ··· 125 125 Documentation/filesystems/dax.rst. 126 126 dax A legacy option which is an alias for ``dax=always``. 127 127 device=%s Specify a path to an extra device to be used together. 128 + directio (For file-backed mounts) Use direct I/O to access backing 129 + files, and asynchronous I/O will be enabled if supported. 128 130 fsid=%s Specify a filesystem image ID for Fscache back-end. 129 - domain_id=%s Specify a domain ID in fscache mode so that different images 130 - with the same blobs under a given domain ID can share storage. 131 + domain_id=%s Specify a trusted domain ID for fscache mode so that 132 + different images with the same blobs, identified by blob IDs, 133 + can share storage within the same trusted domain. 134 + Also used for different filesystems with inode page sharing 135 + enabled to share page cache within the trusted domain. 131 136 fsoffset=%llu Specify block-aligned filesystem offset for the primary device. 137 + inode_share Enable inode page sharing for this filesystem. Inodes with 138 + identical content within the same domain ID can share the 139 + page cache. 132 140 =================== ========================================================= 133 141 134 142 Sysfs Entries

+12 -8

fs/erofs/Kconfig

··· 112 112 config EROFS_FS_ZIP_LZMA 113 113 bool "EROFS LZMA compressed data support" 114 114 depends on EROFS_FS_ZIP 115 + default y 115 116 help 116 117 Saying Y here includes support for reading EROFS file systems 117 118 containing LZMA compressed data, specifically called microLZMA. It 118 119 gives better compression ratios than the default LZ4 format, at the 119 120 expense of more CPU overhead. 120 121 121 - If unsure, say N. 122 + Say N if you want to disable LZMA compression support. 122 123 123 124 config EROFS_FS_ZIP_DEFLATE 124 125 bool "EROFS DEFLATE compressed data support" ··· 130 129 ratios than the default LZ4 format, while it costs more CPU 131 130 overhead. 132 131 133 - DEFLATE support is an experimental feature for now and so most 134 - file systems will be readable without selecting this option. 135 - 136 132 If unsure, say N. 137 133 138 134 config EROFS_FS_ZIP_ZSTD ··· 139 141 Saying Y here includes support for reading EROFS file systems 140 142 containing Zstandard compressed data. It gives better compression 141 143 ratios than the default LZ4 format, while it costs more CPU 142 - overhead. 143 - 144 - Zstandard support is an experimental feature for now and so most 145 - file systems will be readable without selecting this option. 144 + overhead and memory footprint. 146 145 147 146 If unsure, say N. 148 147 ··· 187 192 help 188 193 This permits EROFS to configure per-CPU kthread workers to run 189 194 at higher priority. 195 + 196 + If unsure, say N. 197 + 198 + config EROFS_FS_PAGE_CACHE_SHARE 199 + bool "EROFS page cache share support (experimental)" 200 + depends on EROFS_FS && EROFS_FS_XATTR && !EROFS_FS_ONDEMAND 201 + help 202 + This enables page cache sharing among inodes with identical 203 + content fingerprints on the same machine. 190 204 191 205 If unsure, say N.

+1

fs/erofs/Makefile

··· 10 10 erofs-$(CONFIG_EROFS_FS_ZIP_ACCEL) += decompressor_crypto.o 11 11 erofs-$(CONFIG_EROFS_FS_BACKED_BY_FILE) += fileio.o 12 12 erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o 13 + erofs-$(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) += ishare.o

+32 -24

fs/erofs/data.c

··· 270 270 struct erofs_iomap_iter_ctx { 271 271 struct page *page; 272 272 void *base; 273 + struct inode *realinode; 273 274 }; 274 275 275 276 static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, ··· 278 277 { 279 278 struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap); 280 279 struct erofs_iomap_iter_ctx *ctx = iter->private; 281 - struct super_block *sb = inode->i_sb; 280 + struct inode *realinode = ctx ? ctx->realinode : inode; 281 + struct super_block *sb = realinode->i_sb; 282 282 struct erofs_map_blocks map; 283 283 struct erofs_map_dev mdev; 284 284 int ret; 285 285 286 286 map.m_la = offset; 287 287 map.m_llen = length; 288 - ret = erofs_map_blocks(inode, &map); 288 + ret = erofs_map_blocks(realinode, &map); 289 289 if (ret < 0) 290 290 return ret; 291 291 ··· 299 297 return 0; 300 298 } 301 299 302 - if (!(map.m_flags & EROFS_MAP_META) || !erofs_inode_in_metabox(inode)) { 300 + if (!(map.m_flags & EROFS_MAP_META) || !erofs_inode_in_metabox(realinode)) { 303 301 mdev = (struct erofs_map_dev) { 304 302 .m_deviceid = map.m_deviceid, 305 303 .m_pa = map.m_pa, ··· 325 323 void *ptr; 326 324 327 325 ptr = erofs_read_metabuf(&buf, sb, map.m_pa, 328 - erofs_inode_in_metabox(inode)); 326 + erofs_inode_in_metabox(realinode)); 329 327 if (IS_ERR(ptr)) 330 328 return PTR_ERR(ptr); 331 329 iomap->inline_data = ptr; ··· 366 364 u64 start, u64 len) 367 365 { 368 366 if (erofs_inode_is_data_compressed(EROFS_I(inode)->datalayout)) { 369 - #ifdef CONFIG_EROFS_FS_ZIP 367 + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP)) 368 + return -EOPNOTSUPP; 370 369 return iomap_fiemap(inode, fieinfo, start, len, 371 370 &z_erofs_iomap_report_ops); 372 - #else 373 - return -EOPNOTSUPP; 374 - #endif 375 371 } 376 372 return iomap_fiemap(inode, fieinfo, start, len, &erofs_iomap_ops); 377 373 } ··· 384 384 .ops = &iomap_bio_read_ops, 385 385 .cur_folio = folio, 386 386 }; 387 - struct erofs_iomap_iter_ctx iter_ctx = {}; 387 + bool need_iput; 388 + struct erofs_iomap_iter_ctx iter_ctx = { 389 + .realinode = erofs_real_inode(folio_inode(folio), &need_iput), 390 + }; 388 391 389 - trace_erofs_read_folio(folio, true); 390 - 392 + trace_erofs_read_folio(iter_ctx.realinode, folio, true); 391 393 iomap_read_folio(&erofs_iomap_ops, &read_ctx, &iter_ctx); 394 + if (need_iput) 395 + iput(iter_ctx.realinode); 392 396 return 0; 393 397 } 394 398 ··· 402 398 .ops = &iomap_bio_read_ops, 403 399 .rac = rac, 404 400 }; 405 - struct erofs_iomap_iter_ctx iter_ctx = {}; 401 + bool need_iput; 402 + struct erofs_iomap_iter_ctx iter_ctx = { 403 + .realinode = erofs_real_inode(rac->mapping->host, &need_iput), 404 + }; 406 405 407 - trace_erofs_readahead(rac->mapping->host, readahead_index(rac), 408 - readahead_count(rac), true); 409 - 406 + trace_erofs_readahead(iter_ctx.realinode, readahead_index(rac), 407 + readahead_count(rac), true); 410 408 iomap_readahead(&erofs_iomap_ops, &read_ctx, &iter_ctx); 409 + if (need_iput) 410 + iput(iter_ctx.realinode); 411 411 } 412 412 413 413 static sector_t erofs_bmap(struct address_space *mapping, sector_t block) ··· 427 419 if (!iov_iter_count(to)) 428 420 return 0; 429 421 430 - #ifdef CONFIG_FS_DAX 431 - if (IS_DAX(inode)) 422 + if (IS_ENABLED(CONFIG_FS_DAX) && IS_DAX(inode)) 432 423 return dax_iomap_rw(iocb, to, &erofs_iomap_ops); 433 - #endif 424 + 434 425 if ((iocb->ki_flags & IOCB_DIRECT) && inode->i_sb->s_bdev) { 435 - struct erofs_iomap_iter_ctx iter_ctx = {}; 426 + struct erofs_iomap_iter_ctx iter_ctx = { 427 + .realinode = inode, 428 + }; 436 429 437 430 return iomap_dio_rw(iocb, to, &erofs_iomap_ops, 438 431 NULL, 0, &iter_ctx, 0); ··· 489 480 struct inode *inode = file->f_mapping->host; 490 481 const struct iomap_ops *ops = &erofs_iomap_ops; 491 482 492 - if (erofs_inode_is_data_compressed(EROFS_I(inode)->datalayout)) 493 - #ifdef CONFIG_EROFS_FS_ZIP 483 + if (erofs_inode_is_data_compressed(EROFS_I(inode)->datalayout)) { 484 + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP)) 485 + return generic_file_llseek(file, offset, whence); 494 486 ops = &z_erofs_iomap_report_ops; 495 - #else 496 - return generic_file_llseek(file, offset, whence); 497 - #endif 487 + } 498 488 499 489 if (whence == SEEK_HOLE) 500 490 offset = iomap_seek_hole(inode, offset, ops);

+38 -47

fs/erofs/decompressor.c

··· 34 34 } 35 35 } else { 36 36 distance = le16_to_cpu(dsb->u1.lz4_max_distance); 37 + if (!distance && !erofs_sb_has_lz4_0padding(sbi)) 38 + return 0; 37 39 sbi->lz4.max_pclusterblks = 1; 40 + sbi->available_compr_algs = 1 << Z_EROFS_COMPRESSION_LZ4; 38 41 } 39 42 40 43 sbi->lz4.max_distance_pages = distance ? ··· 198 195 return NULL; 199 196 } 200 197 201 - static int z_erofs_lz4_decompress_mem(struct z_erofs_decompress_req *rq, u8 *dst) 198 + static const char *__z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, 199 + u8 *dst) 202 200 { 203 - bool support_0padding = false, may_inplace = false; 201 + bool may_inplace = false; 204 202 unsigned int inputmargin; 205 203 u8 *out, *headpage, *src; 206 204 const char *reason; 207 205 int ret, maptype; 208 206 209 - DBG_BUGON(*rq->in == NULL); 210 207 headpage = kmap_local_page(*rq->in); 211 - 212 - /* LZ4 decompression inplace is only safe if zero_padding is enabled */ 213 - if (erofs_sb_has_zero_padding(EROFS_SB(rq->sb))) { 214 - support_0padding = true; 215 - reason = z_erofs_fixup_insize(rq, headpage + rq->pageofs_in, 216 - min_t(unsigned int, rq->inputsize, 217 - rq->sb->s_blocksize - rq->pageofs_in)); 218 - if (reason) { 219 - kunmap_local(headpage); 220 - return IS_ERR(reason) ? PTR_ERR(reason) : -EFSCORRUPTED; 221 - } 222 - may_inplace = !((rq->pageofs_in + rq->inputsize) & 223 - (rq->sb->s_blocksize - 1)); 208 + reason = z_erofs_fixup_insize(rq, headpage + rq->pageofs_in, 209 + min_t(unsigned int, rq->inputsize, 210 + rq->sb->s_blocksize - rq->pageofs_in)); 211 + if (reason) { 212 + kunmap_local(headpage); 213 + return reason; 224 214 } 215 + may_inplace = !((rq->pageofs_in + rq->inputsize) & 216 + (rq->sb->s_blocksize - 1)); 225 217 226 218 inputmargin = rq->pageofs_in; 227 219 src = z_erofs_lz4_handle_overlap(rq, headpage, dst, &inputmargin, 228 220 &maptype, may_inplace); 229 221 if (IS_ERR(src)) 230 - return PTR_ERR(src); 222 + return ERR_CAST(src); 231 223 232 224 out = dst + rq->pageofs_out; 233 - /* legacy format could compress extra data in a pcluster. */ 234 - if (rq->partial_decoding || !support_0padding) 225 + if (rq->partial_decoding) 235 226 ret = LZ4_decompress_safe_partial(src + inputmargin, out, 236 227 rq->inputsize, rq->outputsize, rq->outputsize); 237 228 else 238 229 ret = LZ4_decompress_safe(src + inputmargin, out, 239 230 rq->inputsize, rq->outputsize); 231 + if (ret == rq->outputsize) 232 + reason = NULL; 233 + else if (ret < 0) 234 + reason = "corrupted compressed data"; 235 + else 236 + reason = "unexpected end of stream"; 240 237 241 - if (ret != rq->outputsize) { 242 - if (ret >= 0) 243 - memset(out + ret, 0, rq->outputsize - ret); 244 - ret = -EFSCORRUPTED; 245 - } else { 246 - ret = 0; 247 - } 248 - 249 - if (maptype == 0) { 238 + if (!maptype) { 250 239 kunmap_local(headpage); 251 240 } else if (maptype == 1) { 252 241 vm_unmap_ram(src, rq->inpages); ··· 246 251 z_erofs_put_gbuf(src); 247 252 } else if (maptype != 3) { 248 253 DBG_BUGON(1); 249 - return -EFAULT; 254 + return ERR_PTR(-EFAULT); 250 255 } 251 - return ret; 256 + return reason; 252 257 } 253 258 254 259 static const char *z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq, 255 260 struct page **pagepool) 256 261 { 257 262 unsigned int dst_maptype; 263 + const char *reason; 258 264 void *dst; 259 265 int ret; 260 266 ··· 279 283 dst_maptype = 2; 280 284 } 281 285 } 282 - ret = z_erofs_lz4_decompress_mem(rq, dst); 286 + reason = __z_erofs_lz4_decompress(rq, dst); 283 287 if (!dst_maptype) 284 288 kunmap_local(dst); 285 289 else if (dst_maptype == 2) 286 290 vm_unmap_ram(dst, rq->outpages); 287 - return ERR_PTR(ret); 291 + return reason; 288 292 } 289 293 290 294 static const char *z_erofs_transform_plain(struct z_erofs_decompress_req *rq, ··· 448 452 { 449 453 struct erofs_sb_info *sbi = EROFS_SB(sb); 450 454 struct erofs_buf buf = __EROFS_BUF_INITIALIZER; 451 - unsigned int algs, alg; 455 + unsigned long algs, alg; 452 456 erofs_off_t offset; 453 457 int size, ret = 0; 454 458 455 - if (!erofs_sb_has_compr_cfgs(sbi)) { 456 - sbi->available_compr_algs = 1 << Z_EROFS_COMPRESSION_LZ4; 459 + if (!erofs_sb_has_compr_cfgs(sbi)) 457 460 return z_erofs_load_lz4_config(sb, dsb, NULL, 0); 458 - } 459 461 460 - sbi->available_compr_algs = le16_to_cpu(dsb->u1.available_compr_algs); 461 - if (sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS) { 462 - erofs_err(sb, "unidentified algorithms %x, please upgrade kernel", 463 - sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS); 462 + algs = le16_to_cpu(dsb->u1.available_compr_algs); 463 + sbi->available_compr_algs = algs; 464 + if (algs & ~Z_EROFS_ALL_COMPR_ALGS) { 465 + erofs_err(sb, "unidentified algorithms %lx, please upgrade kernel", 466 + algs & ~Z_EROFS_ALL_COMPR_ALGS); 464 467 return -EOPNOTSUPP; 465 468 } 466 469 467 470 (void)erofs_init_metabuf(&buf, sb, false); 468 471 offset = EROFS_SUPER_OFFSET + sbi->sb_size; 469 - alg = 0; 470 - for (algs = sbi->available_compr_algs; algs; algs >>= 1, ++alg) { 472 + for_each_set_bit(alg, &algs, Z_EROFS_COMPRESSION_MAX) { 471 473 const struct z_erofs_decompressor *dec = z_erofs_decomp[alg]; 472 474 void *data; 473 - 474 - if (!(algs & 1)) 475 - continue; 476 475 477 476 data = erofs_read_metadata(sb, &buf, &offset, &size); 478 477 if (IS_ERR(data)) { ··· 475 484 break; 476 485 } 477 486 478 - if (alg < Z_EROFS_COMPRESSION_MAX && dec && dec->config) { 487 + if (dec && dec->config) { 479 488 ret = dec->config(sb, dsb, data, size); 480 489 } else { 481 - erofs_err(sb, "algorithm %d isn't enabled on this kernel", 490 + erofs_err(sb, "algorithm %ld isn't enabled on this kernel", 482 491 alg); 483 492 ret = -EOPNOTSUPP; 484 493 }

+1 -1

fs/erofs/decompressor_crypto.c

··· 62 62 struct crypto_acomp *tfm; 63 63 }; 64 64 65 - struct z_erofs_crypto_engine *z_erofs_crypto[Z_EROFS_COMPRESSION_MAX] = { 65 + static struct z_erofs_crypto_engine *z_erofs_crypto[Z_EROFS_COMPRESSION_MAX] = { 66 66 [Z_EROFS_COMPRESSION_LZ4] = (struct z_erofs_crypto_engine[]) { 67 67 {}, 68 68 },

-1

fs/erofs/decompressor_deflate.c

··· 89 89 inited = true; 90 90 } 91 91 mutex_unlock(&deflate_resize_mutex); 92 - erofs_info(sb, "EXPERIMENTAL DEFLATE feature in use. Use at your own risk!"); 93 92 return 0; 94 93 failed: 95 94 mutex_unlock(&deflate_resize_mutex);

+4 -3

fs/erofs/erofs_fs.h

··· 17 17 #define EROFS_FEATURE_COMPAT_XATTR_FILTER 0x00000004 18 18 #define EROFS_FEATURE_COMPAT_SHARED_EA_IN_METABOX 0x00000008 19 19 #define EROFS_FEATURE_COMPAT_PLAIN_XATTR_PFX 0x00000010 20 - 20 + #define EROFS_FEATURE_COMPAT_ISHARE_XATTRS 0x00000020 21 21 22 22 /* 23 23 * Any bits that aren't in EROFS_ALL_FEATURE_INCOMPAT should 24 24 * be incompatible with this kernel version. 25 25 */ 26 - #define EROFS_FEATURE_INCOMPAT_ZERO_PADDING 0x00000001 26 + #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x00000001 27 27 #define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x00000002 28 28 #define EROFS_FEATURE_INCOMPAT_BIG_PCLUSTER 0x00000002 29 29 #define EROFS_FEATURE_INCOMPAT_CHUNKED_FILE 0x00000004 ··· 83 83 __le32 xattr_prefix_start; /* start of long xattr prefixes */ 84 84 __le64 packed_nid; /* nid of the special packed inode */ 85 85 __u8 xattr_filter_reserved; /* reserved for xattr name filter */ 86 - __u8 reserved[3]; 86 + __u8 ishare_xattr_prefix_id; 87 + __u8 reserved[2]; 87 88 __le32 build_time; /* seconds added to epoch for mkfs time */ 88 89 __le64 rootnid_8b; /* (48BIT on) nid of root directory */ 89 90 __le64 reserved2;

+30 -22

fs/erofs/fileio.c

··· 10 10 struct bio bio; 11 11 struct kiocb iocb; 12 12 struct super_block *sb; 13 + refcount_t ref; 13 14 }; 14 15 15 16 struct erofs_fileio { ··· 25 24 container_of(iocb, struct erofs_fileio_rq, iocb); 26 25 struct folio_iter fi; 27 26 28 - if (ret > 0) { 29 - if (ret != rq->bio.bi_iter.bi_size) { 30 - bio_advance(&rq->bio, ret); 31 - zero_fill_bio(&rq->bio); 32 - } 33 - ret = 0; 27 + if (ret >= 0 && ret != rq->bio.bi_iter.bi_size) { 28 + bio_advance(&rq->bio, ret); 29 + zero_fill_bio(&rq->bio); 34 30 } 35 - if (rq->bio.bi_end_io) { 36 - if (ret < 0 && !rq->bio.bi_status) 37 - rq->bio.bi_status = errno_to_blk_status(ret); 38 - } else { 31 + if (!rq->bio.bi_end_io) { 39 32 bio_for_each_folio_all(fi, &rq->bio) { 40 33 DBG_BUGON(folio_test_uptodate(fi.folio)); 41 - erofs_onlinefolio_end(fi.folio, ret, false); 34 + erofs_onlinefolio_end(fi.folio, ret < 0, false); 42 35 } 36 + } else if (ret < 0 && !rq->bio.bi_status) { 37 + rq->bio.bi_status = errno_to_blk_status(ret); 43 38 } 44 39 bio_endio(&rq->bio); 45 40 bio_uninit(&rq->bio); 46 - kfree(rq); 41 + if (refcount_dec_and_test(&rq->ref)) 42 + kfree(rq); 47 43 } 48 44 49 45 static void erofs_fileio_rq_submit(struct erofs_fileio_rq *rq) 50 46 { 51 47 struct iov_iter iter; 52 - int ret; 48 + ssize_t ret; 53 49 54 50 if (!rq) 55 51 return; ··· 62 64 ret = vfs_iocb_iter_read(rq->iocb.ki_filp, &rq->iocb, &iter); 63 65 if (ret != -EIOCBQUEUED) 64 66 erofs_fileio_ki_complete(&rq->iocb, ret); 67 + if (refcount_dec_and_test(&rq->ref)) 68 + kfree(rq); 65 69 } 66 70 67 71 static struct erofs_fileio_rq *erofs_fileio_rq_alloc(struct erofs_map_dev *mdev) ··· 74 74 bio_init(&rq->bio, NULL, rq->bvecs, ARRAY_SIZE(rq->bvecs), REQ_OP_READ); 75 75 rq->iocb.ki_filp = mdev->m_dif->file; 76 76 rq->sb = mdev->m_sb; 77 + refcount_set(&rq->ref, 2); 77 78 return rq; 78 79 } 79 80 ··· 89 88 bio)); 90 89 } 91 90 92 - static int erofs_fileio_scan_folio(struct erofs_fileio *io, struct folio *folio) 91 + static int erofs_fileio_scan_folio(struct erofs_fileio *io, 92 + struct inode *inode, struct folio *folio) 93 93 { 94 - struct inode *inode = folio_inode(folio); 95 94 struct erofs_map_blocks *map = &io->map; 96 95 unsigned int cur = 0, end = folio_size(folio), len, attached = 0; 97 96 loff_t pos = folio_pos(folio), ofs; ··· 159 158 160 159 static int erofs_fileio_read_folio(struct file *file, struct folio *folio) 161 160 { 161 + bool need_iput; 162 + struct inode *realinode = erofs_real_inode(folio_inode(folio), &need_iput); 162 163 struct erofs_fileio io = {}; 163 164 int err; 164 165 165 - trace_erofs_read_folio(folio, true); 166 - err = erofs_fileio_scan_folio(&io, folio); 166 + trace_erofs_read_folio(realinode, folio, true); 167 + err = erofs_fileio_scan_folio(&io, realinode, folio); 167 168 erofs_fileio_rq_submit(io.rq); 169 + if (need_iput) 170 + iput(realinode); 168 171 return err; 169 172 } 170 173 171 174 static void erofs_fileio_readahead(struct readahead_control *rac) 172 175 { 173 - struct inode *inode = rac->mapping->host; 176 + bool need_iput; 177 + struct inode *realinode = erofs_real_inode(rac->mapping->host, &need_iput); 174 178 struct erofs_fileio io = {}; 175 179 struct folio *folio; 176 180 int err; 177 181 178 - trace_erofs_readahead(inode, readahead_index(rac), 182 + trace_erofs_readahead(realinode, readahead_index(rac), 179 183 readahead_count(rac), true); 180 184 while ((folio = readahead_folio(rac))) { 181 - err = erofs_fileio_scan_folio(&io, folio); 185 + err = erofs_fileio_scan_folio(&io, realinode, folio); 182 186 if (err && err != -EINTR) 183 - erofs_err(inode->i_sb, "readahead error at folio %lu @ nid %llu", 184 - folio->index, EROFS_I(inode)->nid); 187 + erofs_err(realinode->i_sb, "readahead error at folio %lu @ nid %llu", 188 + folio->index, EROFS_I(realinode)->nid); 185 189 } 186 190 erofs_fileio_rq_submit(io.rq); 191 + if (need_iput) 192 + iput(realinode); 187 193 } 188 194 189 195 const struct address_space_operations erofs_fileio_aops = {

+2 -15

fs/erofs/fscache.c

··· 3 3 * Copyright (C) 2022, Alibaba Cloud 4 4 * Copyright (C) 2022, Bytedance Inc. All rights reserved. 5 5 */ 6 - #include <linux/pseudo_fs.h> 7 6 #include <linux/fscache.h> 8 7 #include "internal.h" 9 8 ··· 11 12 static LIST_HEAD(erofs_domain_list); 12 13 static LIST_HEAD(erofs_domain_cookies_list); 13 14 static struct vfsmount *erofs_pseudo_mnt; 14 - 15 - static int erofs_anon_init_fs_context(struct fs_context *fc) 16 - { 17 - return init_pseudo(fc, EROFS_SUPER_MAGIC) ? 0 : -ENOMEM; 18 - } 19 - 20 - static struct file_system_type erofs_anon_fs_type = { 21 - .owner = THIS_MODULE, 22 - .name = "pseudo_erofs", 23 - .init_fs_context = erofs_anon_init_fs_context, 24 - .kill_sb = kill_anon_super, 25 - }; 26 15 27 16 struct erofs_fscache_io { 28 17 struct netfs_cache_resources cres; ··· 379 392 } 380 393 fscache_relinquish_volume(domain->volume, NULL, false); 381 394 mutex_unlock(&erofs_domain_list_lock); 382 - kfree(domain->domain_id); 395 + kfree_sensitive(domain->domain_id); 383 396 kfree(domain); 384 397 return; 385 398 } ··· 446 459 sbi->domain = domain; 447 460 return 0; 448 461 out: 449 - kfree(domain->domain_id); 462 + kfree_sensitive(domain->domain_id); 450 463 kfree(domain); 451 464 return err; 452 465 }

+38 -40

fs/erofs/inode.c

··· 8 8 #include <linux/compat.h> 9 9 #include <trace/events/erofs.h> 10 10 11 - static int erofs_fill_symlink(struct inode *inode, void *kaddr, 12 - unsigned int m_pofs) 11 + static int erofs_fill_symlink(struct inode *inode, void *bptr, unsigned int ofs) 13 12 { 14 13 struct erofs_inode *vi = EROFS_I(inode); 15 - loff_t off; 14 + char *link; 15 + loff_t end; 16 16 17 - m_pofs += vi->xattr_isize; 18 - /* check if it cannot be handled with fast symlink scheme */ 19 - if (vi->datalayout != EROFS_INODE_FLAT_INLINE || 20 - check_add_overflow(m_pofs, inode->i_size, &off) || 21 - off > i_blocksize(inode)) 22 - return 0; 23 - 24 - inode->i_link = kmemdup_nul(kaddr + m_pofs, inode->i_size, GFP_KERNEL); 25 - return inode->i_link ? 0 : -ENOMEM; 17 + ofs += vi->xattr_isize; 18 + /* check whether the symlink data is small enough to be inlined */ 19 + if (vi->datalayout == EROFS_INODE_FLAT_INLINE && 20 + !check_add_overflow(ofs, inode->i_size, &end) && 21 + end <= i_blocksize(inode)) { 22 + link = kmemdup_nul(bptr + ofs, inode->i_size, GFP_KERNEL); 23 + if (!link) 24 + return -ENOMEM; 25 + if (unlikely(!inode->i_size || strlen(link) != inode->i_size)) { 26 + erofs_err(inode->i_sb, "invalid fast symlink size %llu @ nid %llu", 27 + inode->i_size | 0ULL, vi->nid); 28 + kfree(link); 29 + return -EFSCORRUPTED; 30 + } 31 + inode_set_cached_link(inode, link, inode->i_size); 32 + } 33 + return 0; 26 34 } 27 35 28 36 static int erofs_read_inode(struct inode *inode) ··· 145 137 err = -EFSCORRUPTED; 146 138 goto err_out; 147 139 } 140 + 141 + if (IS_ENABLED(CONFIG_EROFS_FS_POSIX_ACL) && 142 + erofs_inode_has_noacl(inode, ptr, ofs)) 143 + cache_no_acl(inode); 144 + 148 145 switch (inode->i_mode & S_IFMT) { 149 146 case S_IFDIR: 150 147 vi->dot_omitted = (ifmt >> EROFS_I_DOT_OMITTED_BIT) & 1; ··· 183 170 goto err_out; 184 171 } 185 172 186 - if (erofs_inode_is_data_compressed(vi->datalayout)) 187 - inode->i_blocks = le32_to_cpu(copied.i_u.blocks_lo) << 188 - (sb->s_blocksize_bits - 9); 189 - else 173 + if (!erofs_inode_is_data_compressed(vi->datalayout)) { 190 174 inode->i_blocks = round_up(inode->i_size, sb->s_blocksize) >> 9; 175 + } else if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP) || !sbi->available_compr_algs) { 176 + erofs_err(sb, "compressed inode (nid %llu) is invalid in a plain filesystem", 177 + vi->nid); 178 + err = -EFSCORRUPTED; 179 + goto err_out; 180 + } else { 181 + inode->i_blocks = le32_to_cpu(copied.i_u.blocks_lo) << 182 + (sb->s_blocksize_bits - 9); 183 + } 191 184 192 185 if (vi->datalayout == EROFS_INODE_CHUNK_BASED) { 193 186 /* fill chunked inode summary info */ ··· 222 203 223 204 static int erofs_fill_inode(struct inode *inode) 224 205 { 225 - struct erofs_inode *vi = EROFS_I(inode); 226 206 int err; 227 207 228 208 trace_erofs_fill_inode(inode); ··· 232 214 switch (inode->i_mode & S_IFMT) { 233 215 case S_IFREG: 234 216 inode->i_op = &erofs_generic_iops; 235 - inode->i_fop = &erofs_file_fops; 217 + inode->i_fop = erofs_ishare_fill_inode(inode) ? 218 + &erofs_ishare_fops : &erofs_file_fops; 236 219 break; 237 220 case S_IFDIR: 238 221 inode->i_op = &erofs_dir_iops; ··· 254 235 } 255 236 256 237 mapping_set_large_folios(inode->i_mapping); 257 - if (erofs_inode_is_data_compressed(vi->datalayout)) { 258 - #ifdef CONFIG_EROFS_FS_ZIP 259 - DO_ONCE_LITE_IF(inode->i_blkbits != PAGE_SHIFT, 260 - erofs_info, inode->i_sb, 261 - "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!"); 262 - inode->i_mapping->a_ops = &z_erofs_aops; 263 - #else 264 - err = -EOPNOTSUPP; 265 - #endif 266 - } else { 267 - inode->i_mapping->a_ops = &erofs_aops; 268 - #ifdef CONFIG_EROFS_FS_ONDEMAND 269 - if (erofs_is_fscache_mode(inode->i_sb)) 270 - inode->i_mapping->a_ops = &erofs_fscache_access_aops; 271 - #endif 272 - #ifdef CONFIG_EROFS_FS_BACKED_BY_FILE 273 - if (erofs_is_fileio_mode(EROFS_SB(inode->i_sb))) 274 - inode->i_mapping->a_ops = &erofs_fileio_aops; 275 - #endif 276 - } 277 - 278 - return err; 238 + return erofs_inode_set_aops(inode, inode, false); 279 239 } 280 240 281 241 /*

+67 -7

fs/erofs/internal.h

··· 59 59 struct erofs_mount_opts { 60 60 /* current strategy of how to use managed cache */ 61 61 unsigned char cache_strategy; 62 - /* strategy of sync decompression (0 - auto, 1 - force on, 2 - force off) */ 63 - unsigned int sync_decompress; 64 - /* threshold for decompression synchronously */ 65 - unsigned int max_sync_decompress_pages; 66 62 unsigned int mount_opt; 67 63 }; 68 64 ··· 112 116 /* managed XArray arranged in physical block number */ 113 117 struct xarray managed_pslots; 114 118 119 + unsigned int sync_decompress; /* strategy for sync decompression */ 115 120 unsigned int shrinker_run_no; 116 - u16 available_compr_algs; 117 121 118 122 /* pseudo inode to manage cached pages */ 119 123 struct inode *managed_cache; ··· 130 134 u32 xattr_blkaddr; 131 135 u32 xattr_prefix_start; 132 136 u8 xattr_prefix_count; 137 + u8 ishare_xattr_prefix_id; 133 138 struct erofs_xattr_prefix_item *xattr_prefixes; 134 139 unsigned int xattr_filter_reserved; 135 140 #endif ··· 153 156 char *volume_name; 154 157 u32 feature_compat; 155 158 u32 feature_incompat; 159 + u16 available_compr_algs; 156 160 157 161 /* sysfs support */ 158 162 struct kobject s_kobj; /* /sys/fs/erofs/<devname> */ ··· 176 178 #define EROFS_MOUNT_DAX_ALWAYS 0x00000040 177 179 #define EROFS_MOUNT_DAX_NEVER 0x00000080 178 180 #define EROFS_MOUNT_DIRECT_IO 0x00000100 181 + #define EROFS_MOUNT_INODE_SHARE 0x00000200 179 182 180 183 #define clear_opt(opt, option) ((opt)->mount_opt &= ~EROFS_MOUNT_##option) 181 184 #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option) ··· 186 187 { 187 188 return IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && sbi->dif0.file; 188 189 } 190 + 191 + extern struct file_system_type erofs_anon_fs_type; 189 192 190 193 static inline bool erofs_is_fscache_mode(struct super_block *sb) 191 194 { ··· 221 220 return sbi->feature_##compat & EROFS_FEATURE_##feature; \ 222 221 } 223 222 224 - EROFS_FEATURE_FUNCS(zero_padding, incompat, INCOMPAT_ZERO_PADDING) 223 + EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING) 225 224 EROFS_FEATURE_FUNCS(compr_cfgs, incompat, INCOMPAT_COMPR_CFGS) 226 225 EROFS_FEATURE_FUNCS(big_pcluster, incompat, INCOMPAT_BIG_PCLUSTER) 227 226 EROFS_FEATURE_FUNCS(chunked_file, incompat, INCOMPAT_CHUNKED_FILE) ··· 237 236 EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER) 238 237 EROFS_FEATURE_FUNCS(shared_ea_in_metabox, compat, COMPAT_SHARED_EA_IN_METABOX) 239 238 EROFS_FEATURE_FUNCS(plain_xattr_pfx, compat, COMPAT_PLAIN_XATTR_PFX) 239 + EROFS_FEATURE_FUNCS(ishare_xattrs, compat, COMPAT_ISHARE_XATTRS) 240 240 241 241 static inline u64 erofs_nid_to_ino64(struct erofs_sb_info *sbi, erofs_nid_t nid) 242 242 { ··· 266 264 267 265 /* default readahead size of directories */ 268 266 #define EROFS_DIR_RA_BYTES 16384 267 + 268 + struct erofs_inode_fingerprint { 269 + u8 *opaque; 270 + int size; 271 + }; 269 272 270 273 struct erofs_inode { 271 274 erofs_nid_t nid; ··· 307 300 }; 308 301 #endif /* CONFIG_EROFS_FS_ZIP */ 309 302 }; 303 + #ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE 304 + struct list_head ishare_list; 305 + union { 306 + /* for each anon shared inode */ 307 + struct { 308 + struct erofs_inode_fingerprint fingerprint; 309 + spinlock_t ishare_lock; 310 + }; 311 + /* for each real inode */ 312 + struct inode *sharedinode; 313 + }; 314 + #endif 310 315 /* the corresponding vfs inode */ 311 316 struct inode vfs_inode; 312 317 }; ··· 425 406 426 407 extern const struct file_operations erofs_file_fops; 427 408 extern const struct file_operations erofs_dir_fops; 409 + extern const struct file_operations erofs_ishare_fops; 428 410 429 411 extern const struct iomap_ops z_erofs_iomap_report_ops; 430 412 ··· 471 451 return NULL; 472 452 } 473 453 454 + static inline int erofs_inode_set_aops(struct inode *inode, 455 + struct inode *realinode, bool no_fscache) 456 + { 457 + if (erofs_inode_is_data_compressed(EROFS_I(realinode)->datalayout)) { 458 + if (!IS_ENABLED(CONFIG_EROFS_FS_ZIP)) 459 + return -EOPNOTSUPP; 460 + DO_ONCE_LITE_IF(realinode->i_blkbits != PAGE_SHIFT, 461 + erofs_info, realinode->i_sb, 462 + "EXPERIMENTAL EROFS subpage compressed block support in use. Use at your own risk!"); 463 + inode->i_mapping->a_ops = &z_erofs_aops; 464 + return 0; 465 + } 466 + inode->i_mapping->a_ops = &erofs_aops; 467 + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !no_fscache && 468 + erofs_is_fscache_mode(realinode->i_sb)) 469 + inode->i_mapping->a_ops = &erofs_fscache_access_aops; 470 + if (IS_ENABLED(CONFIG_EROFS_FS_BACKED_BY_FILE) && 471 + erofs_is_fileio_mode(EROFS_SB(realinode->i_sb))) 472 + inode->i_mapping->a_ops = &erofs_fileio_aops; 473 + return 0; 474 + } 475 + 474 476 int erofs_register_sysfs(struct super_block *sb); 475 477 void erofs_unregister_sysfs(struct super_block *sb); 476 478 int __init erofs_init_sysfs(void); ··· 530 488 int z_erofs_gbuf_growsize(unsigned int nrpages); 531 489 int __init z_erofs_gbuf_init(void); 532 490 void z_erofs_gbuf_exit(void); 533 - int z_erofs_parse_cfgs(struct super_block *sb, struct erofs_super_block *dsb); 534 491 #else 535 492 static inline void erofs_shrinker_register(struct super_block *sb) {} 536 493 static inline void erofs_shrinker_unregister(struct super_block *sb) {} ··· 539 498 static inline void z_erofs_exit_subsystem(void) {} 540 499 static inline int z_erofs_init_super(struct super_block *sb) { return 0; } 541 500 #endif /* !CONFIG_EROFS_FS_ZIP */ 501 + int z_erofs_parse_cfgs(struct super_block *sb, struct erofs_super_block *dsb); 542 502 543 503 #ifdef CONFIG_EROFS_FS_BACKED_BY_FILE 544 504 struct bio *erofs_fileio_bio_alloc(struct erofs_map_dev *mdev); ··· 577 535 } 578 536 static inline struct bio *erofs_fscache_bio_alloc(struct erofs_map_dev *mdev) { return NULL; } 579 537 static inline void erofs_fscache_submit_bio(struct bio *bio) {} 538 + #endif 539 + 540 + #ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE 541 + int __init erofs_init_ishare(void); 542 + void erofs_exit_ishare(void); 543 + bool erofs_ishare_fill_inode(struct inode *inode); 544 + void erofs_ishare_free_inode(struct inode *inode); 545 + struct inode *erofs_real_inode(struct inode *inode, bool *need_iput); 546 + #else 547 + static inline int erofs_init_ishare(void) { return 0; } 548 + static inline void erofs_exit_ishare(void) {} 549 + static inline bool erofs_ishare_fill_inode(struct inode *inode) { return false; } 550 + static inline void erofs_ishare_free_inode(struct inode *inode) {} 551 + static inline struct inode *erofs_real_inode(struct inode *inode, bool *need_iput) 552 + { 553 + *need_iput = false; 554 + return inode; 555 + } 580 556 #endif 581 557 582 558 long erofs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);

+206

fs/erofs/ishare.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Copyright (C) 2024, Alibaba Cloud 4 + */ 5 + #include <linux/xxhash.h> 6 + #include <linux/mount.h> 7 + #include "internal.h" 8 + #include "xattr.h" 9 + 10 + #include "../internal.h" 11 + 12 + static struct vfsmount *erofs_ishare_mnt; 13 + 14 + static inline bool erofs_is_ishare_inode(struct inode *inode) 15 + { 16 + /* assumed FS_ONDEMAND is excluded with FS_PAGE_CACHE_SHARE feature */ 17 + return inode->i_sb->s_type == &erofs_anon_fs_type; 18 + } 19 + 20 + static int erofs_ishare_iget5_eq(struct inode *inode, void *data) 21 + { 22 + struct erofs_inode_fingerprint *fp1 = &EROFS_I(inode)->fingerprint; 23 + struct erofs_inode_fingerprint *fp2 = data; 24 + 25 + return fp1->size == fp2->size && 26 + !memcmp(fp1->opaque, fp2->opaque, fp2->size); 27 + } 28 + 29 + static int erofs_ishare_iget5_set(struct inode *inode, void *data) 30 + { 31 + struct erofs_inode *vi = EROFS_I(inode); 32 + 33 + vi->fingerprint = *(struct erofs_inode_fingerprint *)data; 34 + INIT_LIST_HEAD(&vi->ishare_list); 35 + spin_lock_init(&vi->ishare_lock); 36 + return 0; 37 + } 38 + 39 + bool erofs_ishare_fill_inode(struct inode *inode) 40 + { 41 + struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb); 42 + struct erofs_inode *vi = EROFS_I(inode); 43 + struct erofs_inode_fingerprint fp; 44 + struct inode *sharedinode; 45 + unsigned long hash; 46 + 47 + if (erofs_xattr_fill_inode_fingerprint(&fp, inode, sbi->domain_id)) 48 + return false; 49 + hash = xxh32(fp.opaque, fp.size, 0); 50 + sharedinode = iget5_locked(erofs_ishare_mnt->mnt_sb, hash, 51 + erofs_ishare_iget5_eq, erofs_ishare_iget5_set, 52 + &fp); 53 + if (!sharedinode) { 54 + kfree(fp.opaque); 55 + return false; 56 + } 57 + 58 + if (inode_state_read_once(sharedinode) & I_NEW) { 59 + if (erofs_inode_set_aops(sharedinode, inode, true)) { 60 + iget_failed(sharedinode); 61 + kfree(fp.opaque); 62 + return false; 63 + } 64 + sharedinode->i_size = vi->vfs_inode.i_size; 65 + unlock_new_inode(sharedinode); 66 + } else { 67 + kfree(fp.opaque); 68 + if (sharedinode->i_size != vi->vfs_inode.i_size) { 69 + _erofs_printk(inode->i_sb, KERN_WARNING 70 + "size(%lld:%lld) not matches for the same fingerprint\n", 71 + vi->vfs_inode.i_size, sharedinode->i_size); 72 + iput(sharedinode); 73 + return false; 74 + } 75 + } 76 + vi->sharedinode = sharedinode; 77 + INIT_LIST_HEAD(&vi->ishare_list); 78 + spin_lock(&EROFS_I(sharedinode)->ishare_lock); 79 + list_add(&vi->ishare_list, &EROFS_I(sharedinode)->ishare_list); 80 + spin_unlock(&EROFS_I(sharedinode)->ishare_lock); 81 + return true; 82 + } 83 + 84 + void erofs_ishare_free_inode(struct inode *inode) 85 + { 86 + struct erofs_inode *vi = EROFS_I(inode); 87 + struct inode *sharedinode = vi->sharedinode; 88 + 89 + if (!sharedinode) 90 + return; 91 + spin_lock(&EROFS_I(sharedinode)->ishare_lock); 92 + list_del(&vi->ishare_list); 93 + spin_unlock(&EROFS_I(sharedinode)->ishare_lock); 94 + iput(sharedinode); 95 + vi->sharedinode = NULL; 96 + } 97 + 98 + static int erofs_ishare_file_open(struct inode *inode, struct file *file) 99 + { 100 + struct inode *sharedinode = EROFS_I(inode)->sharedinode; 101 + struct file *realfile; 102 + 103 + if (file->f_flags & O_DIRECT) 104 + return -EINVAL; 105 + realfile = alloc_empty_backing_file(O_RDONLY|O_NOATIME, current_cred()); 106 + if (IS_ERR(realfile)) 107 + return PTR_ERR(realfile); 108 + ihold(sharedinode); 109 + realfile->f_op = &erofs_file_fops; 110 + realfile->f_inode = sharedinode; 111 + realfile->f_mapping = sharedinode->i_mapping; 112 + path_get(&file->f_path); 113 + backing_file_set_user_path(realfile, &file->f_path); 114 + 115 + file_ra_state_init(&realfile->f_ra, file->f_mapping); 116 + realfile->private_data = EROFS_I(inode); 117 + file->private_data = realfile; 118 + return 0; 119 + } 120 + 121 + static int erofs_ishare_file_release(struct inode *inode, struct file *file) 122 + { 123 + struct file *realfile = file->private_data; 124 + 125 + iput(realfile->f_inode); 126 + fput(realfile); 127 + file->private_data = NULL; 128 + return 0; 129 + } 130 + 131 + static ssize_t erofs_ishare_file_read_iter(struct kiocb *iocb, 132 + struct iov_iter *to) 133 + { 134 + struct file *realfile = iocb->ki_filp->private_data; 135 + struct kiocb dedup_iocb; 136 + ssize_t nread; 137 + 138 + if (!iov_iter_count(to)) 139 + return 0; 140 + kiocb_clone(&dedup_iocb, iocb, realfile); 141 + nread = filemap_read(&dedup_iocb, to, 0); 142 + iocb->ki_pos = dedup_iocb.ki_pos; 143 + return nread; 144 + } 145 + 146 + static int erofs_ishare_mmap(struct file *file, struct vm_area_struct *vma) 147 + { 148 + struct file *realfile = file->private_data; 149 + 150 + vma_set_file(vma, realfile); 151 + return generic_file_readonly_mmap(file, vma); 152 + } 153 + 154 + static int erofs_ishare_fadvise(struct file *file, loff_t offset, 155 + loff_t len, int advice) 156 + { 157 + return vfs_fadvise(file->private_data, offset, len, advice); 158 + } 159 + 160 + const struct file_operations erofs_ishare_fops = { 161 + .open = erofs_ishare_file_open, 162 + .llseek = generic_file_llseek, 163 + .read_iter = erofs_ishare_file_read_iter, 164 + .mmap = erofs_ishare_mmap, 165 + .release = erofs_ishare_file_release, 166 + .get_unmapped_area = thp_get_unmapped_area, 167 + .splice_read = filemap_splice_read, 168 + .fadvise = erofs_ishare_fadvise, 169 + }; 170 + 171 + struct inode *erofs_real_inode(struct inode *inode, bool *need_iput) 172 + { 173 + struct erofs_inode *vi, *vi_share; 174 + struct inode *realinode; 175 + 176 + *need_iput = false; 177 + if (!erofs_is_ishare_inode(inode)) 178 + return inode; 179 + 180 + vi_share = EROFS_I(inode); 181 + spin_lock(&vi_share->ishare_lock); 182 + /* fetch any one as real inode */ 183 + DBG_BUGON(list_empty(&vi_share->ishare_list)); 184 + list_for_each_entry(vi, &vi_share->ishare_list, ishare_list) { 185 + realinode = igrab(&vi->vfs_inode); 186 + if (realinode) { 187 + *need_iput = true; 188 + break; 189 + } 190 + } 191 + spin_unlock(&vi_share->ishare_lock); 192 + 193 + DBG_BUGON(!realinode); 194 + return realinode; 195 + } 196 + 197 + int __init erofs_init_ishare(void) 198 + { 199 + erofs_ishare_mnt = kern_mount(&erofs_anon_fs_type); 200 + return PTR_ERR_OR_ZERO(erofs_ishare_mnt); 201 + } 202 + 203 + void erofs_exit_ishare(void) 204 + { 205 + kern_unmount(erofs_ishare_mnt); 206 + }

+106 -36

fs/erofs/super.c

··· 11 11 #include <linux/fs_parser.h> 12 12 #include <linux/exportfs.h> 13 13 #include <linux/backing-dev.h> 14 + #include <linux/pseudo_fs.h> 14 15 #include "xattr.h" 15 16 16 17 #define CREATE_TRACE_POINTS ··· 121 120 } 122 121 return buffer; 123 122 } 124 - 125 - #ifndef CONFIG_EROFS_FS_ZIP 126 - static int z_erofs_parse_cfgs(struct super_block *sb, 127 - struct erofs_super_block *dsb) 128 - { 129 - if (!dsb->u1.available_compr_algs) 130 - return 0; 131 - 132 - erofs_err(sb, "compression disabled, unable to mount compressed EROFS"); 133 - return -EOPNOTSUPP; 134 - } 135 - #endif 136 123 137 124 static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb, 138 125 struct erofs_device_info *dif, erofs_off_t *pos) ··· 308 319 sbi->xattr_prefix_start = le32_to_cpu(dsb->xattr_prefix_start); 309 320 sbi->xattr_prefix_count = dsb->xattr_prefix_count; 310 321 sbi->xattr_filter_reserved = dsb->xattr_filter_reserved; 322 + if (erofs_sb_has_ishare_xattrs(sbi)) { 323 + if (dsb->ishare_xattr_prefix_id >= sbi->xattr_prefix_count) { 324 + erofs_err(sb, "invalid ishare xattr prefix id %u", 325 + dsb->ishare_xattr_prefix_id); 326 + ret = -EFSCORRUPTED; 327 + goto out; 328 + } 329 + sbi->ishare_xattr_prefix_id = dsb->ishare_xattr_prefix_id; 330 + } 311 331 #endif 312 332 sbi->islotbits = ilog2(sizeof(struct erofs_inode_compact)); 313 333 if (erofs_sb_has_48bit(sbi) && dsb->rootnid_8b) { ··· 328 330 } 329 331 sbi->packed_nid = le64_to_cpu(dsb->packed_nid); 330 332 if (erofs_sb_has_metabox(sbi)) { 333 + ret = -EFSCORRUPTED; 331 334 if (sbi->sb_size <= offsetof(struct erofs_super_block, 332 335 metabox_nid)) 333 - return -EFSCORRUPTED; 336 + goto out; 334 337 sbi->metabox_nid = le64_to_cpu(dsb->metabox_nid); 335 338 if (sbi->metabox_nid & BIT_ULL(EROFS_DIRENT_NID_METABOX_BIT)) 336 - return -EFSCORRUPTED; /* self-loop detection */ 339 + goto out; /* self-loop detection */ 337 340 } 338 341 sbi->inos = le64_to_cpu(dsb->inos); 339 342 ··· 345 346 if (dsb->volume_name[0]) { 346 347 sbi->volume_name = kstrndup(dsb->volume_name, 347 348 sizeof(dsb->volume_name), GFP_KERNEL); 348 - if (!sbi->volume_name) 349 - return -ENOMEM; 349 + if (!sbi->volume_name) { 350 + ret = -ENOMEM; 351 + goto out; 352 + } 350 353 } 351 354 352 - /* parse on-disk compression configurations */ 353 - ret = z_erofs_parse_cfgs(sb, dsb); 354 - if (ret < 0) 355 + if (IS_ENABLED(CONFIG_EROFS_FS_ZIP)) { 356 + ret = z_erofs_parse_cfgs(sb, dsb); 357 + if (ret < 0) 358 + goto out; 359 + } else if (dsb->u1.available_compr_algs || 360 + erofs_sb_has_lz4_0padding(sbi)) { 361 + erofs_err(sb, "compression disabled, unable to mount compressed EROFS"); 362 + ret = -EOPNOTSUPP; 355 363 goto out; 364 + } 356 365 357 366 ret = erofs_scan_devices(sb, dsb); 358 367 ··· 379 372 { 380 373 #ifdef CONFIG_EROFS_FS_ZIP 381 374 sbi->opt.cache_strategy = EROFS_ZIP_CACHE_READAROUND; 382 - sbi->opt.max_sync_decompress_pages = 3; 383 - sbi->opt.sync_decompress = EROFS_SYNC_DECOMPRESS_AUTO; 375 + sbi->sync_decompress = EROFS_SYNC_DECOMPRESS_AUTO; 384 376 #endif 385 - #ifdef CONFIG_EROFS_FS_XATTR 386 - set_opt(&sbi->opt, XATTR_USER); 387 - #endif 388 - #ifdef CONFIG_EROFS_FS_POSIX_ACL 389 - set_opt(&sbi->opt, POSIX_ACL); 390 - #endif 377 + if (IS_ENABLED(CONFIG_EROFS_FS_XATTR)) 378 + set_opt(&sbi->opt, XATTR_USER); 379 + if (IS_ENABLED(CONFIG_EROFS_FS_POSIX_ACL)) 380 + set_opt(&sbi->opt, POSIX_ACL); 391 381 } 392 382 393 383 enum { 394 384 Opt_user_xattr, Opt_acl, Opt_cache_strategy, Opt_dax, Opt_dax_enum, 395 385 Opt_device, Opt_fsid, Opt_domain_id, Opt_directio, Opt_fsoffset, 386 + Opt_inode_share, 396 387 }; 397 388 398 389 static const struct constant_table erofs_param_cache_strategy[] = { ··· 418 413 fsparam_string("domain_id", Opt_domain_id), 419 414 fsparam_flag_no("directio", Opt_directio), 420 415 fsparam_u64("fsoffset", Opt_fsoffset), 416 + fsparam_flag("inode_share", Opt_inode_share), 421 417 {} 422 418 }; 423 419 ··· 520 514 if (!sbi->fsid) 521 515 return -ENOMEM; 522 516 break; 517 + #endif 518 + #if defined(CONFIG_EROFS_FS_ONDEMAND) || defined(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) 523 519 case Opt_domain_id: 524 - kfree(sbi->domain_id); 525 - sbi->domain_id = kstrdup(param->string, GFP_KERNEL); 526 - if (!sbi->domain_id) 527 - return -ENOMEM; 520 + kfree_sensitive(sbi->domain_id); 521 + sbi->domain_id = no_free_ptr(param->string); 528 522 break; 529 523 #else 530 524 case Opt_fsid: ··· 544 538 break; 545 539 case Opt_fsoffset: 546 540 sbi->dif0.fsoff = result.uint_64; 541 + break; 542 + case Opt_inode_share: 543 + #ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE 544 + set_opt(&sbi->opt, INODE_SHARE); 545 + #else 546 + errorfc(fc, "%s option not supported", erofs_fs_parameters[opt].name); 547 + #endif 547 548 break; 548 549 } 549 550 return 0; ··· 627 614 { 628 615 struct erofs_sb_info *sbi = EROFS_SB(sb); 629 616 630 - if (sbi->domain_id) 617 + if (sbi->domain_id && sbi->fsid) 631 618 super_set_sysfs_name_generic(sb, "%s,%s", sbi->domain_id, 632 619 sbi->fsid); 633 620 else if (sbi->fsid) ··· 649 636 sb->s_flags |= SB_RDONLY | SB_NOATIME; 650 637 sb->s_maxbytes = MAX_LFS_FILESIZE; 651 638 sb->s_op = &erofs_sops; 639 + 640 + if (!sbi->domain_id && test_opt(&sbi->opt, INODE_SHARE)) { 641 + errorfc(fc, "domain_id is needed when inode_ishare is on"); 642 + return -EINVAL; 643 + } 644 + if (test_opt(&sbi->opt, DAX_ALWAYS) && test_opt(&sbi->opt, INODE_SHARE)) { 645 + errorfc(fc, "FSDAX is not allowed when inode_ishare is on"); 646 + return -EINVAL; 647 + } 652 648 653 649 sbi->blkszbits = PAGE_SHIFT; 654 650 if (!sb->s_bdev) { ··· 736 714 erofs_info(sb, "unsupported blocksize for DAX"); 737 715 clear_opt(&sbi->opt, DAX_ALWAYS); 738 716 } 717 + if (test_opt(&sbi->opt, INODE_SHARE) && !erofs_sb_has_ishare_xattrs(sbi)) { 718 + erofs_info(sb, "on-disk ishare xattrs not found. Turning off inode_share."); 719 + clear_opt(&sbi->opt, INODE_SHARE); 720 + } 721 + if (test_opt(&sbi->opt, INODE_SHARE)) 722 + erofs_info(sb, "EXPERIMENTAL EROFS page cache share support in use. Use at your own risk!"); 739 723 740 724 sb->s_time_gran = 1; 741 725 sb->s_xattr = erofs_xattr_handlers; ··· 877 849 { 878 850 erofs_free_dev_context(sbi->devs); 879 851 kfree(sbi->fsid); 880 - kfree(sbi->domain_id); 852 + kfree_sensitive(sbi->domain_id); 881 853 if (sbi->dif0.file) 882 854 fput(sbi->dif0.file); 883 855 kfree(sbi->volume_name); ··· 971 943 }; 972 944 MODULE_ALIAS_FS("erofs"); 973 945 946 + #if defined(CONFIG_EROFS_FS_ONDEMAND) || defined(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) 947 + static void erofs_free_anon_inode(struct inode *inode) 948 + { 949 + struct erofs_inode *vi = EROFS_I(inode); 950 + 951 + #ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE 952 + kfree(vi->fingerprint.opaque); 953 + #endif 954 + kmem_cache_free(erofs_inode_cachep, vi); 955 + } 956 + 957 + static const struct super_operations erofs_anon_sops = { 958 + .alloc_inode = erofs_alloc_inode, 959 + .drop_inode = inode_just_drop, 960 + .free_inode = erofs_free_anon_inode, 961 + }; 962 + 963 + static int erofs_anon_init_fs_context(struct fs_context *fc) 964 + { 965 + struct pseudo_fs_context *ctx; 966 + 967 + ctx = init_pseudo(fc, EROFS_SUPER_MAGIC); 968 + if (!ctx) 969 + return -ENOMEM; 970 + ctx->ops = &erofs_anon_sops; 971 + return 0; 972 + } 973 + 974 + struct file_system_type erofs_anon_fs_type = { 975 + .name = "pseudo_erofs", 976 + .init_fs_context = erofs_anon_init_fs_context, 977 + .kill_sb = kill_anon_super, 978 + }; 979 + #endif 980 + 974 981 static int __init erofs_module_init(void) 975 982 { 976 983 int err; ··· 1031 968 if (err) 1032 969 goto sysfs_err; 1033 970 971 + err = erofs_init_ishare(); 972 + if (err) 973 + goto ishare_err; 974 + 1034 975 err = register_filesystem(&erofs_fs_type); 1035 976 if (err) 1036 977 goto fs_err; ··· 1042 975 return 0; 1043 976 1044 977 fs_err: 978 + erofs_exit_ishare(); 979 + ishare_err: 1045 980 erofs_exit_sysfs(); 1046 981 sysfs_err: 1047 982 z_erofs_exit_subsystem(); ··· 1061 992 /* Ensure all RCU free inodes / pclusters are safe to be destroyed. */ 1062 993 rcu_barrier(); 1063 994 995 + erofs_exit_ishare(); 1064 996 erofs_exit_sysfs(); 1065 997 z_erofs_exit_subsystem(); 1066 998 erofs_exit_shrinker(); ··· 1116 1046 #endif 1117 1047 if (sbi->dif0.fsoff) 1118 1048 seq_printf(seq, ",fsoffset=%llu", sbi->dif0.fsoff); 1049 + if (test_opt(opt, INODE_SHARE)) 1050 + seq_puts(seq, ",inode_share"); 1119 1051 return 0; 1120 1052 } 1121 1053 1122 1054 static void erofs_evict_inode(struct inode *inode) 1123 1055 { 1124 - #ifdef CONFIG_FS_DAX 1125 1056 if (IS_DAX(inode)) 1126 1057 dax_break_layout_final(inode); 1127 - #endif 1128 - 1058 + erofs_ishare_free_inode(inode); 1129 1059 truncate_inode_pages_final(&inode->i_data); 1130 1060 clear_inode(inode); 1131 1061 }

+3 -6

fs/erofs/sysfs.c

··· 59 59 #define ATTR_LIST(name) (&erofs_attr_##name.attr) 60 60 61 61 #ifdef CONFIG_EROFS_FS_ZIP 62 - EROFS_ATTR_RW_UI(sync_decompress, erofs_mount_opts); 62 + EROFS_ATTR_RW_UI(sync_decompress, erofs_sb_info); 63 63 EROFS_ATTR_FUNC(drop_caches, 0200); 64 64 #endif 65 65 #ifdef CONFIG_EROFS_FS_ZIP_ACCEL ··· 86 86 ATTRIBUTE_GROUPS(erofs); 87 87 88 88 /* Features this copy of erofs supports */ 89 - EROFS_ATTR_FEATURE(zero_padding); 90 89 EROFS_ATTR_FEATURE(compr_cfgs); 91 90 EROFS_ATTR_FEATURE(big_pcluster); 92 91 EROFS_ATTR_FEATURE(chunked_file); ··· 99 100 EROFS_ATTR_FEATURE(metabox); 100 101 101 102 static struct attribute *erofs_feat_attrs[] = { 102 - ATTR_LIST(zero_padding), 103 103 ATTR_LIST(compr_cfgs), 104 104 ATTR_LIST(big_pcluster), 105 105 ATTR_LIST(chunked_file), ··· 168 170 return ret; 169 171 if (t != (unsigned int)t) 170 172 return -ERANGE; 171 - #ifdef CONFIG_EROFS_FS_ZIP 172 - if (!strcmp(a->attr.name, "sync_decompress") && 173 + if (IS_ENABLED(CONFIG_EROFS_FS_ZIP) && 174 + !strcmp(a->attr.name, "sync_decompress") && 173 175 (t > EROFS_SYNC_DECOMPRESS_FORCE_OFF)) 174 176 return -EINVAL; 175 - #endif 176 177 *(unsigned int *)ptr = t; 177 178 return len; 178 179 case attr_pointer_bool:

+172 -94

fs/erofs/xattr.c

··· 25 25 struct dentry *dentry; 26 26 }; 27 27 28 + static const char *erofs_xattr_prefix(unsigned int idx, struct dentry *dentry); 29 + 28 30 static int erofs_init_inode_xattrs(struct inode *inode) 29 31 { 30 - struct erofs_inode *const vi = EROFS_I(inode); 31 - struct erofs_xattr_iter it; 32 - unsigned int i; 33 - struct erofs_xattr_ibody_header *ih; 32 + struct erofs_buf buf = __EROFS_BUF_INITIALIZER; 33 + struct erofs_inode *vi = EROFS_I(inode); 34 34 struct super_block *sb = inode->i_sb; 35 + const struct erofs_xattr_ibody_header *ih; 36 + __le32 *xattr_id; 37 + erofs_off_t pos; 38 + unsigned int i; 35 39 int ret = 0; 40 + 41 + if (!vi->xattr_isize) 42 + return -ENODATA; 36 43 37 44 /* the most case is that xattrs of this inode are initialized. */ 38 45 if (test_bit(EROFS_I_EA_INITED_BIT, &vi->flags)) { ··· 50 43 smp_mb(); 51 44 return 0; 52 45 } 53 - 54 46 if (wait_on_bit_lock(&vi->flags, EROFS_I_BL_XATTR_BIT, TASK_KILLABLE)) 55 47 return -ERESTARTSYS; 56 48 ··· 66 60 * undefined right now (maybe use later with some new sb feature). 67 61 */ 68 62 if (vi->xattr_isize == sizeof(struct erofs_xattr_ibody_header)) { 69 - erofs_err(sb, 70 - "xattr_isize %d of nid %llu is not supported yet", 63 + erofs_err(sb, "xattr_isize %d of nid %llu is not supported yet", 71 64 vi->xattr_isize, vi->nid); 72 65 ret = -EOPNOTSUPP; 73 66 goto out_unlock; 74 67 } else if (vi->xattr_isize < sizeof(struct erofs_xattr_ibody_header)) { 75 - if (vi->xattr_isize) { 76 - erofs_err(sb, "bogus xattr ibody @ nid %llu", vi->nid); 77 - DBG_BUGON(1); 78 - ret = -EFSCORRUPTED; 79 - goto out_unlock; /* xattr ondisk layout error */ 80 - } 81 - ret = -ENODATA; 68 + erofs_err(sb, "bogus xattr ibody @ nid %llu", vi->nid); 69 + DBG_BUGON(1); 70 + ret = -EFSCORRUPTED; 82 71 goto out_unlock; 83 72 } 84 73 85 - it.buf = __EROFS_BUF_INITIALIZER; 86 - ret = erofs_init_metabuf(&it.buf, sb, erofs_inode_in_metabox(inode)); 87 - if (ret) 88 - goto out_unlock; 89 - it.pos = erofs_iloc(inode) + vi->inode_isize; 90 - 91 - /* read in shared xattr array (non-atomic, see kmalloc below) */ 92 - it.kaddr = erofs_bread(&it.buf, it.pos, true); 93 - if (IS_ERR(it.kaddr)) { 94 - ret = PTR_ERR(it.kaddr); 74 + pos = erofs_iloc(inode) + vi->inode_isize; 75 + ih = erofs_read_metabuf(&buf, sb, pos, erofs_inode_in_metabox(inode)); 76 + if (IS_ERR(ih)) { 77 + ret = PTR_ERR(ih); 95 78 goto out_unlock; 96 79 } 97 - 98 - ih = it.kaddr; 99 80 vi->xattr_name_filter = le32_to_cpu(ih->h_name_filter); 100 81 vi->xattr_shared_count = ih->h_shared_count; 101 82 vi->xattr_shared_xattrs = kmalloc_array(vi->xattr_shared_count, 102 83 sizeof(uint), GFP_KERNEL); 103 84 if (!vi->xattr_shared_xattrs) { 104 - erofs_put_metabuf(&it.buf); 85 + erofs_put_metabuf(&buf); 105 86 ret = -ENOMEM; 106 87 goto out_unlock; 107 88 } 108 89 109 - /* let's skip ibody header */ 110 - it.pos += sizeof(struct erofs_xattr_ibody_header); 111 - 90 + /* skip the ibody header and read the shared xattr array */ 91 + pos += sizeof(struct erofs_xattr_ibody_header); 112 92 for (i = 0; i < vi->xattr_shared_count; ++i) { 113 - it.kaddr = erofs_bread(&it.buf, it.pos, true); 114 - if (IS_ERR(it.kaddr)) { 93 + xattr_id = erofs_bread(&buf, pos + i * sizeof(__le32), true); 94 + if (IS_ERR(xattr_id)) { 115 95 kfree(vi->xattr_shared_xattrs); 116 96 vi->xattr_shared_xattrs = NULL; 117 - ret = PTR_ERR(it.kaddr); 97 + ret = PTR_ERR(xattr_id); 118 98 goto out_unlock; 119 99 } 120 - vi->xattr_shared_xattrs[i] = le32_to_cpu(*(__le32 *)it.kaddr); 121 - it.pos += sizeof(__le32); 100 + vi->xattr_shared_xattrs[i] = le32_to_cpu(*xattr_id); 122 101 } 123 - erofs_put_metabuf(&it.buf); 102 + erofs_put_metabuf(&buf); 124 103 125 104 /* paired with smp_mb() at the beginning of the function. */ 126 105 smp_mb(); 127 106 set_bit(EROFS_I_EA_INITED_BIT, &vi->flags); 128 - 129 107 out_unlock: 130 108 clear_and_wake_up_bit(EROFS_I_BL_XATTR_BIT, &vi->flags); 131 109 return ret; 132 110 } 133 - 134 - static bool erofs_xattr_user_list(struct dentry *dentry) 135 - { 136 - return test_opt(&EROFS_SB(dentry->d_sb)->opt, XATTR_USER); 137 - } 138 - 139 - static bool erofs_xattr_trusted_list(struct dentry *dentry) 140 - { 141 - return capable(CAP_SYS_ADMIN); 142 - } 143 - 144 - static int erofs_xattr_generic_get(const struct xattr_handler *handler, 145 - struct dentry *unused, struct inode *inode, 146 - const char *name, void *buffer, size_t size) 147 - { 148 - if (handler->flags == EROFS_XATTR_INDEX_USER && 149 - !test_opt(&EROFS_I_SB(inode)->opt, XATTR_USER)) 150 - return -EOPNOTSUPP; 151 - 152 - return erofs_getxattr(inode, handler->flags, name, buffer, size); 153 - } 154 - 155 - const struct xattr_handler erofs_xattr_user_handler = { 156 - .prefix = XATTR_USER_PREFIX, 157 - .flags = EROFS_XATTR_INDEX_USER, 158 - .list = erofs_xattr_user_list, 159 - .get = erofs_xattr_generic_get, 160 - }; 161 - 162 - const struct xattr_handler erofs_xattr_trusted_handler = { 163 - .prefix = XATTR_TRUSTED_PREFIX, 164 - .flags = EROFS_XATTR_INDEX_TRUSTED, 165 - .list = erofs_xattr_trusted_list, 166 - .get = erofs_xattr_generic_get, 167 - }; 168 - 169 - #ifdef CONFIG_EROFS_FS_SECURITY 170 - const struct xattr_handler __maybe_unused erofs_xattr_security_handler = { 171 - .prefix = XATTR_SECURITY_PREFIX, 172 - .flags = EROFS_XATTR_INDEX_SECURITY, 173 - .get = erofs_xattr_generic_get, 174 - }; 175 - #endif 176 - 177 - const struct xattr_handler * const erofs_xattr_handlers[] = { 178 - &erofs_xattr_user_handler, 179 - &erofs_xattr_trusted_handler, 180 - #ifdef CONFIG_EROFS_FS_SECURITY 181 - &erofs_xattr_security_handler, 182 - #endif 183 - NULL, 184 - }; 185 111 186 112 static int erofs_xattr_copy_to_buffer(struct erofs_xattr_iter *it, 187 113 unsigned int len) 188 114 { 189 115 unsigned int slice, processed; 190 116 struct super_block *sb = it->sb; 191 - void *src; 192 117 193 118 for (processed = 0; processed < len; processed += slice) { 194 119 it->kaddr = erofs_bread(&it->buf, it->pos, true); 195 120 if (IS_ERR(it->kaddr)) 196 121 return PTR_ERR(it->kaddr); 197 122 198 - src = it->kaddr; 199 123 slice = min_t(unsigned int, sb->s_blocksize - 200 124 erofs_blkoff(sb, it->pos), len - processed); 201 - memcpy(it->buffer + it->buffer_ofs, src, slice); 125 + memcpy(it->buffer + it->buffer_ofs, it->kaddr, slice); 202 126 it->buffer_ofs += slice; 203 127 it->pos += slice; 204 128 } ··· 327 391 return i ? ret : -ENODATA; 328 392 } 329 393 330 - int erofs_getxattr(struct inode *inode, int index, const char *name, 331 - void *buffer, size_t buffer_size) 394 + static int erofs_getxattr(struct inode *inode, int index, const char *name, 395 + void *buffer, size_t buffer_size) 332 396 { 333 397 int ret; 334 398 unsigned int hashbit; ··· 398 462 return ret ? ret : it.buffer_ofs; 399 463 } 400 464 465 + static bool erofs_xattr_user_list(struct dentry *dentry) 466 + { 467 + return test_opt(&EROFS_SB(dentry->d_sb)->opt, XATTR_USER); 468 + } 469 + 470 + static bool erofs_xattr_trusted_list(struct dentry *dentry) 471 + { 472 + return capable(CAP_SYS_ADMIN); 473 + } 474 + 475 + static int erofs_xattr_generic_get(const struct xattr_handler *handler, 476 + struct dentry *unused, struct inode *inode, 477 + const char *name, void *buffer, size_t size) 478 + { 479 + if (handler->flags == EROFS_XATTR_INDEX_USER && 480 + !test_opt(&EROFS_I_SB(inode)->opt, XATTR_USER)) 481 + return -EOPNOTSUPP; 482 + 483 + return erofs_getxattr(inode, handler->flags, name, buffer, size); 484 + } 485 + 486 + static const struct xattr_handler erofs_xattr_user_handler = { 487 + .prefix = XATTR_USER_PREFIX, 488 + .flags = EROFS_XATTR_INDEX_USER, 489 + .list = erofs_xattr_user_list, 490 + .get = erofs_xattr_generic_get, 491 + }; 492 + 493 + static const struct xattr_handler erofs_xattr_trusted_handler = { 494 + .prefix = XATTR_TRUSTED_PREFIX, 495 + .flags = EROFS_XATTR_INDEX_TRUSTED, 496 + .list = erofs_xattr_trusted_list, 497 + .get = erofs_xattr_generic_get, 498 + }; 499 + 500 + #ifdef CONFIG_EROFS_FS_SECURITY 501 + static const struct xattr_handler erofs_xattr_security_handler = { 502 + .prefix = XATTR_SECURITY_PREFIX, 503 + .flags = EROFS_XATTR_INDEX_SECURITY, 504 + .get = erofs_xattr_generic_get, 505 + }; 506 + #endif 507 + 508 + const struct xattr_handler * const erofs_xattr_handlers[] = { 509 + &erofs_xattr_user_handler, 510 + &erofs_xattr_trusted_handler, 511 + #ifdef CONFIG_EROFS_FS_SECURITY 512 + &erofs_xattr_security_handler, 513 + #endif 514 + NULL, 515 + }; 516 + 517 + static const char *erofs_xattr_prefix(unsigned int idx, struct dentry *dentry) 518 + { 519 + static const struct xattr_handler * const xattr_handler_map[] = { 520 + [EROFS_XATTR_INDEX_USER] = &erofs_xattr_user_handler, 521 + #ifdef CONFIG_EROFS_FS_POSIX_ACL 522 + [EROFS_XATTR_INDEX_POSIX_ACL_ACCESS] = &nop_posix_acl_access, 523 + [EROFS_XATTR_INDEX_POSIX_ACL_DEFAULT] = &nop_posix_acl_default, 524 + #endif 525 + [EROFS_XATTR_INDEX_TRUSTED] = &erofs_xattr_trusted_handler, 526 + #ifdef CONFIG_EROFS_FS_SECURITY 527 + [EROFS_XATTR_INDEX_SECURITY] = &erofs_xattr_security_handler, 528 + #endif 529 + }; 530 + const struct xattr_handler *handler = NULL; 531 + 532 + if (idx && idx < ARRAY_SIZE(xattr_handler_map)) { 533 + handler = xattr_handler_map[idx]; 534 + if (xattr_handler_can_list(handler, dentry)) 535 + return xattr_prefix(handler); 536 + } 537 + return NULL; 538 + } 539 + 401 540 void erofs_xattr_prefixes_cleanup(struct super_block *sb) 402 541 { 403 542 struct erofs_sb_info *sbi = EROFS_SB(sb); ··· 530 519 } 531 520 532 521 erofs_put_metabuf(&buf); 522 + if (!ret && erofs_sb_has_ishare_xattrs(sbi)) { 523 + struct erofs_xattr_prefix_item *pf = pfs + sbi->ishare_xattr_prefix_id; 524 + struct erofs_xattr_long_prefix *newpfx; 525 + 526 + newpfx = krealloc(pf->prefix, 527 + sizeof(*newpfx) + pf->infix_len + 1, GFP_KERNEL); 528 + if (newpfx) { 529 + newpfx->infix[pf->infix_len] = '\0'; 530 + pf->prefix = newpfx; 531 + } else { 532 + ret = -ENOMEM; 533 + } 534 + } 533 535 sbi->xattr_prefixes = pfs; 534 536 if (ret) 535 537 erofs_xattr_prefixes_cleanup(sb); ··· 586 562 acl = posix_acl_from_xattr(&init_user_ns, value, rc); 587 563 kfree(value); 588 564 return acl; 565 + } 566 + 567 + bool erofs_inode_has_noacl(struct inode *inode, void *kaddr, unsigned int ofs) 568 + { 569 + static const unsigned int bitmask = 570 + BIT(21) | /* system.posix_acl_default */ 571 + BIT(30); /* system.posix_acl_access */ 572 + struct erofs_sb_info *sbi = EROFS_I_SB(inode); 573 + const struct erofs_xattr_ibody_header *ih = kaddr + ofs; 574 + 575 + if (EROFS_I(inode)->xattr_isize < sizeof(*ih)) 576 + return true; 577 + 578 + if (erofs_sb_has_xattr_filter(sbi) && !sbi->xattr_filter_reserved && 579 + !check_add_overflow(ofs, sizeof(*ih), &ofs) && 580 + ofs <= i_blocksize(inode)) { 581 + if ((le32_to_cpu(ih->h_name_filter) & bitmask) == bitmask) 582 + return true; 583 + } 584 + return false; 585 + } 586 + #endif 587 + 588 + #ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE 589 + int erofs_xattr_fill_inode_fingerprint(struct erofs_inode_fingerprint *fp, 590 + struct inode *inode, const char *domain_id) 591 + { 592 + struct erofs_sb_info *sbi = EROFS_SB(inode->i_sb); 593 + struct erofs_xattr_prefix_item *prefix; 594 + const char *infix; 595 + int valuelen, base_index; 596 + 597 + if (!test_opt(&sbi->opt, INODE_SHARE)) 598 + return -EOPNOTSUPP; 599 + if (!sbi->xattr_prefixes) 600 + return -EINVAL; 601 + prefix = sbi->xattr_prefixes + sbi->ishare_xattr_prefix_id; 602 + infix = prefix->prefix->infix; 603 + base_index = prefix->prefix->base_index; 604 + valuelen = erofs_getxattr(inode, base_index, infix, NULL, 0); 605 + if (valuelen <= 0 || valuelen > (1 << sbi->blkszbits)) 606 + return -EFSCORRUPTED; 607 + fp->size = valuelen + (domain_id ? strlen(domain_id) : 0); 608 + fp->opaque = kmalloc(fp->size, GFP_KERNEL); 609 + if (!fp->opaque) 610 + return -ENOMEM; 611 + if (valuelen != erofs_getxattr(inode, base_index, infix, 612 + fp->opaque, valuelen)) { 613 + kfree(fp->opaque); 614 + fp->opaque = NULL; 615 + return -EFSCORRUPTED; 616 + } 617 + memcpy(fp->opaque + valuelen, domain_id, fp->size - valuelen); 618 + return 0; 589 619 } 590 620 #endif

+3 -37

fs/erofs/xattr.h

··· 11 11 #include <linux/xattr.h> 12 12 13 13 #ifdef CONFIG_EROFS_FS_XATTR 14 - extern const struct xattr_handler erofs_xattr_user_handler; 15 - extern const struct xattr_handler erofs_xattr_trusted_handler; 16 - extern const struct xattr_handler erofs_xattr_security_handler; 17 - 18 - static inline const char *erofs_xattr_prefix(unsigned int idx, 19 - struct dentry *dentry) 20 - { 21 - const struct xattr_handler *handler = NULL; 22 - 23 - static const struct xattr_handler * const xattr_handler_map[] = { 24 - [EROFS_XATTR_INDEX_USER] = &erofs_xattr_user_handler, 25 - #ifdef CONFIG_EROFS_FS_POSIX_ACL 26 - [EROFS_XATTR_INDEX_POSIX_ACL_ACCESS] = &nop_posix_acl_access, 27 - [EROFS_XATTR_INDEX_POSIX_ACL_DEFAULT] = &nop_posix_acl_default, 28 - #endif 29 - [EROFS_XATTR_INDEX_TRUSTED] = &erofs_xattr_trusted_handler, 30 - #ifdef CONFIG_EROFS_FS_SECURITY 31 - [EROFS_XATTR_INDEX_SECURITY] = &erofs_xattr_security_handler, 32 - #endif 33 - }; 34 - 35 - if (idx && idx < ARRAY_SIZE(xattr_handler_map)) 36 - handler = xattr_handler_map[idx]; 37 - 38 - if (!xattr_handler_can_list(handler, dentry)) 39 - return NULL; 40 - 41 - return xattr_prefix(handler); 42 - } 43 - 44 14 extern const struct xattr_handler * const erofs_xattr_handlers[]; 45 15 46 16 int erofs_xattr_prefixes_init(struct super_block *sb); 47 17 void erofs_xattr_prefixes_cleanup(struct super_block *sb); 48 - int erofs_getxattr(struct inode *, int, const char *, void *, size_t); 49 18 ssize_t erofs_listxattr(struct dentry *, char *, size_t); 50 19 #else 51 20 static inline int erofs_xattr_prefixes_init(struct super_block *sb) { return 0; } 52 21 static inline void erofs_xattr_prefixes_cleanup(struct super_block *sb) {} 53 - static inline int erofs_getxattr(struct inode *inode, int index, 54 - const char *name, void *buffer, 55 - size_t buffer_size) 56 - { 57 - return -EOPNOTSUPP; 58 - } 59 22 60 23 #define erofs_listxattr (NULL) 61 24 #define erofs_xattr_handlers (NULL) ··· 30 67 #define erofs_get_acl (NULL) 31 68 #endif 32 69 70 + int erofs_xattr_fill_inode_fingerprint(struct erofs_inode_fingerprint *fp, 71 + struct inode *inode, const char *domain_id); 72 + bool erofs_inode_has_noacl(struct inode *inode, void *kaddr, unsigned int ofs); 33 73 #endif

+57 -53

fs/erofs/zdata.c

··· 9 9 #include <linux/cpuhotplug.h> 10 10 #include <trace/events/erofs.h> 11 11 12 + #define Z_EROFS_MAX_SYNC_DECOMPRESS_BYTES 12288 12 13 #define Z_EROFS_PCLUSTER_MAX_PAGES (Z_EROFS_PCLUSTER_MAX_SIZE / PAGE_SIZE) 13 14 #define Z_EROFS_INLINE_BVECS 2 14 15 ··· 494 493 }; 495 494 496 495 struct z_erofs_frontend { 497 - struct inode *const inode; 496 + struct inode *inode, *sharedinode; 498 497 struct erofs_map_blocks map; 499 498 struct z_erofs_bvec_iter biter; 500 499 ··· 509 508 unsigned int icur; 510 509 }; 511 510 512 - #define Z_EROFS_DEFINE_FRONTEND(fe, i, ho) struct z_erofs_frontend fe = { \ 513 - .inode = i, .head = Z_EROFS_PCLUSTER_TAIL, \ 511 + #define Z_EROFS_DEFINE_FRONTEND(fe, i, si, ho) struct z_erofs_frontend fe = { \ 512 + .inode = i, .sharedinode = si, .head = Z_EROFS_PCLUSTER_TAIL, \ 514 513 .mode = Z_EROFS_PCLUSTER_FOLLOWED, .headoffset = ho } 515 514 516 515 static bool z_erofs_should_alloc_cache(struct z_erofs_frontend *fe) ··· 806 805 struct erofs_map_blocks *map = &fe->map; 807 806 struct super_block *sb = fe->inode->i_sb; 808 807 struct z_erofs_pcluster *pcl = NULL; 809 - void *ptr; 808 + void *ptr = NULL; 810 809 int ret; 811 810 812 811 DBG_BUGON(fe->pcl); 813 812 /* must be Z_EROFS_PCLUSTER_TAIL or pointed to previous pcluster */ 814 813 DBG_BUGON(!fe->head); 815 814 816 - if (!(map->m_flags & EROFS_MAP_META)) { 815 + if (map->m_flags & EROFS_MAP_META) { 816 + ret = erofs_init_metabuf(&map->buf, sb, 817 + erofs_inode_in_metabox(fe->inode)); 818 + if (ret) 819 + return ret; 820 + ptr = erofs_bread(&map->buf, map->m_pa, false); 821 + if (IS_ERR(ptr)) { 822 + erofs_err(sb, "failed to read inline data %pe @ pa %llu of nid %llu", 823 + ptr, map->m_pa, EROFS_I(fe->inode)->nid); 824 + return PTR_ERR(ptr); 825 + } 826 + ptr = map->buf.page; 827 + } else { 817 828 while (1) { 818 829 rcu_read_lock(); 819 830 pcl = xa_load(&EROFS_SB(sb)->managed_pslots, map->m_pa); ··· 865 852 /* bind cache first when cached decompression is preferred */ 866 853 z_erofs_bind_cache(fe); 867 854 } else { 868 - ret = erofs_init_metabuf(&map->buf, sb, 869 - erofs_inode_in_metabox(fe->inode)); 870 - if (ret) 871 - return ret; 872 - ptr = erofs_bread(&map->buf, map->m_pa, false); 873 - if (IS_ERR(ptr)) { 874 - ret = PTR_ERR(ptr); 875 - erofs_err(sb, "failed to get inline folio %d", ret); 876 - return ret; 877 - } 878 - folio_get(page_folio(map->buf.page)); 879 - WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, map->buf.page); 855 + folio_get(page_folio((struct page *)ptr)); 856 + WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, ptr); 880 857 fe->pcl->pageofs_in = map->m_pa & ~PAGE_MASK; 881 858 fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE; 882 859 } ··· 1098 1095 return err; 1099 1096 } 1100 1097 1101 - static bool z_erofs_is_sync_decompress(struct erofs_sb_info *sbi, 1102 - unsigned int readahead_pages) 1103 - { 1104 - /* auto: enable for read_folio, disable for readahead */ 1105 - if ((sbi->opt.sync_decompress == EROFS_SYNC_DECOMPRESS_AUTO) && 1106 - !readahead_pages) 1107 - return true; 1108 - 1109 - if ((sbi->opt.sync_decompress == EROFS_SYNC_DECOMPRESS_FORCE_ON) && 1110 - (readahead_pages <= sbi->opt.max_sync_decompress_pages)) 1111 - return true; 1112 - 1113 - return false; 1114 - } 1115 - 1116 1098 static bool z_erofs_page_is_invalidated(struct page *page) 1117 1099 { 1118 1100 return !page_folio(page)->mapping && !z_erofs_is_shortlived_page(page); ··· 1312 1324 GFP_NOWAIT | __GFP_NORETRY 1313 1325 }, be->pagepool); 1314 1326 if (IS_ERR(reason)) { 1315 - erofs_err(be->sb, "failed to decompress (%s) %ld @ pa %llu size %u => %u", 1316 - alg->name, PTR_ERR(reason), pcl->pos, 1317 - pcl->pclustersize, pcl->length); 1327 + if (pcl->besteffort || reason != ERR_PTR(-ENOMEM)) 1328 + erofs_err(be->sb, "failed to decompress (%s) %pe @ pa %llu size %u => %u", 1329 + alg->name, reason, pcl->pos, 1330 + pcl->pclustersize, pcl->length); 1318 1331 err = PTR_ERR(reason); 1319 1332 } else if (unlikely(reason)) { 1320 1333 erofs_err(be->sb, "failed to decompress (%s) %s @ pa %llu size %u => %u", ··· 1472 1483 #else 1473 1484 queue_work(z_erofs_workqueue, &io->u.work); 1474 1485 #endif 1475 - /* enable sync decompression for readahead */ 1476 - if (sbi->opt.sync_decompress == EROFS_SYNC_DECOMPRESS_AUTO) 1477 - sbi->opt.sync_decompress = EROFS_SYNC_DECOMPRESS_FORCE_ON; 1486 + /* See `sync_decompress` in sysfs-fs-erofs for more details */ 1487 + if (sbi->sync_decompress == EROFS_SYNC_DECOMPRESS_AUTO) 1488 + sbi->sync_decompress = EROFS_SYNC_DECOMPRESS_FORCE_ON; 1478 1489 return; 1479 1490 } 1480 1491 z_erofs_decompressqueue_work(&io->u.work); ··· 1791 1802 z_erofs_decompress_kickoff(q[JQ_SUBMIT], nr_bios); 1792 1803 } 1793 1804 1794 - static int z_erofs_runqueue(struct z_erofs_frontend *f, unsigned int rapages) 1805 + static int z_erofs_runqueue(struct z_erofs_frontend *f, unsigned int rabytes) 1795 1806 { 1796 1807 struct z_erofs_decompressqueue io[NR_JOBQUEUES]; 1797 1808 struct erofs_sb_info *sbi = EROFS_I_SB(f->inode); 1798 - bool force_fg = z_erofs_is_sync_decompress(sbi, rapages); 1809 + int syncmode = sbi->sync_decompress; 1810 + bool force_fg; 1799 1811 int err; 1812 + 1813 + force_fg = (syncmode == EROFS_SYNC_DECOMPRESS_AUTO && !rabytes) || 1814 + (syncmode == EROFS_SYNC_DECOMPRESS_FORCE_ON && 1815 + (rabytes <= Z_EROFS_MAX_SYNC_DECOMPRESS_BYTES)); 1800 1816 1801 1817 if (f->head == Z_EROFS_PCLUSTER_TAIL) 1802 1818 return 0; 1803 - z_erofs_submit_queue(f, io, &force_fg, !!rapages); 1819 + z_erofs_submit_queue(f, io, &force_fg, !!rabytes); 1804 1820 1805 1821 /* handle bypass queue (no i/o pclusters) immediately */ 1806 1822 err = z_erofs_decompress_queue(&io[JQ_BYPASS], &f->pagepool); ··· 1860 1866 pgoff_t index = cur >> PAGE_SHIFT; 1861 1867 struct folio *folio; 1862 1868 1863 - folio = erofs_grab_folio_nowait(inode->i_mapping, index); 1869 + folio = erofs_grab_folio_nowait(f->sharedinode->i_mapping, index); 1864 1870 if (!IS_ERR_OR_NULL(folio)) { 1865 1871 if (folio_test_uptodate(folio)) 1866 1872 folio_unlock(folio); ··· 1877 1883 1878 1884 static int z_erofs_read_folio(struct file *file, struct folio *folio) 1879 1885 { 1880 - struct inode *const inode = folio->mapping->host; 1881 - Z_EROFS_DEFINE_FRONTEND(f, inode, folio_pos(folio)); 1886 + struct inode *sharedinode = folio->mapping->host; 1887 + bool need_iput; 1888 + struct inode *realinode = erofs_real_inode(sharedinode, &need_iput); 1889 + Z_EROFS_DEFINE_FRONTEND(f, realinode, sharedinode, folio_pos(folio)); 1882 1890 int err; 1883 1891 1884 - trace_erofs_read_folio(folio, false); 1892 + trace_erofs_read_folio(realinode, folio, false); 1885 1893 z_erofs_pcluster_readmore(&f, NULL, true); 1886 1894 err = z_erofs_scan_folio(&f, folio, false); 1887 1895 z_erofs_pcluster_readmore(&f, NULL, false); ··· 1892 1896 /* if some pclusters are ready, need submit them anyway */ 1893 1897 err = z_erofs_runqueue(&f, 0) ?: err; 1894 1898 if (err && err != -EINTR) 1895 - erofs_err(inode->i_sb, "read error %d @ %lu of nid %llu", 1896 - err, folio->index, EROFS_I(inode)->nid); 1899 + erofs_err(realinode->i_sb, "read error %d @ %lu of nid %llu", 1900 + err, folio->index, EROFS_I(realinode)->nid); 1897 1901 1898 1902 erofs_put_metabuf(&f.map.buf); 1899 1903 erofs_release_pages(&f.pagepool); 1904 + 1905 + if (need_iput) 1906 + iput(realinode); 1900 1907 return err; 1901 1908 } 1902 1909 1903 1910 static void z_erofs_readahead(struct readahead_control *rac) 1904 1911 { 1905 - struct inode *const inode = rac->mapping->host; 1906 - Z_EROFS_DEFINE_FRONTEND(f, inode, readahead_pos(rac)); 1912 + struct inode *sharedinode = rac->mapping->host; 1913 + bool need_iput; 1914 + struct inode *realinode = erofs_real_inode(sharedinode, &need_iput); 1915 + Z_EROFS_DEFINE_FRONTEND(f, realinode, sharedinode, readahead_pos(rac)); 1907 1916 unsigned int nrpages = readahead_count(rac); 1908 1917 struct folio *head = NULL, *folio; 1909 1918 int err; 1910 1919 1911 - trace_erofs_readahead(inode, readahead_index(rac), nrpages, false); 1920 + trace_erofs_readahead(realinode, readahead_index(rac), nrpages, false); 1912 1921 z_erofs_pcluster_readmore(&f, rac, true); 1913 1922 while ((folio = readahead_folio(rac))) { 1914 1923 folio->private = head; ··· 1927 1926 1928 1927 err = z_erofs_scan_folio(&f, folio, true); 1929 1928 if (err && err != -EINTR) 1930 - erofs_err(inode->i_sb, "readahead error at folio %lu @ nid %llu", 1931 - folio->index, EROFS_I(inode)->nid); 1929 + erofs_err(realinode->i_sb, "readahead error at folio %lu @ nid %llu", 1930 + folio->index, EROFS_I(realinode)->nid); 1932 1931 } 1933 1932 z_erofs_pcluster_readmore(&f, rac, false); 1934 1933 z_erofs_pcluster_end(&f); 1935 1934 1936 - (void)z_erofs_runqueue(&f, nrpages); 1935 + (void)z_erofs_runqueue(&f, nrpages << PAGE_SHIFT); 1937 1936 erofs_put_metabuf(&f.map.buf); 1938 1937 erofs_release_pages(&f.pagepool); 1938 + 1939 + if (need_iput) 1940 + iput(realinode); 1939 1941 } 1940 1942 1941 1943 const struct address_space_operations z_erofs_aops = {

+1

fs/file_table.c

··· 308 308 ff->file.f_mode |= FMODE_BACKING | FMODE_NOACCOUNT; 309 309 return &ff->file; 310 310 } 311 + EXPORT_SYMBOL_GPL(alloc_empty_backing_file); 311 312 312 313 /** 313 314 * file_init_path - initialize a 'struct file' based on path

+5 -5

include/trace/events/erofs.h

··· 82 82 83 83 TRACE_EVENT(erofs_read_folio, 84 84 85 - TP_PROTO(struct folio *folio, bool raw), 85 + TP_PROTO(struct inode *inode, struct folio *folio, bool raw), 86 86 87 - TP_ARGS(folio, raw), 87 + TP_ARGS(inode, folio, raw), 88 88 89 89 TP_STRUCT__entry( 90 90 __field(dev_t, dev ) ··· 96 96 ), 97 97 98 98 TP_fast_assign( 99 - __entry->dev = folio->mapping->host->i_sb->s_dev; 100 - __entry->nid = EROFS_I(folio->mapping->host)->nid; 101 - __entry->dir = S_ISDIR(folio->mapping->host->i_mode); 99 + __entry->dev = inode->i_sb->s_dev; 100 + __entry->nid = EROFS_I(inode)->nid; 101 + __entry->dir = S_ISDIR(inode->i_mode); 102 102 __entry->index = folio->index; 103 103 __entry->uptodate = folio_test_uptodate(folio); 104 104 __entry->raw = raw;

Configure Feed

Configure Feed