Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added new features such as zone capacity for ZNS
and a new GC policy, ATGC, along with in-memory segment management. In
addition, we could improve the decompression speed significantly by
changing virtual mapping method. Even though we've fixed lots of small
bugs in compression support, I feel that it becomes more stable so
that I could give it a try in production.

Enhancements:
- suport zone capacity in NVMe Zoned Namespace devices
- introduce in-memory current segment management
- add standart casefolding support
- support age threshold based garbage collection
- improve decompression speed by changing virtual mapping method

Bug fixes:
- fix condition checks in some ioctl() such as compression, move_range, etc
- fix 32/64bits support in data structures
- fix memory allocation in zstd decompress
- add some boundary checks to avoid kernel panic on corrupted image
- fix disallowing compression for non-empty file
- fix slab leakage of compressed block writes

In addition, it includes code refactoring for better readability and
minor bug fixes for compression and zoned device support"

* tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
f2fs: code cleanup by removing unnecessary check
f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info
f2fs: fix writecount false positive in releasing compress blocks
f2fs: introduce check_swap_activate_fast()
f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case
f2fs: handle errors of f2fs_get_meta_page_nofail
f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode
f2fs: reject CASEFOLD inode flag without casefold feature
f2fs: fix memory alignment to support 32bit
f2fs: fix slab leak of rpages pointer
f2fs: compress: fix to disallow enabling compress on non-empty file
f2fs: compress: introduce cic/dic slab cache
f2fs: compress: introduce page array slab cache
f2fs: fix to do sanity check on segment/section count
f2fs: fix to check segment boundary during SIT page readahead
f2fs: fix uninit-value in f2fs_lookup
f2fs: remove unneeded parameter in find_in_block()
f2fs: fix wrong total_sections check and fsmeta check
f2fs: remove duplicated code in sanity_check_area_boundary
f2fs: remove unused check on version_bitmap
...

+1797 -495
+2 -1
Documentation/ABI/testing/sysfs-fs-f2fs
··· 22 22 Description: Controls the victim selection policy for garbage collection. 23 23 Setting gc_idle = 0(default) will disable this option. Setting 24 24 gc_idle = 1 will select the Cost Benefit approach & setting 25 - gc_idle = 2 will select the greedy approach. 25 + gc_idle = 2 will select the greedy approach & setting 26 + gc_idle = 3 will select the age-threshold based approach. 26 27 27 28 What: /sys/fs/f2fs/<disk>/reclaim_segments 28 29 Date: October 2013
+67 -15
Documentation/filesystems/f2fs.rst
··· 127 127 current design, f2fs supports only 2, 4, and 6 logs. 128 128 Default number is 6. 129 129 disable_ext_identify Disable the extension list configured by mkfs, so f2fs 130 - does not aware of cold files such as media files. 130 + is not aware of cold files such as media files. 131 131 inline_xattr Enable the inline xattrs feature. 132 132 noinline_xattr Disable the inline xattrs feature. 133 133 inline_xattr_size=%u Support configuring inline xattr size, it depends on 134 134 flexible inline xattr feature. 135 - inline_data Enable the inline data feature: New created small(<~3.4k) 135 + inline_data Enable the inline data feature: Newly created small (<~3.4k) 136 136 files can be written into inode block. 137 - inline_dentry Enable the inline dir feature: data in new created 137 + inline_dentry Enable the inline dir feature: data in newly created 138 138 directory entries can be written into inode block. The 139 139 space of inode block which is used to store inline 140 140 dentries is limited to ~3.4k. ··· 203 203 grpjquota=<file> information can be properly updated during recovery flow, 204 204 prjjquota=<file> <quota file>: must be in root directory; 205 205 jqfmt=<quota type> <quota type>: [vfsold,vfsv0,vfsv1]. 206 - offusrjquota Turn off user journelled quota. 207 - offgrpjquota Turn off group journelled quota. 208 - offprjjquota Turn off project journelled quota. 206 + offusrjquota Turn off user journalled quota. 207 + offgrpjquota Turn off group journalled quota. 208 + offprjjquota Turn off project journalled quota. 209 209 quota Enable plain user disk quota accounting. 210 210 noquota Disable all plain disk quota option. 211 211 whint_mode=%s Control which write hints are passed down to block ··· 266 266 inline encryption hardware. The on-disk format is 267 267 unaffected. For more details, see 268 268 Documentation/block/inline-encryption.rst. 269 + atgc Enable age-threshold garbage collection, it provides high 270 + effectiveness and efficiency on background GC. 269 271 ======================== ============================================================ 270 272 271 273 Debugfs Entries ··· 303 301 304 302 # insmod f2fs.ko 305 303 306 - 3. Create a directory trying to mount:: 304 + 3. Create a directory to use when mounting:: 307 305 308 306 # mkdir /mnt/f2fs 309 307 ··· 317 315 The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem, 318 316 which builds a basic on-disk layout. 319 317 320 - The options consist of: 318 + The quick options consist of: 321 319 322 320 =============== =========================================================== 323 321 ``-l [label]`` Give a volume label, up to 512 unicode name. ··· 339 337 1 is set by default, which conducts discard. 340 338 =============== =========================================================== 341 339 340 + Note: please refer to the manpage of mkfs.f2fs(8) to get full option list. 341 + 342 342 fsck.f2fs 343 343 --------- 344 344 The fsck.f2fs is a tool to check the consistency of an f2fs-formatted ··· 348 344 are cross-referenced correctly or not. 349 345 Note that, initial version of the tool does not fix any inconsistency. 350 346 351 - The options consist of:: 347 + The quick options consist of:: 352 348 353 349 -d debug level [default:0] 350 + 351 + Note: please refer to the manpage of fsck.f2fs(8) to get full option list. 354 352 355 353 dump.f2fs 356 354 --------- ··· 377 371 # dump.f2fs -s 0~-1 /dev/sdx (SIT dump) 378 372 # dump.f2fs -a 0~-1 /dev/sdx (SSA dump) 379 373 374 + Note: please refer to the manpage of dump.f2fs(8) to get full option list. 375 + 376 + sload.f2fs 377 + ---------- 378 + The sload.f2fs gives a way to insert files and directories in the exisiting disk 379 + image. This tool is useful when building f2fs images given compiled files. 380 + 381 + Note: please refer to the manpage of sload.f2fs(8) to get full option list. 382 + 383 + resize.f2fs 384 + ----------- 385 + The resize.f2fs lets a user resize the f2fs-formatted disk image, while preserving 386 + all the files and directories stored in the image. 387 + 388 + Note: please refer to the manpage of resize.f2fs(8) to get full option list. 389 + 390 + defrag.f2fs 391 + ----------- 392 + The defrag.f2fs can be used to defragment scattered written data as well as 393 + filesystem metadata across the disk. This can improve the write speed by giving 394 + more free consecutive space. 395 + 396 + Note: please refer to the manpage of defrag.f2fs(8) to get full option list. 397 + 398 + f2fs_io 399 + ------- 400 + The f2fs_io is a simple tool to issue various filesystem APIs as well as 401 + f2fs-specific ones, which is very useful for QA tests. 402 + 403 + Note: please refer to the manpage of f2fs_io(8) to get full option list. 404 + 380 405 Design 381 406 ====== 382 407 ··· 420 383 segment size identically, but users can easily modify the sizes by mkfs. 421 384 422 385 F2FS splits the entire volume into six areas, and all the areas except superblock 423 - consists of multiple segments as described below:: 386 + consist of multiple segments as described below:: 424 387 425 388 align with the zone size <-| 426 389 |-> align with the segment size ··· 523 486 `- direct node (1018) 524 487 `- data (1018) 525 488 526 - Note that, all the node blocks are mapped by NAT which means the location of 489 + Note that all the node blocks are mapped by NAT which means the location of 527 490 each node is translated by the NAT table. In the consideration of the wandering 528 491 tree problem, F2FS is able to cut off the propagation of node updates caused by 529 492 leaf data writes. ··· 603 566 name is calculated. Then, F2FS scans the hash table in level #0 to find the 604 567 dentry consisting of the file name and its inode number. If not found, F2FS 605 568 scans the next hash table in level #1. In this way, F2FS scans hash tables in 606 - each levels incrementally from 1 to N. In each levels F2FS needs to scan only 569 + each levels incrementally from 1 to N. In each level F2FS needs to scan only 607 570 one bucket determined by the following equation, which shows O(log(# of files)) 608 571 complexity:: 609 572 ··· 744 707 Fallocate(2) Policy 745 708 ------------------- 746 709 747 - The default policy follows the below posix rule. 710 + The default policy follows the below POSIX rule. 748 711 749 712 Allocating disk space 750 713 The default operation (i.e., mode is zero) of fallocate() allocates ··· 757 720 as a method of optimally implementing that function. 758 721 759 722 However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to 760 - fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having 723 + fallocate(fd, DEFAULT_MODE), it allocates on-disk block addressess having 761 724 zero or random data, which is useful to the below scenario where: 762 725 763 726 1. create(fd) ··· 776 739 cluster can be compressed or not. 777 740 778 741 - In cluster metadata layout, one special block address is used to indicate 779 - cluster is compressed one or normal one, for compressed cluster, following 742 + a cluster is a compressed one or normal one; for compressed cluster, following 780 743 metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs 781 744 stores data including compress header and compressed data. 782 745 ··· 809 772 +-------------+-------------+----------+----------------------------+ 810 773 | data length | data chksum | reserved | compressed data | 811 774 +-------------+-------------+----------+----------------------------+ 775 + 776 + NVMe Zoned Namespace devices 777 + ---------------------------- 778 + 779 + - ZNS defines a per-zone capacity which can be equal or less than the 780 + zone-size. Zone-capacity is the number of usable blocks in the zone. 781 + F2FS checks if zone-capacity is less than zone-size, if it is, then any 782 + segment which starts after the zone-capacity is marked as not-free in 783 + the free segment bitmap at initial mount time. These segments are marked 784 + as permanently used so they are not allocated for writes and 785 + consequently are not needed to be garbage collected. In case the 786 + zone-capacity is not aligned to default segment size(2MB), then a segment 787 + can start before the zone-capacity and span across zone-capacity boundary. 788 + Such spanning segments are also considered as usable segments. All blocks 789 + past the zone-capacity are considered unusable in these segments.
+3 -3
fs/f2fs/acl.c
··· 160 160 return (void *)f2fs_acl; 161 161 162 162 fail: 163 - kvfree(f2fs_acl); 163 + kfree(f2fs_acl); 164 164 return ERR_PTR(-EINVAL); 165 165 } 166 166 ··· 190 190 acl = NULL; 191 191 else 192 192 acl = ERR_PTR(retval); 193 - kvfree(value); 193 + kfree(value); 194 194 195 195 return acl; 196 196 } ··· 240 240 241 241 error = f2fs_setxattr(inode, name_index, "", value, size, ipage, 0); 242 242 243 - kvfree(value); 243 + kfree(value); 244 244 if (!error) 245 245 set_cached_acl(inode, type, acl); 246 246
+14 -3
fs/f2fs/checkpoint.c
··· 107 107 return __get_meta_page(sbi, index, true); 108 108 } 109 109 110 - struct page *f2fs_get_meta_page_nofail(struct f2fs_sb_info *sbi, pgoff_t index) 110 + struct page *f2fs_get_meta_page_retry(struct f2fs_sb_info *sbi, pgoff_t index) 111 111 { 112 112 struct page *page; 113 113 int count = 0; ··· 243 243 blkno * NAT_ENTRY_PER_BLOCK); 244 244 break; 245 245 case META_SIT: 246 + if (unlikely(blkno >= TOTAL_SEGS(sbi))) 247 + goto out; 246 248 /* get sit block addr */ 247 249 fio.new_blkaddr = current_sit_addr(sbi, 248 250 blkno * SIT_ENTRY_PER_BLOCK); ··· 1049 1047 get_pages(sbi, is_dir ? 1050 1048 F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA)); 1051 1049 retry: 1052 - if (unlikely(f2fs_cp_error(sbi))) 1050 + if (unlikely(f2fs_cp_error(sbi))) { 1051 + trace_f2fs_sync_dirty_inodes_exit(sbi->sb, is_dir, 1052 + get_pages(sbi, is_dir ? 1053 + F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA)); 1053 1054 return -EIO; 1055 + } 1054 1056 1055 1057 spin_lock(&sbi->inode_lock[type]); 1056 1058 ··· 1625 1619 1626 1620 f2fs_flush_sit_entries(sbi, cpc); 1627 1621 1622 + /* save inmem log status */ 1623 + f2fs_save_inmem_curseg(sbi); 1624 + 1628 1625 err = do_checkpoint(sbi, cpc); 1629 1626 if (err) 1630 1627 f2fs_release_discard_addrs(sbi); 1631 1628 else 1632 1629 f2fs_clear_prefree_segments(sbi, cpc); 1630 + 1631 + f2fs_restore_inmem_curseg(sbi); 1633 1632 stop: 1634 1633 unblock_operations(sbi); 1635 1634 stat_inc_cp_count(sbi->stat_info); ··· 1665 1654 } 1666 1655 1667 1656 sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS - 1668 - NR_CURSEG_TYPE - __cp_payload(sbi)) * 1657 + NR_CURSEG_PERSIST_TYPE - __cp_payload(sbi)) * 1669 1658 F2FS_ORPHANS_PER_BLOCK; 1670 1659 } 1671 1660
+182 -60
fs/f2fs/compress.c
··· 17 17 #include "node.h" 18 18 #include <trace/events/f2fs.h> 19 19 20 + static struct kmem_cache *cic_entry_slab; 21 + static struct kmem_cache *dic_entry_slab; 22 + 23 + static void *page_array_alloc(struct inode *inode, int nr) 24 + { 25 + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 26 + unsigned int size = sizeof(struct page *) * nr; 27 + 28 + if (likely(size <= sbi->page_array_slab_size)) 29 + return kmem_cache_zalloc(sbi->page_array_slab, GFP_NOFS); 30 + return f2fs_kzalloc(sbi, size, GFP_NOFS); 31 + } 32 + 33 + static void page_array_free(struct inode *inode, void *pages, int nr) 34 + { 35 + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); 36 + unsigned int size = sizeof(struct page *) * nr; 37 + 38 + if (!pages) 39 + return; 40 + 41 + if (likely(size <= sbi->page_array_slab_size)) 42 + kmem_cache_free(sbi->page_array_slab, pages); 43 + else 44 + kfree(pages); 45 + } 46 + 20 47 struct f2fs_compress_ops { 21 48 int (*init_compress_ctx)(struct compress_ctx *cc); 22 49 void (*destroy_compress_ctx)(struct compress_ctx *cc); ··· 157 130 158 131 int f2fs_init_compress_ctx(struct compress_ctx *cc) 159 132 { 160 - struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode); 161 - 162 - if (cc->nr_rpages) 133 + if (cc->rpages) 163 134 return 0; 164 135 165 - cc->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) << 166 - cc->log_cluster_size, GFP_NOFS); 136 + cc->rpages = page_array_alloc(cc->inode, cc->cluster_size); 167 137 return cc->rpages ? 0 : -ENOMEM; 168 138 } 169 139 170 140 void f2fs_destroy_compress_ctx(struct compress_ctx *cc) 171 141 { 172 - kfree(cc->rpages); 142 + page_array_free(cc->inode, cc->rpages, cc->cluster_size); 173 143 cc->rpages = NULL; 174 144 cc->nr_rpages = 0; 175 145 cc->nr_cpages = 0; ··· 406 382 ZSTD_DStream *stream; 407 383 void *workspace; 408 384 unsigned int workspace_size; 385 + unsigned int max_window_size = 386 + MAX_COMPRESS_WINDOW_SIZE(dic->log_cluster_size); 409 387 410 - workspace_size = ZSTD_DStreamWorkspaceBound(MAX_COMPRESS_WINDOW_SIZE); 388 + workspace_size = ZSTD_DStreamWorkspaceBound(max_window_size); 411 389 412 390 workspace = f2fs_kvmalloc(F2FS_I_SB(dic->inode), 413 391 workspace_size, GFP_NOFS); 414 392 if (!workspace) 415 393 return -ENOMEM; 416 394 417 - stream = ZSTD_initDStream(MAX_COMPRESS_WINDOW_SIZE, 418 - workspace, workspace_size); 395 + stream = ZSTD_initDStream(max_window_size, workspace, workspace_size); 419 396 if (!stream) { 420 397 printk_ratelimited("%sF2FS-fs (%s): %s ZSTD_initDStream failed\n", 421 398 KERN_ERR, F2FS_I_SB(dic->inode)->sb->s_id, ··· 579 554 mempool_free(page, compress_page_pool); 580 555 } 581 556 557 + #define MAX_VMAP_RETRIES 3 558 + 559 + static void *f2fs_vmap(struct page **pages, unsigned int count) 560 + { 561 + int i; 562 + void *buf = NULL; 563 + 564 + for (i = 0; i < MAX_VMAP_RETRIES; i++) { 565 + buf = vm_map_ram(pages, count, -1); 566 + if (buf) 567 + break; 568 + vm_unmap_aliases(); 569 + } 570 + return buf; 571 + } 572 + 582 573 static int f2fs_compress_pages(struct compress_ctx *cc) 583 574 { 584 - struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode); 585 575 struct f2fs_inode_info *fi = F2FS_I(cc->inode); 586 576 const struct f2fs_compress_ops *cops = 587 577 f2fs_cops[fi->i_compress_algorithm]; 588 - unsigned int max_len, nr_cpages; 578 + unsigned int max_len, new_nr_cpages; 579 + struct page **new_cpages; 589 580 int i, ret; 590 581 591 582 trace_f2fs_compress_pages_start(cc->inode, cc->cluster_idx, ··· 616 575 max_len = COMPRESS_HEADER_SIZE + cc->clen; 617 576 cc->nr_cpages = DIV_ROUND_UP(max_len, PAGE_SIZE); 618 577 619 - cc->cpages = f2fs_kzalloc(sbi, sizeof(struct page *) * 620 - cc->nr_cpages, GFP_NOFS); 578 + cc->cpages = page_array_alloc(cc->inode, cc->nr_cpages); 621 579 if (!cc->cpages) { 622 580 ret = -ENOMEM; 623 581 goto destroy_compress_ctx; ··· 630 590 } 631 591 } 632 592 633 - cc->rbuf = vmap(cc->rpages, cc->cluster_size, VM_MAP, PAGE_KERNEL_RO); 593 + cc->rbuf = f2fs_vmap(cc->rpages, cc->cluster_size); 634 594 if (!cc->rbuf) { 635 595 ret = -ENOMEM; 636 596 goto out_free_cpages; 637 597 } 638 598 639 - cc->cbuf = vmap(cc->cpages, cc->nr_cpages, VM_MAP, PAGE_KERNEL); 599 + cc->cbuf = f2fs_vmap(cc->cpages, cc->nr_cpages); 640 600 if (!cc->cbuf) { 641 601 ret = -ENOMEM; 642 602 goto out_vunmap_rbuf; ··· 658 618 for (i = 0; i < COMPRESS_DATA_RESERVED_SIZE; i++) 659 619 cc->cbuf->reserved[i] = cpu_to_le32(0); 660 620 661 - nr_cpages = DIV_ROUND_UP(cc->clen + COMPRESS_HEADER_SIZE, PAGE_SIZE); 621 + new_nr_cpages = DIV_ROUND_UP(cc->clen + COMPRESS_HEADER_SIZE, PAGE_SIZE); 622 + 623 + /* Now we're going to cut unnecessary tail pages */ 624 + new_cpages = page_array_alloc(cc->inode, new_nr_cpages); 625 + if (!new_cpages) { 626 + ret = -ENOMEM; 627 + goto out_vunmap_cbuf; 628 + } 662 629 663 630 /* zero out any unused part of the last page */ 664 631 memset(&cc->cbuf->cdata[cc->clen], 0, 665 - (nr_cpages * PAGE_SIZE) - (cc->clen + COMPRESS_HEADER_SIZE)); 632 + (new_nr_cpages * PAGE_SIZE) - 633 + (cc->clen + COMPRESS_HEADER_SIZE)); 666 634 667 - vunmap(cc->cbuf); 668 - vunmap(cc->rbuf); 635 + vm_unmap_ram(cc->cbuf, cc->nr_cpages); 636 + vm_unmap_ram(cc->rbuf, cc->cluster_size); 669 637 670 - for (i = nr_cpages; i < cc->nr_cpages; i++) { 638 + for (i = 0; i < cc->nr_cpages; i++) { 639 + if (i < new_nr_cpages) { 640 + new_cpages[i] = cc->cpages[i]; 641 + continue; 642 + } 671 643 f2fs_compress_free_page(cc->cpages[i]); 672 644 cc->cpages[i] = NULL; 673 645 } ··· 687 635 if (cops->destroy_compress_ctx) 688 636 cops->destroy_compress_ctx(cc); 689 637 690 - cc->nr_cpages = nr_cpages; 638 + page_array_free(cc->inode, cc->cpages, cc->nr_cpages); 639 + cc->cpages = new_cpages; 640 + cc->nr_cpages = new_nr_cpages; 691 641 692 642 trace_f2fs_compress_pages_end(cc->inode, cc->cluster_idx, 693 643 cc->clen, ret); 694 644 return 0; 695 645 696 646 out_vunmap_cbuf: 697 - vunmap(cc->cbuf); 647 + vm_unmap_ram(cc->cbuf, cc->nr_cpages); 698 648 out_vunmap_rbuf: 699 - vunmap(cc->rbuf); 649 + vm_unmap_ram(cc->rbuf, cc->cluster_size); 700 650 out_free_cpages: 701 651 for (i = 0; i < cc->nr_cpages; i++) { 702 652 if (cc->cpages[i]) 703 653 f2fs_compress_free_page(cc->cpages[i]); 704 654 } 705 - kfree(cc->cpages); 655 + page_array_free(cc->inode, cc->cpages, cc->nr_cpages); 706 656 cc->cpages = NULL; 707 657 destroy_compress_ctx: 708 658 if (cops->destroy_compress_ctx) ··· 731 677 if (bio->bi_status || PageError(page)) 732 678 dic->failed = true; 733 679 734 - if (refcount_dec_not_one(&dic->ref)) 680 + if (atomic_dec_return(&dic->pending_pages)) 735 681 return; 736 682 737 683 trace_f2fs_decompress_pages_start(dic->inode, dic->cluster_idx, ··· 743 689 goto out_free_dic; 744 690 } 745 691 746 - dic->tpages = f2fs_kzalloc(sbi, sizeof(struct page *) * 747 - dic->cluster_size, GFP_NOFS); 692 + dic->tpages = page_array_alloc(dic->inode, dic->cluster_size); 748 693 if (!dic->tpages) { 749 694 ret = -ENOMEM; 750 695 goto out_free_dic; ··· 768 715 goto out_free_dic; 769 716 } 770 717 771 - dic->rbuf = vmap(dic->tpages, dic->cluster_size, VM_MAP, PAGE_KERNEL); 718 + dic->rbuf = f2fs_vmap(dic->tpages, dic->cluster_size); 772 719 if (!dic->rbuf) { 773 720 ret = -ENOMEM; 774 721 goto destroy_decompress_ctx; 775 722 } 776 723 777 - dic->cbuf = vmap(dic->cpages, dic->nr_cpages, VM_MAP, PAGE_KERNEL_RO); 724 + dic->cbuf = f2fs_vmap(dic->cpages, dic->nr_cpages); 778 725 if (!dic->cbuf) { 779 726 ret = -ENOMEM; 780 727 goto out_vunmap_rbuf; ··· 791 738 ret = cops->decompress_pages(dic); 792 739 793 740 out_vunmap_cbuf: 794 - vunmap(dic->cbuf); 741 + vm_unmap_ram(dic->cbuf, dic->nr_cpages); 795 742 out_vunmap_rbuf: 796 - vunmap(dic->rbuf); 743 + vm_unmap_ram(dic->rbuf, dic->cluster_size); 797 744 destroy_decompress_ctx: 798 745 if (cops->destroy_decompress_ctx) 799 746 cops->destroy_decompress_ctx(dic); 800 747 out_free_dic: 801 748 if (verity) 802 - refcount_set(&dic->ref, dic->nr_cpages); 749 + atomic_set(&dic->pending_pages, dic->nr_cpages); 803 750 if (!verity) 804 751 f2fs_decompress_end_io(dic->rpages, dic->cluster_size, 805 752 ret, false); ··· 1082 1029 1083 1030 { 1084 1031 struct compress_ctx cc = { 1032 + .inode = inode, 1085 1033 .log_cluster_size = F2FS_I(inode)->i_log_cluster_size, 1086 1034 .cluster_size = F2FS_I(inode)->i_cluster_size, 1087 1035 .rpages = fsdata, ··· 1186 1132 */ 1187 1133 down_read(&sbi->node_write); 1188 1134 } else if (!f2fs_trylock_op(sbi)) { 1189 - return -EAGAIN; 1135 + goto out_free; 1190 1136 } 1191 1137 1192 1138 set_new_dnode(&dn, cc->inode, NULL, NULL, 0); ··· 1209 1155 1210 1156 fio.version = ni.version; 1211 1157 1212 - cic = f2fs_kzalloc(sbi, sizeof(struct compress_io_ctx), GFP_NOFS); 1158 + cic = kmem_cache_zalloc(cic_entry_slab, GFP_NOFS); 1213 1159 if (!cic) 1214 1160 goto out_put_dnode; 1215 1161 1216 1162 cic->magic = F2FS_COMPRESSED_PAGE_MAGIC; 1217 1163 cic->inode = inode; 1218 - refcount_set(&cic->ref, cc->nr_cpages); 1219 - cic->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) << 1220 - cc->log_cluster_size, GFP_NOFS); 1164 + atomic_set(&cic->pending_pages, cc->nr_cpages); 1165 + cic->rpages = page_array_alloc(cc->inode, cc->cluster_size); 1221 1166 if (!cic->rpages) 1222 1167 goto out_put_cic; 1223 1168 ··· 1310 1257 spin_unlock(&fi->i_size_lock); 1311 1258 1312 1259 f2fs_put_rpages(cc); 1260 + page_array_free(cc->inode, cc->cpages, cc->nr_cpages); 1261 + cc->cpages = NULL; 1313 1262 f2fs_destroy_compress_ctx(cc); 1314 1263 return 0; 1315 1264 1316 1265 out_destroy_crypt: 1317 - kfree(cic->rpages); 1266 + page_array_free(cc->inode, cic->rpages, cc->cluster_size); 1318 1267 1319 1268 for (--i; i >= 0; i--) 1320 1269 fscrypt_finalize_bounce_page(&cc->cpages[i]); ··· 1326 1271 f2fs_put_page(cc->cpages[i], 1); 1327 1272 } 1328 1273 out_put_cic: 1329 - kfree(cic); 1274 + kmem_cache_free(cic_entry_slab, cic); 1330 1275 out_put_dnode: 1331 1276 f2fs_put_dnode(&dn); 1332 1277 out_unlock_op: ··· 1334 1279 up_read(&sbi->node_write); 1335 1280 else 1336 1281 f2fs_unlock_op(sbi); 1282 + out_free: 1283 + page_array_free(cc->inode, cc->cpages, cc->nr_cpages); 1284 + cc->cpages = NULL; 1337 1285 return -EAGAIN; 1338 1286 } 1339 1287 ··· 1354 1296 1355 1297 dec_page_count(sbi, F2FS_WB_DATA); 1356 1298 1357 - if (refcount_dec_not_one(&cic->ref)) 1299 + if (atomic_dec_return(&cic->pending_pages)) 1358 1300 return; 1359 1301 1360 1302 for (i = 0; i < cic->nr_rpages; i++) { ··· 1363 1305 end_page_writeback(cic->rpages[i]); 1364 1306 } 1365 1307 1366 - kfree(cic->rpages); 1367 - kfree(cic); 1308 + page_array_free(cic->inode, cic->rpages, cic->nr_rpages); 1309 + kmem_cache_free(cic_entry_slab, cic); 1368 1310 } 1369 1311 1370 1312 static int f2fs_write_raw_pages(struct compress_ctx *cc, ··· 1446 1388 struct writeback_control *wbc, 1447 1389 enum iostat_type io_type) 1448 1390 { 1449 - struct f2fs_inode_info *fi = F2FS_I(cc->inode); 1450 - const struct f2fs_compress_ops *cops = 1451 - f2fs_cops[fi->i_compress_algorithm]; 1452 1391 int err; 1453 1392 1454 1393 *submitted = 0; ··· 1460 1405 1461 1406 err = f2fs_write_compressed_pages(cc, submitted, 1462 1407 wbc, io_type); 1463 - cops->destroy_compress_ctx(cc); 1464 - kfree(cc->cpages); 1465 - cc->cpages = NULL; 1466 1408 if (!err) 1467 1409 return 0; 1468 1410 f2fs_bug_on(F2FS_I_SB(cc->inode), err != -EAGAIN); ··· 1476 1424 1477 1425 struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc) 1478 1426 { 1479 - struct f2fs_sb_info *sbi = F2FS_I_SB(cc->inode); 1480 1427 struct decompress_io_ctx *dic; 1481 1428 pgoff_t start_idx = start_idx_of_cluster(cc); 1482 1429 int i; 1483 1430 1484 - dic = f2fs_kzalloc(sbi, sizeof(struct decompress_io_ctx), GFP_NOFS); 1431 + dic = kmem_cache_zalloc(dic_entry_slab, GFP_NOFS); 1485 1432 if (!dic) 1486 1433 return ERR_PTR(-ENOMEM); 1487 1434 1488 - dic->rpages = f2fs_kzalloc(sbi, sizeof(struct page *) << 1489 - cc->log_cluster_size, GFP_NOFS); 1435 + dic->rpages = page_array_alloc(cc->inode, cc->cluster_size); 1490 1436 if (!dic->rpages) { 1491 - kfree(dic); 1437 + kmem_cache_free(dic_entry_slab, dic); 1492 1438 return ERR_PTR(-ENOMEM); 1493 1439 } 1494 1440 1495 1441 dic->magic = F2FS_COMPRESSED_PAGE_MAGIC; 1496 1442 dic->inode = cc->inode; 1497 - refcount_set(&dic->ref, cc->nr_cpages); 1443 + atomic_set(&dic->pending_pages, cc->nr_cpages); 1498 1444 dic->cluster_idx = cc->cluster_idx; 1499 1445 dic->cluster_size = cc->cluster_size; 1500 1446 dic->log_cluster_size = cc->log_cluster_size; ··· 1503 1453 dic->rpages[i] = cc->rpages[i]; 1504 1454 dic->nr_rpages = cc->cluster_size; 1505 1455 1506 - dic->cpages = f2fs_kzalloc(sbi, sizeof(struct page *) * 1507 - dic->nr_cpages, GFP_NOFS); 1456 + dic->cpages = page_array_alloc(dic->inode, dic->nr_cpages); 1508 1457 if (!dic->cpages) 1509 1458 goto out_free; 1510 1459 ··· 1538 1489 continue; 1539 1490 f2fs_compress_free_page(dic->tpages[i]); 1540 1491 } 1541 - kfree(dic->tpages); 1492 + page_array_free(dic->inode, dic->tpages, dic->cluster_size); 1542 1493 } 1543 1494 1544 1495 if (dic->cpages) { ··· 1547 1498 continue; 1548 1499 f2fs_compress_free_page(dic->cpages[i]); 1549 1500 } 1550 - kfree(dic->cpages); 1501 + page_array_free(dic->inode, dic->cpages, dic->nr_cpages); 1551 1502 } 1552 1503 1553 - kfree(dic->rpages); 1554 - kfree(dic); 1504 + page_array_free(dic->inode, dic->rpages, dic->nr_rpages); 1505 + kmem_cache_free(dic_entry_slab, dic); 1555 1506 } 1556 1507 1557 1508 void f2fs_decompress_end_io(struct page **rpages, ··· 1578 1529 unlock: 1579 1530 unlock_page(rpage); 1580 1531 } 1532 + } 1533 + 1534 + int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi) 1535 + { 1536 + dev_t dev = sbi->sb->s_bdev->bd_dev; 1537 + char slab_name[32]; 1538 + 1539 + sprintf(slab_name, "f2fs_page_array_entry-%u:%u", MAJOR(dev), MINOR(dev)); 1540 + 1541 + sbi->page_array_slab_size = sizeof(struct page *) << 1542 + F2FS_OPTION(sbi).compress_log_size; 1543 + 1544 + sbi->page_array_slab = f2fs_kmem_cache_create(slab_name, 1545 + sbi->page_array_slab_size); 1546 + if (!sbi->page_array_slab) 1547 + return -ENOMEM; 1548 + return 0; 1549 + } 1550 + 1551 + void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi) 1552 + { 1553 + kmem_cache_destroy(sbi->page_array_slab); 1554 + } 1555 + 1556 + static int __init f2fs_init_cic_cache(void) 1557 + { 1558 + cic_entry_slab = f2fs_kmem_cache_create("f2fs_cic_entry", 1559 + sizeof(struct compress_io_ctx)); 1560 + if (!cic_entry_slab) 1561 + return -ENOMEM; 1562 + return 0; 1563 + } 1564 + 1565 + static void f2fs_destroy_cic_cache(void) 1566 + { 1567 + kmem_cache_destroy(cic_entry_slab); 1568 + } 1569 + 1570 + static int __init f2fs_init_dic_cache(void) 1571 + { 1572 + dic_entry_slab = f2fs_kmem_cache_create("f2fs_dic_entry", 1573 + sizeof(struct decompress_io_ctx)); 1574 + if (!dic_entry_slab) 1575 + return -ENOMEM; 1576 + return 0; 1577 + } 1578 + 1579 + static void f2fs_destroy_dic_cache(void) 1580 + { 1581 + kmem_cache_destroy(dic_entry_slab); 1582 + } 1583 + 1584 + int __init f2fs_init_compress_cache(void) 1585 + { 1586 + int err; 1587 + 1588 + err = f2fs_init_cic_cache(); 1589 + if (err) 1590 + goto out; 1591 + err = f2fs_init_dic_cache(); 1592 + if (err) 1593 + goto free_cic; 1594 + return 0; 1595 + free_cic: 1596 + f2fs_destroy_cic_cache(); 1597 + out: 1598 + return -ENOMEM; 1599 + } 1600 + 1601 + void f2fs_destroy_compress_cache(void) 1602 + { 1603 + f2fs_destroy_dic_cache(); 1604 + f2fs_destroy_cic_cache(); 1581 1605 }
+104 -15
fs/f2fs/data.c
··· 202 202 dic = (struct decompress_io_ctx *)page_private(page); 203 203 204 204 if (dic) { 205 - if (refcount_dec_not_one(&dic->ref)) 205 + if (atomic_dec_return(&dic->pending_pages)) 206 206 continue; 207 207 f2fs_verify_pages(dic->rpages, 208 208 dic->cluster_size); ··· 517 517 518 518 zero_user_segment(page, 0, PAGE_SIZE); 519 519 SetPagePrivate(page); 520 - set_page_private(page, (unsigned long)DUMMY_WRITTEN_PAGE); 520 + set_page_private(page, DUMMY_WRITTEN_PAGE); 521 521 lock_page(page); 522 522 if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) 523 523 f2fs_bug_on(sbi, 1); ··· 1416 1416 set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version); 1417 1417 old_blkaddr = dn->data_blkaddr; 1418 1418 f2fs_allocate_data_block(sbi, NULL, old_blkaddr, &dn->data_blkaddr, 1419 - &sum, seg_type, NULL); 1419 + &sum, seg_type, NULL); 1420 1420 if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) 1421 1421 invalidate_mapping_pages(META_MAPPING(sbi), 1422 1422 old_blkaddr, old_blkaddr); ··· 1803 1803 static int get_data_block_bmap(struct inode *inode, sector_t iblock, 1804 1804 struct buffer_head *bh_result, int create) 1805 1805 { 1806 - /* Block number less than F2FS MAX BLOCKS */ 1807 - if (unlikely(iblock >= F2FS_I_SB(inode)->max_file_blocks)) 1808 - return -EFBIG; 1809 - 1810 1806 return __get_data_block(inode, iblock, bh_result, create, 1811 1807 F2FS_GET_BLOCK_BMAP, NULL, 1812 1808 NO_CHECK_TYPE, create); ··· 2268 2272 if (IS_ERR(bio)) { 2269 2273 ret = PTR_ERR(bio); 2270 2274 dic->failed = true; 2271 - if (refcount_sub_and_test(dic->nr_cpages - i, 2272 - &dic->ref)) { 2275 + if (!atomic_sub_return(dic->nr_cpages - i, 2276 + &dic->pending_pages)) { 2273 2277 f2fs_decompress_end_io(dic->rpages, 2274 2278 cc->cluster_size, true, 2275 2279 false); ··· 3129 3133 retry = 0; 3130 3134 } 3131 3135 } 3136 + if (f2fs_compressed_file(inode)) 3137 + f2fs_destroy_compress_ctx(&cc); 3132 3138 #endif 3133 3139 if (retry) { 3134 3140 index = 0; ··· 3572 3574 bio->bi_private = dio->orig_private; 3573 3575 bio->bi_end_io = dio->orig_end_io; 3574 3576 3575 - kvfree(dio); 3577 + kfree(dio); 3576 3578 3577 3579 bio_endio(bio); 3578 3580 } ··· 3671 3673 err); 3672 3674 if (!do_opu) 3673 3675 set_inode_flag(inode, FI_UPDATE_WRITE); 3676 + } else if (err == -EIOCBQUEUED) { 3677 + f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO, 3678 + count - iov_iter_count(iter)); 3674 3679 } else if (err < 0) { 3675 3680 f2fs_write_failed(mapping, offset + count); 3676 3681 } 3677 3682 } else { 3678 3683 if (err > 0) 3679 3684 f2fs_update_iostat(sbi, APP_DIRECT_READ_IO, err); 3685 + else if (err == -EIOCBQUEUED) 3686 + f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_READ_IO, 3687 + count - iov_iter_count(iter)); 3680 3688 } 3681 3689 3682 3690 out: ··· 3811 3807 if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) 3812 3808 filemap_write_and_wait(mapping); 3813 3809 3814 - if (f2fs_compressed_file(inode)) 3815 - blknr = f2fs_bmap_compress(inode, block); 3810 + /* Block number less than F2FS MAX BLOCKS */ 3811 + if (unlikely(block >= F2FS_I_SB(inode)->max_file_blocks)) 3812 + goto out; 3816 3813 3817 - if (!get_data_block_bmap(inode, block, &tmp, 0)) 3818 - blknr = tmp.b_blocknr; 3814 + if (f2fs_compressed_file(inode)) { 3815 + blknr = f2fs_bmap_compress(inode, block); 3816 + } else { 3817 + if (!get_data_block_bmap(inode, block, &tmp, 0)) 3818 + blknr = tmp.b_blocknr; 3819 + } 3819 3820 out: 3820 3821 trace_f2fs_bmap(inode, block, blknr); 3821 3822 return blknr; ··· 3883 3874 #endif 3884 3875 3885 3876 #ifdef CONFIG_SWAP 3877 + static int check_swap_activate_fast(struct swap_info_struct *sis, 3878 + struct file *swap_file, sector_t *span) 3879 + { 3880 + struct address_space *mapping = swap_file->f_mapping; 3881 + struct inode *inode = mapping->host; 3882 + sector_t cur_lblock; 3883 + sector_t last_lblock; 3884 + sector_t pblock; 3885 + sector_t lowest_pblock = -1; 3886 + sector_t highest_pblock = 0; 3887 + int nr_extents = 0; 3888 + unsigned long nr_pblocks; 3889 + unsigned long len; 3890 + int ret; 3891 + 3892 + /* 3893 + * Map all the blocks into the extent list. This code doesn't try 3894 + * to be very smart. 3895 + */ 3896 + cur_lblock = 0; 3897 + last_lblock = logical_to_blk(inode, i_size_read(inode)); 3898 + len = i_size_read(inode); 3899 + 3900 + while (cur_lblock <= last_lblock && cur_lblock < sis->max) { 3901 + struct buffer_head map_bh; 3902 + pgoff_t next_pgofs; 3903 + 3904 + cond_resched(); 3905 + 3906 + memset(&map_bh, 0, sizeof(struct buffer_head)); 3907 + map_bh.b_size = len - cur_lblock; 3908 + 3909 + ret = get_data_block(inode, cur_lblock, &map_bh, 0, 3910 + F2FS_GET_BLOCK_FIEMAP, &next_pgofs); 3911 + if (ret) 3912 + goto err_out; 3913 + 3914 + /* hole */ 3915 + if (!buffer_mapped(&map_bh)) 3916 + goto err_out; 3917 + 3918 + pblock = map_bh.b_blocknr; 3919 + nr_pblocks = logical_to_blk(inode, map_bh.b_size); 3920 + 3921 + if (cur_lblock + nr_pblocks >= sis->max) 3922 + nr_pblocks = sis->max - cur_lblock; 3923 + 3924 + if (cur_lblock) { /* exclude the header page */ 3925 + if (pblock < lowest_pblock) 3926 + lowest_pblock = pblock; 3927 + if (pblock + nr_pblocks - 1 > highest_pblock) 3928 + highest_pblock = pblock + nr_pblocks - 1; 3929 + } 3930 + 3931 + /* 3932 + * We found a PAGE_SIZE-length, PAGE_SIZE-aligned run of blocks 3933 + */ 3934 + ret = add_swap_extent(sis, cur_lblock, nr_pblocks, pblock); 3935 + if (ret < 0) 3936 + goto out; 3937 + nr_extents += ret; 3938 + cur_lblock += nr_pblocks; 3939 + } 3940 + ret = nr_extents; 3941 + *span = 1 + highest_pblock - lowest_pblock; 3942 + if (cur_lblock == 0) 3943 + cur_lblock = 1; /* force Empty message */ 3944 + sis->max = cur_lblock; 3945 + sis->pages = cur_lblock - 1; 3946 + sis->highest_bit = cur_lblock - 1; 3947 + out: 3948 + return ret; 3949 + err_out: 3950 + pr_err("swapon: swapfile has holes\n"); 3951 + return -EINVAL; 3952 + } 3953 + 3886 3954 /* Copied from generic_swapfile_activate() to check any holes */ 3887 3955 static int check_swap_activate(struct swap_info_struct *sis, 3888 3956 struct file *swap_file, sector_t *span) ··· 3975 3889 sector_t highest_block = 0; 3976 3890 int nr_extents = 0; 3977 3891 int ret; 3892 + 3893 + if (PAGE_SIZE == F2FS_BLKSIZE) 3894 + return check_swap_activate_fast(sis, swap_file, span); 3978 3895 3979 3896 blkbits = inode->i_blkbits; 3980 3897 blocks_per_page = PAGE_SIZE >> blkbits; ··· 4078 3989 if (ret) 4079 3990 return ret; 4080 3991 4081 - if (f2fs_disable_compressed_file(inode)) 3992 + if (!f2fs_disable_compressed_file(inode)) 4082 3993 return -EINVAL; 4083 3994 4084 3995 ret = check_swap_activate(sis, file, span);
+13 -5
fs/f2fs/debug.c
··· 131 131 si->inline_inode = atomic_read(&sbi->inline_inode); 132 132 si->inline_dir = atomic_read(&sbi->inline_dir); 133 133 si->compr_inode = atomic_read(&sbi->compr_inode); 134 - si->compr_blocks = atomic_read(&sbi->compr_blocks); 134 + si->compr_blocks = atomic64_read(&sbi->compr_blocks); 135 135 si->append = sbi->im[APPEND_INO].ino_num; 136 136 si->update = sbi->im[UPDATE_INO].ino_num; 137 137 si->orphans = sbi->im[ORPHAN_INO].ino_num; ··· 164 164 * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg) 165 165 / 2; 166 166 si->util_invalid = 50 - si->util_free - si->util_valid; 167 - for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_NODE; i++) { 167 + for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) { 168 168 struct curseg_info *curseg = CURSEG_I(sbi, i); 169 169 si->curseg[i] = curseg->segno; 170 170 si->cursec[i] = GET_SEC_FROM_SEG(sbi, curseg->segno); ··· 342 342 si->inline_inode); 343 343 seq_printf(s, " - Inline_dentry Inode: %u\n", 344 344 si->inline_dir); 345 - seq_printf(s, " - Compressed Inode: %u, Blocks: %u\n", 345 + seq_printf(s, " - Compressed Inode: %u, Blocks: %llu\n", 346 346 si->compr_inode, si->compr_blocks); 347 347 seq_printf(s, " - Orphan/Append/Update Inode: %u, %u, %u\n", 348 348 si->orphans, si->append, si->update); ··· 393 393 si->dirty_seg[CURSEG_COLD_NODE], 394 394 si->full_seg[CURSEG_COLD_NODE], 395 395 si->valid_blks[CURSEG_COLD_NODE]); 396 + seq_printf(s, " - Pinned file: %8d %8d %8d\n", 397 + si->curseg[CURSEG_COLD_DATA_PINNED], 398 + si->cursec[CURSEG_COLD_DATA_PINNED], 399 + si->curzone[CURSEG_COLD_DATA_PINNED]); 400 + seq_printf(s, " - ATGC data: %8d %8d %8d\n", 401 + si->curseg[CURSEG_ALL_DATA_ATGC], 402 + si->cursec[CURSEG_ALL_DATA_ATGC], 403 + si->curzone[CURSEG_ALL_DATA_ATGC]); 396 404 seq_printf(s, "\n - Valid: %d\n - Dirty: %d\n", 397 405 si->main_area_segs - si->dirty_count - 398 406 si->prefree_count - si->free_segs, ··· 550 542 atomic_set(&sbi->inline_inode, 0); 551 543 atomic_set(&sbi->inline_dir, 0); 552 544 atomic_set(&sbi->compr_inode, 0); 553 - atomic_set(&sbi->compr_blocks, 0); 545 + atomic64_set(&sbi->compr_blocks, 0); 554 546 atomic_set(&sbi->inplace_count, 0); 555 547 for (i = META_CP; i < META_MAX; i++) 556 548 atomic_set(&sbi->meta_count[i], 0); ··· 574 566 list_del(&si->stat_list); 575 567 mutex_unlock(&f2fs_stat_mutex); 576 568 577 - kvfree(si); 569 + kfree(si); 578 570 } 579 571 580 572 void __init f2fs_create_root_stats(void)
+18 -91
fs/f2fs/dir.c
··· 75 75 struct f2fs_filename *fname) 76 76 { 77 77 #ifdef CONFIG_UNICODE 78 - struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb); 78 + struct super_block *sb = dir->i_sb; 79 + struct f2fs_sb_info *sbi = F2FS_SB(sb); 79 80 80 81 if (IS_CASEFOLDED(dir)) { 81 82 fname->cf_name.name = f2fs_kmalloc(sbi, F2FS_NAME_LEN, 82 83 GFP_NOFS); 83 84 if (!fname->cf_name.name) 84 85 return -ENOMEM; 85 - fname->cf_name.len = utf8_casefold(sbi->s_encoding, 86 + fname->cf_name.len = utf8_casefold(sb->s_encoding, 86 87 fname->usr_fname, 87 88 fname->cf_name.name, 88 89 F2FS_NAME_LEN); 89 90 if ((int)fname->cf_name.len <= 0) { 90 91 kfree(fname->cf_name.name); 91 92 fname->cf_name.name = NULL; 92 - if (f2fs_has_strict_mode(sbi)) 93 + if (sb_has_strict_encoding(sb)) 93 94 return -EINVAL; 94 95 /* fall back to treating name as opaque byte sequence */ 95 96 } ··· 191 190 static struct f2fs_dir_entry *find_in_block(struct inode *dir, 192 191 struct page *dentry_page, 193 192 const struct f2fs_filename *fname, 194 - int *max_slots, 195 - struct page **res_page) 193 + int *max_slots) 196 194 { 197 195 struct f2fs_dentry_block *dentry_blk; 198 - struct f2fs_dir_entry *de; 199 196 struct f2fs_dentry_ptr d; 200 197 201 198 dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); 202 199 203 200 make_dentry_ptr_block(dir, &d, dentry_blk); 204 - de = f2fs_find_target_dentry(&d, fname, max_slots); 205 - if (de) 206 - *res_page = dentry_page; 207 - 208 - return de; 201 + return f2fs_find_target_dentry(&d, fname, max_slots); 209 202 } 210 203 211 204 #ifdef CONFIG_UNICODE ··· 210 215 static bool f2fs_match_ci_name(const struct inode *dir, const struct qstr *name, 211 216 const u8 *de_name, u32 de_name_len) 212 217 { 213 - const struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb); 214 - const struct unicode_map *um = sbi->s_encoding; 218 + const struct super_block *sb = dir->i_sb; 219 + const struct unicode_map *um = sb->s_encoding; 215 220 struct qstr entry = QSTR_INIT(de_name, de_name_len); 216 221 int res; 217 222 ··· 221 226 * In strict mode, ignore invalid names. In non-strict mode, 222 227 * fall back to treating them as opaque byte sequences. 223 228 */ 224 - if (f2fs_has_strict_mode(sbi) || name->len != entry.len) 229 + if (sb_has_strict_encoding(sb) || name->len != entry.len) 225 230 return false; 226 231 return !memcmp(name->name, entry.name, name->len); 227 232 } ··· 325 330 } 326 331 } 327 332 328 - de = find_in_block(dir, dentry_page, fname, &max_slots, 329 - res_page); 330 - if (de) 333 + de = find_in_block(dir, dentry_page, fname, &max_slots); 334 + if (de) { 335 + *res_page = dentry_page; 331 336 break; 337 + } 332 338 333 339 if (max_slots >= s) 334 340 room = true; ··· 353 357 unsigned int max_depth; 354 358 unsigned int level; 355 359 360 + *res_page = NULL; 361 + 356 362 if (f2fs_has_inline_dentry(dir)) { 357 - *res_page = NULL; 358 363 de = f2fs_find_in_inline_dir(dir, fname, res_page); 359 364 goto out; 360 365 } 361 366 362 - if (npages == 0) { 363 - *res_page = NULL; 367 + if (npages == 0) 364 368 goto out; 365 - } 366 369 367 370 max_depth = F2FS_I(dir)->i_current_depth; 368 371 if (unlikely(max_depth > MAX_DIR_HASH_DEPTH)) { ··· 372 377 } 373 378 374 379 for (level = 0; level < max_depth; level++) { 375 - *res_page = NULL; 376 380 de = find_in_level(dir, level, fname, res_page); 377 381 if (de || IS_ERR(*res_page)) 378 382 break; ··· 1101 1107 }; 1102 1108 1103 1109 #ifdef CONFIG_UNICODE 1104 - static int f2fs_d_compare(const struct dentry *dentry, unsigned int len, 1105 - const char *str, const struct qstr *name) 1106 - { 1107 - const struct dentry *parent = READ_ONCE(dentry->d_parent); 1108 - const struct inode *dir = READ_ONCE(parent->d_inode); 1109 - const struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb); 1110 - struct qstr entry = QSTR_INIT(str, len); 1111 - char strbuf[DNAME_INLINE_LEN]; 1112 - int res; 1113 - 1114 - if (!dir || !IS_CASEFOLDED(dir)) 1115 - goto fallback; 1116 - 1117 - /* 1118 - * If the dentry name is stored in-line, then it may be concurrently 1119 - * modified by a rename. If this happens, the VFS will eventually retry 1120 - * the lookup, so it doesn't matter what ->d_compare() returns. 1121 - * However, it's unsafe to call utf8_strncasecmp() with an unstable 1122 - * string. Therefore, we have to copy the name into a temporary buffer. 1123 - */ 1124 - if (len <= DNAME_INLINE_LEN - 1) { 1125 - memcpy(strbuf, str, len); 1126 - strbuf[len] = 0; 1127 - entry.name = strbuf; 1128 - /* prevent compiler from optimizing out the temporary buffer */ 1129 - barrier(); 1130 - } 1131 - 1132 - res = utf8_strncasecmp(sbi->s_encoding, name, &entry); 1133 - if (res >= 0) 1134 - return res; 1135 - 1136 - if (f2fs_has_strict_mode(sbi)) 1137 - return -EINVAL; 1138 - fallback: 1139 - if (len != name->len) 1140 - return 1; 1141 - return !!memcmp(str, name->name, len); 1142 - } 1143 - 1144 - static int f2fs_d_hash(const struct dentry *dentry, struct qstr *str) 1145 - { 1146 - struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb); 1147 - const struct unicode_map *um = sbi->s_encoding; 1148 - const struct inode *inode = READ_ONCE(dentry->d_inode); 1149 - unsigned char *norm; 1150 - int len, ret = 0; 1151 - 1152 - if (!inode || !IS_CASEFOLDED(inode)) 1153 - return 0; 1154 - 1155 - norm = f2fs_kmalloc(sbi, PATH_MAX, GFP_ATOMIC); 1156 - if (!norm) 1157 - return -ENOMEM; 1158 - 1159 - len = utf8_casefold(um, str, norm, PATH_MAX); 1160 - if (len < 0) { 1161 - if (f2fs_has_strict_mode(sbi)) 1162 - ret = -EINVAL; 1163 - goto out; 1164 - } 1165 - str->hash = full_name_hash(dentry, norm, len); 1166 - out: 1167 - kvfree(norm); 1168 - return ret; 1169 - } 1170 - 1171 1110 const struct dentry_operations f2fs_dentry_ops = { 1172 - .d_hash = f2fs_d_hash, 1173 - .d_compare = f2fs_d_compare, 1111 + .d_hash = generic_ci_d_hash, 1112 + .d_compare = generic_ci_d_compare, 1174 1113 }; 1175 1114 #endif
+35 -2
fs/f2fs/extent_cache.c
··· 58 58 return re; 59 59 } 60 60 61 + struct rb_node **f2fs_lookup_rb_tree_ext(struct f2fs_sb_info *sbi, 62 + struct rb_root_cached *root, 63 + struct rb_node **parent, 64 + unsigned long long key, bool *leftmost) 65 + { 66 + struct rb_node **p = &root->rb_root.rb_node; 67 + struct rb_entry *re; 68 + 69 + while (*p) { 70 + *parent = *p; 71 + re = rb_entry(*parent, struct rb_entry, rb_node); 72 + 73 + if (key < re->key) { 74 + p = &(*p)->rb_left; 75 + } else { 76 + p = &(*p)->rb_right; 77 + *leftmost = false; 78 + } 79 + } 80 + 81 + return p; 82 + } 83 + 61 84 struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, 62 85 struct rb_root_cached *root, 63 86 struct rb_node **parent, ··· 189 166 } 190 167 191 168 bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi, 192 - struct rb_root_cached *root) 169 + struct rb_root_cached *root, bool check_key) 193 170 { 194 171 #ifdef CONFIG_F2FS_CHECK_FS 195 172 struct rb_node *cur = rb_first_cached(root), *next; ··· 206 183 cur_re = rb_entry(cur, struct rb_entry, rb_node); 207 184 next_re = rb_entry(next, struct rb_entry, rb_node); 208 185 186 + if (check_key) { 187 + if (cur_re->key > next_re->key) { 188 + f2fs_info(sbi, "inconsistent rbtree, " 189 + "cur(%llu) next(%llu)", 190 + cur_re->key, next_re->key); 191 + return false; 192 + } 193 + goto next; 194 + } 195 + 209 196 if (cur_re->ofs + cur_re->len > next_re->ofs) { 210 197 f2fs_info(sbi, "inconsistent rbtree, cur(%u, %u) next(%u, %u)", 211 198 cur_re->ofs, cur_re->len, 212 199 next_re->ofs, next_re->len); 213 200 return false; 214 201 } 215 - 202 + next: 216 203 cur = next; 217 204 } 218 205 #endif
+84 -34
fs/f2fs/f2fs.h
··· 98 98 #define F2FS_MOUNT_RESERVE_ROOT 0x01000000 99 99 #define F2FS_MOUNT_DISABLE_CHECKPOINT 0x02000000 100 100 #define F2FS_MOUNT_NORECOVERY 0x04000000 101 + #define F2FS_MOUNT_ATGC 0x08000000 101 102 102 103 #define F2FS_OPTION(sbi) ((sbi)->mount_opt) 103 104 #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option) ··· 613 612 614 613 struct rb_entry { 615 614 struct rb_node rb_node; /* rb node located in rb-tree */ 616 - unsigned int ofs; /* start offset of the entry */ 617 - unsigned int len; /* length of the entry */ 615 + union { 616 + struct { 617 + unsigned int ofs; /* start offset of the entry */ 618 + unsigned int len; /* length of the entry */ 619 + }; 620 + unsigned long long key; /* 64-bits key */ 621 + } __packed; 618 622 }; 619 623 620 624 struct extent_info { ··· 807 801 struct timespec64 i_disk_time[4];/* inode disk times */ 808 802 809 803 /* for file compress */ 810 - u64 i_compr_blocks; /* # of compressed blocks */ 804 + atomic_t i_compr_blocks; /* # of compressed blocks */ 811 805 unsigned char i_compress_algorithm; /* algorithm type */ 812 806 unsigned char i_log_cluster_size; /* log of cluster size */ 813 807 unsigned int i_cluster_size; /* cluster size */ ··· 979 973 */ 980 974 #define NR_CURSEG_DATA_TYPE (3) 981 975 #define NR_CURSEG_NODE_TYPE (3) 982 - #define NR_CURSEG_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE) 976 + #define NR_CURSEG_INMEM_TYPE (2) 977 + #define NR_CURSEG_PERSIST_TYPE (NR_CURSEG_DATA_TYPE + NR_CURSEG_NODE_TYPE) 978 + #define NR_CURSEG_TYPE (NR_CURSEG_INMEM_TYPE + NR_CURSEG_PERSIST_TYPE) 983 979 984 980 enum { 985 981 CURSEG_HOT_DATA = 0, /* directory entry blocks */ ··· 990 982 CURSEG_HOT_NODE, /* direct node blocks of directory files */ 991 983 CURSEG_WARM_NODE, /* direct node blocks of normal files */ 992 984 CURSEG_COLD_NODE, /* indirect node blocks */ 993 - NO_CHECK_TYPE, 994 - CURSEG_COLD_DATA_PINNED,/* cold data for pinned file */ 985 + NR_PERSISTENT_LOG, /* number of persistent log */ 986 + CURSEG_COLD_DATA_PINNED = NR_PERSISTENT_LOG, 987 + /* pinned file that needs consecutive block address */ 988 + CURSEG_ALL_DATA_ATGC, /* SSR alloctor in hot/warm/cold data area */ 989 + NO_CHECK_TYPE, /* number of persistent & inmem log */ 995 990 }; 996 991 997 992 struct flush_cmd { ··· 1220 1209 #ifdef CONFIG_BLK_DEV_ZONED 1221 1210 unsigned int nr_blkz; /* Total number of zones */ 1222 1211 unsigned long *blkz_seq; /* Bitmap indicating sequential zones */ 1212 + block_t *zone_capacity_blocks; /* Array of zone capacity in blks */ 1223 1213 #endif 1224 1214 }; 1225 1215 ··· 1238 1226 spinlock_t ino_lock; /* for ino entry lock */ 1239 1227 struct list_head ino_list; /* inode list head */ 1240 1228 unsigned long ino_num; /* number of entries */ 1229 + }; 1230 + 1231 + /* for GC_AT */ 1232 + struct atgc_management { 1233 + bool atgc_enabled; /* ATGC is enabled or not */ 1234 + struct rb_root_cached root; /* root of victim rb-tree */ 1235 + struct list_head victim_list; /* linked with all victim entries */ 1236 + unsigned int victim_count; /* victim count in rb-tree */ 1237 + unsigned int candidate_ratio; /* candidate ratio */ 1238 + unsigned int max_candidate_count; /* max candidate count */ 1239 + unsigned int age_weight; /* age weight, vblock_weight = 100 - age_weight */ 1240 + unsigned long long age_threshold; /* age threshold */ 1241 1241 }; 1242 1242 1243 1243 /* For s_flag in struct f2fs_sb_info */ ··· 1284 1260 GC_NORMAL, 1285 1261 GC_IDLE_CB, 1286 1262 GC_IDLE_GREEDY, 1263 + GC_IDLE_AT, 1287 1264 GC_URGENT_HIGH, 1288 1265 GC_URGENT_LOW, 1289 1266 }; ··· 1328 1303 #define DUMMY_WRITTEN_PAGE ((unsigned long)-2) 1329 1304 1330 1305 #define IS_ATOMIC_WRITTEN_PAGE(page) \ 1331 - (page_private(page) == (unsigned long)ATOMIC_WRITTEN_PAGE) 1306 + (page_private(page) == ATOMIC_WRITTEN_PAGE) 1332 1307 #define IS_DUMMY_WRITTEN_PAGE(page) \ 1333 - (page_private(page) == (unsigned long)DUMMY_WRITTEN_PAGE) 1308 + (page_private(page) == DUMMY_WRITTEN_PAGE) 1334 1309 1335 1310 #ifdef CONFIG_F2FS_IO_TRACE 1336 1311 #define IS_IO_TRACED_PAGE(page) \ ··· 1384 1359 struct inode *inode; /* inode the context belong to */ 1385 1360 struct page **rpages; /* pages store raw data in cluster */ 1386 1361 unsigned int nr_rpages; /* total page number in rpages */ 1387 - refcount_t ref; /* referrence count of raw page */ 1362 + atomic_t pending_pages; /* in-flight compressed page count */ 1388 1363 }; 1389 1364 1390 1365 /* decompress io context for read IO path */ ··· 1403 1378 struct compress_data *cbuf; /* virtual mapped address on cpages */ 1404 1379 size_t rlen; /* valid data length in rbuf */ 1405 1380 size_t clen; /* valid data length in cbuf */ 1406 - refcount_t ref; /* referrence count of compressed page */ 1381 + atomic_t pending_pages; /* in-flight compressed page count */ 1407 1382 bool failed; /* indicate IO error during decompression */ 1408 1383 void *private; /* payload buffer for specified decompression algorithm */ 1409 1384 void *private2; /* extra payload buffer */ ··· 1412 1387 #define NULL_CLUSTER ((unsigned int)(~0)) 1413 1388 #define MIN_COMPRESS_LOG_SIZE 2 1414 1389 #define MAX_COMPRESS_LOG_SIZE 8 1415 - #define MAX_COMPRESS_WINDOW_SIZE ((PAGE_SIZE) << MAX_COMPRESS_LOG_SIZE) 1390 + #define MAX_COMPRESS_WINDOW_SIZE(log_size) ((PAGE_SIZE) << (log_size)) 1416 1391 1417 1392 struct f2fs_sb_info { 1418 1393 struct super_block *sb; /* pointer to VFS super block */ ··· 1422 1397 int valid_super_block; /* valid super block no */ 1423 1398 unsigned long s_flag; /* flags for sbi */ 1424 1399 struct mutex writepages; /* mutex for writepages() */ 1425 - #ifdef CONFIG_UNICODE 1426 - struct unicode_map *s_encoding; 1427 - __u16 s_encoding_flags; 1428 - #endif 1429 1400 1430 1401 #ifdef CONFIG_BLK_DEV_ZONED 1431 1402 unsigned int blocks_per_blkz; /* F2FS blocks per zone */ ··· 1529 1508 * race between GC and GC or CP 1530 1509 */ 1531 1510 struct f2fs_gc_kthread *gc_thread; /* GC thread */ 1511 + struct atgc_management am; /* atgc management */ 1532 1512 unsigned int cur_victim_sec; /* current victim section num */ 1533 1513 unsigned int gc_mode; /* current GC state */ 1534 1514 unsigned int next_victim_seg[2]; /* next segment in victim section */ ··· 1566 1544 atomic_t inline_inode; /* # of inline_data inodes */ 1567 1545 atomic_t inline_dir; /* # of inline_dentry inodes */ 1568 1546 atomic_t compr_inode; /* # of compressed inodes */ 1569 - atomic_t compr_blocks; /* # of compressed blocks */ 1547 + atomic64_t compr_blocks; /* # of compressed blocks */ 1570 1548 atomic_t vw_cnt; /* # of volatile writes */ 1571 1549 atomic_t max_aw_cnt; /* max # of atomic writes */ 1572 1550 atomic_t max_vw_cnt; /* max # of volatile writes */ ··· 1615 1593 1616 1594 struct kmem_cache *inline_xattr_slab; /* inline xattr entry */ 1617 1595 unsigned int inline_xattr_slab_size; /* default inline xattr slab size */ 1596 + 1597 + #ifdef CONFIG_F2FS_FS_COMPRESSION 1598 + struct kmem_cache *page_array_slab; /* page array entry */ 1599 + unsigned int page_array_slab_size; /* default page array slab size */ 1600 + #endif 1618 1601 }; 1619 1602 1620 1603 struct f2fs_private_dio { ··· 3352 3325 int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable); 3353 3326 void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); 3354 3327 int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); 3328 + void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi); 3329 + void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi); 3330 + void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi); 3331 + void f2fs_get_new_segment(struct f2fs_sb_info *sbi, 3332 + unsigned int *newseg, bool new_sec, int dir); 3355 3333 void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type, 3356 3334 unsigned int start, unsigned int end); 3357 3335 void f2fs_allocate_new_segment(struct f2fs_sb_info *sbi, int type); ··· 3375 3343 int f2fs_inplace_write_data(struct f2fs_io_info *fio); 3376 3344 void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, 3377 3345 block_t old_blkaddr, block_t new_blkaddr, 3378 - bool recover_curseg, bool recover_newaddr); 3346 + bool recover_curseg, bool recover_newaddr, 3347 + bool from_gc); 3379 3348 void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, 3380 3349 block_t old_addr, block_t new_addr, 3381 3350 unsigned char version, bool recover_curseg, ··· 3404 3371 int f2fs_rw_hint_to_seg_type(enum rw_hint hint); 3405 3372 enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi, 3406 3373 enum page_type type, enum temp_type temp); 3374 + unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi, 3375 + unsigned int segno); 3376 + unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi, 3377 + unsigned int segno); 3407 3378 3408 3379 /* 3409 3380 * checkpoint.c ··· 3415 3378 void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io); 3416 3379 struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); 3417 3380 struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); 3418 - struct page *f2fs_get_meta_page_nofail(struct f2fs_sb_info *sbi, pgoff_t index); 3381 + struct page *f2fs_get_meta_page_retry(struct f2fs_sb_info *sbi, pgoff_t index); 3419 3382 struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index); 3420 3383 bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, 3421 3384 block_t blkaddr, int type); ··· 3523 3486 unsigned int segno); 3524 3487 void f2fs_build_gc_manager(struct f2fs_sb_info *sbi); 3525 3488 int f2fs_resize_fs(struct f2fs_sb_info *sbi, __u64 block_count); 3489 + int __init f2fs_create_garbage_collection_cache(void); 3490 + void f2fs_destroy_garbage_collection_cache(void); 3526 3491 3527 3492 /* 3528 3493 * recovery.c ··· 3560 3521 int nr_discard_cmd; 3561 3522 unsigned int undiscard_blks; 3562 3523 int inline_xattr, inline_inode, inline_dir, append, update, orphans; 3563 - int compr_inode, compr_blocks; 3524 + int compr_inode; 3525 + unsigned long long compr_blocks; 3564 3526 int aw_cnt, max_aw_cnt, vw_cnt, max_vw_cnt; 3565 3527 unsigned int valid_count, valid_node_count, valid_inode_count, discard_blks; 3566 3528 unsigned int bimodal, avg_vblocks; ··· 3646 3606 (atomic_dec(&F2FS_I_SB(inode)->compr_inode)); \ 3647 3607 } while (0) 3648 3608 #define stat_add_compr_blocks(inode, blocks) \ 3649 - (atomic_add(blocks, &F2FS_I_SB(inode)->compr_blocks)) 3609 + (atomic64_add(blocks, &F2FS_I_SB(inode)->compr_blocks)) 3650 3610 #define stat_sub_compr_blocks(inode, blocks) \ 3651 - (atomic_sub(blocks, &F2FS_I_SB(inode)->compr_blocks)) 3611 + (atomic64_sub(blocks, &F2FS_I_SB(inode)->compr_blocks)) 3652 3612 #define stat_inc_meta_count(sbi, blkaddr) \ 3653 3613 do { \ 3654 3614 if (blkaddr < SIT_I(sbi)->sit_base_addr) \ ··· 3827 3787 */ 3828 3788 struct rb_entry *f2fs_lookup_rb_tree(struct rb_root_cached *root, 3829 3789 struct rb_entry *cached_re, unsigned int ofs); 3790 + struct rb_node **f2fs_lookup_rb_tree_ext(struct f2fs_sb_info *sbi, 3791 + struct rb_root_cached *root, 3792 + struct rb_node **parent, 3793 + unsigned long long key, bool *left_most); 3830 3794 struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, 3831 3795 struct rb_root_cached *root, 3832 3796 struct rb_node **parent, ··· 3841 3797 struct rb_node ***insert_p, struct rb_node **insert_parent, 3842 3798 bool force, bool *leftmost); 3843 3799 bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi, 3844 - struct rb_root_cached *root); 3800 + struct rb_root_cached *root, bool check_key); 3845 3801 unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink); 3846 3802 void f2fs_init_extent_tree(struct inode *inode, struct page *ipage); 3847 3803 void f2fs_drop_extent_tree(struct inode *inode); ··· 3927 3883 int f2fs_init_compress_ctx(struct compress_ctx *cc); 3928 3884 void f2fs_destroy_compress_ctx(struct compress_ctx *cc); 3929 3885 void f2fs_init_compress_info(struct f2fs_sb_info *sbi); 3886 + int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi); 3887 + void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi); 3888 + int __init f2fs_init_compress_cache(void); 3889 + void f2fs_destroy_compress_cache(void); 3930 3890 #else 3931 3891 static inline bool f2fs_is_compressed_page(struct page *page) { return false; } 3932 3892 static inline bool f2fs_is_compress_backend_ready(struct inode *inode) ··· 3947 3899 } 3948 3900 static inline int f2fs_init_compress_mempool(void) { return 0; } 3949 3901 static inline void f2fs_destroy_compress_mempool(void) { } 3902 + static inline int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi) { return 0; } 3903 + static inline void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi) { } 3904 + static inline int __init f2fs_init_compress_cache(void) { return 0; } 3905 + static inline void f2fs_destroy_compress_cache(void) { } 3950 3906 #endif 3951 3907 3952 3908 static inline void set_compress_context(struct inode *inode) ··· 3969 3917 f2fs_mark_inode_dirty_sync(inode, true); 3970 3918 } 3971 3919 3972 - static inline u64 f2fs_disable_compressed_file(struct inode *inode) 3920 + static inline bool f2fs_disable_compressed_file(struct inode *inode) 3973 3921 { 3974 3922 struct f2fs_inode_info *fi = F2FS_I(inode); 3975 3923 3976 3924 if (!f2fs_compressed_file(inode)) 3977 - return 0; 3978 - if (S_ISREG(inode->i_mode)) { 3979 - if (get_dirty_pages(inode)) 3980 - return 1; 3981 - if (fi->i_compr_blocks) 3982 - return fi->i_compr_blocks; 3983 - } 3925 + return true; 3926 + if (S_ISREG(inode->i_mode) && 3927 + (get_dirty_pages(inode) || atomic_read(&fi->i_compr_blocks))) 3928 + return false; 3984 3929 3985 3930 fi->i_flags &= ~F2FS_COMPR_FL; 3986 3931 stat_dec_compr_inode(inode); 3987 3932 clear_inode_flag(inode, FI_COMPRESSED_FILE); 3988 3933 f2fs_mark_inode_dirty_sync(inode, true); 3989 - return 0; 3934 + return true; 3990 3935 } 3991 3936 3992 3937 #define F2FS_FEATURE_FUNCS(name, flagname) \ ··· 4077 4028 u64 blocks, bool add) 4078 4029 { 4079 4030 int diff = F2FS_I(inode)->i_cluster_size - blocks; 4031 + struct f2fs_inode_info *fi = F2FS_I(inode); 4080 4032 4081 4033 /* don't update i_compr_blocks if saved blocks were released */ 4082 - if (!add && !F2FS_I(inode)->i_compr_blocks) 4034 + if (!add && !atomic_read(&fi->i_compr_blocks)) 4083 4035 return; 4084 4036 4085 4037 if (add) { 4086 - F2FS_I(inode)->i_compr_blocks += diff; 4038 + atomic_add(diff, &fi->i_compr_blocks); 4087 4039 stat_add_compr_blocks(inode, diff); 4088 4040 } else { 4089 - F2FS_I(inode)->i_compr_blocks -= diff; 4041 + atomic_sub(diff, &fi->i_compr_blocks); 4090 4042 stat_sub_compr_blocks(inode, diff); 4091 4043 } 4092 4044 f2fs_mark_inode_dirty_sync(inode, true);
+40 -48
fs/f2fs/file.c
··· 376 376 return f2fs_do_sync_file(file, start, end, datasync, false); 377 377 } 378 378 379 - static pgoff_t __get_first_dirty_index(struct address_space *mapping, 380 - pgoff_t pgofs, int whence) 381 - { 382 - struct page *page; 383 - int nr_pages; 384 - 385 - if (whence != SEEK_DATA) 386 - return 0; 387 - 388 - /* find first dirty page index */ 389 - nr_pages = find_get_pages_tag(mapping, &pgofs, PAGECACHE_TAG_DIRTY, 390 - 1, &page); 391 - if (!nr_pages) 392 - return ULONG_MAX; 393 - pgofs = page->index; 394 - put_page(page); 395 - return pgofs; 396 - } 397 - 398 - static bool __found_offset(struct f2fs_sb_info *sbi, block_t blkaddr, 399 - pgoff_t dirty, pgoff_t pgofs, int whence) 379 + static bool __found_offset(struct address_space *mapping, block_t blkaddr, 380 + pgoff_t index, int whence) 400 381 { 401 382 switch (whence) { 402 383 case SEEK_DATA: 403 - if ((blkaddr == NEW_ADDR && dirty == pgofs) || 404 - __is_valid_data_blkaddr(blkaddr)) 384 + if (__is_valid_data_blkaddr(blkaddr)) 385 + return true; 386 + if (blkaddr == NEW_ADDR && 387 + xa_get_mark(&mapping->i_pages, index, PAGECACHE_TAG_DIRTY)) 405 388 return true; 406 389 break; 407 390 case SEEK_HOLE: ··· 400 417 struct inode *inode = file->f_mapping->host; 401 418 loff_t maxbytes = inode->i_sb->s_maxbytes; 402 419 struct dnode_of_data dn; 403 - pgoff_t pgofs, end_offset, dirty; 420 + pgoff_t pgofs, end_offset; 404 421 loff_t data_ofs = offset; 405 422 loff_t isize; 406 423 int err = 0; ··· 412 429 goto fail; 413 430 414 431 /* handle inline data case */ 415 - if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) { 416 - if (whence == SEEK_HOLE) 417 - data_ofs = isize; 432 + if (f2fs_has_inline_data(inode) && whence == SEEK_HOLE) { 433 + data_ofs = isize; 418 434 goto found; 419 435 } 420 436 421 437 pgofs = (pgoff_t)(offset >> PAGE_SHIFT); 422 - 423 - dirty = __get_first_dirty_index(inode->i_mapping, pgofs, whence); 424 438 425 439 for (; data_ofs < isize; data_ofs = (loff_t)pgofs << PAGE_SHIFT) { 426 440 set_new_dnode(&dn, inode, NULL, NULL, 0); ··· 451 471 goto fail; 452 472 } 453 473 454 - if (__found_offset(F2FS_I_SB(inode), blkaddr, dirty, 474 + if (__found_offset(file->f_mapping, blkaddr, 455 475 pgofs, whence)) { 456 476 f2fs_put_dnode(&dn); 457 477 goto found; ··· 544 564 bool compressed_cluster = false; 545 565 int cluster_index = 0, valid_blocks = 0; 546 566 int cluster_size = F2FS_I(dn->inode)->i_cluster_size; 547 - bool released = !F2FS_I(dn->inode)->i_compr_blocks; 567 + bool released = !atomic_read(&F2FS_I(dn->inode)->i_compr_blocks); 548 568 549 569 if (IS_INODE(dn->node_page) && f2fs_has_extra_attr(dn->inode)) 550 570 base = get_extra_isize(dn->inode); ··· 733 753 return err; 734 754 735 755 #ifdef CONFIG_F2FS_FS_COMPRESSION 736 - if (from != free_from) 756 + if (from != free_from) { 737 757 err = f2fs_truncate_partial_cluster(inode, from, lock); 758 + if (err) 759 + return err; 760 + } 738 761 #endif 739 762 740 - return err; 763 + return 0; 741 764 } 742 765 743 766 int f2fs_truncate(struct inode *inode) ··· 1639 1656 } 1640 1657 1641 1658 down_write(&sbi->pin_sem); 1642 - map.m_seg_type = CURSEG_COLD_DATA_PINNED; 1643 1659 1644 1660 f2fs_lock_op(sbi); 1645 - f2fs_allocate_new_segment(sbi, CURSEG_COLD_DATA); 1661 + f2fs_allocate_new_segment(sbi, CURSEG_COLD_DATA_PINNED); 1646 1662 f2fs_unlock_op(sbi); 1647 1663 1664 + map.m_seg_type = CURSEG_COLD_DATA_PINNED; 1648 1665 err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO); 1666 + 1649 1667 up_write(&sbi->pin_sem); 1650 1668 1651 1669 done += map.m_len; ··· 1812 1828 1813 1829 if ((iflags ^ masked_flags) & F2FS_COMPR_FL) { 1814 1830 if (masked_flags & F2FS_COMPR_FL) { 1815 - if (f2fs_disable_compressed_file(inode)) 1831 + if (!f2fs_disable_compressed_file(inode)) 1816 1832 return -EINVAL; 1817 1833 } 1818 1834 if (iflags & F2FS_NOCOMP_FL) 1819 1835 return -EINVAL; 1820 1836 if (iflags & F2FS_COMPR_FL) { 1821 1837 if (!f2fs_may_compress(inode)) 1838 + return -EINVAL; 1839 + if (S_ISREG(inode->i_mode) && inode->i_size) 1822 1840 return -EINVAL; 1823 1841 1824 1842 set_compress_context(inode); ··· 2769 2783 if (IS_ENCRYPTED(src) || IS_ENCRYPTED(dst)) 2770 2784 return -EOPNOTSUPP; 2771 2785 2786 + if (pos_out < 0 || pos_in < 0) 2787 + return -EINVAL; 2788 + 2772 2789 if (src == dst) { 2773 2790 if (pos_in == pos_out) 2774 2791 return 0; ··· 3247 3258 if (ret) 3248 3259 goto out; 3249 3260 3250 - if (f2fs_disable_compressed_file(inode)) { 3261 + if (!f2fs_disable_compressed_file(inode)) { 3251 3262 ret = -EOPNOTSUPP; 3252 3263 goto out; 3253 3264 } ··· 3374 3385 min(FSLABEL_MAX, count))) 3375 3386 err = -EFAULT; 3376 3387 3377 - kvfree(vbuf); 3388 + kfree(vbuf); 3378 3389 return err; 3379 3390 } 3380 3391 ··· 3425 3436 if (!f2fs_compressed_file(inode)) 3426 3437 return -EINVAL; 3427 3438 3428 - blocks = F2FS_I(inode)->i_compr_blocks; 3439 + blocks = atomic_read(&F2FS_I(inode)->i_compr_blocks); 3429 3440 return put_user(blocks, (u64 __user *)arg); 3430 3441 } 3431 3442 ··· 3510 3521 inode_lock(inode); 3511 3522 3512 3523 writecount = atomic_read(&inode->i_writecount); 3513 - if ((filp->f_mode & FMODE_WRITE && writecount != 1) || writecount) { 3524 + if ((filp->f_mode & FMODE_WRITE && writecount != 1) || 3525 + (!(filp->f_mode & FMODE_WRITE) && writecount)) { 3514 3526 ret = -EBUSY; 3515 3527 goto out; 3516 3528 } ··· 3530 3540 inode->i_ctime = current_time(inode); 3531 3541 f2fs_mark_inode_dirty_sync(inode, true); 3532 3542 3533 - if (!F2FS_I(inode)->i_compr_blocks) 3543 + if (!atomic_read(&F2FS_I(inode)->i_compr_blocks)) 3534 3544 goto out; 3535 3545 3536 3546 down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); ··· 3578 3588 3579 3589 if (ret >= 0) { 3580 3590 ret = put_user(released_blocks, (u64 __user *)arg); 3581 - } else if (released_blocks && F2FS_I(inode)->i_compr_blocks) { 3591 + } else if (released_blocks && 3592 + atomic_read(&F2FS_I(inode)->i_compr_blocks)) { 3582 3593 set_sbi_flag(sbi, SBI_NEED_FSCK); 3583 3594 f2fs_warn(sbi, "%s: partial blocks were released i_ino=%lx " 3584 - "iblocks=%llu, released=%u, compr_blocks=%llu, " 3595 + "iblocks=%llu, released=%u, compr_blocks=%u, " 3585 3596 "run fsck to fix.", 3586 3597 __func__, inode->i_ino, inode->i_blocks, 3587 3598 released_blocks, 3588 - F2FS_I(inode)->i_compr_blocks); 3599 + atomic_read(&F2FS_I(inode)->i_compr_blocks)); 3589 3600 } 3590 3601 3591 3602 return ret; ··· 3674 3683 if (ret) 3675 3684 return ret; 3676 3685 3677 - if (F2FS_I(inode)->i_compr_blocks) 3686 + if (atomic_read(&F2FS_I(inode)->i_compr_blocks)) 3678 3687 goto out; 3679 3688 3680 3689 f2fs_balance_fs(F2FS_I_SB(inode), true); ··· 3738 3747 3739 3748 if (ret >= 0) { 3740 3749 ret = put_user(reserved_blocks, (u64 __user *)arg); 3741 - } else if (reserved_blocks && F2FS_I(inode)->i_compr_blocks) { 3750 + } else if (reserved_blocks && 3751 + atomic_read(&F2FS_I(inode)->i_compr_blocks)) { 3742 3752 set_sbi_flag(sbi, SBI_NEED_FSCK); 3743 3753 f2fs_warn(sbi, "%s: partial blocks were released i_ino=%lx " 3744 - "iblocks=%llu, reserved=%u, compr_blocks=%llu, " 3754 + "iblocks=%llu, reserved=%u, compr_blocks=%u, " 3745 3755 "run fsck to fix.", 3746 3756 __func__, inode->i_ino, inode->i_blocks, 3747 3757 reserved_blocks, 3748 - F2FS_I(inode)->i_compr_blocks); 3758 + atomic_read(&F2FS_I(inode)->i_compr_blocks)); 3749 3759 } 3750 3760 3751 3761 return ret;
+395 -18
fs/f2fs/gc.c
··· 21 21 #include "gc.h" 22 22 #include <trace/events/f2fs.h> 23 23 24 + static struct kmem_cache *victim_entry_slab; 25 + 24 26 static unsigned int count_bits(const unsigned long *addr, 25 27 unsigned int offset, unsigned int len); 26 28 ··· 152 150 "f2fs_gc-%u:%u", MAJOR(dev), MINOR(dev)); 153 151 if (IS_ERR(gc_th->f2fs_gc_task)) { 154 152 err = PTR_ERR(gc_th->f2fs_gc_task); 155 - kvfree(gc_th); 153 + kfree(gc_th); 156 154 sbi->gc_thread = NULL; 157 155 } 158 156 out: ··· 165 163 if (!gc_th) 166 164 return; 167 165 kthread_stop(gc_th->f2fs_gc_task); 168 - kvfree(gc_th); 166 + kfree(gc_th); 169 167 sbi->gc_thread = NULL; 170 168 } 171 169 172 170 static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type) 173 171 { 174 - int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY; 172 + int gc_mode; 173 + 174 + if (gc_type == BG_GC) { 175 + if (sbi->am.atgc_enabled) 176 + gc_mode = GC_AT; 177 + else 178 + gc_mode = GC_CB; 179 + } else { 180 + gc_mode = GC_GREEDY; 181 + } 175 182 176 183 switch (sbi->gc_mode) { 177 184 case GC_IDLE_CB: ··· 190 179 case GC_URGENT_HIGH: 191 180 gc_mode = GC_GREEDY; 192 181 break; 182 + case GC_IDLE_AT: 183 + gc_mode = GC_AT; 184 + break; 193 185 } 186 + 194 187 return gc_mode; 195 188 } 196 189 ··· 204 189 struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); 205 190 206 191 if (p->alloc_mode == SSR) { 192 + p->gc_mode = GC_GREEDY; 193 + p->dirty_bitmap = dirty_i->dirty_segmap[type]; 194 + p->max_search = dirty_i->nr_dirty[type]; 195 + p->ofs_unit = 1; 196 + } else if (p->alloc_mode == AT_SSR) { 207 197 p->gc_mode = GC_GREEDY; 208 198 p->dirty_bitmap = dirty_i->dirty_segmap[type]; 209 199 p->max_search = dirty_i->nr_dirty[type]; ··· 232 212 */ 233 213 if (gc_type != FG_GC && 234 214 (sbi->gc_mode != GC_URGENT_HIGH) && 215 + (p->gc_mode != GC_AT && p->alloc_mode != AT_SSR) && 235 216 p->max_search > sbi->max_victim_search) 236 217 p->max_search = sbi->max_victim_search; 237 218 ··· 250 229 /* SSR allocates in a segment unit */ 251 230 if (p->alloc_mode == SSR) 252 231 return sbi->blocks_per_seg; 232 + else if (p->alloc_mode == AT_SSR) 233 + return UINT_MAX; 234 + 235 + /* LFS */ 253 236 if (p->gc_mode == GC_GREEDY) 254 237 return 2 * sbi->blocks_per_seg * p->ofs_unit; 255 238 else if (p->gc_mode == GC_CB) 239 + return UINT_MAX; 240 + else if (p->gc_mode == GC_AT) 256 241 return UINT_MAX; 257 242 else /* No other gc_mode */ 258 243 return 0; ··· 293 266 unsigned char age = 0; 294 267 unsigned char u; 295 268 unsigned int i; 269 + unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno); 296 270 297 - for (i = 0; i < sbi->segs_per_sec; i++) 271 + for (i = 0; i < usable_segs_per_sec; i++) 298 272 mtime += get_seg_entry(sbi, start + i)->mtime; 299 273 vblocks = get_valid_blocks(sbi, segno, true); 300 274 301 - mtime = div_u64(mtime, sbi->segs_per_sec); 302 - vblocks = div_u64(vblocks, sbi->segs_per_sec); 275 + mtime = div_u64(mtime, usable_segs_per_sec); 276 + vblocks = div_u64(vblocks, usable_segs_per_sec); 303 277 304 278 u = (vblocks * 100) >> sbi->log_blocks_per_seg; 305 279 ··· 325 297 /* alloc_mode == LFS */ 326 298 if (p->gc_mode == GC_GREEDY) 327 299 return get_valid_blocks(sbi, segno, true); 328 - else 300 + else if (p->gc_mode == GC_CB) 329 301 return get_cb_cost(sbi, segno); 302 + 303 + f2fs_bug_on(sbi, 1); 304 + return 0; 330 305 } 331 306 332 307 static unsigned int count_bits(const unsigned long *addr, ··· 344 313 return sum; 345 314 } 346 315 316 + static struct victim_entry *attach_victim_entry(struct f2fs_sb_info *sbi, 317 + unsigned long long mtime, unsigned int segno, 318 + struct rb_node *parent, struct rb_node **p, 319 + bool left_most) 320 + { 321 + struct atgc_management *am = &sbi->am; 322 + struct victim_entry *ve; 323 + 324 + ve = f2fs_kmem_cache_alloc(victim_entry_slab, GFP_NOFS); 325 + 326 + ve->mtime = mtime; 327 + ve->segno = segno; 328 + 329 + rb_link_node(&ve->rb_node, parent, p); 330 + rb_insert_color_cached(&ve->rb_node, &am->root, left_most); 331 + 332 + list_add_tail(&ve->list, &am->victim_list); 333 + 334 + am->victim_count++; 335 + 336 + return ve; 337 + } 338 + 339 + static void insert_victim_entry(struct f2fs_sb_info *sbi, 340 + unsigned long long mtime, unsigned int segno) 341 + { 342 + struct atgc_management *am = &sbi->am; 343 + struct rb_node **p; 344 + struct rb_node *parent = NULL; 345 + bool left_most = true; 346 + 347 + p = f2fs_lookup_rb_tree_ext(sbi, &am->root, &parent, mtime, &left_most); 348 + attach_victim_entry(sbi, mtime, segno, parent, p, left_most); 349 + } 350 + 351 + static void add_victim_entry(struct f2fs_sb_info *sbi, 352 + struct victim_sel_policy *p, unsigned int segno) 353 + { 354 + struct sit_info *sit_i = SIT_I(sbi); 355 + unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); 356 + unsigned int start = GET_SEG_FROM_SEC(sbi, secno); 357 + unsigned long long mtime = 0; 358 + unsigned int i; 359 + 360 + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { 361 + if (p->gc_mode == GC_AT && 362 + get_valid_blocks(sbi, segno, true) == 0) 363 + return; 364 + 365 + if (p->alloc_mode == AT_SSR && 366 + get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0) 367 + return; 368 + } 369 + 370 + for (i = 0; i < sbi->segs_per_sec; i++) 371 + mtime += get_seg_entry(sbi, start + i)->mtime; 372 + mtime = div_u64(mtime, sbi->segs_per_sec); 373 + 374 + /* Handle if the system time has changed by the user */ 375 + if (mtime < sit_i->min_mtime) 376 + sit_i->min_mtime = mtime; 377 + if (mtime > sit_i->max_mtime) 378 + sit_i->max_mtime = mtime; 379 + if (mtime < sit_i->dirty_min_mtime) 380 + sit_i->dirty_min_mtime = mtime; 381 + if (mtime > sit_i->dirty_max_mtime) 382 + sit_i->dirty_max_mtime = mtime; 383 + 384 + /* don't choose young section as candidate */ 385 + if (sit_i->dirty_max_mtime - mtime < p->age_threshold) 386 + return; 387 + 388 + insert_victim_entry(sbi, mtime, segno); 389 + } 390 + 391 + static struct rb_node *lookup_central_victim(struct f2fs_sb_info *sbi, 392 + struct victim_sel_policy *p) 393 + { 394 + struct atgc_management *am = &sbi->am; 395 + struct rb_node *parent = NULL; 396 + bool left_most; 397 + 398 + f2fs_lookup_rb_tree_ext(sbi, &am->root, &parent, p->age, &left_most); 399 + 400 + return parent; 401 + } 402 + 403 + static void atgc_lookup_victim(struct f2fs_sb_info *sbi, 404 + struct victim_sel_policy *p) 405 + { 406 + struct sit_info *sit_i = SIT_I(sbi); 407 + struct atgc_management *am = &sbi->am; 408 + struct rb_root_cached *root = &am->root; 409 + struct rb_node *node; 410 + struct rb_entry *re; 411 + struct victim_entry *ve; 412 + unsigned long long total_time; 413 + unsigned long long age, u, accu; 414 + unsigned long long max_mtime = sit_i->dirty_max_mtime; 415 + unsigned long long min_mtime = sit_i->dirty_min_mtime; 416 + unsigned int sec_blocks = BLKS_PER_SEC(sbi); 417 + unsigned int vblocks; 418 + unsigned int dirty_threshold = max(am->max_candidate_count, 419 + am->candidate_ratio * 420 + am->victim_count / 100); 421 + unsigned int age_weight = am->age_weight; 422 + unsigned int cost; 423 + unsigned int iter = 0; 424 + 425 + if (max_mtime < min_mtime) 426 + return; 427 + 428 + max_mtime += 1; 429 + total_time = max_mtime - min_mtime; 430 + 431 + accu = div64_u64(ULLONG_MAX, total_time); 432 + accu = min_t(unsigned long long, div_u64(accu, 100), 433 + DEFAULT_ACCURACY_CLASS); 434 + 435 + node = rb_first_cached(root); 436 + next: 437 + re = rb_entry_safe(node, struct rb_entry, rb_node); 438 + if (!re) 439 + return; 440 + 441 + ve = (struct victim_entry *)re; 442 + 443 + if (ve->mtime >= max_mtime || ve->mtime < min_mtime) 444 + goto skip; 445 + 446 + /* age = 10000 * x% * 60 */ 447 + age = div64_u64(accu * (max_mtime - ve->mtime), total_time) * 448 + age_weight; 449 + 450 + vblocks = get_valid_blocks(sbi, ve->segno, true); 451 + f2fs_bug_on(sbi, !vblocks || vblocks == sec_blocks); 452 + 453 + /* u = 10000 * x% * 40 */ 454 + u = div64_u64(accu * (sec_blocks - vblocks), sec_blocks) * 455 + (100 - age_weight); 456 + 457 + f2fs_bug_on(sbi, age + u >= UINT_MAX); 458 + 459 + cost = UINT_MAX - (age + u); 460 + iter++; 461 + 462 + if (cost < p->min_cost || 463 + (cost == p->min_cost && age > p->oldest_age)) { 464 + p->min_cost = cost; 465 + p->oldest_age = age; 466 + p->min_segno = ve->segno; 467 + } 468 + skip: 469 + if (iter < dirty_threshold) { 470 + node = rb_next(node); 471 + goto next; 472 + } 473 + } 474 + 475 + /* 476 + * select candidates around source section in range of 477 + * [target - dirty_threshold, target + dirty_threshold] 478 + */ 479 + static void atssr_lookup_victim(struct f2fs_sb_info *sbi, 480 + struct victim_sel_policy *p) 481 + { 482 + struct sit_info *sit_i = SIT_I(sbi); 483 + struct atgc_management *am = &sbi->am; 484 + struct rb_node *node; 485 + struct rb_entry *re; 486 + struct victim_entry *ve; 487 + unsigned long long age; 488 + unsigned long long max_mtime = sit_i->dirty_max_mtime; 489 + unsigned long long min_mtime = sit_i->dirty_min_mtime; 490 + unsigned int seg_blocks = sbi->blocks_per_seg; 491 + unsigned int vblocks; 492 + unsigned int dirty_threshold = max(am->max_candidate_count, 493 + am->candidate_ratio * 494 + am->victim_count / 100); 495 + unsigned int cost; 496 + unsigned int iter = 0; 497 + int stage = 0; 498 + 499 + if (max_mtime < min_mtime) 500 + return; 501 + max_mtime += 1; 502 + next_stage: 503 + node = lookup_central_victim(sbi, p); 504 + next_node: 505 + re = rb_entry_safe(node, struct rb_entry, rb_node); 506 + if (!re) { 507 + if (stage == 0) 508 + goto skip_stage; 509 + return; 510 + } 511 + 512 + ve = (struct victim_entry *)re; 513 + 514 + if (ve->mtime >= max_mtime || ve->mtime < min_mtime) 515 + goto skip_node; 516 + 517 + age = max_mtime - ve->mtime; 518 + 519 + vblocks = get_seg_entry(sbi, ve->segno)->ckpt_valid_blocks; 520 + f2fs_bug_on(sbi, !vblocks); 521 + 522 + /* rare case */ 523 + if (vblocks == seg_blocks) 524 + goto skip_node; 525 + 526 + iter++; 527 + 528 + age = max_mtime - abs(p->age - age); 529 + cost = UINT_MAX - vblocks; 530 + 531 + if (cost < p->min_cost || 532 + (cost == p->min_cost && age > p->oldest_age)) { 533 + p->min_cost = cost; 534 + p->oldest_age = age; 535 + p->min_segno = ve->segno; 536 + } 537 + skip_node: 538 + if (iter < dirty_threshold) { 539 + if (stage == 0) 540 + node = rb_prev(node); 541 + else if (stage == 1) 542 + node = rb_next(node); 543 + goto next_node; 544 + } 545 + skip_stage: 546 + if (stage < 1) { 547 + stage++; 548 + iter = 0; 549 + goto next_stage; 550 + } 551 + } 552 + static void lookup_victim_by_age(struct f2fs_sb_info *sbi, 553 + struct victim_sel_policy *p) 554 + { 555 + f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi, 556 + &sbi->am.root, true)); 557 + 558 + if (p->gc_mode == GC_AT) 559 + atgc_lookup_victim(sbi, p); 560 + else if (p->alloc_mode == AT_SSR) 561 + atssr_lookup_victim(sbi, p); 562 + else 563 + f2fs_bug_on(sbi, 1); 564 + } 565 + 566 + static void release_victim_entry(struct f2fs_sb_info *sbi) 567 + { 568 + struct atgc_management *am = &sbi->am; 569 + struct victim_entry *ve, *tmp; 570 + 571 + list_for_each_entry_safe(ve, tmp, &am->victim_list, list) { 572 + list_del(&ve->list); 573 + kmem_cache_free(victim_entry_slab, ve); 574 + am->victim_count--; 575 + } 576 + 577 + am->root = RB_ROOT_CACHED; 578 + 579 + f2fs_bug_on(sbi, am->victim_count); 580 + f2fs_bug_on(sbi, !list_empty(&am->victim_list)); 581 + } 582 + 347 583 /* 348 584 * This function is called from two paths. 349 585 * One is garbage collection and the other is SSR segment selection. ··· 620 322 * which has minimum valid blocks and removes it from dirty seglist. 621 323 */ 622 324 static int get_victim_by_default(struct f2fs_sb_info *sbi, 623 - unsigned int *result, int gc_type, int type, char alloc_mode) 325 + unsigned int *result, int gc_type, int type, 326 + char alloc_mode, unsigned long long age) 624 327 { 625 328 struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); 626 329 struct sit_info *sm = SIT_I(sbi); 627 330 struct victim_sel_policy p; 628 331 unsigned int secno, last_victim; 629 332 unsigned int last_segment; 630 - unsigned int nsearched = 0; 333 + unsigned int nsearched; 334 + bool is_atgc; 631 335 int ret = 0; 632 336 633 337 mutex_lock(&dirty_i->seglist_lock); 634 338 last_segment = MAIN_SECS(sbi) * sbi->segs_per_sec; 635 339 636 340 p.alloc_mode = alloc_mode; 637 - select_policy(sbi, gc_type, type, &p); 341 + p.age = age; 342 + p.age_threshold = sbi->am.age_threshold; 638 343 344 + retry: 345 + select_policy(sbi, gc_type, type, &p); 639 346 p.min_segno = NULL_SEGNO; 347 + p.oldest_age = 0; 640 348 p.min_cost = get_max_cost(sbi, &p); 349 + 350 + is_atgc = (p.gc_mode == GC_AT || p.alloc_mode == AT_SSR); 351 + nsearched = 0; 352 + 353 + if (is_atgc) 354 + SIT_I(sbi)->dirty_min_mtime = ULLONG_MAX; 641 355 642 356 if (*result != NULL_SEGNO) { 643 357 if (!get_valid_blocks(sbi, *result, false)) { ··· 731 421 /* Don't touch checkpointed data */ 732 422 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) && 733 423 get_ckpt_valid_blocks(sbi, segno) && 734 - p.alloc_mode != SSR)) 424 + p.alloc_mode == LFS)) 735 425 goto next; 736 426 if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) 737 427 goto next; 428 + 429 + if (is_atgc) { 430 + add_victim_entry(sbi, &p, segno); 431 + goto next; 432 + } 738 433 739 434 cost = get_gc_cost(sbi, segno, &p); 740 435 ··· 759 444 break; 760 445 } 761 446 } 447 + 448 + /* get victim for GC_AT/AT_SSR */ 449 + if (is_atgc) { 450 + lookup_victim_by_age(sbi, &p); 451 + release_victim_entry(sbi); 452 + } 453 + 454 + if (is_atgc && p.min_segno == NULL_SEGNO && 455 + sm->elapsed_time < p.age_threshold) { 456 + p.age_threshold = 0; 457 + goto retry; 458 + } 459 + 762 460 if (p.min_segno != NULL_SEGNO) { 763 461 got_it: 764 462 *result = (p.min_segno / p.ofs_unit) * p.ofs_unit; ··· 864 536 int phase = 0; 865 537 bool fggc = (gc_type == FG_GC); 866 538 int submitted = 0; 539 + unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno); 867 540 868 541 start_addr = START_BLOCK(sbi, segno); 869 542 ··· 874 545 if (fggc && phase == 2) 875 546 atomic_inc(&sbi->wb_sync_req[NODE]); 876 547 877 - for (off = 0; off < sbi->blocks_per_seg; off++, entry++) { 548 + for (off = 0; off < usable_blks_in_seg; off++, entry++) { 878 549 nid_t nid = le32_to_cpu(entry->nid); 879 550 struct page *node_page; 880 551 struct node_info ni; ··· 1120 791 block_t newaddr; 1121 792 int err = 0; 1122 793 bool lfs_mode = f2fs_lfs_mode(fio.sbi); 794 + int type = fio.sbi->am.atgc_enabled ? 795 + CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA; 1123 796 1124 797 /* do not read out */ 1125 798 page = f2fs_grab_cache_page(inode->i_mapping, bidx, false); ··· 1208 877 } 1209 878 1210 879 f2fs_allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr, 1211 - &sum, CURSEG_COLD_DATA, NULL); 880 + &sum, type, NULL); 1212 881 1213 882 fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(fio.sbi), 1214 883 newaddr, FGP_LOCK | FGP_CREAT, GFP_NOFS); ··· 1258 927 recover_block: 1259 928 if (err) 1260 929 f2fs_do_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr, 1261 - true, true); 930 + true, true, true); 1262 931 up_out: 1263 932 if (lfs_mode) 1264 933 up_write(&fio.sbi->io_order_lock); ··· 1364 1033 int off; 1365 1034 int phase = 0; 1366 1035 int submitted = 0; 1036 + unsigned int usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno); 1367 1037 1368 1038 start_addr = START_BLOCK(sbi, segno); 1369 1039 1370 1040 next_step: 1371 1041 entry = sum; 1372 1042 1373 - for (off = 0; off < sbi->blocks_per_seg; off++, entry++) { 1043 + for (off = 0; off < usable_blks_in_seg; off++, entry++) { 1374 1044 struct page *data_page; 1375 1045 struct inode *inode; 1376 1046 struct node_info dni; /* dnode info for the data */ ··· 1514 1182 1515 1183 down_write(&sit_i->sentry_lock); 1516 1184 ret = DIRTY_I(sbi)->v_ops->get_victim(sbi, victim, gc_type, 1517 - NO_CHECK_TYPE, LFS); 1185 + NO_CHECK_TYPE, LFS, 0); 1518 1186 up_write(&sit_i->sentry_lock); 1519 1187 return ret; 1520 1188 } ··· 1535 1203 1536 1204 if (__is_large_section(sbi)) 1537 1205 end_segno = rounddown(end_segno, sbi->segs_per_sec); 1206 + 1207 + /* 1208 + * zone-capacity can be less than zone-size in zoned devices, 1209 + * resulting in less than expected usable segments in the zone, 1210 + * calculate the end segno in the zone which can be garbage collected 1211 + */ 1212 + if (f2fs_sb_has_blkzoned(sbi)) 1213 + end_segno -= sbi->segs_per_sec - 1214 + f2fs_usable_segs_in_sec(sbi, segno); 1215 + 1216 + sanity_check_seg_type(sbi, get_seg_entry(sbi, segno)->type); 1538 1217 1539 1218 /* readahead multi ssa blocks those have contiguous address */ 1540 1219 if (__is_large_section(sbi)) ··· 1699 1356 goto stop; 1700 1357 1701 1358 seg_freed = do_garbage_collect(sbi, segno, &gc_list, gc_type); 1702 - if (gc_type == FG_GC && seg_freed == sbi->segs_per_sec) 1359 + if (gc_type == FG_GC && 1360 + seg_freed == f2fs_usable_segs_in_sec(sbi, segno)) 1703 1361 sec_freed++; 1704 1362 total_freed += seg_freed; 1705 1363 ··· 1757 1413 return ret; 1758 1414 } 1759 1415 1416 + int __init f2fs_create_garbage_collection_cache(void) 1417 + { 1418 + victim_entry_slab = f2fs_kmem_cache_create("f2fs_victim_entry", 1419 + sizeof(struct victim_entry)); 1420 + if (!victim_entry_slab) 1421 + return -ENOMEM; 1422 + return 0; 1423 + } 1424 + 1425 + void f2fs_destroy_garbage_collection_cache(void) 1426 + { 1427 + kmem_cache_destroy(victim_entry_slab); 1428 + } 1429 + 1430 + static void init_atgc_management(struct f2fs_sb_info *sbi) 1431 + { 1432 + struct atgc_management *am = &sbi->am; 1433 + 1434 + if (test_opt(sbi, ATGC) && 1435 + SIT_I(sbi)->elapsed_time >= DEF_GC_THREAD_AGE_THRESHOLD) 1436 + am->atgc_enabled = true; 1437 + 1438 + am->root = RB_ROOT_CACHED; 1439 + INIT_LIST_HEAD(&am->victim_list); 1440 + am->victim_count = 0; 1441 + 1442 + am->candidate_ratio = DEF_GC_THREAD_CANDIDATE_RATIO; 1443 + am->max_candidate_count = DEF_GC_THREAD_MAX_CANDIDATE_COUNT; 1444 + am->age_weight = DEF_GC_THREAD_AGE_WEIGHT; 1445 + } 1446 + 1760 1447 void f2fs_build_gc_manager(struct f2fs_sb_info *sbi) 1761 1448 { 1762 1449 DIRTY_I(sbi)->v_ops = &default_v_ops; ··· 1798 1423 if (f2fs_is_multi_device(sbi) && !__is_large_section(sbi)) 1799 1424 SIT_I(sbi)->last_victim[ALLOC_NEXT] = 1800 1425 GET_SEGNO(sbi, FDEV(0).end_blk) + 1; 1426 + 1427 + init_atgc_management(sbi); 1801 1428 } 1802 1429 1803 1430 static int free_segment_range(struct f2fs_sb_info *sbi, ··· 1827 1450 mutex_unlock(&DIRTY_I(sbi)->seglist_lock); 1828 1451 1829 1452 /* Move out cursegs from the target range */ 1830 - for (type = CURSEG_HOT_DATA; type < NR_CURSEG_TYPE; type++) 1453 + for (type = CURSEG_HOT_DATA; type < NR_CURSEG_PERSIST_TYPE; type++) 1831 1454 f2fs_allocate_segment_for_resize(sbi, type, start, end); 1832 1455 1833 1456 /* do GC to move out valid blocks in the range */
+65 -4
fs/f2fs/gc.h
··· 14 14 #define DEF_GC_THREAD_MIN_SLEEP_TIME 30000 /* milliseconds */ 15 15 #define DEF_GC_THREAD_MAX_SLEEP_TIME 60000 16 16 #define DEF_GC_THREAD_NOGC_SLEEP_TIME 300000 /* wait 5 min */ 17 + 18 + /* choose candidates from sections which has age of more than 7 days */ 19 + #define DEF_GC_THREAD_AGE_THRESHOLD (60 * 60 * 24 * 7) 20 + #define DEF_GC_THREAD_CANDIDATE_RATIO 20 /* select 20% oldest sections as candidates */ 21 + #define DEF_GC_THREAD_MAX_CANDIDATE_COUNT 10 /* select at most 10 sections as candidates */ 22 + #define DEF_GC_THREAD_AGE_WEIGHT 60 /* age weight */ 23 + #define DEFAULT_ACCURACY_CLASS 10000 /* accuracy class */ 24 + 17 25 #define LIMIT_INVALID_BLOCK 40 /* percentage over total user space */ 18 26 #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */ 19 27 ··· 49 41 struct radix_tree_root iroot; 50 42 }; 51 43 44 + struct victim_info { 45 + unsigned long long mtime; /* mtime of section */ 46 + unsigned int segno; /* section No. */ 47 + }; 48 + 49 + struct victim_entry { 50 + struct rb_node rb_node; /* rb node located in rb-tree */ 51 + union { 52 + struct { 53 + unsigned long long mtime; /* mtime of section */ 54 + unsigned int segno; /* segment No. */ 55 + }; 56 + struct victim_info vi; /* victim info */ 57 + }; 58 + struct list_head list; 59 + }; 60 + 52 61 /* 53 62 * inline functions 54 63 */ 64 + 65 + /* 66 + * On a Zoned device zone-capacity can be less than zone-size and if 67 + * zone-capacity is not aligned to f2fs segment size(2MB), then the segment 68 + * starting just before zone-capacity has some blocks spanning across the 69 + * zone-capacity, these blocks are not usable. 70 + * Such spanning segments can be in free list so calculate the sum of usable 71 + * blocks in currently free segments including normal and spanning segments. 72 + */ 73 + static inline block_t free_segs_blk_count_zoned(struct f2fs_sb_info *sbi) 74 + { 75 + block_t free_seg_blks = 0; 76 + struct free_segmap_info *free_i = FREE_I(sbi); 77 + int j; 78 + 79 + spin_lock(&free_i->segmap_lock); 80 + for (j = 0; j < MAIN_SEGS(sbi); j++) 81 + if (!test_bit(j, free_i->free_segmap)) 82 + free_seg_blks += f2fs_usable_blks_in_seg(sbi, j); 83 + spin_unlock(&free_i->segmap_lock); 84 + 85 + return free_seg_blks; 86 + } 87 + 88 + static inline block_t free_segs_blk_count(struct f2fs_sb_info *sbi) 89 + { 90 + if (f2fs_sb_has_blkzoned(sbi)) 91 + return free_segs_blk_count_zoned(sbi); 92 + 93 + return free_segments(sbi) << sbi->log_blocks_per_seg; 94 + } 95 + 55 96 static inline block_t free_user_blocks(struct f2fs_sb_info *sbi) 56 97 { 57 - if (free_segments(sbi) < overprovision_segments(sbi)) 98 + block_t free_blks, ovp_blks; 99 + 100 + free_blks = free_segs_blk_count(sbi); 101 + ovp_blks = overprovision_segments(sbi) << sbi->log_blocks_per_seg; 102 + 103 + if (free_blks < ovp_blks) 58 104 return 0; 59 - else 60 - return (free_segments(sbi) - overprovision_segments(sbi)) 61 - << sbi->log_blocks_per_seg; 105 + 106 + return free_blks - ovp_blks; 62 107 } 63 108 64 109 static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info *sbi)
+2 -2
fs/f2fs/inline.c
··· 524 524 !f2fs_has_inline_xattr(dir)) 525 525 F2FS_I(dir)->i_inline_xattr_size = 0; 526 526 527 - kvfree(backup_dentry); 527 + kfree(backup_dentry); 528 528 return 0; 529 529 recover: 530 530 lock_page(ipage); ··· 535 535 set_page_dirty(ipage); 536 536 f2fs_put_page(ipage, 1); 537 537 538 - kvfree(backup_dentry); 538 + kfree(backup_dentry); 539 539 return err; 540 540 } 541 541
+17 -4
fs/f2fs/inode.c
··· 287 287 return false; 288 288 } 289 289 290 + if ((fi->i_flags & F2FS_CASEFOLD_FL) && !f2fs_sb_has_casefold(sbi)) { 291 + set_sbi_flag(sbi, SBI_NEED_FSCK); 292 + f2fs_warn(sbi, "%s: inode (ino=%lx) has casefold flag, but casefold feature is off", 293 + __func__, inode->i_ino); 294 + return false; 295 + } 296 + 290 297 if (f2fs_has_extra_attr(inode) && f2fs_sb_has_compression(sbi) && 291 298 fi->i_flags & F2FS_COMPR_FL && 292 299 F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, 293 300 i_log_cluster_size)) { 294 301 if (ri->i_compress_algorithm >= COMPRESS_MAX) { 302 + set_sbi_flag(sbi, SBI_NEED_FSCK); 295 303 f2fs_warn(sbi, "%s: inode (ino=%lx) has unsupported " 296 304 "compress algorithm: %u, run fsck to fix", 297 305 __func__, inode->i_ino, ··· 308 300 } 309 301 if (le64_to_cpu(ri->i_compr_blocks) > 310 302 SECTOR_TO_BLOCK(inode->i_blocks)) { 303 + set_sbi_flag(sbi, SBI_NEED_FSCK); 311 304 f2fs_warn(sbi, "%s: inode (ino=%lx) has inconsistent " 312 305 "i_compr_blocks:%llu, i_blocks:%llu, run fsck to fix", 313 306 __func__, inode->i_ino, ··· 318 309 } 319 310 if (ri->i_log_cluster_size < MIN_COMPRESS_LOG_SIZE || 320 311 ri->i_log_cluster_size > MAX_COMPRESS_LOG_SIZE) { 312 + set_sbi_flag(sbi, SBI_NEED_FSCK); 321 313 f2fs_warn(sbi, "%s: inode (ino=%lx) has unsupported " 322 314 "log cluster size: %u, run fsck to fix", 323 315 __func__, inode->i_ino, ··· 452 442 (fi->i_flags & F2FS_COMPR_FL)) { 453 443 if (F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, 454 444 i_log_cluster_size)) { 455 - fi->i_compr_blocks = le64_to_cpu(ri->i_compr_blocks); 445 + atomic_set(&fi->i_compr_blocks, 446 + le64_to_cpu(ri->i_compr_blocks)); 456 447 fi->i_compress_algorithm = ri->i_compress_algorithm; 457 448 fi->i_log_cluster_size = ri->i_log_cluster_size; 458 449 fi->i_cluster_size = 1 << fi->i_log_cluster_size; ··· 471 460 stat_inc_inline_inode(inode); 472 461 stat_inc_inline_dir(inode); 473 462 stat_inc_compr_inode(inode); 474 - stat_add_compr_blocks(inode, F2FS_I(inode)->i_compr_blocks); 463 + stat_add_compr_blocks(inode, atomic_read(&fi->i_compr_blocks)); 475 464 476 465 return 0; 477 466 } ··· 630 619 F2FS_FITS_IN_INODE(ri, F2FS_I(inode)->i_extra_isize, 631 620 i_log_cluster_size)) { 632 621 ri->i_compr_blocks = 633 - cpu_to_le64(F2FS_I(inode)->i_compr_blocks); 622 + cpu_to_le64(atomic_read( 623 + &F2FS_I(inode)->i_compr_blocks)); 634 624 ri->i_compress_algorithm = 635 625 F2FS_I(inode)->i_compress_algorithm; 636 626 ri->i_log_cluster_size = ··· 780 768 stat_dec_inline_dir(inode); 781 769 stat_dec_inline_inode(inode); 782 770 stat_dec_compr_inode(inode); 783 - stat_sub_compr_blocks(inode, F2FS_I(inode)->i_compr_blocks); 771 + stat_sub_compr_blocks(inode, 772 + atomic_read(&F2FS_I(inode)->i_compr_blocks)); 784 773 785 774 if (likely(!f2fs_cp_error(sbi) && 786 775 !is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
+1 -1
fs/f2fs/namei.c
··· 712 712 f2fs_handle_failed_inode(inode); 713 713 out_free_encrypted_link: 714 714 if (disk_link.name != (unsigned char *)symname) 715 - kvfree(disk_link.name); 715 + kfree(disk_link.name); 716 716 return err; 717 717 } 718 718
+2 -5
fs/f2fs/node.c
··· 109 109 110 110 static struct page *get_current_nat_page(struct f2fs_sb_info *sbi, nid_t nid) 111 111 { 112 - return f2fs_get_meta_page_nofail(sbi, current_nat_addr(sbi, nid)); 112 + return f2fs_get_meta_page(sbi, current_nat_addr(sbi, nid)); 113 113 } 114 114 115 115 static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid) ··· 3105 3105 nm_i->next_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid); 3106 3106 nm_i->bitmap_size = __bitmap_size(sbi, NAT_BITMAP); 3107 3107 version_bitmap = __bitmap_ptr(sbi, NAT_BITMAP); 3108 - if (!version_bitmap) 3109 - return -EFAULT; 3110 - 3111 3108 nm_i->nat_bitmap = kmemdup(version_bitmap, nm_i->bitmap_size, 3112 3109 GFP_KERNEL); 3113 3110 if (!nm_i->nat_bitmap) ··· 3254 3257 kvfree(nm_i->nat_bitmap_mir); 3255 3258 #endif 3256 3259 sbi->nm_info = NULL; 3257 - kvfree(nm_i); 3260 + kfree(nm_i); 3258 3261 } 3259 3262 3260 3263 int __init f2fs_create_node_manager_caches(void)
+426 -96
fs/f2fs/segment.c
··· 189 189 190 190 f2fs_trace_pid(page); 191 191 192 - f2fs_set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE); 192 + f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE); 193 193 194 194 new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS); 195 195 ··· 728 728 "f2fs_flush-%u:%u", MAJOR(dev), MINOR(dev)); 729 729 if (IS_ERR(fcc->f2fs_issue_flush)) { 730 730 err = PTR_ERR(fcc->f2fs_issue_flush); 731 - kvfree(fcc); 731 + kfree(fcc); 732 732 SM_I(sbi)->fcc_info = NULL; 733 733 return err; 734 734 } ··· 747 747 kthread_stop(flush_thread); 748 748 } 749 749 if (free) { 750 - kvfree(fcc); 750 + kfree(fcc); 751 751 SM_I(sbi)->fcc_info = NULL; 752 752 } 753 753 } ··· 757 757 int ret = 0, i; 758 758 759 759 if (!f2fs_is_multi_device(sbi)) 760 + return 0; 761 + 762 + if (test_opt(sbi, NOBARRIER)) 760 763 return 0; 761 764 762 765 for (i = 1; i < sbi->s_ndevs; i++) { ··· 862 859 { 863 860 struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); 864 861 unsigned short valid_blocks, ckpt_valid_blocks; 862 + unsigned int usable_blocks; 865 863 866 864 if (segno == NULL_SEGNO || IS_CURSEG(sbi, segno)) 867 865 return; 868 866 867 + usable_blocks = f2fs_usable_blks_in_seg(sbi, segno); 869 868 mutex_lock(&dirty_i->seglist_lock); 870 869 871 870 valid_blocks = get_valid_blocks(sbi, segno, false); 872 871 ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno); 873 872 874 873 if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) || 875 - ckpt_valid_blocks == sbi->blocks_per_seg)) { 874 + ckpt_valid_blocks == usable_blocks)) { 876 875 __locate_dirty_segment(sbi, segno, PRE); 877 876 __remove_dirty_segment(sbi, segno, DIRTY); 878 - } else if (valid_blocks < sbi->blocks_per_seg) { 877 + } else if (valid_blocks < usable_blocks) { 879 878 __locate_dirty_segment(sbi, segno, DIRTY); 880 879 } else { 881 880 /* Recovery routine with SSR needs this */ ··· 920 915 for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) { 921 916 se = get_seg_entry(sbi, segno); 922 917 if (IS_NODESEG(se->type)) 923 - holes[NODE] += sbi->blocks_per_seg - se->valid_blocks; 918 + holes[NODE] += f2fs_usable_blks_in_seg(sbi, segno) - 919 + se->valid_blocks; 924 920 else 925 - holes[DATA] += sbi->blocks_per_seg - se->valid_blocks; 921 + holes[DATA] += f2fs_usable_blks_in_seg(sbi, segno) - 922 + se->valid_blocks; 926 923 } 927 924 mutex_unlock(&dirty_i->seglist_lock); 928 925 ··· 1528 1521 goto next; 1529 1522 if (unlikely(dcc->rbtree_check)) 1530 1523 f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi, 1531 - &dcc->root)); 1524 + &dcc->root, false)); 1532 1525 blk_start_plug(&plug); 1533 1526 list_for_each_entry_safe(dc, tmp, pend_list, list) { 1534 1527 f2fs_bug_on(sbi, dc->state != D_PREP); ··· 1965 1958 1966 1959 mutex_lock(&dirty_i->seglist_lock); 1967 1960 for_each_set_bit(segno, dirty_i->dirty_segmap[PRE], MAIN_SEGS(sbi)) 1968 - __set_test_and_free(sbi, segno); 1961 + __set_test_and_free(sbi, segno, false); 1969 1962 mutex_unlock(&dirty_i->seglist_lock); 1970 1963 } 1971 1964 ··· 2108 2101 "f2fs_discard-%u:%u", MAJOR(dev), MINOR(dev)); 2109 2102 if (IS_ERR(dcc->f2fs_issue_discard)) { 2110 2103 err = PTR_ERR(dcc->f2fs_issue_discard); 2111 - kvfree(dcc); 2104 + kfree(dcc); 2112 2105 SM_I(sbi)->dcc_info = NULL; 2113 2106 return err; 2114 2107 } ··· 2132 2125 if (unlikely(atomic_read(&dcc->discard_cmd_cnt))) 2133 2126 f2fs_issue_discard_timeout(sbi); 2134 2127 2135 - kvfree(dcc); 2128 + kfree(dcc); 2136 2129 SM_I(sbi)->dcc_info = NULL; 2137 2130 } 2138 2131 ··· 2157 2150 __mark_sit_entry_dirty(sbi, segno); 2158 2151 } 2159 2152 2153 + static inline unsigned long long get_segment_mtime(struct f2fs_sb_info *sbi, 2154 + block_t blkaddr) 2155 + { 2156 + unsigned int segno = GET_SEGNO(sbi, blkaddr); 2157 + 2158 + if (segno == NULL_SEGNO) 2159 + return 0; 2160 + return get_seg_entry(sbi, segno)->mtime; 2161 + } 2162 + 2163 + static void update_segment_mtime(struct f2fs_sb_info *sbi, block_t blkaddr, 2164 + unsigned long long old_mtime) 2165 + { 2166 + struct seg_entry *se; 2167 + unsigned int segno = GET_SEGNO(sbi, blkaddr); 2168 + unsigned long long ctime = get_mtime(sbi, false); 2169 + unsigned long long mtime = old_mtime ? old_mtime : ctime; 2170 + 2171 + if (segno == NULL_SEGNO) 2172 + return; 2173 + 2174 + se = get_seg_entry(sbi, segno); 2175 + 2176 + if (!se->mtime) 2177 + se->mtime = mtime; 2178 + else 2179 + se->mtime = div_u64(se->mtime * se->valid_blocks + mtime, 2180 + se->valid_blocks + 1); 2181 + 2182 + if (ctime > SIT_I(sbi)->max_mtime) 2183 + SIT_I(sbi)->max_mtime = ctime; 2184 + } 2185 + 2160 2186 static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) 2161 2187 { 2162 2188 struct seg_entry *se; ··· 2207 2167 offset = GET_BLKOFF_FROM_SEG0(sbi, blkaddr); 2208 2168 2209 2169 f2fs_bug_on(sbi, (new_vblocks < 0 || 2210 - (new_vblocks > sbi->blocks_per_seg))); 2170 + (new_vblocks > f2fs_usable_blks_in_seg(sbi, segno)))); 2211 2171 2212 2172 se->valid_blocks = new_vblocks; 2213 - se->mtime = get_mtime(sbi, false); 2214 - if (se->mtime > SIT_I(sbi)->max_mtime) 2215 - SIT_I(sbi)->max_mtime = se->mtime; 2216 2173 2217 2174 /* Update valid block bitmap */ 2218 2175 if (del > 0) { ··· 2302 2265 /* add it into sit main buffer */ 2303 2266 down_write(&sit_i->sentry_lock); 2304 2267 2268 + update_segment_mtime(sbi, addr, 0); 2305 2269 update_sit_entry(sbi, addr, -1); 2306 2270 2307 2271 /* add it into dirty seglist */ ··· 2382 2344 */ 2383 2345 struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno) 2384 2346 { 2385 - return f2fs_get_meta_page_nofail(sbi, GET_SUM_BLOCK(sbi, segno)); 2347 + if (unlikely(f2fs_cp_error(sbi))) 2348 + return ERR_PTR(-EIO); 2349 + return f2fs_get_meta_page_retry(sbi, GET_SUM_BLOCK(sbi, segno)); 2386 2350 } 2387 2351 2388 2352 void f2fs_update_meta_page(struct f2fs_sb_info *sbi, ··· 2429 2389 f2fs_put_page(page, 1); 2430 2390 } 2431 2391 2432 - static int is_next_segment_free(struct f2fs_sb_info *sbi, int type) 2392 + static int is_next_segment_free(struct f2fs_sb_info *sbi, 2393 + struct curseg_info *curseg, int type) 2433 2394 { 2434 - struct curseg_info *curseg = CURSEG_I(sbi, type); 2435 2395 unsigned int segno = curseg->segno + 1; 2436 2396 struct free_segmap_info *free_i = FREE_I(sbi); 2437 2397 ··· 2535 2495 { 2536 2496 struct curseg_info *curseg = CURSEG_I(sbi, type); 2537 2497 struct summary_footer *sum_footer; 2498 + unsigned short seg_type = curseg->seg_type; 2538 2499 2500 + curseg->inited = true; 2539 2501 curseg->segno = curseg->next_segno; 2540 2502 curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno); 2541 2503 curseg->next_blkoff = 0; ··· 2545 2503 2546 2504 sum_footer = &(curseg->sum_blk->footer); 2547 2505 memset(sum_footer, 0, sizeof(struct summary_footer)); 2548 - if (IS_DATASEG(type)) 2506 + 2507 + sanity_check_seg_type(sbi, seg_type); 2508 + 2509 + if (IS_DATASEG(seg_type)) 2549 2510 SET_SUM_TYPE(sum_footer, SUM_TYPE_DATA); 2550 - if (IS_NODESEG(type)) 2511 + if (IS_NODESEG(seg_type)) 2551 2512 SET_SUM_TYPE(sum_footer, SUM_TYPE_NODE); 2552 - __set_sit_entry_type(sbi, type, curseg->segno, modified); 2513 + __set_sit_entry_type(sbi, seg_type, curseg->segno, modified); 2553 2514 } 2554 2515 2555 2516 static unsigned int __get_next_segno(struct f2fs_sb_info *sbi, int type) 2556 2517 { 2518 + struct curseg_info *curseg = CURSEG_I(sbi, type); 2519 + unsigned short seg_type = curseg->seg_type; 2520 + 2521 + sanity_check_seg_type(sbi, seg_type); 2522 + 2557 2523 /* if segs_per_sec is large than 1, we need to keep original policy. */ 2558 2524 if (__is_large_section(sbi)) 2559 - return CURSEG_I(sbi, type)->segno; 2525 + return curseg->segno; 2526 + 2527 + /* inmem log may not locate on any segment after mount */ 2528 + if (!curseg->inited) 2529 + return 0; 2560 2530 2561 2531 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) 2562 2532 return 0; 2563 2533 2564 2534 if (test_opt(sbi, NOHEAP) && 2565 - (type == CURSEG_HOT_DATA || IS_NODESEG(type))) 2535 + (seg_type == CURSEG_HOT_DATA || IS_NODESEG(seg_type))) 2566 2536 return 0; 2567 2537 2568 2538 if (SIT_I(sbi)->last_victim[ALLOC_NEXT]) ··· 2584 2530 if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_REUSE) 2585 2531 return 0; 2586 2532 2587 - return CURSEG_I(sbi, type)->segno; 2533 + return curseg->segno; 2588 2534 } 2589 2535 2590 2536 /* ··· 2594 2540 static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) 2595 2541 { 2596 2542 struct curseg_info *curseg = CURSEG_I(sbi, type); 2543 + unsigned short seg_type = curseg->seg_type; 2597 2544 unsigned int segno = curseg->segno; 2598 2545 int dir = ALLOC_LEFT; 2599 2546 2600 - write_sum_page(sbi, curseg->sum_blk, 2547 + if (curseg->inited) 2548 + write_sum_page(sbi, curseg->sum_blk, 2601 2549 GET_SUM_BLOCK(sbi, segno)); 2602 - if (type == CURSEG_WARM_DATA || type == CURSEG_COLD_DATA) 2550 + if (seg_type == CURSEG_WARM_DATA || seg_type == CURSEG_COLD_DATA) 2603 2551 dir = ALLOC_RIGHT; 2604 2552 2605 2553 if (test_opt(sbi, NOHEAP)) ··· 2650 2594 * This function always allocates a used segment(from dirty seglist) by SSR 2651 2595 * manner, so it should recover the existing segment information of valid blocks 2652 2596 */ 2653 - static void change_curseg(struct f2fs_sb_info *sbi, int type) 2597 + static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush) 2654 2598 { 2655 2599 struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); 2656 2600 struct curseg_info *curseg = CURSEG_I(sbi, type); ··· 2658 2602 struct f2fs_summary_block *sum_node; 2659 2603 struct page *sum_page; 2660 2604 2661 - write_sum_page(sbi, curseg->sum_blk, 2662 - GET_SUM_BLOCK(sbi, curseg->segno)); 2605 + if (flush) 2606 + write_sum_page(sbi, curseg->sum_blk, 2607 + GET_SUM_BLOCK(sbi, curseg->segno)); 2608 + 2663 2609 __set_test_and_inuse(sbi, new_segno); 2664 2610 2665 2611 mutex_lock(&dirty_i->seglist_lock); ··· 2674 2616 __next_free_blkoff(sbi, curseg, 0); 2675 2617 2676 2618 sum_page = f2fs_get_sum_page(sbi, new_segno); 2677 - f2fs_bug_on(sbi, IS_ERR(sum_page)); 2619 + if (IS_ERR(sum_page)) { 2620 + /* GC won't be able to use stale summary pages by cp_error */ 2621 + memset(curseg->sum_blk, 0, SUM_ENTRY_SIZE); 2622 + return; 2623 + } 2678 2624 sum_node = (struct f2fs_summary_block *)page_address(sum_page); 2679 2625 memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE); 2680 2626 f2fs_put_page(sum_page, 1); 2681 2627 } 2682 2628 2683 - static int get_ssr_segment(struct f2fs_sb_info *sbi, int type) 2629 + static int get_ssr_segment(struct f2fs_sb_info *sbi, int type, 2630 + int alloc_mode, unsigned long long age); 2631 + 2632 + static void get_atssr_segment(struct f2fs_sb_info *sbi, int type, 2633 + int target_type, int alloc_mode, 2634 + unsigned long long age) 2635 + { 2636 + struct curseg_info *curseg = CURSEG_I(sbi, type); 2637 + 2638 + curseg->seg_type = target_type; 2639 + 2640 + if (get_ssr_segment(sbi, type, alloc_mode, age)) { 2641 + struct seg_entry *se = get_seg_entry(sbi, curseg->next_segno); 2642 + 2643 + curseg->seg_type = se->type; 2644 + change_curseg(sbi, type, true); 2645 + } else { 2646 + /* allocate cold segment by default */ 2647 + curseg->seg_type = CURSEG_COLD_DATA; 2648 + new_curseg(sbi, type, true); 2649 + } 2650 + stat_inc_seg_type(sbi, curseg); 2651 + } 2652 + 2653 + static void __f2fs_init_atgc_curseg(struct f2fs_sb_info *sbi) 2654 + { 2655 + struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_ALL_DATA_ATGC); 2656 + 2657 + if (!sbi->am.atgc_enabled) 2658 + return; 2659 + 2660 + down_read(&SM_I(sbi)->curseg_lock); 2661 + 2662 + mutex_lock(&curseg->curseg_mutex); 2663 + down_write(&SIT_I(sbi)->sentry_lock); 2664 + 2665 + get_atssr_segment(sbi, CURSEG_ALL_DATA_ATGC, CURSEG_COLD_DATA, SSR, 0); 2666 + 2667 + up_write(&SIT_I(sbi)->sentry_lock); 2668 + mutex_unlock(&curseg->curseg_mutex); 2669 + 2670 + up_read(&SM_I(sbi)->curseg_lock); 2671 + 2672 + } 2673 + void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi) 2674 + { 2675 + __f2fs_init_atgc_curseg(sbi); 2676 + } 2677 + 2678 + static void __f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi, int type) 2679 + { 2680 + struct curseg_info *curseg = CURSEG_I(sbi, type); 2681 + 2682 + mutex_lock(&curseg->curseg_mutex); 2683 + if (!curseg->inited) 2684 + goto out; 2685 + 2686 + if (get_valid_blocks(sbi, curseg->segno, false)) { 2687 + write_sum_page(sbi, curseg->sum_blk, 2688 + GET_SUM_BLOCK(sbi, curseg->segno)); 2689 + } else { 2690 + mutex_lock(&DIRTY_I(sbi)->seglist_lock); 2691 + __set_test_and_free(sbi, curseg->segno, true); 2692 + mutex_unlock(&DIRTY_I(sbi)->seglist_lock); 2693 + } 2694 + out: 2695 + mutex_unlock(&curseg->curseg_mutex); 2696 + } 2697 + 2698 + void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi) 2699 + { 2700 + __f2fs_save_inmem_curseg(sbi, CURSEG_COLD_DATA_PINNED); 2701 + 2702 + if (sbi->am.atgc_enabled) 2703 + __f2fs_save_inmem_curseg(sbi, CURSEG_ALL_DATA_ATGC); 2704 + } 2705 + 2706 + static void __f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi, int type) 2707 + { 2708 + struct curseg_info *curseg = CURSEG_I(sbi, type); 2709 + 2710 + mutex_lock(&curseg->curseg_mutex); 2711 + if (!curseg->inited) 2712 + goto out; 2713 + if (get_valid_blocks(sbi, curseg->segno, false)) 2714 + goto out; 2715 + 2716 + mutex_lock(&DIRTY_I(sbi)->seglist_lock); 2717 + __set_test_and_inuse(sbi, curseg->segno); 2718 + mutex_unlock(&DIRTY_I(sbi)->seglist_lock); 2719 + out: 2720 + mutex_unlock(&curseg->curseg_mutex); 2721 + } 2722 + 2723 + void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi) 2724 + { 2725 + __f2fs_restore_inmem_curseg(sbi, CURSEG_COLD_DATA_PINNED); 2726 + 2727 + if (sbi->am.atgc_enabled) 2728 + __f2fs_restore_inmem_curseg(sbi, CURSEG_ALL_DATA_ATGC); 2729 + } 2730 + 2731 + static int get_ssr_segment(struct f2fs_sb_info *sbi, int type, 2732 + int alloc_mode, unsigned long long age) 2684 2733 { 2685 2734 struct curseg_info *curseg = CURSEG_I(sbi, type); 2686 2735 const struct victim_selection *v_ops = DIRTY_I(sbi)->v_ops; 2687 2736 unsigned segno = NULL_SEGNO; 2737 + unsigned short seg_type = curseg->seg_type; 2688 2738 int i, cnt; 2689 2739 bool reversed = false; 2690 2740 2741 + sanity_check_seg_type(sbi, seg_type); 2742 + 2691 2743 /* f2fs_need_SSR() already forces to do this */ 2692 - if (!v_ops->get_victim(sbi, &segno, BG_GC, type, SSR)) { 2744 + if (!v_ops->get_victim(sbi, &segno, BG_GC, seg_type, alloc_mode, age)) { 2693 2745 curseg->next_segno = segno; 2694 2746 return 1; 2695 2747 } 2696 2748 2697 2749 /* For node segments, let's do SSR more intensively */ 2698 - if (IS_NODESEG(type)) { 2699 - if (type >= CURSEG_WARM_NODE) { 2750 + if (IS_NODESEG(seg_type)) { 2751 + if (seg_type >= CURSEG_WARM_NODE) { 2700 2752 reversed = true; 2701 2753 i = CURSEG_COLD_NODE; 2702 2754 } else { ··· 2814 2646 } 2815 2647 cnt = NR_CURSEG_NODE_TYPE; 2816 2648 } else { 2817 - if (type >= CURSEG_WARM_DATA) { 2649 + if (seg_type >= CURSEG_WARM_DATA) { 2818 2650 reversed = true; 2819 2651 i = CURSEG_COLD_DATA; 2820 2652 } else { ··· 2824 2656 } 2825 2657 2826 2658 for (; cnt-- > 0; reversed ? i-- : i++) { 2827 - if (i == type) 2659 + if (i == seg_type) 2828 2660 continue; 2829 - if (!v_ops->get_victim(sbi, &segno, BG_GC, i, SSR)) { 2661 + if (!v_ops->get_victim(sbi, &segno, BG_GC, i, alloc_mode, age)) { 2830 2662 curseg->next_segno = segno; 2831 2663 return 1; 2832 2664 } ··· 2855 2687 if (force) 2856 2688 new_curseg(sbi, type, true); 2857 2689 else if (!is_set_ckpt_flags(sbi, CP_CRC_RECOVERY_FLAG) && 2858 - type == CURSEG_WARM_NODE) 2690 + curseg->seg_type == CURSEG_WARM_NODE) 2859 2691 new_curseg(sbi, type, false); 2860 - else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, type) && 2692 + else if (curseg->alloc_type == LFS && 2693 + is_next_segment_free(sbi, curseg, type) && 2861 2694 likely(!is_sbi_flag_set(sbi, SBI_CP_DISABLED))) 2862 2695 new_curseg(sbi, type, false); 2863 - else if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type)) 2864 - change_curseg(sbi, type); 2696 + else if (f2fs_need_SSR(sbi) && 2697 + get_ssr_segment(sbi, type, SSR, 0)) 2698 + change_curseg(sbi, type, true); 2865 2699 else 2866 2700 new_curseg(sbi, type, false); 2867 2701 ··· 2884 2714 if (segno < start || segno > end) 2885 2715 goto unlock; 2886 2716 2887 - if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type)) 2888 - change_curseg(sbi, type); 2717 + if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type, SSR, 0)) 2718 + change_curseg(sbi, type, true); 2889 2719 else 2890 2720 new_curseg(sbi, type, true); 2891 2721 ··· 2908 2738 struct curseg_info *curseg = CURSEG_I(sbi, type); 2909 2739 unsigned int old_segno; 2910 2740 2741 + if (!curseg->inited) 2742 + goto alloc; 2743 + 2911 2744 if (!curseg->next_blkoff && 2912 2745 !get_valid_blocks(sbi, curseg->segno, false) && 2913 2746 !get_ckpt_valid_blocks(sbi, curseg->segno)) 2914 2747 return; 2915 2748 2749 + alloc: 2916 2750 old_segno = curseg->segno; 2917 2751 SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true); 2918 2752 locate_dirty_segment(sbi, old_segno); ··· 2980 2806 mutex_lock(&dcc->cmd_lock); 2981 2807 if (unlikely(dcc->rbtree_check)) 2982 2808 f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi, 2983 - &dcc->root)); 2809 + &dcc->root, false)); 2984 2810 2985 2811 dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root, 2986 2812 NULL, start, ··· 3104 2930 return err; 3105 2931 } 3106 2932 3107 - static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type) 2933 + static bool __has_curseg_space(struct f2fs_sb_info *sbi, 2934 + struct curseg_info *curseg) 3108 2935 { 3109 - struct curseg_info *curseg = CURSEG_I(sbi, type); 3110 - if (curseg->next_blkoff < sbi->blocks_per_seg) 3111 - return true; 3112 - return false; 2936 + return curseg->next_blkoff < f2fs_usable_blks_in_seg(sbi, 2937 + curseg->segno); 3113 2938 } 3114 2939 3115 2940 int f2fs_rw_hint_to_seg_type(enum rw_hint hint) ··· 3248 3075 if (fio->type == DATA) { 3249 3076 struct inode *inode = fio->page->mapping->host; 3250 3077 3251 - if (is_cold_data(fio->page) || file_is_cold(inode) || 3252 - f2fs_compressed_file(inode)) 3078 + if (is_cold_data(fio->page)) { 3079 + if (fio->sbi->am.atgc_enabled) 3080 + return CURSEG_ALL_DATA_ATGC; 3081 + else 3082 + return CURSEG_COLD_DATA; 3083 + } 3084 + if (file_is_cold(inode) || f2fs_compressed_file(inode)) 3253 3085 return CURSEG_COLD_DATA; 3254 3086 if (file_is_hot(inode) || 3255 3087 is_inode_flag_set(inode, FI_HOT_DATA) || ··· 3304 3126 { 3305 3127 struct sit_info *sit_i = SIT_I(sbi); 3306 3128 struct curseg_info *curseg = CURSEG_I(sbi, type); 3307 - bool put_pin_sem = false; 3308 - 3309 - if (type == CURSEG_COLD_DATA) { 3310 - /* GC during CURSEG_COLD_DATA_PINNED allocation */ 3311 - if (down_read_trylock(&sbi->pin_sem)) { 3312 - put_pin_sem = true; 3313 - } else { 3314 - type = CURSEG_WARM_DATA; 3315 - curseg = CURSEG_I(sbi, type); 3316 - } 3317 - } else if (type == CURSEG_COLD_DATA_PINNED) { 3318 - type = CURSEG_COLD_DATA; 3319 - } 3129 + unsigned long long old_mtime; 3130 + bool from_gc = (type == CURSEG_ALL_DATA_ATGC); 3131 + struct seg_entry *se = NULL; 3320 3132 3321 3133 down_read(&SM_I(sbi)->curseg_lock); 3322 3134 3323 3135 mutex_lock(&curseg->curseg_mutex); 3324 3136 down_write(&sit_i->sentry_lock); 3325 3137 3138 + if (from_gc) { 3139 + f2fs_bug_on(sbi, GET_SEGNO(sbi, old_blkaddr) == NULL_SEGNO); 3140 + se = get_seg_entry(sbi, GET_SEGNO(sbi, old_blkaddr)); 3141 + sanity_check_seg_type(sbi, se->type); 3142 + f2fs_bug_on(sbi, IS_NODESEG(se->type)); 3143 + } 3326 3144 *new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg); 3145 + 3146 + f2fs_bug_on(sbi, curseg->next_blkoff >= sbi->blocks_per_seg); 3327 3147 3328 3148 f2fs_wait_discard_bio(sbi, *new_blkaddr); 3329 3149 ··· 3336 3160 3337 3161 stat_inc_block_count(sbi, curseg); 3338 3162 3163 + if (from_gc) { 3164 + old_mtime = get_segment_mtime(sbi, old_blkaddr); 3165 + } else { 3166 + update_segment_mtime(sbi, old_blkaddr, 0); 3167 + old_mtime = 0; 3168 + } 3169 + update_segment_mtime(sbi, *new_blkaddr, old_mtime); 3170 + 3339 3171 /* 3340 3172 * SIT information should be updated before segment allocation, 3341 3173 * since SSR needs latest valid block information. ··· 3352 3168 if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) 3353 3169 update_sit_entry(sbi, old_blkaddr, -1); 3354 3170 3355 - if (!__has_curseg_space(sbi, type)) 3356 - sit_i->s_ops->allocate_segment(sbi, type, false); 3357 - 3171 + if (!__has_curseg_space(sbi, curseg)) { 3172 + if (from_gc) 3173 + get_atssr_segment(sbi, type, se->type, 3174 + AT_SSR, se->mtime); 3175 + else 3176 + sit_i->s_ops->allocate_segment(sbi, type, false); 3177 + } 3358 3178 /* 3359 3179 * segment dirty status should be updated after segment allocation, 3360 3180 * so we just need to update status only one time after previous ··· 3392 3204 mutex_unlock(&curseg->curseg_mutex); 3393 3205 3394 3206 up_read(&SM_I(sbi)->curseg_lock); 3395 - 3396 - if (put_pin_sem) 3397 - up_read(&sbi->pin_sem); 3398 3207 } 3399 3208 3400 3209 static void update_device_state(struct f2fs_io_info *fio) ··· 3540 3355 3541 3356 void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, 3542 3357 block_t old_blkaddr, block_t new_blkaddr, 3543 - bool recover_curseg, bool recover_newaddr) 3358 + bool recover_curseg, bool recover_newaddr, 3359 + bool from_gc) 3544 3360 { 3545 3361 struct sit_info *sit_i = SIT_I(sbi); 3546 3362 struct curseg_info *curseg; ··· 3586 3400 /* change the current segment */ 3587 3401 if (segno != curseg->segno) { 3588 3402 curseg->next_segno = segno; 3589 - change_curseg(sbi, type); 3403 + change_curseg(sbi, type, true); 3590 3404 } 3591 3405 3592 3406 curseg->next_blkoff = GET_BLKOFF_FROM_SEG0(sbi, new_blkaddr); 3593 3407 __add_sum_entry(sbi, type, sum); 3594 3408 3595 - if (!recover_curseg || recover_newaddr) 3409 + if (!recover_curseg || recover_newaddr) { 3410 + if (!from_gc) 3411 + update_segment_mtime(sbi, new_blkaddr, 0); 3596 3412 update_sit_entry(sbi, new_blkaddr, 1); 3413 + } 3597 3414 if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO) { 3598 3415 invalidate_mapping_pages(META_MAPPING(sbi), 3599 3416 old_blkaddr, old_blkaddr); 3417 + if (!from_gc) 3418 + update_segment_mtime(sbi, old_blkaddr, 0); 3600 3419 update_sit_entry(sbi, old_blkaddr, -1); 3601 3420 } 3602 3421 ··· 3613 3422 if (recover_curseg) { 3614 3423 if (old_cursegno != curseg->segno) { 3615 3424 curseg->next_segno = old_cursegno; 3616 - change_curseg(sbi, type); 3425 + change_curseg(sbi, type, true); 3617 3426 } 3618 3427 curseg->next_blkoff = old_blkoff; 3619 3428 } ··· 3633 3442 set_summary(&sum, dn->nid, dn->ofs_in_node, version); 3634 3443 3635 3444 f2fs_do_replace_block(sbi, &sum, old_addr, new_addr, 3636 - recover_curseg, recover_newaddr); 3445 + recover_curseg, recover_newaddr, false); 3637 3446 3638 3447 f2fs_update_data_blkaddr(dn, new_addr); 3639 3448 } ··· 3765 3574 blk_off = le16_to_cpu(ckpt->cur_data_blkoff[type - 3766 3575 CURSEG_HOT_DATA]); 3767 3576 if (__exist_node_summaries(sbi)) 3768 - blk_addr = sum_blk_addr(sbi, NR_CURSEG_TYPE, type); 3577 + blk_addr = sum_blk_addr(sbi, NR_CURSEG_PERSIST_TYPE, type); 3769 3578 else 3770 3579 blk_addr = sum_blk_addr(sbi, NR_CURSEG_DATA_TYPE, type); 3771 3580 } else { ··· 3843 3652 } 3844 3653 3845 3654 if (__exist_node_summaries(sbi)) 3846 - f2fs_ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type), 3847 - NR_CURSEG_TYPE - type, META_CP, true); 3655 + f2fs_ra_meta_pages(sbi, 3656 + sum_blk_addr(sbi, NR_CURSEG_PERSIST_TYPE, type), 3657 + NR_CURSEG_PERSIST_TYPE - type, META_CP, true); 3848 3658 3849 3659 for (; type <= CURSEG_COLD_NODE; type++) { 3850 3660 err = read_normal_summaries(sbi, type); ··· 3973 3781 static struct page *get_current_sit_page(struct f2fs_sb_info *sbi, 3974 3782 unsigned int segno) 3975 3783 { 3976 - return f2fs_get_meta_page_nofail(sbi, current_sit_addr(sbi, segno)); 3784 + return f2fs_get_meta_page(sbi, current_sit_addr(sbi, segno)); 3977 3785 } 3978 3786 3979 3787 static struct page *get_next_sit_page(struct f2fs_sb_info *sbi, ··· 4347 4155 struct curseg_info *array; 4348 4156 int i; 4349 4157 4350 - array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE, sizeof(*array)), 4351 - GFP_KERNEL); 4158 + array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE, 4159 + sizeof(*array)), GFP_KERNEL); 4352 4160 if (!array) 4353 4161 return -ENOMEM; 4354 4162 4355 4163 SM_I(sbi)->curseg_array = array; 4356 4164 4357 - for (i = 0; i < NR_CURSEG_TYPE; i++) { 4165 + for (i = 0; i < NO_CHECK_TYPE; i++) { 4358 4166 mutex_init(&array[i].curseg_mutex); 4359 4167 array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL); 4360 4168 if (!array[i].sum_blk) ··· 4364 4172 sizeof(struct f2fs_journal), GFP_KERNEL); 4365 4173 if (!array[i].journal) 4366 4174 return -ENOMEM; 4175 + if (i < NR_PERSISTENT_LOG) 4176 + array[i].seg_type = CURSEG_HOT_DATA + i; 4177 + else if (i == CURSEG_COLD_DATA_PINNED) 4178 + array[i].seg_type = CURSEG_COLD_DATA; 4179 + else if (i == CURSEG_ALL_DATA_ATGC) 4180 + array[i].seg_type = CURSEG_COLD_DATA; 4367 4181 array[i].segno = NULL_SEGNO; 4368 4182 array[i].next_blkoff = 0; 4183 + array[i].inited = false; 4369 4184 } 4370 4185 return restore_curseg_summaries(sbi); 4371 4186 } ··· 4493 4294 { 4494 4295 unsigned int start; 4495 4296 int type; 4297 + struct seg_entry *sentry; 4496 4298 4497 4299 for (start = 0; start < MAIN_SEGS(sbi); start++) { 4498 - struct seg_entry *sentry = get_seg_entry(sbi, start); 4300 + if (f2fs_usable_blks_in_seg(sbi, start) == 0) 4301 + continue; 4302 + sentry = get_seg_entry(sbi, start); 4499 4303 if (!sentry->valid_blocks) 4500 4304 __set_free(sbi, start); 4501 4305 else ··· 4518 4316 struct dirty_seglist_info *dirty_i = DIRTY_I(sbi); 4519 4317 struct free_segmap_info *free_i = FREE_I(sbi); 4520 4318 unsigned int segno = 0, offset = 0, secno; 4521 - block_t valid_blocks; 4319 + block_t valid_blocks, usable_blks_in_seg; 4522 4320 block_t blks_per_sec = BLKS_PER_SEC(sbi); 4523 4321 4524 4322 while (1) { ··· 4528 4326 break; 4529 4327 offset = segno + 1; 4530 4328 valid_blocks = get_valid_blocks(sbi, segno, false); 4531 - if (valid_blocks == sbi->blocks_per_seg || !valid_blocks) 4329 + usable_blks_in_seg = f2fs_usable_blks_in_seg(sbi, segno); 4330 + if (valid_blocks == usable_blks_in_seg || !valid_blocks) 4532 4331 continue; 4533 - if (valid_blocks > sbi->blocks_per_seg) { 4332 + if (valid_blocks > usable_blks_in_seg) { 4534 4333 f2fs_bug_on(sbi, 1); 4535 4334 continue; 4536 4335 } ··· 4611 4408 * In LFS/SSR curseg, .next_blkoff should point to an unused blkaddr; 4612 4409 * In LFS curseg, all blkaddr after .next_blkoff should be unused. 4613 4410 */ 4614 - for (i = 0; i < NO_CHECK_TYPE; i++) { 4411 + for (i = 0; i < NR_PERSISTENT_LOG; i++) { 4615 4412 struct curseg_info *curseg = CURSEG_I(sbi, i); 4616 4413 struct seg_entry *se = get_seg_entry(sbi, curseg->segno); 4617 4414 unsigned int blkofs = curseg->next_blkoff; 4415 + 4416 + sanity_check_seg_type(sbi, curseg->seg_type); 4618 4417 4619 4418 if (f2fs_test_bit(blkofs, se->cur_valid_map)) 4620 4419 goto out; ··· 4842 4637 { 4843 4638 int i, ret; 4844 4639 4845 - for (i = 0; i < NO_CHECK_TYPE; i++) { 4640 + for (i = 0; i < NR_PERSISTENT_LOG; i++) { 4846 4641 ret = fix_curseg_write_pointer(sbi, i); 4847 4642 if (ret) 4848 4643 return ret; ··· 4883 4678 4884 4679 return 0; 4885 4680 } 4681 + 4682 + static bool is_conv_zone(struct f2fs_sb_info *sbi, unsigned int zone_idx, 4683 + unsigned int dev_idx) 4684 + { 4685 + if (!bdev_is_zoned(FDEV(dev_idx).bdev)) 4686 + return true; 4687 + return !test_bit(zone_idx, FDEV(dev_idx).blkz_seq); 4688 + } 4689 + 4690 + /* Return the zone index in the given device */ 4691 + static unsigned int get_zone_idx(struct f2fs_sb_info *sbi, unsigned int secno, 4692 + int dev_idx) 4693 + { 4694 + block_t sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno)); 4695 + 4696 + return (sec_start_blkaddr - FDEV(dev_idx).start_blk) >> 4697 + sbi->log_blocks_per_blkz; 4698 + } 4699 + 4700 + /* 4701 + * Return the usable segments in a section based on the zone's 4702 + * corresponding zone capacity. Zone is equal to a section. 4703 + */ 4704 + static inline unsigned int f2fs_usable_zone_segs_in_sec( 4705 + struct f2fs_sb_info *sbi, unsigned int segno) 4706 + { 4707 + unsigned int dev_idx, zone_idx, unusable_segs_in_sec; 4708 + 4709 + dev_idx = f2fs_target_device_index(sbi, START_BLOCK(sbi, segno)); 4710 + zone_idx = get_zone_idx(sbi, GET_SEC_FROM_SEG(sbi, segno), dev_idx); 4711 + 4712 + /* Conventional zone's capacity is always equal to zone size */ 4713 + if (is_conv_zone(sbi, zone_idx, dev_idx)) 4714 + return sbi->segs_per_sec; 4715 + 4716 + /* 4717 + * If the zone_capacity_blocks array is NULL, then zone capacity 4718 + * is equal to the zone size for all zones 4719 + */ 4720 + if (!FDEV(dev_idx).zone_capacity_blocks) 4721 + return sbi->segs_per_sec; 4722 + 4723 + /* Get the segment count beyond zone capacity block */ 4724 + unusable_segs_in_sec = (sbi->blocks_per_blkz - 4725 + FDEV(dev_idx).zone_capacity_blocks[zone_idx]) >> 4726 + sbi->log_blocks_per_seg; 4727 + return sbi->segs_per_sec - unusable_segs_in_sec; 4728 + } 4729 + 4730 + /* 4731 + * Return the number of usable blocks in a segment. The number of blocks 4732 + * returned is always equal to the number of blocks in a segment for 4733 + * segments fully contained within a sequential zone capacity or a 4734 + * conventional zone. For segments partially contained in a sequential 4735 + * zone capacity, the number of usable blocks up to the zone capacity 4736 + * is returned. 0 is returned in all other cases. 4737 + */ 4738 + static inline unsigned int f2fs_usable_zone_blks_in_seg( 4739 + struct f2fs_sb_info *sbi, unsigned int segno) 4740 + { 4741 + block_t seg_start, sec_start_blkaddr, sec_cap_blkaddr; 4742 + unsigned int zone_idx, dev_idx, secno; 4743 + 4744 + secno = GET_SEC_FROM_SEG(sbi, segno); 4745 + seg_start = START_BLOCK(sbi, segno); 4746 + dev_idx = f2fs_target_device_index(sbi, seg_start); 4747 + zone_idx = get_zone_idx(sbi, secno, dev_idx); 4748 + 4749 + /* 4750 + * Conventional zone's capacity is always equal to zone size, 4751 + * so, blocks per segment is unchanged. 4752 + */ 4753 + if (is_conv_zone(sbi, zone_idx, dev_idx)) 4754 + return sbi->blocks_per_seg; 4755 + 4756 + if (!FDEV(dev_idx).zone_capacity_blocks) 4757 + return sbi->blocks_per_seg; 4758 + 4759 + sec_start_blkaddr = START_BLOCK(sbi, GET_SEG_FROM_SEC(sbi, secno)); 4760 + sec_cap_blkaddr = sec_start_blkaddr + 4761 + FDEV(dev_idx).zone_capacity_blocks[zone_idx]; 4762 + 4763 + /* 4764 + * If segment starts before zone capacity and spans beyond 4765 + * zone capacity, then usable blocks are from seg start to 4766 + * zone capacity. If the segment starts after the zone capacity, 4767 + * then there are no usable blocks. 4768 + */ 4769 + if (seg_start >= sec_cap_blkaddr) 4770 + return 0; 4771 + if (seg_start + sbi->blocks_per_seg > sec_cap_blkaddr) 4772 + return sec_cap_blkaddr - seg_start; 4773 + 4774 + return sbi->blocks_per_seg; 4775 + } 4886 4776 #else 4887 4777 int f2fs_fix_curseg_write_pointer(struct f2fs_sb_info *sbi) 4888 4778 { ··· 4988 4688 { 4989 4689 return 0; 4990 4690 } 4691 + 4692 + static inline unsigned int f2fs_usable_zone_blks_in_seg(struct f2fs_sb_info *sbi, 4693 + unsigned int segno) 4694 + { 4695 + return 0; 4696 + } 4697 + 4698 + static inline unsigned int f2fs_usable_zone_segs_in_sec(struct f2fs_sb_info *sbi, 4699 + unsigned int segno) 4700 + { 4701 + return 0; 4702 + } 4991 4703 #endif 4704 + unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi, 4705 + unsigned int segno) 4706 + { 4707 + if (f2fs_sb_has_blkzoned(sbi)) 4708 + return f2fs_usable_zone_blks_in_seg(sbi, segno); 4709 + 4710 + return sbi->blocks_per_seg; 4711 + } 4712 + 4713 + unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi, 4714 + unsigned int segno) 4715 + { 4716 + if (f2fs_sb_has_blkzoned(sbi)) 4717 + return f2fs_usable_zone_segs_in_sec(sbi, segno); 4718 + 4719 + return sbi->segs_per_sec; 4720 + } 4992 4721 4993 4722 /* 4994 4723 * Update min, max modified time for cost-benefit GC algorithm ··· 5044 4715 sit_i->min_mtime = mtime; 5045 4716 } 5046 4717 sit_i->max_mtime = get_mtime(sbi, false); 4718 + sit_i->dirty_max_mtime = 0; 5047 4719 up_write(&sit_i->sentry_lock); 5048 4720 } 5049 4721 ··· 5160 4830 5161 4831 destroy_victim_secmap(sbi); 5162 4832 SM_I(sbi)->dirty_info = NULL; 5163 - kvfree(dirty_i); 4833 + kfree(dirty_i); 5164 4834 } 5165 4835 5166 4836 static void destroy_curseg(struct f2fs_sb_info *sbi) ··· 5172 4842 return; 5173 4843 SM_I(sbi)->curseg_array = NULL; 5174 4844 for (i = 0; i < NR_CURSEG_TYPE; i++) { 5175 - kvfree(array[i].sum_blk); 5176 - kvfree(array[i].journal); 4845 + kfree(array[i].sum_blk); 4846 + kfree(array[i].journal); 5177 4847 } 5178 - kvfree(array); 4848 + kfree(array); 5179 4849 } 5180 4850 5181 4851 static void destroy_free_segmap(struct f2fs_sb_info *sbi) ··· 5186 4856 SM_I(sbi)->free_info = NULL; 5187 4857 kvfree(free_i->free_segmap); 5188 4858 kvfree(free_i->free_secmap); 5189 - kvfree(free_i); 4859 + kfree(free_i); 5190 4860 } 5191 4861 5192 4862 static void destroy_sit_info(struct f2fs_sb_info *sbi) ··· 5198 4868 5199 4869 if (sit_i->sentries) 5200 4870 kvfree(sit_i->bitmap); 5201 - kvfree(sit_i->tmp_map); 4871 + kfree(sit_i->tmp_map); 5202 4872 5203 4873 kvfree(sit_i->sentries); 5204 4874 kvfree(sit_i->sec_entries); ··· 5210 4880 kvfree(sit_i->sit_bitmap_mir); 5211 4881 kvfree(sit_i->invalid_segmap); 5212 4882 #endif 5213 - kvfree(sit_i); 4883 + kfree(sit_i); 5214 4884 } 5215 4885 5216 4886 void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi) ··· 5226 4896 destroy_free_segmap(sbi); 5227 4897 destroy_sit_info(sbi); 5228 4898 sbi->sm_info = NULL; 5229 - kvfree(sm_info); 4899 + kfree(sm_info); 5230 4900 } 5231 4901 5232 4902 int __init f2fs_create_segment_manager_caches(void)
+51 -20
fs/f2fs/segment.h
··· 16 16 #define DEF_MAX_RECLAIM_PREFREE_SEGMENTS 4096 /* 8GB in maximum */ 17 17 18 18 #define F2FS_MIN_SEGMENTS 9 /* SB + 2 (CP + SIT + NAT) + SSA + MAIN */ 19 + #define F2FS_MIN_META_SEGMENTS 8 /* SB + 2 (CP + SIT + NAT) + SSA */ 19 20 20 21 /* L: Logical segment # in volume, R: Relative segment # in main area */ 21 22 #define GET_L2R_SEGNO(free_i, segno) ((segno) - (free_i)->start_segno) 22 23 #define GET_R2L_SEGNO(free_i, segno) ((segno) + (free_i)->start_segno) 23 24 24 25 #define IS_DATASEG(t) ((t) <= CURSEG_COLD_DATA) 25 - #define IS_NODESEG(t) ((t) >= CURSEG_HOT_NODE) 26 + #define IS_NODESEG(t) ((t) >= CURSEG_HOT_NODE && (t) <= CURSEG_COLD_NODE) 27 + 28 + static inline void sanity_check_seg_type(struct f2fs_sb_info *sbi, 29 + unsigned short seg_type) 30 + { 31 + f2fs_bug_on(sbi, seg_type >= NR_PERSISTENT_LOG); 32 + } 26 33 27 34 #define IS_HOT(t) ((t) == CURSEG_HOT_NODE || (t) == CURSEG_HOT_DATA) 28 35 #define IS_WARM(t) ((t) == CURSEG_WARM_NODE || (t) == CURSEG_WARM_DATA) ··· 41 34 ((seg) == CURSEG_I(sbi, CURSEG_COLD_DATA)->segno) || \ 42 35 ((seg) == CURSEG_I(sbi, CURSEG_HOT_NODE)->segno) || \ 43 36 ((seg) == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno) || \ 44 - ((seg) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno)) 37 + ((seg) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno) || \ 38 + ((seg) == CURSEG_I(sbi, CURSEG_COLD_DATA_PINNED)->segno) || \ 39 + ((seg) == CURSEG_I(sbi, CURSEG_ALL_DATA_ATGC)->segno)) 45 40 46 41 #define IS_CURSEC(sbi, secno) \ 47 42 (((secno) == CURSEG_I(sbi, CURSEG_HOT_DATA)->segno / \ ··· 57 48 ((secno) == CURSEG_I(sbi, CURSEG_WARM_NODE)->segno / \ 58 49 (sbi)->segs_per_sec) || \ 59 50 ((secno) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \ 60 - (sbi)->segs_per_sec)) \ 51 + (sbi)->segs_per_sec) || \ 52 + ((secno) == CURSEG_I(sbi, CURSEG_COLD_DATA_PINNED)->segno / \ 53 + (sbi)->segs_per_sec) || \ 54 + ((secno) == CURSEG_I(sbi, CURSEG_ALL_DATA_ATGC)->segno / \ 55 + (sbi)->segs_per_sec)) 61 56 62 57 #define MAIN_BLKADDR(sbi) \ 63 58 (SM_I(sbi) ? SM_I(sbi)->main_blkaddr : \ ··· 145 132 * In the victim_sel_policy->alloc_mode, there are two block allocation modes. 146 133 * LFS writes data sequentially with cleaning operations. 147 134 * SSR (Slack Space Recycle) reuses obsolete space without cleaning operations. 135 + * AT_SSR (Age Threshold based Slack Space Recycle) merges fragments into 136 + * fragmented segment which has similar aging degree. 148 137 */ 149 138 enum { 150 139 LFS = 0, 151 - SSR 140 + SSR, 141 + AT_SSR, 152 142 }; 153 143 154 144 /* 155 145 * In the victim_sel_policy->gc_mode, there are two gc, aka cleaning, modes. 156 146 * GC_CB is based on cost-benefit algorithm. 157 147 * GC_GREEDY is based on greedy algorithm. 148 + * GC_AT is based on age-threshold algorithm. 158 149 */ 159 150 enum { 160 151 GC_CB = 0, 161 152 GC_GREEDY, 153 + GC_AT, 162 154 ALLOC_NEXT, 163 155 FLUSH_DEVICE, 164 156 MAX_GC_POLICY, ··· 192 174 unsigned int offset; /* last scanned bitmap offset */ 193 175 unsigned int ofs_unit; /* bitmap search unit */ 194 176 unsigned int min_cost; /* minimum cost */ 177 + unsigned long long oldest_age; /* oldest age of segments having the same min cost */ 195 178 unsigned int min_segno; /* segment # having min. cost */ 179 + unsigned long long age; /* mtime of GCed section*/ 180 + unsigned long long age_threshold;/* age threshold */ 196 181 }; 197 182 198 183 struct seg_entry { ··· 261 240 unsigned long long mounted_time; /* mount time */ 262 241 unsigned long long min_mtime; /* min. modification time */ 263 242 unsigned long long max_mtime; /* max. modification time */ 243 + unsigned long long dirty_min_mtime; /* rerange candidates in GC_AT */ 244 + unsigned long long dirty_max_mtime; /* rerange candidates in GC_AT */ 264 245 265 246 unsigned int last_victim[MAX_GC_POLICY]; /* last victim segment # */ 266 247 }; ··· 301 278 /* victim selection function for cleaning and SSR */ 302 279 struct victim_selection { 303 280 int (*get_victim)(struct f2fs_sb_info *, unsigned int *, 304 - int, int, char); 281 + int, int, char, unsigned long long); 305 282 }; 306 283 307 284 /* for active log information */ ··· 311 288 struct rw_semaphore journal_rwsem; /* protect journal area */ 312 289 struct f2fs_journal *journal; /* cached journal info */ 313 290 unsigned char alloc_type; /* current allocation type */ 291 + unsigned short seg_type; /* segment type like CURSEG_XXX_TYPE */ 314 292 unsigned int segno; /* current segment number */ 315 293 unsigned short next_blkoff; /* next block offset to write */ 316 294 unsigned int zone; /* current zone number */ 317 295 unsigned int next_segno; /* preallocated segment */ 296 + bool inited; /* indicate inmem log is inited */ 318 297 }; 319 298 320 299 struct sit_entry_set { ··· 330 305 */ 331 306 static inline struct curseg_info *CURSEG_I(struct f2fs_sb_info *sbi, int type) 332 307 { 333 - if (type == CURSEG_COLD_DATA_PINNED) 334 - type = CURSEG_COLD_DATA; 335 308 return (struct curseg_info *)(SM_I(sbi)->curseg_array + type); 336 309 } 337 310 ··· 434 411 unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); 435 412 unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno); 436 413 unsigned int next; 414 + unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno); 437 415 438 416 spin_lock(&free_i->segmap_lock); 439 417 clear_bit(segno, free_i->free_segmap); ··· 442 418 443 419 next = find_next_bit(free_i->free_segmap, 444 420 start_segno + sbi->segs_per_sec, start_segno); 445 - if (next >= start_segno + sbi->segs_per_sec) { 421 + if (next >= start_segno + usable_segs) { 446 422 clear_bit(secno, free_i->free_secmap); 447 423 free_i->free_sections++; 448 424 } ··· 462 438 } 463 439 464 440 static inline void __set_test_and_free(struct f2fs_sb_info *sbi, 465 - unsigned int segno) 441 + unsigned int segno, bool inmem) 466 442 { 467 443 struct free_segmap_info *free_i = FREE_I(sbi); 468 444 unsigned int secno = GET_SEC_FROM_SEG(sbi, segno); 469 445 unsigned int start_segno = GET_SEG_FROM_SEC(sbi, secno); 470 446 unsigned int next; 447 + unsigned int usable_segs = f2fs_usable_segs_in_sec(sbi, segno); 471 448 472 449 spin_lock(&free_i->segmap_lock); 473 450 if (test_and_clear_bit(segno, free_i->free_segmap)) { 474 451 free_i->free_segments++; 475 452 476 - if (IS_CURSEC(sbi, secno)) 453 + if (!inmem && IS_CURSEC(sbi, secno)) 477 454 goto skip_free; 478 455 next = find_next_bit(free_i->free_segmap, 479 456 start_segno + sbi->segs_per_sec, start_segno); 480 - if (next >= start_segno + sbi->segs_per_sec) { 457 + if (next >= start_segno + usable_segs) { 481 458 if (test_and_clear_bit(secno, free_i->free_secmap)) 482 459 free_i->free_sections++; 483 460 } ··· 525 500 return FREE_I(sbi)->free_segments; 526 501 } 527 502 528 - static inline int reserved_segments(struct f2fs_sb_info *sbi) 503 + static inline unsigned int reserved_segments(struct f2fs_sb_info *sbi) 529 504 { 530 505 return SM_I(sbi)->reserved_segments; 531 506 } ··· 557 532 558 533 static inline int reserved_sections(struct f2fs_sb_info *sbi) 559 534 { 560 - return GET_SEC_FROM_SEG(sbi, (unsigned int)reserved_segments(sbi)); 535 + return GET_SEC_FROM_SEG(sbi, reserved_segments(sbi)); 561 536 } 562 537 563 538 static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi) ··· 571 546 /* check current node segment */ 572 547 for (i = CURSEG_HOT_NODE; i <= CURSEG_COLD_NODE; i++) { 573 548 segno = CURSEG_I(sbi, i)->segno; 574 - left_blocks = sbi->blocks_per_seg - 575 - get_seg_entry(sbi, segno)->ckpt_valid_blocks; 549 + left_blocks = f2fs_usable_blks_in_seg(sbi, segno) - 550 + get_seg_entry(sbi, segno)->ckpt_valid_blocks; 576 551 577 552 if (node_blocks > left_blocks) 578 553 return false; ··· 580 555 581 556 /* check current data segment */ 582 557 segno = CURSEG_I(sbi, CURSEG_HOT_DATA)->segno; 583 - left_blocks = sbi->blocks_per_seg - 558 + left_blocks = f2fs_usable_blks_in_seg(sbi, segno) - 584 559 get_seg_entry(sbi, segno)->ckpt_valid_blocks; 585 560 if (dent_blocks > left_blocks) 586 561 return false; ··· 702 677 bool is_valid = test_bit_le(0, raw_sit->valid_map) ? true : false; 703 678 int valid_blocks = 0; 704 679 int cur_pos = 0, next_pos; 680 + unsigned int usable_blks_per_seg = f2fs_usable_blks_in_seg(sbi, segno); 705 681 706 682 /* check bitmap with valid block count */ 707 683 do { 708 684 if (is_valid) { 709 685 next_pos = find_next_zero_bit_le(&raw_sit->valid_map, 710 - sbi->blocks_per_seg, 686 + usable_blks_per_seg, 711 687 cur_pos); 712 688 valid_blocks += next_pos - cur_pos; 713 689 } else 714 690 next_pos = find_next_bit_le(&raw_sit->valid_map, 715 - sbi->blocks_per_seg, 691 + usable_blks_per_seg, 716 692 cur_pos); 717 693 cur_pos = next_pos; 718 694 is_valid = !is_valid; 719 - } while (cur_pos < sbi->blocks_per_seg); 695 + } while (cur_pos < usable_blks_per_seg); 720 696 721 697 if (unlikely(GET_SIT_VBLOCKS(raw_sit) != valid_blocks)) { 722 698 f2fs_err(sbi, "Mismatch valid blocks %d vs. %d", ··· 726 700 return -EFSCORRUPTED; 727 701 } 728 702 703 + if (usable_blks_per_seg < sbi->blocks_per_seg) 704 + f2fs_bug_on(sbi, find_next_bit_le(&raw_sit->valid_map, 705 + sbi->blocks_per_seg, 706 + usable_blks_per_seg) != sbi->blocks_per_seg); 707 + 729 708 /* check segment usage, and check boundary of a given segment number */ 730 - if (unlikely(GET_SIT_VBLOCKS(raw_sit) > sbi->blocks_per_seg 709 + if (unlikely(GET_SIT_VBLOCKS(raw_sit) > usable_blks_per_seg 731 710 || segno > TOTAL_SEGS(sbi) - 1)) { 732 711 f2fs_err(sbi, "Wrong valid blocks %d or segno %u", 733 712 GET_SIT_VBLOCKS(raw_sit), segno);
+123 -49
fs/f2fs/super.c
··· 146 146 Opt_compress_algorithm, 147 147 Opt_compress_log_size, 148 148 Opt_compress_extension, 149 + Opt_atgc, 149 150 Opt_err, 150 151 }; 151 152 ··· 214 213 {Opt_compress_algorithm, "compress_algorithm=%s"}, 215 214 {Opt_compress_log_size, "compress_log_size=%u"}, 216 215 {Opt_compress_extension, "compress_extension=%s"}, 216 + {Opt_atgc, "atgc"}, 217 217 {Opt_err, NULL}, 218 218 }; 219 219 ··· 582 580 case Opt_active_logs: 583 581 if (args->from && match_int(args, &arg)) 584 582 return -EINVAL; 585 - if (arg != 2 && arg != 4 && arg != NR_CURSEG_TYPE) 583 + if (arg != 2 && arg != 4 && 584 + arg != NR_CURSEG_PERSIST_TYPE) 586 585 return -EINVAL; 587 586 F2FS_OPTION(sbi).active_logs = arg; 588 587 break; ··· 871 868 #ifdef CONFIG_F2FS_FS_COMPRESSION 872 869 case Opt_compress_algorithm: 873 870 if (!f2fs_sb_has_compression(sbi)) { 874 - f2fs_err(sbi, "Compression feature if off"); 875 - return -EINVAL; 871 + f2fs_info(sbi, "Image doesn't support compression"); 872 + break; 876 873 } 877 874 name = match_strdup(&args[0]); 878 875 if (!name) ··· 897 894 break; 898 895 case Opt_compress_log_size: 899 896 if (!f2fs_sb_has_compression(sbi)) { 900 - f2fs_err(sbi, "Compression feature is off"); 901 - return -EINVAL; 897 + f2fs_info(sbi, "Image doesn't support compression"); 898 + break; 902 899 } 903 900 if (args->from && match_int(args, &arg)) 904 901 return -EINVAL; ··· 912 909 break; 913 910 case Opt_compress_extension: 914 911 if (!f2fs_sb_has_compression(sbi)) { 915 - f2fs_err(sbi, "Compression feature is off"); 916 - return -EINVAL; 912 + f2fs_info(sbi, "Image doesn't support compression"); 913 + break; 917 914 } 918 915 name = match_strdup(&args[0]); 919 916 if (!name) ··· 941 938 f2fs_info(sbi, "compression options not supported"); 942 939 break; 943 940 #endif 941 + case Opt_atgc: 942 + set_opt(sbi, ATGC); 943 + break; 944 944 default: 945 945 f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value", 946 946 p); ··· 967 961 if (f2fs_sb_has_casefold(sbi)) { 968 962 f2fs_err(sbi, 969 963 "Filesystem with casefold feature cannot be mounted without CONFIG_UNICODE"); 964 + return -EINVAL; 965 + } 966 + #endif 967 + /* 968 + * The BLKZONED feature indicates that the drive was formatted with 969 + * zone alignment optimization. This is optional for host-aware 970 + * devices, but mandatory for host-managed zoned block devices. 971 + */ 972 + #ifndef CONFIG_BLK_DEV_ZONED 973 + if (f2fs_sb_has_blkzoned(sbi)) { 974 + f2fs_err(sbi, "Zoned block device support is not enabled"); 970 975 return -EINVAL; 971 976 } 972 977 #endif ··· 1018 1001 } 1019 1002 1020 1003 /* Not pass down write hints if the number of active logs is lesser 1021 - * than NR_CURSEG_TYPE. 1004 + * than NR_CURSEG_PERSIST_TYPE. 1022 1005 */ 1023 1006 if (F2FS_OPTION(sbi).active_logs != NR_CURSEG_TYPE) 1024 1007 F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF; ··· 1037 1020 1038 1021 /* Initialize f2fs-specific inode info */ 1039 1022 atomic_set(&fi->dirty_pages, 0); 1023 + atomic_set(&fi->i_compr_blocks, 0); 1040 1024 init_rwsem(&fi->i_sem); 1041 1025 spin_lock_init(&fi->i_size_lock); 1042 1026 INIT_LIST_HEAD(&fi->dirty_list); ··· 1202 1184 blkdev_put(FDEV(i).bdev, FMODE_EXCL); 1203 1185 #ifdef CONFIG_BLK_DEV_ZONED 1204 1186 kvfree(FDEV(i).blkz_seq); 1187 + kfree(FDEV(i).zone_capacity_blocks); 1205 1188 #endif 1206 1189 } 1207 1190 kvfree(sbi->devs); ··· 1288 1269 kfree(sbi->raw_super); 1289 1270 1290 1271 destroy_device_list(sbi); 1272 + f2fs_destroy_page_array_cache(sbi); 1291 1273 f2fs_destroy_xattr_caches(sbi); 1292 1274 mempool_destroy(sbi->write_io_dummy); 1293 1275 #ifdef CONFIG_QUOTA ··· 1300 1280 for (i = 0; i < NR_PAGE_TYPE; i++) 1301 1281 kvfree(sbi->write_io[i]); 1302 1282 #ifdef CONFIG_UNICODE 1303 - utf8_unload(sbi->s_encoding); 1283 + utf8_unload(sb->s_encoding); 1304 1284 #endif 1305 1285 kfree(sbi); 1306 1286 } ··· 1654 1634 #ifdef CONFIG_F2FS_FS_COMPRESSION 1655 1635 f2fs_show_compress_options(seq, sbi->sb); 1656 1636 #endif 1637 + 1638 + if (test_opt(sbi, ATGC)) 1639 + seq_puts(seq, ",atgc"); 1657 1640 return 0; 1658 1641 } 1659 1642 1660 1643 static void default_options(struct f2fs_sb_info *sbi) 1661 1644 { 1662 1645 /* init some FS parameters */ 1663 - F2FS_OPTION(sbi).active_logs = NR_CURSEG_TYPE; 1646 + F2FS_OPTION(sbi).active_logs = NR_CURSEG_PERSIST_TYPE; 1664 1647 F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS; 1665 1648 F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF; 1666 1649 F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT; ··· 1786 1763 bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE); 1787 1764 bool disable_checkpoint = test_opt(sbi, DISABLE_CHECKPOINT); 1788 1765 bool no_io_align = !F2FS_IO_ALIGNED(sbi); 1766 + bool no_atgc = !test_opt(sbi, ATGC); 1789 1767 bool checkpoint_changed; 1790 1768 #ifdef CONFIG_QUOTA 1791 1769 int i, j; ··· 1859 1835 } 1860 1836 } 1861 1837 #endif 1838 + /* disallow enable atgc dynamically */ 1839 + if (no_atgc == !!test_opt(sbi, ATGC)) { 1840 + err = -EINVAL; 1841 + f2fs_warn(sbi, "switch atgc option is not allowed"); 1842 + goto restore_opts; 1843 + } 1844 + 1862 1845 /* disallow enable/disable extent_cache dynamically */ 1863 1846 if (no_extent_cache == !!test_opt(sbi, EXTENT_CACHE)) { 1864 1847 err = -EINVAL; ··· 2710 2679 } 2711 2680 2712 2681 if (main_end_blkaddr > seg_end_blkaddr) { 2713 - f2fs_info(sbi, "Wrong MAIN_AREA boundary, start(%u) end(%u) block(%u)", 2714 - main_blkaddr, 2715 - segment0_blkaddr + 2716 - (segment_count << log_blocks_per_seg), 2682 + f2fs_info(sbi, "Wrong MAIN_AREA boundary, start(%u) end(%llu) block(%u)", 2683 + main_blkaddr, seg_end_blkaddr, 2717 2684 segment_count_main << log_blocks_per_seg); 2718 2685 return true; 2719 2686 } else if (main_end_blkaddr < seg_end_blkaddr) { ··· 2729 2700 err = __f2fs_commit_super(bh, NULL); 2730 2701 res = err ? "failed" : "done"; 2731 2702 } 2732 - f2fs_info(sbi, "Fix alignment : %s, start(%u) end(%u) block(%u)", 2733 - res, main_blkaddr, 2734 - segment0_blkaddr + 2735 - (segment_count << log_blocks_per_seg), 2703 + f2fs_info(sbi, "Fix alignment : %s, start(%u) end(%llu) block(%u)", 2704 + res, main_blkaddr, seg_end_blkaddr, 2736 2705 segment_count_main << log_blocks_per_seg); 2737 2706 if (err) 2738 2707 return true; ··· 2741 2714 static int sanity_check_raw_super(struct f2fs_sb_info *sbi, 2742 2715 struct buffer_head *bh) 2743 2716 { 2744 - block_t segment_count, segs_per_sec, secs_per_zone; 2717 + block_t segment_count, segs_per_sec, secs_per_zone, segment_count_main; 2745 2718 block_t total_sections, blocks_per_seg; 2746 2719 struct f2fs_super_block *raw_super = (struct f2fs_super_block *) 2747 2720 (bh->b_data + F2FS_SUPER_OFFSET); ··· 2812 2785 } 2813 2786 2814 2787 segment_count = le32_to_cpu(raw_super->segment_count); 2788 + segment_count_main = le32_to_cpu(raw_super->segment_count_main); 2815 2789 segs_per_sec = le32_to_cpu(raw_super->segs_per_sec); 2816 2790 secs_per_zone = le32_to_cpu(raw_super->secs_per_zone); 2817 2791 total_sections = le32_to_cpu(raw_super->section_count); ··· 2826 2798 return -EFSCORRUPTED; 2827 2799 } 2828 2800 2829 - if (total_sections > segment_count || 2830 - total_sections < F2FS_MIN_SEGMENTS || 2801 + if (total_sections > segment_count_main || total_sections < 1 || 2831 2802 segs_per_sec > segment_count || !segs_per_sec) { 2832 2803 f2fs_info(sbi, "Invalid segment/section count (%u, %u x %u)", 2833 2804 segment_count, total_sections, segs_per_sec); 2805 + return -EFSCORRUPTED; 2806 + } 2807 + 2808 + if (segment_count_main != total_sections * segs_per_sec) { 2809 + f2fs_info(sbi, "Invalid segment/section count (%u != %u * %u)", 2810 + segment_count_main, total_sections, segs_per_sec); 2834 2811 return -EFSCORRUPTED; 2835 2812 } 2836 2813 ··· 2862 2829 if (segment_count != dev_seg_count) { 2863 2830 f2fs_info(sbi, "Segment count (%u) mismatch with total segments from devices (%u)", 2864 2831 segment_count, dev_seg_count); 2832 + return -EFSCORRUPTED; 2833 + } 2834 + } else { 2835 + if (__F2FS_HAS_FEATURE(raw_super, F2FS_FEATURE_BLKZONED) && 2836 + !bdev_is_zoned(sbi->sb->s_bdev)) { 2837 + f2fs_info(sbi, "Zoned block device path is missing"); 2865 2838 return -EFSCORRUPTED; 2866 2839 } 2867 2840 } ··· 2945 2906 ovp_segments = le32_to_cpu(ckpt->overprov_segment_count); 2946 2907 reserved_segments = le32_to_cpu(ckpt->rsvd_segment_count); 2947 2908 2948 - if (unlikely(fsmeta < F2FS_MIN_SEGMENTS || 2909 + if (unlikely(fsmeta < F2FS_MIN_META_SEGMENTS || 2949 2910 ovp_segments == 0 || reserved_segments == 0)) { 2950 2911 f2fs_err(sbi, "Wrong layout: check mkfs.f2fs version"); 2951 2912 return 1; ··· 3033 2994 cp_payload = __cp_payload(sbi); 3034 2995 if (cp_pack_start_sum < cp_payload + 1 || 3035 2996 cp_pack_start_sum > blocks_per_seg - 1 - 3036 - NR_CURSEG_TYPE) { 2997 + NR_CURSEG_PERSIST_TYPE) { 3037 2998 f2fs_err(sbi, "Wrong cp_pack_start_sum: %u", 3038 2999 cp_pack_start_sum); 3039 3000 return 1; ··· 3126 3087 } 3127 3088 3128 3089 #ifdef CONFIG_BLK_DEV_ZONED 3129 - static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx, 3130 - void *data) 3131 - { 3132 - struct f2fs_dev_info *dev = data; 3133 3090 3134 - if (zone->type != BLK_ZONE_TYPE_CONVENTIONAL) 3135 - set_bit(idx, dev->blkz_seq); 3091 + struct f2fs_report_zones_args { 3092 + struct f2fs_dev_info *dev; 3093 + bool zone_cap_mismatch; 3094 + }; 3095 + 3096 + static int f2fs_report_zone_cb(struct blk_zone *zone, unsigned int idx, 3097 + void *data) 3098 + { 3099 + struct f2fs_report_zones_args *rz_args = data; 3100 + 3101 + if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL) 3102 + return 0; 3103 + 3104 + set_bit(idx, rz_args->dev->blkz_seq); 3105 + rz_args->dev->zone_capacity_blocks[idx] = zone->capacity >> 3106 + F2FS_LOG_SECTORS_PER_BLOCK; 3107 + if (zone->len != zone->capacity && !rz_args->zone_cap_mismatch) 3108 + rz_args->zone_cap_mismatch = true; 3109 + 3136 3110 return 0; 3137 3111 } 3138 3112 ··· 3153 3101 { 3154 3102 struct block_device *bdev = FDEV(devi).bdev; 3155 3103 sector_t nr_sectors = bdev->bd_part->nr_sects; 3104 + struct f2fs_report_zones_args rep_zone_arg; 3156 3105 int ret; 3157 3106 3158 3107 if (!f2fs_sb_has_blkzoned(sbi)) ··· 3179 3126 if (!FDEV(devi).blkz_seq) 3180 3127 return -ENOMEM; 3181 3128 3182 - /* Get block zones type */ 3129 + /* Get block zones type and zone-capacity */ 3130 + FDEV(devi).zone_capacity_blocks = f2fs_kzalloc(sbi, 3131 + FDEV(devi).nr_blkz * sizeof(block_t), 3132 + GFP_KERNEL); 3133 + if (!FDEV(devi).zone_capacity_blocks) 3134 + return -ENOMEM; 3135 + 3136 + rep_zone_arg.dev = &FDEV(devi); 3137 + rep_zone_arg.zone_cap_mismatch = false; 3138 + 3183 3139 ret = blkdev_report_zones(bdev, 0, BLK_ALL_ZONES, f2fs_report_zone_cb, 3184 - &FDEV(devi)); 3140 + &rep_zone_arg); 3185 3141 if (ret < 0) 3186 3142 return ret; 3143 + 3144 + if (!rep_zone_arg.zone_cap_mismatch) { 3145 + kfree(FDEV(devi).zone_capacity_blocks); 3146 + FDEV(devi).zone_capacity_blocks = NULL; 3147 + } 3187 3148 3188 3149 return 0; 3189 3150 } ··· 3395 3328 static int f2fs_setup_casefold(struct f2fs_sb_info *sbi) 3396 3329 { 3397 3330 #ifdef CONFIG_UNICODE 3398 - if (f2fs_sb_has_casefold(sbi) && !sbi->s_encoding) { 3331 + if (f2fs_sb_has_casefold(sbi) && !sbi->sb->s_encoding) { 3399 3332 const struct f2fs_sb_encodings *encoding_info; 3400 3333 struct unicode_map *encoding; 3401 3334 __u16 encoding_flags; ··· 3426 3359 "%s-%s with flags 0x%hx", encoding_info->name, 3427 3360 encoding_info->version?:"\b", encoding_flags); 3428 3361 3429 - sbi->s_encoding = encoding; 3430 - sbi->s_encoding_flags = encoding_flags; 3362 + sbi->sb->s_encoding = encoding; 3363 + sbi->sb->s_encoding_flags = encoding_flags; 3431 3364 sbi->sb->s_d_op = &f2fs_dentry_ops; 3432 3365 } 3433 3366 #else ··· 3506 3439 sbi->s_chksum_seed = f2fs_chksum(sbi, ~0, raw_super->uuid, 3507 3440 sizeof(raw_super->uuid)); 3508 3441 3509 - /* 3510 - * The BLKZONED feature indicates that the drive was formatted with 3511 - * zone alignment optimization. This is optional for host-aware 3512 - * devices, but mandatory for host-managed zoned block devices. 3513 - */ 3514 - #ifndef CONFIG_BLK_DEV_ZONED 3515 - if (f2fs_sb_has_blkzoned(sbi)) { 3516 - f2fs_err(sbi, "Zoned block device support is not enabled"); 3517 - err = -EOPNOTSUPP; 3518 - goto free_sb_buf; 3519 - } 3520 - #endif 3521 3442 default_options(sbi); 3522 3443 /* parse mount options */ 3523 3444 options = kstrdup((const char *)data, GFP_KERNEL); ··· 3620 3565 err = f2fs_init_xattr_caches(sbi); 3621 3566 if (err) 3622 3567 goto free_io_dummy; 3568 + err = f2fs_init_page_array_cache(sbi); 3569 + if (err) 3570 + goto free_xattr_cache; 3623 3571 3624 3572 /* get an inode for meta space */ 3625 3573 sbi->meta_inode = f2fs_iget(sb, F2FS_META_INO(sbi)); 3626 3574 if (IS_ERR(sbi->meta_inode)) { 3627 3575 f2fs_err(sbi, "Failed to read F2FS meta data inode"); 3628 3576 err = PTR_ERR(sbi->meta_inode); 3629 - goto free_xattr_cache; 3577 + goto free_page_array_cache; 3630 3578 } 3631 3579 3632 3580 err = f2fs_get_valid_checkpoint(sbi); ··· 3819 3761 } 3820 3762 3821 3763 reset_checkpoint: 3764 + f2fs_init_inmem_curseg(sbi); 3765 + 3822 3766 /* f2fs_recover_fsync_data() cleared this already */ 3823 3767 clear_sbi_flag(sbi, SBI_POR_DOING); 3824 3768 ··· 3905 3845 make_bad_inode(sbi->meta_inode); 3906 3846 iput(sbi->meta_inode); 3907 3847 sbi->meta_inode = NULL; 3848 + free_page_array_cache: 3849 + f2fs_destroy_page_array_cache(sbi); 3908 3850 free_xattr_cache: 3909 3851 f2fs_destroy_xattr_caches(sbi); 3910 3852 free_io_dummy: ··· 3918 3856 kvfree(sbi->write_io[i]); 3919 3857 3920 3858 #ifdef CONFIG_UNICODE 3921 - utf8_unload(sbi->s_encoding); 3859 + utf8_unload(sb->s_encoding); 3922 3860 #endif 3923 3861 free_options: 3924 3862 #ifdef CONFIG_QUOTA ··· 4028 3966 err = f2fs_create_extent_cache(); 4029 3967 if (err) 4030 3968 goto free_checkpoint_caches; 4031 - err = f2fs_init_sysfs(); 3969 + err = f2fs_create_garbage_collection_cache(); 4032 3970 if (err) 4033 3971 goto free_extent_cache; 3972 + err = f2fs_init_sysfs(); 3973 + if (err) 3974 + goto free_garbage_collection_cache; 4034 3975 err = register_shrinker(&f2fs_shrinker_info); 4035 3976 if (err) 4036 3977 goto free_sysfs; ··· 4053 3988 err = f2fs_init_compress_mempool(); 4054 3989 if (err) 4055 3990 goto free_bioset; 3991 + err = f2fs_init_compress_cache(); 3992 + if (err) 3993 + goto free_compress_mempool; 4056 3994 return 0; 3995 + free_compress_mempool: 3996 + f2fs_destroy_compress_mempool(); 4057 3997 free_bioset: 4058 3998 f2fs_destroy_bioset(); 4059 3999 free_bio_enrty_cache: ··· 4072 4002 unregister_shrinker(&f2fs_shrinker_info); 4073 4003 free_sysfs: 4074 4004 f2fs_exit_sysfs(); 4005 + free_garbage_collection_cache: 4006 + f2fs_destroy_garbage_collection_cache(); 4075 4007 free_extent_cache: 4076 4008 f2fs_destroy_extent_cache(); 4077 4009 free_checkpoint_caches: ··· 4090 4018 4091 4019 static void __exit exit_f2fs_fs(void) 4092 4020 { 4021 + f2fs_destroy_compress_cache(); 4093 4022 f2fs_destroy_compress_mempool(); 4094 4023 f2fs_destroy_bioset(); 4095 4024 f2fs_destroy_bio_entry_cache(); ··· 4099 4026 unregister_filesystem(&f2fs_fs_type); 4100 4027 unregister_shrinker(&f2fs_shrinker_info); 4101 4028 f2fs_exit_sysfs(); 4029 + f2fs_destroy_garbage_collection_cache(); 4102 4030 f2fs_destroy_extent_cache(); 4103 4031 f2fs_destroy_checkpoint_caches(); 4104 4032 f2fs_destroy_segment_manager_caches();
+15 -7
fs/f2fs/sysfs.c
··· 176 176 struct f2fs_sb_info *sbi, char *buf) 177 177 { 178 178 #ifdef CONFIG_UNICODE 179 + struct super_block *sb = sbi->sb; 180 + 179 181 if (f2fs_sb_has_casefold(sbi)) 180 182 return snprintf(buf, PAGE_SIZE, "%s (%d.%d.%d)\n", 181 - sbi->s_encoding->charset, 182 - (sbi->s_encoding->version >> 16) & 0xff, 183 - (sbi->s_encoding->version >> 8) & 0xff, 184 - sbi->s_encoding->version & 0xff); 183 + sb->s_encoding->charset, 184 + (sb->s_encoding->version >> 16) & 0xff, 185 + (sb->s_encoding->version >> 8) & 0xff, 186 + sb->s_encoding->version & 0xff); 185 187 #endif 186 188 return sprintf(buf, "(none)"); 187 189 } ··· 377 375 return count; 378 376 } 379 377 if (!strcmp(a->attr.name, "gc_idle")) { 380 - if (t == GC_IDLE_CB) 378 + if (t == GC_IDLE_CB) { 381 379 sbi->gc_mode = GC_IDLE_CB; 382 - else if (t == GC_IDLE_GREEDY) 380 + } else if (t == GC_IDLE_GREEDY) { 383 381 sbi->gc_mode = GC_IDLE_GREEDY; 384 - else 382 + } else if (t == GC_IDLE_AT) { 383 + if (!sbi->am.atgc_enabled) 384 + return -EINVAL; 385 + sbi->gc_mode = GC_AT; 386 + } else { 385 387 sbi->gc_mode = GC_NORMAL; 388 + } 386 389 return count; 387 390 } 388 391 ··· 975 968 } 976 969 kobject_del(&sbi->s_kobj); 977 970 kobject_put(&sbi->s_kobj); 971 + wait_for_completion(&sbi->s_kobj_unregister); 978 972 }
+4 -4
fs/f2fs/xattr.c
··· 39 39 if (is_inline) 40 40 kmem_cache_free(sbi->inline_xattr_slab, xattr_addr); 41 41 else 42 - kvfree(xattr_addr); 42 + kfree(xattr_addr); 43 43 } 44 44 45 45 static int f2fs_xattr_generic_get(const struct xattr_handler *handler, ··· 425 425 *base_addr = txattr_addr; 426 426 return 0; 427 427 fail: 428 - kvfree(txattr_addr); 428 + kfree(txattr_addr); 429 429 return err; 430 430 } 431 431 ··· 610 610 } 611 611 error = buffer_size - rest; 612 612 cleanup: 613 - kvfree(base_addr); 613 + kfree(base_addr); 614 614 return error; 615 615 } 616 616 ··· 750 750 if (!error && S_ISDIR(inode->i_mode)) 751 751 set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP); 752 752 exit: 753 - kvfree(base_addr); 753 + kfree(base_addr); 754 754 return error; 755 755 } 756 756
+87
fs/libfs.c
··· 20 20 #include <linux/fs_context.h> 21 21 #include <linux/pseudo_fs.h> 22 22 #include <linux/fsnotify.h> 23 + #include <linux/unicode.h> 24 + #include <linux/fscrypt.h> 23 25 24 26 #include <linux/uaccess.h> 25 27 ··· 1365 1363 return (inode->i_fop == &empty_dir_operations) && 1366 1364 (inode->i_op == &empty_dir_inode_operations); 1367 1365 } 1366 + 1367 + #ifdef CONFIG_UNICODE 1368 + /* 1369 + * Determine if the name of a dentry should be casefolded. 1370 + * 1371 + * Return: if names will need casefolding 1372 + */ 1373 + static bool needs_casefold(const struct inode *dir) 1374 + { 1375 + return IS_CASEFOLDED(dir) && dir->i_sb->s_encoding; 1376 + } 1377 + 1378 + /** 1379 + * generic_ci_d_compare - generic d_compare implementation for casefolding filesystems 1380 + * @dentry: dentry whose name we are checking against 1381 + * @len: len of name of dentry 1382 + * @str: str pointer to name of dentry 1383 + * @name: Name to compare against 1384 + * 1385 + * Return: 0 if names match, 1 if mismatch, or -ERRNO 1386 + */ 1387 + int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, 1388 + const char *str, const struct qstr *name) 1389 + { 1390 + const struct dentry *parent = READ_ONCE(dentry->d_parent); 1391 + const struct inode *dir = READ_ONCE(parent->d_inode); 1392 + const struct super_block *sb = dentry->d_sb; 1393 + const struct unicode_map *um = sb->s_encoding; 1394 + struct qstr qstr = QSTR_INIT(str, len); 1395 + char strbuf[DNAME_INLINE_LEN]; 1396 + int ret; 1397 + 1398 + if (!dir || !needs_casefold(dir)) 1399 + goto fallback; 1400 + /* 1401 + * If the dentry name is stored in-line, then it may be concurrently 1402 + * modified by a rename. If this happens, the VFS will eventually retry 1403 + * the lookup, so it doesn't matter what ->d_compare() returns. 1404 + * However, it's unsafe to call utf8_strncasecmp() with an unstable 1405 + * string. Therefore, we have to copy the name into a temporary buffer. 1406 + */ 1407 + if (len <= DNAME_INLINE_LEN - 1) { 1408 + memcpy(strbuf, str, len); 1409 + strbuf[len] = 0; 1410 + qstr.name = strbuf; 1411 + /* prevent compiler from optimizing out the temporary buffer */ 1412 + barrier(); 1413 + } 1414 + ret = utf8_strncasecmp(um, name, &qstr); 1415 + if (ret >= 0) 1416 + return ret; 1417 + 1418 + if (sb_has_strict_encoding(sb)) 1419 + return -EINVAL; 1420 + fallback: 1421 + if (len != name->len) 1422 + return 1; 1423 + return !!memcmp(str, name->name, len); 1424 + } 1425 + EXPORT_SYMBOL(generic_ci_d_compare); 1426 + 1427 + /** 1428 + * generic_ci_d_hash - generic d_hash implementation for casefolding filesystems 1429 + * @dentry: dentry of the parent directory 1430 + * @str: qstr of name whose hash we should fill in 1431 + * 1432 + * Return: 0 if hash was successful or unchanged, and -EINVAL on error 1433 + */ 1434 + int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) 1435 + { 1436 + const struct inode *dir = READ_ONCE(dentry->d_inode); 1437 + struct super_block *sb = dentry->d_sb; 1438 + const struct unicode_map *um = sb->s_encoding; 1439 + int ret = 0; 1440 + 1441 + if (!dir || !needs_casefold(dir)) 1442 + return 0; 1443 + 1444 + ret = utf8_casefold_hash(um, dentry, str); 1445 + if (ret < 0 && sb_has_strict_encoding(sb)) 1446 + return -EINVAL; 1447 + return 0; 1448 + } 1449 + EXPORT_SYMBOL(generic_ci_d_hash); 1450 + #endif
+22 -1
fs/unicode/utf8-core.c
··· 6 6 #include <linux/parser.h> 7 7 #include <linux/errno.h> 8 8 #include <linux/unicode.h> 9 + #include <linux/stringhash.h> 9 10 10 11 #include "utf8n.h" 11 12 ··· 123 122 } 124 123 return -EINVAL; 125 124 } 126 - 127 125 EXPORT_SYMBOL(utf8_casefold); 126 + 127 + int utf8_casefold_hash(const struct unicode_map *um, const void *salt, 128 + struct qstr *str) 129 + { 130 + const struct utf8data *data = utf8nfdicf(um->version); 131 + struct utf8cursor cur; 132 + int c; 133 + unsigned long hash = init_name_hash(salt); 134 + 135 + if (utf8ncursor(&cur, data, str->name, str->len) < 0) 136 + return -EINVAL; 137 + 138 + while ((c = utf8byte(&cur))) { 139 + if (c < 0) 140 + return -EINVAL; 141 + hash = partial_name_hash((unsigned char)c, hash); 142 + } 143 + str->hash = end_name_hash(hash); 144 + return 0; 145 + } 146 + EXPORT_SYMBOL(utf8_casefold_hash); 128 147 129 148 int utf8_normalize(const struct unicode_map *um, const struct qstr *str, 130 149 unsigned char *dest, size_t dlen)
-3
include/linux/f2fs_fs.h
··· 38 38 #define F2FS_MAX_QUOTAS 3 39 39 40 40 #define F2FS_ENC_UTF8_12_1 1 41 - #define F2FS_ENC_STRICT_MODE_FL (1 << 0) 42 - #define f2fs_has_strict_mode(sbi) \ 43 - (sbi->s_encoding_flags & F2FS_ENC_STRICT_MODE_FL) 44 41 45 42 #define F2FS_IO_SIZE(sbi) (1 << F2FS_OPTION(sbi).write_io_size_bits) /* Blocks */ 46 43 #define F2FS_IO_SIZE_KB(sbi) (1 << (F2FS_OPTION(sbi).write_io_size_bits + 2)) /* KB */
+16
include/linux/fs.h
··· 1366 1366 #define SB_ACTIVE (1<<30) 1367 1367 #define SB_NOUSER (1<<31) 1368 1368 1369 + /* These flags relate to encoding and casefolding */ 1370 + #define SB_ENC_STRICT_MODE_FL (1 << 0) 1371 + 1372 + #define sb_has_strict_encoding(sb) \ 1373 + (sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL) 1374 + 1369 1375 /* 1370 1376 * Umount options 1371 1377 */ ··· 1441 1435 #endif 1442 1436 #ifdef CONFIG_FS_VERITY 1443 1437 const struct fsverity_operations *s_vop; 1438 + #endif 1439 + #ifdef CONFIG_UNICODE 1440 + struct unicode_map *s_encoding; 1441 + __u16 s_encoding_flags; 1444 1442 #endif 1445 1443 struct hlist_bl_head s_roots; /* alternate root dentries for NFS */ 1446 1444 struct list_head s_mounts; /* list of mounts; _not_ for fs use */ ··· 3198 3188 extern int generic_file_fsync(struct file *, loff_t, loff_t, int); 3199 3189 3200 3190 extern int generic_check_addressable(unsigned, u64); 3191 + 3192 + #ifdef CONFIG_UNICODE 3193 + extern int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str); 3194 + extern int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, 3195 + const char *str, const struct qstr *name); 3196 + #endif 3201 3197 3202 3198 #ifdef CONFIG_MIGRATION 3203 3199 extern int buffer_migrate_page(struct address_space *,
+3
include/linux/unicode.h
··· 27 27 int utf8_casefold(const struct unicode_map *um, const struct qstr *str, 28 28 unsigned char *dest, size_t dlen); 29 29 30 + int utf8_casefold_hash(const struct unicode_map *um, const void *salt, 31 + struct qstr *str); 32 + 30 33 struct unicode_map *utf8_load(const char *version); 31 34 void utf8_unload(struct unicode_map *um); 32 35
+6 -4
include/trace/events/f2fs.h
··· 111 111 112 112 #define show_alloc_mode(type) \ 113 113 __print_symbolic(type, \ 114 - { LFS, "LFS-mode" }, \ 115 - { SSR, "SSR-mode" }) 114 + { LFS, "LFS-mode" }, \ 115 + { SSR, "SSR-mode" }, \ 116 + { AT_SSR, "AT_SSR-mode" }) 116 117 117 118 #define show_victim_policy(type) \ 118 119 __print_symbolic(type, \ 119 120 { GC_GREEDY, "Greedy" }, \ 120 - { GC_CB, "Cost-Benefit" }) 121 + { GC_CB, "Cost-Benefit" }, \ 122 + { GC_AT, "Age-threshold" }) 121 123 122 124 #define show_cpreason(type) \ 123 125 __print_flags(type, "|", \ ··· 136 134 __print_symbolic(type, \ 137 135 { CP_NO_NEEDED, "no needed" }, \ 138 136 { CP_NON_REGULAR, "non regular" }, \ 139 - { CP_COMPRESSED, "compreesed" }, \ 137 + { CP_COMPRESSED, "compressed" }, \ 140 138 { CP_HARDLINK, "hardlink" }, \ 141 139 { CP_SB_NEED_CP, "sb needs cp" }, \ 142 140 { CP_WRONG_PINO, "wrong pino" }, \